Educause Security Discussion mailing list archives

Re: Post Processing Google Webmaster Tools data


From: "James H. Moore" <jhmiso () RIT EDU>
Date: Wed, 30 Nov 2011 15:14:12 -0500

Thanks for your suggestion, and question.  I think that a little more background is in order.

What we have found is something that seems to be part of general black hat search engine optimization. A number of 
other universities have detected it, but I don't think it is confined to universities.

We implemented some things in Google Alerts to check for new web pages that have drug names associated with them.  We 
don't have a medical school, so this works with a fairly simple configuration.  And we picked up a lot that were in 
unmoderated comments on blogs, or unauthenticated wiki entries.  And we forwarded them on to the owners, which caused 
blogs to be more moderated, and wikis to be authenticated or entries queued.  

The response was to be more subtle.  Since it is Google/Search Engine ranking the perpetrators are after,  they began 
hacking websites, and if they succeed, they insert code that says if the User Agent string is googlebot, for instance,  
or if Google is in the referrer string, then execute the redirect to the cheap drugs website.   If not, display the 
page normally.  That makes it more difficult to check.

So I got the web development team to add me as a "webmaster" for Google.  You can be set up as a webmaster, if you 
apply, and then place some code that Google supplies in your webpage.    Then you have a view from the "webmaster 
tools" of some of Google's data.  One of the pieces of data is the sites that link to your website.  In that large 
list, we have found numeric IPs, and IPs from countries and companies that we didn't expect.  If we could curl the 
webpage, and then evaluate the link, then we could possibly block the URL as a known weak website.  Sometimes, it is 
good to have information of that sort to see when new links pop up.

I hope that this is more clear.  I have included the security list as well, since I believe my last note was a bit too 
terse to be clear.

One related question, that I thought of as I was writing the above.  What scanning tools do you use for new webpages?  
What about for re-authorization?

Thanks,

Jim Moore
  

-----Original Message-----
From: Paul Hanson [mailto:paulh () haas berkeley edu] 
Sent: Wednesday, November 30, 2011 2:41 PM
To: James H. Moore
Subject: RE: Post Processing Google Webmaster Tools data

I don't know the Google tools you speak of so please bear with me as I'm curious to know more about the problem you're 
trying to resolve.  Could it be something as simple as parsing the referrer field in your Apache/IIS logs?  Not 100% 
accurate, but it will certainly tell you where they're coming from.  One could then check the referring site to see if 
a link to your site exists.  

Thanks,
Paul

-----Original Message-----
From: The EDUCAUSE Security Constituent Group Listserv [mailto:SECURITY () LISTSERV EDUCAUSE EDU] On Behalf Of James H. 
Moore
Sent: Wednesday, November 30, 2011 10:51 AM
To: SECURITY () LISTSERV EDUCAUSE EDU
Subject: [SECURITY] Post Processing Google Webmaster Tools data

I am trying to guess/gauge the effort needed to do some post processing on Google Webmaster Tools downloads, 
specifically links.   To keep tabs on who is linking to us.  It can be an alert to backlinking and form spam.  I don't 
know if  the limitation on 100K links will be an issue or not.  I just didn't know if someone had already done 
something like a script that will take a list of links and go out and retrieve the page with curl, and then suck out 
the line with our university  (and maybe the line before and after).  I would also like to track it, and see what are 
new links this week.  I would also like to check something to see if I have already analyzed it.

Some perl and curl, and a flat file would probably work.  I just didn't want to reinvent the wheel.

Or is some of this better handled through a private space crawler that could catalog outbond links as well?

Jim
- - - -
Jim Moore, CISSP, IAM
Senior Information Security Forensic Investigator Rochester Institute of Technology
151 Lomb Memorial Drive
Rochester, NY 14623-5603
(585) 475-5406 (office)
(585) 255-0809 (Cell - Incident Reporting & Emergencies)
(585) 475-7920 (fax)


If you consciously try to thwart opponents, you are already late.  Miyamoto Musashi, Japanese philosopher/samurai, 1645

A ship in harbor is safe -- but that is not what ships are built for.  John A. Shedd, Salt from My Attic, 1928 
CONFIDENTIALITY NOTE: The information transmitted, including attachments, is intended only for the person(s) or entity 
to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, 
dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other 
than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any 
copies of this information


Current thread: