Educause Security Discussion mailing list archives

Post Processing Google Webmaster Tools data


From: "James H. Moore" <jhmiso () RIT EDU>
Date: Wed, 30 Nov 2011 13:51:19 -0500

I am trying to guess/gauge the effort needed to do some post processing on Google Webmaster Tools downloads, 
specifically links.   To keep tabs on who is linking to us.  It can be an alert to backlinking and form spam.  I don't 
know if  the limitation on 100K links will be an issue or not.  I just didn't know if someone had already done 
something like a script that will take a list of links and go out and retrieve the page with curl, and then suck out 
the line with our university  (and maybe the line before and after).  I would also like to track it, and see what are 
new links this week.  I would also like to check something to see if I have already analyzed it.

Some perl and curl, and a flat file would probably work.  I just didn't want to reinvent the wheel.

Or is some of this better handled through a private space crawler that could catalog outbond links as well?

Jim
- - - -
Jim Moore, CISSP, IAM
Senior Information Security Forensic Investigator Rochester Institute of Technology
151 Lomb Memorial Drive
Rochester, NY 14623-5603
(585) 475-5406 (office)
(585) 255-0809 (Cell - Incident Reporting & Emergencies)
(585) 475-7920 (fax)


If you consciously try to thwart opponents, you are already late.  Miyamoto Musashi, Japanese philosopher/samurai, 1645

A ship in harbor is safe -- but that is not what ships are built for.  John A. Shedd, Salt from My Attic, 1928 
CONFIDENTIALITY NOTE: The information transmitted, including attachments, is intended only for the person(s) or entity 
to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, 
dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other 
than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any 
copies of this information


Current thread: