Information Security News mailing list archives

Online Search Engines Help Lift Cover of Privacy

From: InfoSec News <isn () c4i org>
Date: Mon, 9 Feb 2004 03:39:26 -0600 (CST)
Forwarded from: William Knowles <wk () c4i org>

http://www.washingtonpost.com/wp-dyn/articles/A24053-2004Feb8.html

By Yuki Noguchi
Washington Post Staff Writer
Monday, February 9, 2004

Sitting at his laptop, Chris O'Ferrell types a few words into the 
Google search engine and up pops a link to what appears to be a 
military document listing suspected Taliban and al Qaeda members, date 
of birth, place of birth, passport numbers and national identification 
numbers.

Another search yields a spreadsheet of names and credit card numbers.

"All search engines will get you this," O'Ferrell said, pointing to 
files of spoils he has found on the Internet: Medical records, bank 
account numbers, students' grades, and the docking locations of 804 
U.S. Navy ships, submarines and destroyers.

And it is all legal, using the world's most powerful Internet search 
engine.

Cybersecurity experts say an increasing number of private or 
putatively secret documents are online in out-of-the-way corners of 
computers all over the globe, leaving the government, individuals, and 
companies vulnerable to security breaches. At some Web sites and 
various message groups, techno-hobbyists are even offering 
instructions on how to find sensitive documents using a relatively 
simple search. Though it does not technically trespass, the practice 
is sometimes called "Google hacking."

"There's a whole subculture that's doing this," said O'Ferrell, a 
long-time hacking expert and chief technology officer of Herndon-based 
security consultancy Netsec Inc.

In the decade they have been around, search engines like Google have 
become more powerful. At the same time, the Web has become a richer 
source of information as more businesses and government agencies rely 
on the Internet to transmit and share information. All of it is stored 
on computers called servers, each one linked to the Internet.

For a variety of reasons -- improperly configured servers, holes in 
security systems, human error -- a wide assortment of material not 
intended to be viewed by the public is, in fact, publicly available. 
Once Google or another search engine finds it, it is nearly impossible 
to draw back into secrecy.

That is giving rise to more activity from "Googledorks," who troll the 
Internet for confidential goods, security engineers said.

"As far as the number of sites affected by this, it's in the tens of 
thousands," said Johnny Long, 32, a researcher and developer for 
Computer Sciences Corp. and veteran hacker who maintains a Web site 
that he says keeps him connected to the hacker community. He spoke 
about Google hacking at the Def Con hacker convention in Las Vegas 
last summer, which has led to more awareness of vulnerabilities, he 
said.

Google gets singled out for these searches because of its 
effectiveness.

"The reason Google's good is that they give you more information and 
they give you more tools to search," O'Ferrell said.

Its powerful computer "crawls" over every Web page on the Internet at 
least every couple weeks, which means surfing every public server on 
the globe, grabbing every page, and every link attached to every page. 
Those results are then catalogued using complex mathematical systems.

The most basic way to keep Google from reaching information in a Web 
server, security experts said, is to set up a digital gatekeeper in 
the form of an instruction sheet for the search-engine's crawler. That 
file, which is called robots.txt, defines what is open to the crawler 
and what is not. But if the robots.txt file is not properly configured 
, or is left off inadvertently, a hole is opened where Google gets in. 
And because Google's crawlers are legal, no alarms will go off.

"The scariest thing is that this could be happening to the government 
and they may never know it was happening," Long said. "If there's a 
chink in the armor, [the hackers] will find it."

Google and other search-engine officials said they are sensitive to 
the problem, but are not in a position to control it.

With a vast system of more than 10,000 computer systems constantly 
collecting new information on more than 3 billion Web sites, the 
company cannot and does not want to police or censor what goes on the 
Web, said Craig Silverstein, Google's chief technology officer.

"I think Web masters have to be careful," he said. "The basic problem 
is that with 3 billion [Web sites], there's a lot of information out 
there." It offers a tool on its own Web site, "Webmaster guidelines," 
on how to remove Web sites from Google's system, including Google's 
vast store of cached pages that may no longer be available online, 
Silverstein said.

For hacking experts, Google-hacking has a kind of populist allure: any 
one with Internet access can do it if they know the right way to 
search.

"It's the easiest point-and-click hacking -- it's fun, it's new, 
quirky, and yet you can achieve powerful results," said Edward 
Skoudis, a security consultant for INS Inc., which helps government 
and business clients monitor what is visible from the Web. "This 
concept of using a search engine for hacking has been around for a 
while, but it's taken off in the last few months," probably because of 
a new-found enthusiasm in the underground hacking community, he said.

Search strings including "xls," or "cc," or "ssn" often brings up 
spread sheets, credit card numbers, and Social Security numbers linked 
to a customer list. Adding the word "total" in searches often pulls up 
financial spreadsheets totaling dollar figures. A hacker with enough 
time and experience recognizing sensitive content can find an alarming 
amount of supposedly private information.

"On a [client's] bank site, I found an Excel spread sheet with 10,000 
Social Security and credit card numbers," said Skoudis, of one of his 
successful treasure hunts.

The bank's Web server had been properly configured to keep such 
documents private, but someone had mistakenly put the information on 
the wrong side of the fence, he said. "Google found the open door and 
crawled in."

Skoudis confronted the "red-faced executives" with his findings, he 
said, and was told: "Just fix it, damn it."

Google and other search-engine operators are unable to gauge how 
frequently private documents are accessed using their sites, or how 
many are removed for security reasons.

"The challenge is that as the search-engine tool evolved, people got 
more lax about what they put on a publicly available Web server," said 
Tom Wilde, vice president and general manager of Terra Lycos's 19 
search engines. "It would be impossible to monitor" the tens of 
millions of searches that take place every day, Wilde said, adding 
that he has never been notified of a security breach on his sites.

Government officials said they were familiar with Google hacking, and 
were working with government agencies and businesses to secure 
sensitive documents on Web servers.

"It's an issue we're aware of and tracking," said Amit Yoran, director 
of the cybersecurity division of the Homeland Security Department. By 
law, each agency is responsible for its own security, and although 
hacking or security breaches are reported to Homeland Security, the 
cybersecurity division does not monitor the content of the Web, he 
said.

It is unclear who is at fault when someone digs up a confidential 
document.

"I don't know what law's been violated just for searching" on a 
publicly available search engine, said Paul Bresson, a spokesman for 
the FBI, noting the bureau has not yet taken actions against 
individuals who have found secure documents by using search engines. 
"If they use it for some sinister purpose, that's another issue."

The availability of private information contributes to rising 
incidence of identity theft, which for the last four years has been 
the No. 1 consumer problem for the Federal Trade Commission. Last year 
the FTC received nearly 215,000 complaints about identity theft, up 
from about 152,000 in 2002.

Since 2001, the FTC has settled cases with Eli Lilly & Co., Microsoft 
Corp. and clothing maker Guess Inc. for not taking "reasonable" 
measures to keep medical or financial information secure, said Jessica 
Rich, assistant director of the commission's bureau of consumer 
protection. Letting customer information reside on an unsecure server 
can open up a business to such liability.

"There are unique vulnerabilities because of databases that are 
accessible through the Web," Rich said, adding that the FTC 
anticipates bringing more security-related cases in the future.

Once confidential pages are found, it is not easy to get them back 
under wraps.

Even after a document has been pulled off of a Web server, as was the 
case when MTV removed from its Web site a pre-Super Bowl press release 
promising "shocking moments" at the halftime show, documents often 
remain cached, or stored, in other search engines' computers so they 
can still be accessed.

"Once it is placed online, it's very hard to get the digital horse 
back in the electronic barn," said Marc Rotenberg, executive director 
of the Electronic Privacy Information Center. "It's close to 
impossible to get it back."


 
*==============================================================*
"Communications without intelligence is noise;  Intelligence
without communications is irrelevant." Gen Alfred. M. Gray, USMC
----------------------------------------------------------------
C4I.org - Computer Security, & Intelligence - http://www.c4i.org
================================================================
Help C4I.org with a donation: http://www.c4i.org/contribute.html
*==============================================================*



-
ISN is currently hosted by Attrition.org

To unsubscribe email majordomo () attrition org with 'unsubscribe isn'
in the BODY of the mail.
Current thread:

Online Search Engines Help Lift Cover of Privacy InfoSec News (Feb 09)