Interesting People mailing list archives
Best of Luck, Jimmy Wales ...
From: David Farber <dave () farber net>
Date: Tue, 19 Jun 2007 15:15:28 -0400
Begin forwarded message: From: Randall <rvh40 () insightbb com> Date: June 19, 2007 3:02:38 PM EDT To: johnmacsgroup () yahoogroups com Cc: David Farber <dave () farber net> Subject: Best of Luck, Jimmy Wales ... Open-source search engine gangs up on Google http://www.newscientisttech.com/article.ns?id=mg19426066.500&print=true * 30 May 2007 * Paul Marks IF HOPE alone could spawn a world-class search engine, Google would be dead by now. In reality it's going to take more than faith to topple the search giant, which has pioneered cutting-edge technology and grabbed a "mindshare" that secured it a place in the Oxford English Dictionary. Yet despite the sizeable hurdles ahead, a rebellious group of engineers is hoping to do just that. Led by Wikipedia's co-founder Jimmy Wales, hundreds of software engineers - ranging from fledgling teenage coders to retired, respected software gurus - are combining in an unlikely attempt to overturn Google's domination of the search market. Their weapon? The transparency provided by open source software. The idea underpinning their search engine - dubbed Wikia Search after Wales's umbrella company Wikia - is that its search algorithm, which determines which web pages appear top of the lists of links it serves up, will be made public. Wikia's search engineers think this will elicit the trust of users in a way that Google, which keeps its algorithm a closely guarded secret, never will. Open source search results will also be more relevant, as the algorithm will continually be tweaked by its users, keeping it up to date with new technologies as they are deployed, Wales says. The Wikia Search team believes this process of continual improvement will also make it better than Google at dodging the efforts of the spammers who constantly try to "game" Google's search algorithms to put their own nefarious web pages top of the list of search results (see "A spark for spam, or an end to it?"). Google is the top search engine today thanks to an innovative way of determining which pages are the most relevant to a web user's query pioneered by its founders, Sergey Brin and Larry Page, back in the 1990s. Yahoo and Microsoft followed with similar algorithms to rank pages (New Scientist, 20 November 2004, p 23). These algorithms form the heart of each company's intellectual property and so are kept secret. But that, Wales told New Scientist earlier this year, is their Achilles' heel, because it means no one knows why search results appear in the order they do. Last month, for instance, Google upgraded its algorithm to serve up links to images, news, video, music and books, as well as web pages, in a single search results page, saving users the trouble of having to search under different headings. The company is keeping quiet about how it does this too. Faced with that silence, people rightfully question the quality of search results, says Jeremie Miller, Wikia's technology chief, who is based in San Mateo, California. Some ask whether Google's algorithm skews results towards its advertising clients, which earned the company more than $10 billion in 2006. Google denies this, but equally, the secrecy means it is difficult to prove otherwise. Similar criticism can be levelled at other search engines. Last year several companies filed lawsuits against Google and Yahoo alleging that the companies unfairly skew their search results (New Scientist, 19 August 2006, p 24). Politicians are worried too. "European governments have been getting concerned about the competition aspects of search engines, particularly as Google has become so dominant," says Ian Brown, an electronic privacy expert at University College London. "They think there should be much more transparency with search algorithms." Web surfers may wish to turn to Wikia Search for another reason: it is vowing not to record the terms people search for. Google, Yahoo and Microsoft store this data as they say it helps them improve their technology, but there are concerns that it could be used more intrusively. Wikia Search still has a long way to go before it becomes reality. Though the discussion forums on the project's website (search.wikia.com) and its associated email list have been up and running since January, and are brimming with ideas about better ways of running a search engine, no clear way forward has yet been decided. What has emerged is that the code will probably incorporate the best elements of two existing open source search programs, neither of which is ready for prime time. One, called Lucene, creates lists of websites and their contents; the other, called Nutch, picks out search results from vast clusters of computers. Google says it welcomes the competition. "We're just really excited when a new development comes to the space because it is good for everybody," says Jon Steinback of Google. To take on Google, Yahoo, Microsoft and the rest, Wales and his coterie of coders face some tough challenges. One is a lack of cash to buy a fleet of global data centres. Today's search engines create lists containing the contents of billions of web pages, known as indexes, and store them on tens of thousands of servers around the globe. The exact number is another trade secret, but there is no doubt that maintaining and powering them is hugely expensive. Wikia Search is already considering one solution. Rather than investing in data centres, it might store its index on a distributed computing "grid" made up of thousands of volunteers' home PCs and servers connected via the internet. The model for this is the SETI@home screen saver, which divvies up data from a radio telescope among volunteers' home PCs. Each computer would hold a small part of Wikia Search's index and handle search requests relevant to that part. This strategy brings a bunch of problems of its own, though. What do you do when individual machines are switched off? And how do you stop spammers posing as Wikia volunteers and flooding the index with nefarious web pages? Miller is confident these problems can be overcome. Video distribution networks that use BitTorrent software also store material on users' machines and can continue to function even when some are switched off by spreading copies of the data across a number of machines. Google itself shows the distributed approach works, says Brown. Using clusters of desktop-class PCs, it deploys clever distributed algorithms to shunt search data between them. Can Wikia Search's creators win the day? Clearly they are spirited. "Kill and destroy Google," jokes one contributor. "Let's drive a stake through the evil dragon's heart." In the end it may come down to how much users value transparency. "Search needs to be part of the internet's infrastructure, not the domain of commercial giants," says Miller. "Google is an advertising service." A spark for spam, or an end to it? Going open source should ensure the ordering of a search engine's results cannot be secretly bent to its owner's whim. But will it make the results any less prone to manipulation by spammers? Search engine spam has plagued Google's results since the company was founded. One way spammers initially "gamed" Google's search algorithm, which ranks pages more highly if lots of other pages contain links pointing to them, was to put up spoof web pages crammed with links pointing to their own sites. As Google got wise to this, spammers got more sophisticated and the two sides are now locked in an arms race. Spammers deduce how Google's algorithm works from observing how it seems to rank pages, and then devise their own technologies to take advantage of the algorithm and propel their pages to the top. Meanwhile Google has to constantly modify its algorithm to dodge these tricks. Now Wikia, a company co-founded by Wikipedia pioneer Jimmy Wales, plans to build an open source search engine to rival Google that will publicise the way its algorithm ranks results. Ben Laurie, an open source programmer based in London, says that this will make it easier for spammers to game the algorithms. Instead of having to guess at how an algorithm works, as they do now, they will simply be able to peek inside the software to come up with ways to manipulate it. "By publishing its search algorithm, it's going to be pretty obvious to spammers how to get to the top of the search hits, risking a huge spamfest," Laurie says. "Some genius might come up with algorithms that, despite being published, are resistant to that. But it strikes me as unlikely." The Wikia Search team, however, expect that to happen. They hope their algorithms will be more responsive than Google to new spam techniques because of the vast number of volunteers' brains that will be thrown at the problem. Danny Sullivan of the news site searchengineland.com thinks that Wikia Search will turn its army of volunteers to finding ways to block spammers in the same way that Wikipedia handles vandalism in its articles using an army of human editors. "I think they might come up with some novel technology to let humans shape or refine search results," he says. My Original Writing blog: http://itgotworse.blogsource.com ------------------------------------------- Archives: http://v2.listbox.com/member/archive/247/=now RSS Feed: http://v2.listbox.com/member/archive/rss/247/ Powered by Listbox: http://www.listbox.com
Current thread:
- Best of Luck, Jimmy Wales ... David Farber (Jun 19)
- <Possible follow-ups>
- Re: Best of Luck, Jimmy Wales ... David Farber (Jun 20)