Interesting People mailing list archives

Re WSJ ends Google users' free ride, then falls 44% in search results


From: "Dave Farber" <farber () gmail com>
Date: Wed, 14 Jun 2017 21:41:15 -0400




Begin forwarded message:

From: Chuck McManis <chuck.mcmanis () gmail com>
Date: June 14, 2017 at 8:21:27 PM EDT
To: Dave Farber <dave () farber net>
Subject: Re: [IP] Re WSJ ends Google users' free ride, then falls 44% in search results

[For IP if you wish]

Having worked at Google (although not in the Search group directly) and deployed and a competitive search engine 
(Blekko) and dealt directly with the issues of crawling and search, a number of things are important in this 
discussion;

First (and perhaps foremost) there is a big question of business models, costs, and value. That ground is fairly well 
covered but to summarize web advertising generates significantly (as in orders of magnitude) less revenue than print 
advertising. Subscription models have always had 'leakage' where content was shared when a print copy was handed 
around (or lended in the case of libraries), content production costs (those costs that don't include printing and 
distribution of printed copies) have gone up, and information value (as a function of availability) has gone down. 
This is a fascinating (for me) economic system which is the subject of many learned papers but the summary impact in 
this discussion is that publications like the Wall Street journal are working hard to maximize the value extracted 
within the constraints of the web infrastructure.

Second, there is a persistent tension between people who apply classical economics to the system and those who would 
like to produce a financially viable work product. 

And finally, there is a "Fraud Surface Area" component that is enabled by the new infrastructure that is relatively 
easily exploited without a concomitant level of risk to the perpetrators.

So lets approach this from the fraud perspective first and directly answer the complaint, "The WSJ must have the 
world's worst web programmers if they can't figure out how to show Google the full articles even though normal users 
are paywalled." 

Google is a target for fraudsters because subverting its algorithm can enable advertising click fraud, remote system 
compromise,  and identity theft. One way that arose early on in Google's history were sites that present something 
interesting when the Google Crawler came through reading the page, some something malicious when an individual came 
through. The choice of what to show in response to an HTTP protocol request was determined largely from meta-data 
associated with the connection such as "User Agent", "Source Address", "Protocol options", and "Optional headers." To 
combat this Google has developed a crawling infrastructure that will crawl a web page and then at a future date audit 
that page by fetching it from an address with metadata that would suggest a human viewer. When the contents of a page 
change based on whether or not it looks like a human connection, Google typically would immediately dump the page and 
penalize the domain in terms of its Page Rank (this moves the page into the later pages of results and so are less 
likely to clicked on by the general public).

Google is also a company that doesn't generally like to put "exemptions" in for a particular domain. They have had 
issues in the past where an exemption was added and then the company went out of business and the domain acquired by 
a bad actor who subsequently exploited the exemption to expose users to malware laced web pages. As a result, (at 
least as of 2010 when I left) the policy was not to provide exceptions and not to create future problems when the 
circumstances around a specific exemption might no longer apply. As a result significant co-ordination between the 
web site and Google is required to support anything out of the ordinary, and that costs resources which Google is not 
willing to donate to solve the web site's problems.

It is also important to note that both Google and the WSJ's are cognizant of the sales conversion opportunity 
associated with a reader *knowing* because of the snippet that some piece of information is present in the document, 
and then being denied access to that document for free. It connects the dots between "there is something here I want 
to know" and "you can pay me now and I'll give it to you."  As a result, if Google were to continue to rank the WSJ 
article into the first page of results it would be providing a financial boost to the WSJ and yet not benefiting 
itself financially at all.

The bottom line is, as it usually is, that there is a value here and the market maker is unwilling to cede all of it 
to the seller. Google has solved this problem with web shopping sites by telling them they have to pay Google a fee 
to appear in the first page of results, no doubt if the WSJ was willing to pay Google an ongoing maintenance fee 
Google would be willing to put the WSJ pages back into the first page of results (even without them being available 
if you clicked on them). 

As has been demonstrated in the many interactions between Google and the newspapers of the world, absent any 
externally applied regulation, there are three 'values' Google is willing to accept. You can give Google's customers 
free access to a page found on Google (the one click free rule) which Google values because it keeps Google at the 
top of everyone's first choice for searching for information. Alternatively you can allow only Google advertising on 
your pages which Google values because it can extract some revenue from the traffic they send your way. Or you can 
just pay Google for the opportunity to be in the set of results that the user sees first.

--Chuck McManis



On Wed, Jun 14, 2017 at 4:09 PM, Dave Farber <farber () gmail com> wrote:



Begin forwarded message:

From: "John Levine" <johnl () iecc com>
Date: June 14, 2017 at 6:37:48 PM EDT
To: dave () farber net
Cc: "Lauren Weinstein" <lauren () vortex com>, "Bcc" <johnl-sent () iecc com>
Subject: Re: [IP] WSJ ends Google users' free ride, then falls 44% in search results

In article <6D9A5574-7651-4048-B295-66085444E8F5 () gmail com> you write:
  After the Journal's free articles
  went behind a paywall, Google's bot only saw the first few
  paragraphs and started ranking them lower, limiting the
  Journal's viewership.  Executives at the Journal, owned by
  Rupert Murdoch's News Corp., argue that Google's policy is
  unfairly punishing them for trying to attract more digital
  subscribers. They want Google to treat their articles equally
  in search rankings, despite being behind a paywall.

The WSJ must have the world's worst web programmers if they can't
figure out how to show Google the full articles even though normal
users are paywalled.  That's what all the other paywalled papers do.

Sheesh.

R's,
John

PS: If the argument were "but then people can get them from the Google
cache" their progammers would be even worse than I thought.

Archives  | Modify  Your Subscription | Unsubscribe Now       




-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/18849915-ae8fa580
Modify Your Subscription: https://www.listbox.com/member/?member_id=18849915&id_secret=18849915-aa268125
Unsubscribe Now: 
https://www.listbox.com/unsubscribe/?member_id=18849915&id_secret=18849915-32545cb4&post_id=20170614214124:BAAFB59E-516B-11E7-8CD3-905071F2199E
Powered by Listbox: http://www.listbox.com

Current thread: