Posts Tagged ‘google’

The Value of Number of Words Indexed by Google Equation

Wednesday, April 7th, 2010

Why?

I have a subscription to the danish newspaper, Weekendavisen. They have some great articles on society, sciences and politics. Articles that are still relevant many years after first being published. The other day I wanted to look-up an old article, so I went to the website, to search their archive. I signed-up using my customer number.

After signing up I went to the Avisen-section, only to find

  1. four (open and free) articles,
  2. an non-mechanical read audio-version of the entire, last newspaper (a quite cool feature),
  3. an e-paper.dk version of a PDF (These flash/PDF/e-paper viewers are useless gimicks with no real value to anyone. And if you are foced to use one, please use a propper one),
  4. and the archive – which you for an additional fee can get access to. Now, the prices is 49 DKK for seven days of access to the database, so my temptation was resistible.

You can complain about the prices being to high, and I think most will, but that is not what this post is about. This post is about Weekendavisen cheating themselves of potential profits. According to FDIM they have around 6.000 unique cookies a week. If just 1% of those pay every single week (and that is in my experience a very high estimate), it will amount to a revenue of 2.940 DKK a week. Or 12. 783 DKK a month, and a tiny fraction more from the extra advertising revenue that those 1% generate.

The Missing Equation

My hypothesis is that by making their entire archive – minus the last years articles (to avoid market cannibalism) – open, free and indexable. They can get more visits and generate a much greater monthly revenue, than they do today, solely from their advertising.

If they already have digitalized all their articles the cost of granting access to everyone will be minimal. They can still keep the pay wall for the audio-versions and for articles that are less than 12 months old.

I was only able to find prices from 2008; Two formats with a Cost per thousand (CPM) of respectively 250 DKK and 200 DKK. Provided that they were able to sell both formats and they were on every page, Weekendavisen would only need 6.533 extra page-views a week to break-even with todays 1% revenue. Today they have approximately 30.000 page-views a week according to FDIM, hence the objective is to get an 22% increase in page-views from releasing their entire article archive. This objective sounds quite reasonable. Note: (2940/(200+250))*1000 = 6.533

The only thing they would have to insure was that their articles where indexed by Google (and preferably also search engine optimized), and Google would supply the new extra visitors that would to drive up revenue.

Unfortunately I was unable to find any model or equation that could be used to calculate and support the last claim. Something on the lines of “more content, more traffic”. Or even better yet, a model that not only looked at content as a whole, but more specifically number of words, since Weekendavisen’s four open articles range between 460-2.400 words each. Anyway the general notion is that it’s easier to gain keyphrase relevancy when Google has more words to choose from.

Anyways…

I am hoping weekendavisen will try the above-mentioned calculations with they own numbers (1% was just my guess). And even test what their indexed word count vs number of non-branded visits from search engines, to get a sense of what their content is worth.

I have only discussed the direct revenues from this – customer satisfaction is priceless.

If anybody else has some numbers of interest or the missing equation please comment.

Crowds teach computers to read the scanned text

Wednesday, September 16th, 2009

From the Google Acquires reCAPTCHA article at Mashable.com:

Why exactly does Google want to own this technology?

… many of the CAPTCHAs provided by reCAPTCHA come from scanned archival newspapers and old books. Computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.”

… those 100,000+ captcha forms are now Google-powered, with the data being used to improve Google’s ability to digitize old books and newspapers to make them Web searchable. It makes a lot of sense, and gives Google yet another strategic advantage over would-be competitors.