2010-12-14 13:36:05
Michael Blastland By Michael Blastland GO FIGURE - Seeing stats in a different
way
It's easy to think search engine queries could provide a gold mine of data, but
it's not easy to know how to exploit, says Michael Blastland in his regular
column.
How many sheep? Come on, it's not hard. There are, after all, only two white
fluffy objects in the picture. So how many sheep?
Did I mention that the ewe is pregnant? Now how many sheep? So is it one? Two?
Three? Or maybe one and a half? Or maybe one and two halves?
Maybe some readers are saying: "Well, obviously, there is one sheep and one
lamb. Duh!"
But that's not what I asked. I asked how many sheep? In which case your answer
would be one, right? Anyone care to disagree with one?
That we could go on arguing about definitions in order to count up to three -
or fewer - tells us that counting just ain't like it used to be on Sesame
Street. As soon as you give a number, you insist on a definition, a definition
that others might not share.
At which point they accuse you of being slippery with statistics. "Look, buddy,
I'm just trying to count sheep," you reply. "Yeah, and what's your agenda?"
says your critic. "You some kind of sheep-denier?"
Tiger Woods, Kim Kardashian and Sandra Bullock Why do we search for who we
search for?
The sheep problem has a parallel in the Christmas glut of internet search
highlights.
So in the last week or so, we've had the most frequently asked questions of the
year from Ask Jeeves (or Ask if you're outside the UK), that apparently
included "Is the X Factor fixed?" and "Why are England so bad at football?".
They are at numbers one and two.
AOL had the general election as the most searched-for news item, and reported
that music fans searched for Lady Gaga more than any other artist. Bing US
listed the most popular overall 2010 searches as Kim Kardashian first, Sandra
Bullock second and Tiger Woods third.
On Google, it's possible to check up at any time on the frequency of any search
request you care to think of.
But to come back to point about the sheep, what exactly are we counting when we
count internet searches?
Woman with mask Google thinks certain search terms can be used to map flu
spread
With searches for celebrities, are we measuring popularity, or just attention?
Some people might appear high up a list because they are hated. In other words,
how many of those interested in Lady Gaga think she's a sheep and how many a
goat? Music sales tell us her music is popular, but do internet searches?
So public opinion is not the same as popularity. One is what people think, the
other is loosely what people think about. Though we might wonder how much
attention is spontaneous. How much searching is prompted by what's talked about
on media web sites, TV and so on?
Then again, does it matter? Maybe. There are more heavyweight examples. Google
thinks it can use the volume of internet searches to tell us how much flu there
is before the medical authorities know themselves.
The study of patterns of disease is called epidemiology. Lately, the phrase
"info-demiology" has appeared, along with "info-veillance" - using internet
search volume to track or even forecast public health problems or economic
trends like claims for unemployment benefit.
Politicians might be tempted to think it can tell them what they once used to
assess from news coverage, what's known as "The Most Important Problem" to
voters.
Some people think you can use internet searches to measure investor interest
and so pick the shares that will rise.
But the counting problem is still there. For example, are we counting
popularity - or attention, or interest or whatever we call it - only among
internet users, who might be younger or richer than the population as a whole?
In other words, is there bias? And if so, how much? Enough to matter?
Ballot box Politicians want to get inside voters minds through search result
analysis
Are people watching a share because they think it will rise or fall? You can
make money both ways.
It's also quite hard to know that you've got all the potential search terms
covered. If you want to measure public interest in the recession, for example,
how many different terms might people use?
There's something epic in the potential of the internet to measure the interest
of billions of people almost, by conventional standards, instantly. I'm not
knocking any of it, and some of the research links here are fascinating.
The rise of statistics corresponds almost exactly with the rise of big urban
populations and large-scale government. Understanding what's going on in a big,
bustling society is still a huge problem. So what if you could put the whole
lot online? Shades of big brother? Or data heaven?
But the bottom-line problem with the volume of internet searches is that - on
its own - there's no qualitative data here. They count something, but we're
often unsure what - or why.
Comments
I manage a website focused on renewable energy, for certain single word generic
terms Google says I get 40% of visitors. I can tell an awful lot about what
people are interested in and how many. But, all of this interest comes through
Google, who know what the world is looking at and for and how many. This
private company knows more about trends and interests than any government, a
bit worrying?
I'm a daydreamer. I download files on Rolls Royces, fancy houses costing
millions of pounds, as it is an interest of mine - I don't have any money to
buy them, it's just a hobby, but what does it tell the estate agents and
dealerships? That they have huge amounts of interest in their products?
Probably. But sorry to disappoint. It's what they actually sell that counts.