Mainstream Media’s Miscomprehension of Search Results as Evidence

I decided to use a Sunday morning of new sinus congestion and early autumn chill to curl up and read some news. My formula’s always the same; start with a heathy dose of Reddit, where the worst of the world’s day gets sugared by wit, wisdom, and a sense of shared response. After reading a sweet story about a unique cab ride, more Tina Fey inflicted Sarah Palin Mockery, and a Business Week piece on Saudi oil perspectives, I followed a link to NewsWeek, for a story on how to keep men from cheating: “Google the words “marriage and affair” and you get more than 17 million variations on how to heal. That’s because “fidelity in marriage”—which only gets about 3½ million hits—is a hard thing to come by these days.” Reading those opening lines, I cringed, took a sip of sweet, creamy chai, and fired up WordPress.

I’m a web strategist, so of course every page I open forces a quick evaluation of internet marketing and content best practices. Nice – supremely Google News friendly URL. No keywords, unfortunately, but short and with a clear article 6-digit code. A quick scroll reveals a Top Ten Links box to internal articles. Double nice. Unfortunately, the box is in flash, killing any tab opening hopes I once held. Oh well. At this point, I can deal with small-to-increasingly-medium usability issues like this. What I can’t deal with is lazy misunderstanding and thus misinformation from those who butcher interpretation of total search results.

There is information to be gleaned from the figure at the top right of a search results page, sure, but it’s both less and more than the average person thinks. To understand what it’s not, we should remind ourselves of what search engines do when we search.
Google’s smart, but more in the sense of Deep Blue than of Gary Kasparov – as you would expect. It does not “understand” your query, trying to decide which pages of content would best answer the questions your search query implicitly asks. It plays a massively calculating word match game based on the words you chose and the order in which you put them, paying complete attention to uses of quotes, which of course indicate and exact phrase match search. I try to make this difference clear when giving tips for searching and using search engines effectively.

Keeping what search engines do in mind, you can have a better idea of what the search results total indicates. It shows the number of pages that, in a nutshell, are indexed as using or getting links from that word. For example, a search for a particular word could very well not list a site that only uses its exact synonym, creating a semantic incompleteness. Yes, the page might eventually rank if it receives backlinks flavoured with the missing keyword in or around the anchor text of the linking page, but for the most part, if the word’s not on the site, search engines aren’t necessarily sure the page is about that word, too. This semantic gap highlights one way search engines will be getting smarter in the near future. For now, they’re not quite there. Showing the number of pages that include a particular word does not indicate anything conclusively about the appeal or interest in that word, the most basic example being blatant negation, whereby searches for a particular thing will often bring about results against the keyword in question. A search for PETA brings about not just the organization’s main page, but as well.

But writers in search of an introduction feel a temptation to use the total search result to whatever end they please. I’ll never forget when Unlocked Sports, now known for sports picks, managed to sneak by a “Top 6 Most Popular Athletes According to Google” blog post, based entirely on Google results totals. Athletes who popped up for relationships with celebrities, crimes committed, and other unrelated results would have been caught under this interpretive umbrella. In the current case with Newsweek, the numbers are equally meaningless.

I’m going to assume the author’s use of quotes was due to a lack of better formatting standards, since including the quotation marks in the search actually reverses the point. Searching in the way the author suggests, quotes included, I only get 149 results. Searching for “fidelity and marriage” (quotes included) yields over 31,000 results – reversing the number comparison. With quotations serving such an important search query role, using them in this way is irresponsible, and probably the worst possible choice of formatting. Try italics next time!

Back to my main beef. People searching marriage and affair will be taken to results that prominently showcase and have backlinks relating to those words. So, pages on “How to Balance Marriage and Affair”, all else being equal, should have a fairly equal shot at showing up as “A Terrible Disaster: Marriage and Affair”. While both are potentially relevant to a searcher, only a detailed analysis of each of these results would qualify the type of interest involved. Even then, this would only indicate web publisher interest. While public demand does shape published media, and while the quantity of published media should correlate with public interest (depending on the topic), observing the quantity and type of searches performed serves as a far better indicator of public interest than that which is published.

This brings up another failing of the article’s search comparison case, which you may have guessed by now: it doesn’t take into account other formulations of the phrase. Why search only “marriage and affair” and not “marriage and cheating”? Failure to do so ignores both searcher and publisher variations, and relies wholly on search engine semantic association ability.

Solving much of these issues is a matter of using the right tools. Google Trends is an easy way for the layperson to get a sense of what is actually searched, which is an acceptable indicator for a public interest metric. Publishers should use that, at a minimum. An even more professional job would include use of the Google AdWords Keyword Tool, which addresses many of the concerns brought up so far by not only giving search volumes on keywords on a much more detailed scale than Trends, but also suggesting alternative and related keyword phrases that support the initial request through synonym and relation. In the current case, putting marriage and affair into the keyword tool brings up searched phrases like marriage infidelity and cheating boyfriend. This tool takes advantage of Google’s semantic abilities thus far, and you can gain even more ground by learning how to use the tool fully with exact and phrase search totals. You can search how many people use the quotation marks exact phrase search described above, taking advantage of Google’s recognizing of word order on the flip side of search.

As far as what search results totals do tell you; well, here’s what I tend to take from them:

– If searching for a particular expression between quotation marks, I can get a sense of the extent to which the expression is used in popular culture. For example, I might wonder “do people ever use the expression ‘what sorcery is this?’” A search in quotes gives 1,440 results, so yes, to a limited degree. While web results wouldn’t suffice as a perfect indicator, they give some idea. I might supplement this with a query in the keyword tool to see if people search it, too.
– I sometimes search for a phrase in quotes and see the results as further information to see if I should buy a domain name composed entirely of that phrase. As long as the query is specific enough, it might lend itself to some branding potential, even if rarely searched.
– If I’m debating between which of two synonyms is more commonly used in popular culture, I might search both words individually and compare their totals. One definitively more common than another would be enough for me. Trends would be more accurate, but a Google search tends to be quicker.
– The above principle is used when checking spelling in Google. Barring a total academic failing of the majority of the Internet population, the version with the most results is probably the correct spelling. And even then, widespread use of particular spelling and meaning variations sometimes ends up changing the norm, in strange antonymous ways.

All to say, to anyone who knows how search engines work, search result total science tends to be remarkably pseudo. Publications whose authors and editors submit these errors would be wise to shape up before the majority of the populace catches on. Large results totals on searches not in quotations do not serve as an indicator of anything quantitatively relevant about human beings.

Oh – and search results are not a good indicator of term competitiveness, either, for you search engine optimizers. They might be, sort of, if meta-titles were the main factor in page indexation. While titles affect where you rank, they don’t as much serve as the principal factor for if you rank at all. Text on the page can do the trick just fine. Total results will correlate to some degree, but aren’t nearly necessary nor sufficient evidence.

All this ranting, and my chai’s gone cold.

Leave a Reply

Your email address will not be published. Required fields are marked *