Bracing against the wind  

Wednesday, March 18, 2009

Using Google to Analyze English Word Usage

On Wikipedia, I came across the sentence

'Hash' is the most common name for the mark (#) used in the English-speaking world outside North America. [citation needed]

I thought... sheesh... you really need a citation for that? I mean... it seems like common sense for anyone who's been to both places. I wouldn't be surprised to see "Wearing a sweater will keep you warm [citation needed]".

So I decided to use Google to analyze the use of the words "hash" and "number sign". Of course the problem is there's no way to know if people are talking about the "#" on the page.

Fortunately, the word "octothorp" means the same thing in all english speaking countries and can be used to constrain the search.

Among people using the word "octothorp" on a web page, authors in the UK top level domain were more likely to also use the word "hash" (65%), whereas authors
in the US top level domain were about equally likely (48%) (full analysis below). Australian users were also more likely to use the word hash (60%).

So, hypotheses proved? According to Wikipedia, this is original research and is not not acceptable unless I am shown to be an "Expert in the field". What field is that exactly? Word frequency analysis? Wasting time? I've got credentials in both.

"hash" octothorpe = 389
"hash" octothorp = 155
total:544 63%

"number sign" octothorpe = 119
"number sign" octothorp = 194
total:313 37%

"hash" octothorpe = 154
"hash" octothorp = 44
total:198 48%

"number sign" octothorpe = 108
"number sign" octothorp = 116
total:214 52%

"hash" octothorpe = 87
"hash" octothorp = 26
total:113 60%

"number sign" octothorpe = 33
"number sign" octothorp = 41
total:74 40%

[View/Post Comments] [Digg] [] [Stumble]

Home | Email me when this weblog updates: | View Archive

(C) 2002 Erik Aronesty/DocumentRoot.Com. Right to copy, without attribution, is given freely to anyone for any reason.

Listed on BlogShares | Bloghop: the best pretty good | Blogarama | Technorati | Blogwise