I recently found myself in need of a English word list ordered by frequency, but could not find a free (in both freedom-of-use and free-of-charge senses) one that satisfied me. So, I have compiled one using word counts in the Google Ngrams database, doing just a little processing to extract counts since 2005 (to avoid archaic words) and to strip out parts of speech identifiers from the word stems.
It seems adequate for my purposes, but have not done any extensive checking on it. It should be adequate for common use ("Hello, how is your dog?"), but also for more formal writing. For example, it contains the words "phylogenetic", "immunoblotting" and "histochemical" -- all fairly specialized molecular biology terms.
Be aware that there is no filtration on the terms included (i.e. if you want to strip out, e.g. profanity, you will need to do some further processing). The file contains a header; these comments can be filtered out by excluding lines beginning with "#". All entries are in lower case.
If this would be useful to you, the word list of the top 100,000 most common terms can be downloaded at: http://www.biophysengr.net/files/blog/wordlist/top100k_words_ngrams_djp.txt
The Ngrams data seems to be under Creative Commons Attribution 3.0 Unported license (CC-BY-3.0), so I will follow suit for my processing of this list as well.
Happy word frequency-ing
This comment has been removed by the author.
ReplyDeleteThe dissertation writing and editing process requires several rounds of writing, rewriting, and editing. phd dissertation writing service Nonetheless, this is the ideal way of allowing grammar and terminology errors find their way to your final draft.
ReplyDeleteI found your this post while searching for some related information on blog search...Its a good post..keep posting and update the information. online grammar check
ReplyDeleteThanks for sharing this helpful blog post. I love your website. keep moving.
ReplyDeleteBy using ThopTV PC watch more than 3000+ radio channels at anywhere.