Word frequency lists and dictionary
from the Corpus of Contemporary American English

home uses compare samples free lists n-grams non-english purchase


This site contains what we believe is the most accurate frequency data of English. It contains word frequency lists of the top 60,000 words (lemmas) in English, collocates lists (looking at nearby words to see word meaning and use), and n-grams (the frequency of all two and three-word sequences in the corpora).

Any frequency list is only as good as the corpus (collection of texts) that it is based on. Our data is based on the only large, genre-balanced, up-to-date corpus of American English -- the 425 million word Corpus of Contemporary American English. You can be sure that the data that you find here represents what you would encounter in the real world.

If you are a language learner, you can use the frequency lists to maximize your study of vocabulary in a way that is not possible with any other resource.  If you are a (computational) linguist, you will have access to highly accurate, robust and useful data for research and for Natural Language Processing. (More information on how to use this data.)

The English frequency data comes in a number of different formats, shown below. For Spanish and Portuguese, click here.

  • Simple word lists of the top 5,000-60,000 words (you choose the size of the word list).

  • Word lists + genre frequency. See the frequency in spoken, fiction, popular magazine, newspapers, and academic, as well as more than 40 sub-genres like NEWS-Financial or ACAD-Medicine (and then use this data to create your own customized lists).

  • Collocates (nearby words; maximum of 200-300 per word) for each of the 60,000 words, giving nearly 4,800,000 node word / collocate pairs. Collocates provide great insight into the meaning and use of words, by seeing what words occur with each other.

  • N-grams: more than 155 million 3-grams, which can be installed and queried on your own machine, to search for patterns by word form, lemma, part of speech, etc.

  • An eBook containing up to the 20,000 most frequent words, along with the 20-30 most frequent collocates and the synonyms for each word

  • A printed book (from Routledge) with the top 5,000 words (including collocates) and thematic lists.

  • Free word lists -- top 5,000 lemmas, or top 500,000 word forms + part of speech and # texts (but not lemmatized)


Contact information