|
This site contains what we believe is the
most accurate
frequency data of English. It contains word frequency lists
of the top 60,000 words (lemmas) in English,
collocates lists (looking at nearby words to see word meaning
and use), and n-grams (the frequency of all two and
three-word sequences in the corpora).
Any frequency list is
only as good as the corpus (collection of texts) that it is based
on. Our data is based on the only large, genre-balanced,
up-to-date corpus of American English -- the 425 million word
Corpus of
Contemporary American English. You can be sure that the data
that you find here represents what you would encounter in the real world.
If you are
a language learner, you can use
the frequency lists to maximize your study of vocabulary in a
way that is not possible with any other resource. If you are a
(computational) linguist, you will have access to highly accurate,
robust and useful data for research and for Natural Language
Processing. (More information
on how to use this data.)
The English frequency data comes in a number of
different formats, shown
below. For Spanish and Portuguese,
click here.
-
Simple word lists
of the top 5,000-60,000 words (you choose the size of the word
list).
-
Word lists + genre frequency.
See the frequency in spoken, fiction, popular magazine,
newspapers, and academic, as well as more than 40 sub-genres
like NEWS-Financial or ACAD-Medicine (and then use this data to
create your own customized lists).
-
Collocates
(nearby words; maximum of 200-300 per word) for each of the 60,000 words,
giving nearly 4,800,000 node word / collocate pairs.
Collocates provide great insight into the meaning and use of
words, by seeing what words occur with each other.
-
N-grams: more than 155
million 3-grams, which can be installed and queried on your own
machine, to search for patterns by word form, lemma,
part of speech, etc.
-
An
eBook
containing up to the 20,000 most frequent words, along with the
20-30 most frequent collocates and the synonyms
for each word
-
A
printed book
(from
Routledge)
with the top 5,000 words (including collocates) and thematic
lists.
-
Free word lists
-- top 5,000 lemmas, or top 500,000 word forms + part of speech
and # texts (but not lemmatized)
Contact information
|