Collocates data


Most of the information at this website deals with data from the COCA corpus. You might also be interested in the collocates data from the  14 billion word iWeb corpus.


NEW: COCA 2020 data

Collocates are words that occur near a given word (the node word), and they can provide very useful insight into the meaning and usage of the words near which they occur.

This site contains the largest and most accurate lists of collocates of English -- about 13.5 million node/collocate pairs. This is about three times as many as the lists that were available from 2010-early 2020 (more information). Feel free to take a look at the some samples (expanded: 1.34 million entries).

Remember that any list of collocates is only as good as the corpus (collection of texts) that it is based on. The 13.5 million node/collocate pairs are based on the only large, genre-balanced, up-to-date corpus of English -- the one billion word Corpus of Contemporary American English (COCA).

Sample (see more)
nodeID

  node

nodePoS

  collocate

collPoS freq MutInfo % preNode
17491  smolderv  still r1503.95 0.93
17491  smolderv  fire n975.78 0.62
17491  smolderv  eye n623.89 0.58
17491  smolderv  ash n357.88 0.23
17491  smolderv  cigarette n296.34 0.69
17491  smolderv  ember n2810.67 0.29
17491  smolderv  burn v274.98 0.56
17491  smolderv  ruin n247.76 0.33