Collocates data


NEW: Most of the information at this website deals with data from the COCA corpus, which was about 400 million words in size when this collocates data was compiled. In May 2018 we released collocates data  from the 14 billion word iWeb corpus, which is about 35 times as large as COCA.


Collocates are words that occur near a given word (the node word), and they can provide very useful insight into the meaning and usage of the words near which they occur.

This site contains the largest and most accurate lists of collocates of English -- up to 4.3 million node/collocate pairs. Feel free to take a look at the some samples (expanded: 45,000 entries).

Remember that any list of collocates is only as good as the corpus (collection of texts) that it is based on. The 4.3 million node/collocate pairs are based on the only large, genre-balanced, up-to-date corpus of English -- the 520 million word Corpus of Contemporary American English (COCA).

Sample (see more)
nodeID node nodePoS collocate collPoS freq MutInfo preNode postNode % preNode
15349 smolder v still r 76 4.39 74 2 0.97
15349 smolder v fire n 59 6.33 39 20 0.66
15349 smolder v eye n 43 4.41 24 19 0.55
15349 smolder v cigarette n 26 6.93 17 9 0.65
15349 smolder v ash n 15 7.42 5 10 0.33
15349 smolder v ember n 14 10.62 4 10 0.28
15349 smolder v resentment n 14 8.26 2 12 0.14