Collocates data

from the 14 billion word iWeb corpus

intro samples related get data

Collocates (nearby words) can be used to examine the meaning and usage of a given word. You can purchase lists of collocates (up to 1,000 collocates for each word) for the top 60,000 words (lemmas) in the 14 billion word iWeb corpus (a total of about 33 million node/collocates pairs). The entries also show the Mutual Information score (how closely the two words are related to each other), and where the collocates are typically found with regard to the node word.

The following chart shows the information that is given for each word (lemma), and an explanation of the columns is found in the downloadable sample (which contains every tenth word in the 60,000 word list).

wordID    nodeWord nodPoS    collocates collPoS freq pre-node MutInfo
2251    organic j    food n 29792 5000 3.95
2251    organic j    matter n 21873 407 5.01
2251    organic j    product n 18092 5037 2.56
2251    organic j    natural j 17945 12084 4.29
2251    organic j    material n 16771 1407 3.67
2251    organic j    compound n 15622 600 6.93