Collocates data

The following are the columns in the collocates files (see sample with 1.34 million node/collocate pairs; 1/10 of the total).

ID	The rank ID of the lemma, 1-60,000.
There are no entries for most of the 30-40 most frequent words (e.g. "the", "of", "with"), but all other words in the top 60,000 are included in the collocates data.This makes sense, at list from a simplistic psycholinguistic point of view. When most people hear bread, they think {loaf, slice, crumb, butter, eat}, etc. But it's not clear at all what they might think of when they hear the or of or with.
lemma	The "node word". Note that this is lemmatized, so that decide = {decide, decides, decided, deciding, etc)
lemPoS	The part of speech of the node word
coll	The collocate (lemmatized).
collPoS	The part of speech of the collocate. (This is the first letter of the codes shown here.)
MI	The Mutual Information score (see https://www.english-corpora.org/mutualInformation.asp).
There are minimum values in terms of collocate frequency and Mutual Information (MI) score for inclusion in the list: ID 1-200: MI > 1.6 // ID 201-1000: MI > 2.0 // ID > 1000: MI > 2.4
freq	The frequency of the node / collocate pair.
[% coll < node]	The percentage of the tokens for this pair (node word and collocate) where the collocate precedes the node word. This can be useful to distinguish, for example, subjects (which typically precede verbs) and objects (which follow the verb).