Collocates data

There are relatively few collocates dictionaries or lists, other than what we have here. Some sites advertise collocates dictionaries, but they are far too small to include many collocates for many words (e.g. try smolder, adamantly, commemorative, boundless, or ebb -- none of which is found there, but all of which are found (with many collocates) in our data -- with up to 300 collocates per word).

With the Oxford Collocations Dictionary, you would have to use their interface to copy and paste the collocates -- word by word by word, for tens of thousands of words. The other problem is that their data is also quite sparse at times. For example, we took seven words completely at random from our sample list (words ~11,000-12,000), and the following chart shows the number of collocates in our lists and in their dictionary.

# in file word PoS Our lists Oxford
10201 chopper n 300 0
10401 intrude v 300 0
10501 slowdown n 300 0
10601 audible j 300 21
10901 chute n 300 5
11001 unreliable j 300 3
12101 baffle v 300 1

One other resource for collocates -- which we like very much (overall) -- is Sketch Engine. However, there are important differences between our lists and what you can retrieve from Sketch Engine:

  1. Depending on use, Sketch Engine can be very expensive -- several thousand dollars a year in some cases. Our lists are considerably less expensive.

  2. Most importantly, with our lists you can download all 13.5 million collocate pairs at once. With Sketch Engine, you must do it word ... by word ... by word -- for tens of thousands of different words. This could take weeks or even months. With our data, you'll have the entire list in a few minutes.