Collocations

Under construction.

The Collocates page helps you find words or tags that frequently appear near a chosen “node word” in your corpus. This is a powerful way to explore patterns of association and meaning.


What is a Collocation?

Important

A collocation is a pair or group of words (or tags) that tend to appear together more often than would be expected by chance. For example, “strong tea” and “make a decision” are common collocations in English.


What You Can Do

  • Search for words or tags that frequently occur near a chosen node word
  • Set the span (window) to the left and right of the node word
  • Filter results by tag
  • Choose the association statistic (NPMI, PMI, PMI2, PMI3)
  • Download your results as an Excel file

Step-by-Step Guide

1. Enter a Node Word

  • In the sidebar, enter the word you want to explore (the “node word”).
  • You can optionally anchor your search to a specific tag (POS or DocuScope) for the node word.

2. Set the Span

  • Choose how many words to the left and right of the node word to include in your search window.

3. Choose an Association Statistic

  • PMI (Pointwise Mutual Information): Measures how much more often two words appear together than would be expected by chance. It is very sensitive to rare co-occurrences, which can sometimes highlight spurious associations.
  • PMI2 / PMI3: Variations of PMI that are less sensitive to rare co-occurrences. These provide more stable results by reducing the influence of very infrequent pairs.
  • NPMI (Normalized PMI): Scales PMI between -1 and 1, making it easier to compare across different word pairs and reducing the impact of rare events.
Tip

Tip:
If you want to avoid results dominated by rare word pairs, try PMI2, PMI3, or NPMI instead of standard PMI.

4. (Optional) Anchor by Tag

  • You can restrict your search to node words with a specific POS or DocuScope tag.
  • For POS, you can choose between general or specific tags.

5. Generate and Filter Results

  • Click the Collocations button to generate your table.
  • Filter the results by tag using the multiselect box above the table.

6. Download Your Results

  • Toggle Download to Excel? in the sidebar to enable download.
  • Click Download to Excel to save your results.

7. Create a New Collocations Table

  • Use the Create New Collocations Table button in the sidebar to reset and start a new search.

Understanding the Table

  • Each row shows a collocate (word or tag), its frequency, and the chosen association statistic.
  • Higher values indicate a stronger association with the node word.

Tips for New Users

Tip
  • Try different statistics to see which gives the most meaningful results for your data.
  • If you get unexpected results, try narrowing the span or anchoring by tag.
  • Download your results often so you can experiment without losing your work.

If You Get Stuck

Important
  • Make sure you have loaded and processed a target corpus.
  • If you see warnings, check your node word and tag selections.
  • Use the reset button on the Manage Corpus Data page if you need to start over.