Collocations
Collocations
The Collocates page helps you find words or tags that frequently appear near a chosen “node word” in your corpus. This is a powerful way to explore patterns of association and meaning.
What is a Collocation?
Important
A collocation is a pair or group of words (or tags) that tend to appear together more often than would be expected by chance. For example, “strong tea” and “make a decision” are common collocations in English.
What You Can Do
- Search for words or tags that frequently occur near a chosen node word
- Set the span (window) to the left and right of the node word
- Filter results by tag
- Choose the association statistic (NPMI, PMI, PMI2, PMI3)
- Download your results as an Excel file
Step-by-Step Guide
1. Enter a Node Word
- In the sidebar, enter the word you want to explore (the “node word”).
- You can optionally anchor your search to a specific tag (POS or DocuScope) for the node word.
2. Set the Span
- Choose how many words to the left and right of the node word to include in your search window.
3. Choose an Association Statistic
- PMI (Pointwise Mutual Information): Measures how much more often two words appear together than would be expected by chance. It is very sensitive to rare co-occurrences, which can sometimes highlight spurious associations.
- PMI2 / PMI3: Variations of PMI that are less sensitive to rare co-occurrences. These provide more stable results by reducing the influence of very infrequent pairs.
- NPMI (Normalized PMI): Scales PMI between -1 and 1, making it easier to compare across different word pairs and reducing the impact of rare events.
Tip
Tip:
If you want to avoid results dominated by rare word pairs, try PMI2, PMI3, or NPMI instead of standard PMI.
4. (Optional) Anchor by Tag
- You can restrict your search to node words with a specific POS or DocuScope tag.
- For POS, you can choose between general or specific tags.
5. Generate and Filter Results
- Click the Collocations button to generate your table.
- Filter the results by tag using the multiselect box above the table.
6. Download Your Results
- Toggle Download to Excel? in the sidebar to enable download.
- Click Download to Excel to save your results.
7. Create a New Collocations Table
- Use the Create New Collocations Table button in the sidebar to reset and start a new search.
Understanding the Table
- Each row shows a collocate (word or tag), its frequency, and the chosen association statistic.
- Higher values indicate a stronger association with the node word.
Tips for New Users
Tip
- Try different statistics to see which gives the most meaningful results for your data.
- If you get unexpected results, try narrowing the span or anchoring by tag.
- Download your results often so you can experiment without losing your work.
If You Get Stuck
Important
- Make sure you have loaded and processed a target corpus.
- If you see warnings, check your node word and tag selections.
- Use the reset button on the Manage Corpus Data page if you need to start over.