N-grams & Clusters
N-grams & Clusters
The N-gram and Cluster Frequency page lets you explore common word/tag sequences (n-grams) and search for clusters containing specific words or tags in your corpus.
What You Can Do
- Generate frequency tables for n-grams (sequences of 2, 3, or 4 tags)
- Search for clusters containing a specific word or tag
- Filter n-grams by tag at each position
- Download your results as an Excel file
Step 1: Choose Table Type
When you first visit the page, you’ll be asked:
- N-grams: Find the most frequent sequences of tags (e.g., POS or DocuScope tags).
- Clusters: Find all n-grams that contain a specific word or tag at a chosen position.
Important
What is an “n-gram”?
An n-gram is a sequence of n items (words or tags) that appear together in your text. For example, a 3-gram (trigram) could be “in the house” or “NOUN VERB NOUN”.
Step 2: Set Your Options
For N-grams
- Span: Choose the length (2, 3, or 4).
- Tagset: Choose between Parts-of-Speech or DocuScope tags.
- Click N-grams Table to generate the table.
For Clusters
- Search mode: Choose to search by a specific token (word) or tag.
- Node word/tag: Enter the word or select the tag you want to anchor your search.
- Search type (for tokens): Choose Fixed, Starts with, Ends with, or Contains.
- Tagset: Choose Parts-of-Speech or DocuScope.
- Span & position: Choose the n-gram length and the position of your anchor.
- Click Clusters Table to generate the table.
Step 3: Filter and Explore
- Once your table is generated, use the filters at the top to focus on specific tags at each position in the n-gram.
- The table updates automatically as you filter.
Tip
Tip:
Filtering by tag position helps you find patterns, like all trigrams starting with a verb or ending with a specific rhetorical tag.
Step 4: Download Your Table
- Toggle Download to Excel? in the sidebar to enable download.
- Click Download to Excel to save your results for further analysis.
Generating a New Table
- Use the Create a New Ngrams Table button in the sidebar to reset and start over with new settings.
Tips for New Users
Tip
- Try generating both n-gram and cluster tables to see which is most useful for your research.
- If you don’t see any data, make sure you have loaded and processed a target corpus.
- Download your results often so you can experiment without losing your work.
If You Get Stuck
Important
- Use the reset button on the Manage Corpus Data page if you need to start over.
- If you see warnings, check that your corpus is loaded and processed, and that your search settings are valid.