Model Comparison
DocuScope spaCy Models: Full vs. Common Dictionary
Important
DocuScope CA runs on a model.
When “dictionaries” are referenced, they refer to the model training data.
The “Full Dictionary” model uses the complete set of DocuScope tags, while the “Common Dictionary” model uses a reduced, more general set.
Model Overview
Model Name | HuggingFace Link | Tagset | Description |
---|---|---|---|
Full Dictionary | en_docusco_spacy | Full DocuScope tagset | Most detailed, includes all original DocuScope categories. |
Common Dictionary | en_docusco_spacy_cd | Common DocuScope tagset | Simplified, focuses on the most frequent and general categories. |
Both models are trained on the CLAWS7 part-of-speech tagset.
Why Choose One Over the Other?
Choose the Full Dictionary model if:
- You need maximum detail and want to capture subtle rhetorical or genre distinctions.
- Your research or teaching requires the full range of DocuScope categories.
- You are analyzing academic, professional, or highly structured texts.
Choose the Common Dictionary model if:
- You want a simpler, more interpretable output.
- You are working with general texts or need results that are easier to explain to non-specialists.
- You are doing exploratory analysis or classroom demonstrations.
Further Reading
- DocuScope spaCy Full Model on HuggingFace
- DocuScope spaCy Common Dictionary Model
- DocuScope Model Training Data
- Kaufer and Ishizaki: The Origins of DocuScope