Model Comparison

DocuScope spaCy Models: Full vs. Common Dictionary

Important

DocuScope CA runs on a model.
When “dictionaries” are referenced, they refer to the model training data.
The “Full Dictionary” model uses the complete set of DocuScope tags, while the “Common Dictionary” model uses a reduced, more general set.

Model Overview

Model Name HuggingFace Link Tagset Description
Full Dictionary en_docusco_spacy Full DocuScope tagset Most detailed, includes all original DocuScope categories.
Common Dictionary en_docusco_spacy_cd Common DocuScope tagset Simplified, focuses on the most frequent and general categories.

Both models are trained on the CLAWS7 part-of-speech tagset.


Tagset Differences

Full Dictionary

  • Scope: Contains the entire set of DocuScope tags, as used in the original dictionary-lookup technology.
  • Categories: Includes fine-grained distinctions (e.g., Citation Authorized vs. Citation Hedged, Confidence High vs. Confidence Low), specialized academic moves, and nuanced rhetorical features.
  • Use Case: Best for detailed rhetorical, genre, or discourse analysis where subtle distinctions matter.

Common Dictionary

  • Scope: Uses a reduced set of the most common and broadly applicable DocuScope tags.
  • Categories: Focuses on general rhetorical and linguistic features, omitting some of the more specialized or rare categories.
  • Use Case: Ideal for general text analysis, classroom use, or when interpretability and simplicity are priorities.

Why Choose One Over the Other?

Choose the Full Dictionary model if:

  • You need maximum detail and want to capture subtle rhetorical or genre distinctions.
  • Your research or teaching requires the full range of DocuScope categories.
  • You are analyzing academic, professional, or highly structured texts.

Choose the Common Dictionary model if:

  • You want a simpler, more interpretable output.
  • You are working with general texts or need results that are easier to explain to non-specialists.
  • You are doing exploratory analysis or classroom demonstrations.

Example: Tagset Comparison

Tag (Full) Present in Common? Description (abbreviated)
Academic Terms ✔️ Specialized terms
Citation Authorized (Merged into Citation)
Confidence High/Low (Generalized)
Metadiscourse Cohesive ✔️ Cohesive markers
Strategic ✔️ Strategy/goal language
Updates (Omitted)

The Common Dictionary omits or merges some specialized categories for clarity and ease of use.


Further Reading