spacy_parse
parse_utils.spacy_parse(
corp,
nlp_model,
n_process=1,
batch_size=25,
disable_ner=True,
)Parse a corpus (legacy public API).
This function is maintained for backward compatibility and now delegates to the new class-based pipeline (CorpusProcessor.process_corpus).
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| corp | pl.DataFrame | DataFrame with ‘doc_id’ and ‘text’. | required |
| nlp_model | Language | spaCy model instance (e.g., ‘en_core_web_sm’). | required |
| n_process | int | Number of processes passed to spaCy’s pipe. | 1 |
| batch_size | int | Batch size for spaCy pipe. | 25 |
Returns
| Name | Type | Description |
|---|---|---|
| pl.DataFrame | Token-level dependency parse output identical in schema to the legacy implementation. |