spacy_parse
parse_utils.spacy_parse(
corp,
nlp_model,=1,
n_process=25,
batch_size=True,
disable_ner )
Parse a corpus (legacy public API).
This function is maintained for backward compatibility and now delegates to the new class-based pipeline (CorpusProcessor.process_corpus
).
Parameters
Name | Type | Description | Default |
---|---|---|---|
corp | pl.DataFrame | DataFrame with ‘doc_id’ and ‘text’. | required |
nlp_model | Language | spaCy model instance (e.g., ‘en_core_web_sm’). | required |
n_process | int | Number of processes passed to spaCy’s pipe. | 1 |
batch_size | int | Batch size for spaCy pipe. | 25 |
Returns
Name | Type | Description |
---|---|---|
pl.DataFrame | Token-level dependency parse output identical in schema to the legacy implementation. |