spacy_parse

parse_utils.spacy_parse(
    corp,
    nlp_model,
    n_process=1,
    batch_size=25,
    disable_ner=True,
)

Parse a corpus (legacy public API).

This function is maintained for backward compatibility and now delegates to the new class-based pipeline (CorpusProcessor.process_corpus).

Parameters

Name Type Description Default
corp pl.DataFrame DataFrame with ‘doc_id’ and ‘text’. required
nlp_model Language spaCy model instance (e.g., ‘en_core_web_sm’). required
n_process int Number of processes passed to spaCy’s pipe. 1
batch_size int Batch size for spaCy pipe. 25

Returns

Name Type Description
pl.DataFrame Token-level dependency parse output identical in schema to the legacy implementation.