spacy_parse
parse_utils.spacy_parse(corp, nlp_model, n_process=1, batch_size=25)
Parse a corpus using the ‘en_core_web_sm’ model.
Parameters
corp |
pl.DataFrame |
A polars DataFrame conataining a ‘doc_id’ column and a ‘text’ column. |
required |
nlp_model |
Language |
An ‘en_core_web_sm’ instance. |
required |
n_process |
|
The number of parallel processes to use during parsing. |
1 |
batch_size |
|
The batch size to use during parsing. |
25 |
Returns
|
pl.DataFrame |
A polars DataFrame with, token sequencies identified by part-of-speech tags and dependency parses. |