spacy_parse

parse_utils.spacy_parse(corp, nlp_model, n_process=1, batch_size=25)

Parse a corpus using the ‘en_core_web_sm’ model.

Parameters

Name Type Description Default
corp pl.DataFrame A polars DataFrame conataining a ‘doc_id’ column and a ‘text’ column. required
nlp_model Language An ‘en_core_web_sm’ instance. required
n_process The number of parallel processes to use during parsing. 1
batch_size The batch size to use during parsing. 25

Returns

Name Type Description
pl.DataFrame A polars DataFrame with, token sequencies identified by part-of-speech tags and dependency parses.