get_noun_phrases

parse_utils.get_noun_phrases(corp, nlp_model, n_process=1, batch_size=25)

Extract expanded noun phrases using the ‘en_core_web_sm’ model.

Parameters

Name Type Description Default
corp pl.DataFrame A polars DataFrame conataining a ‘doc_id’ column and a ‘text’ column. required
nlp_model Language An ‘en_core_web_sm’ instance. required
n_process The number of parallel processes to use during parsing. 1
batch_size The batch size to use during parsing. 25

Returns

Name Type Description
pl.DataFrame a polars DataFrame with, noun phrases and their assocated part-of-speech tags.

Notes

Noun phrases can be extracted directly from the noun_chunks attribute. However, per spaCy’s documentation the attribute does not permit nested noun phrases, for example when a prepositional phrases modifies a preceding noun phrase. This function extracts elatorated noun phrases in their complete form.