get_noun_phrases
=1, batch_size=25) parse_utils.get_noun_phrases(corp, nlp_model, n_process
Extract expanded noun phrases using the ‘en_core_web_sm’ model.
Parameters
Name | Type | Description | Default |
---|---|---|---|
corp | pl.DataFrame | A polars DataFrame conataining a ‘doc_id’ column and a ‘text’ column. | required |
nlp_model | Language | An ‘en_core_web_sm’ instance. | required |
n_process | The number of parallel processes to use during parsing. | 1 |
|
batch_size | The batch size to use during parsing. | 25 |
Returns
Name | Type | Description |
---|---|---|
pl.DataFrame | a polars DataFrame with, noun phrases and their assocated part-of-speech tags. |
Notes
Noun phrases can be extracted directly from the noun_chunks attribute. However, per spaCy’s documentation the attribute does not permit nested noun phrases, for example when a prepositional phrases modifies a preceding noun phrase. This function extracts elatorated noun phrases in their complete form.