get_noun_phrases
parse_utils.get_noun_phrases(
corp,
nlp_model,
n_process=1,
batch_size=25,
disable_ner=True,
)Extract expanded noun phrases using the ‘en_core_web_sm’ model.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| corp | pl.DataFrame | A polars DataFrame conataining a ‘doc_id’ column and a ‘text’ column. | required |
| nlp_model | Language | An ‘en_core_web_sm’ instance. | required |
| n_process | The number of parallel processes to use during parsing. | 1 |
|
| batch_size | The batch size to use during parsing. | 25 |
Returns
| Name | Type | Description |
|---|---|---|
| pl.DataFrame | a polars DataFrame with, noun phrases and their assocated part-of-speech tags. |
Notes
Noun phrases can be extracted directly from the noun_chunks attribute. However, per spaCy’s documentation the attribute does not permit nested noun phrases, for example when a prepositional phrases modifies a preceding noun phrase. This function extracts elatorated noun phrases in their complete form.