PybiberPipeline
pipeline.PybiberPipeline(
nlp=None,
model='en_core_web_sm',
disable_ner=True,
n_process=CONFIG.DEFAULT_N_PROCESS,
batch_size=CONFIG.DEFAULT_BATCH_SIZE,
show_progress=None,
)End-to-end convenience wrapper for common pybiber workflows.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| nlp | Optional[Language] | Pre-loaded spaCy model. If None, the model named by model will be loaded lazily on first use. |
None |
| model | str | Name of the spaCy model to load when nlp is None. Defaults to “en_core_web_sm”. |
'en_core_web_sm' |
| disable_ner | bool | When True, disable spaCy’s NER component for speed and stability. Parser is still enabled and required. Defaults to True. | True |
| n_process | int | Number of processes to use for spaCy’s pipe. | CONFIG.DEFAULT_N_PROCESS |
| batch_size | int | Batch size for spaCy’s pipe. | CONFIG.DEFAULT_BATCH_SIZE |
| show_progress | Optional[bool] | Whether to show internal progress indicators when processing. If None, it is determined based on corpus size. | None |
Methods
| Name | Description |
|---|---|
| features | Compute Biber features from token-level parses. |
| from_folder | Read .txt files from a folder into a corpus DataFrame. |
| parse | Parse a corpus with spaCy using the configured settings. |
| run | Parse and compute features from an in-memory corpus DataFrame. |
| run_from_folder | Read, parse, and compute features from a folder of .txt files. |
| to_analyzer | Create a BiberAnalyzer from a Biber feature matrix. |
features
pipeline.PybiberPipeline.features(tokens, normalize=True, force_ttr=False)Compute Biber features from token-level parses.
from_folder
pipeline.PybiberPipeline.from_folder(directory, recursive=False)Read .txt files from a folder into a corpus DataFrame.
parse
pipeline.PybiberPipeline.parse(corpus)Parse a corpus with spaCy using the configured settings.
run
pipeline.PybiberPipeline.run(
corpus,
return_tokens=False,
normalize=True,
force_ttr=False,
)Parse and compute features from an in-memory corpus DataFrame.
run_from_folder
pipeline.PybiberPipeline.run_from_folder(
directory,
recursive=False,
return_tokens=False,
normalize=True,
force_ttr=False,
)Read, parse, and compute features from a folder of .txt files.
to_analyzer
pipeline.PybiberPipeline.to_analyzer(biber_df)Create a BiberAnalyzer from a Biber feature matrix.