PybiberPipeline

pipeline.PybiberPipeline(
    nlp=None,
    model='en_core_web_sm',
    disable_ner=True,
    n_process=CONFIG.DEFAULT_N_PROCESS,
    batch_size=CONFIG.DEFAULT_BATCH_SIZE,
    show_progress=None,
)

End-to-end convenience wrapper for common pybiber workflows.

Parameters

Name Type Description Default
nlp Optional[Language] Pre-loaded spaCy model. If None, the model named by model will be loaded lazily on first use. None
model str Name of the spaCy model to load when nlp is None. Defaults to “en_core_web_sm”. 'en_core_web_sm'
disable_ner bool When True, disable spaCy’s NER component for speed and stability. Parser is still enabled and required. Defaults to True. True
n_process int Number of processes to use for spaCy’s pipe. CONFIG.DEFAULT_N_PROCESS
batch_size int Batch size for spaCy’s pipe. CONFIG.DEFAULT_BATCH_SIZE
show_progress Optional[bool] Whether to show internal progress indicators when processing. If None, it is determined based on corpus size. None

Methods

Name Description
features Compute Biber features from token-level parses.
from_folder Read .txt files from a folder into a corpus DataFrame.
parse Parse a corpus with spaCy using the configured settings.
run Parse and compute features from an in-memory corpus DataFrame.
run_from_folder Read, parse, and compute features from a folder of .txt files.
to_analyzer Create a BiberAnalyzer from a Biber feature matrix.

features

pipeline.PybiberPipeline.features(tokens, normalize=True, force_ttr=False)

Compute Biber features from token-level parses.

from_folder

pipeline.PybiberPipeline.from_folder(directory, recursive=False)

Read .txt files from a folder into a corpus DataFrame.

parse

pipeline.PybiberPipeline.parse(corpus)

Parse a corpus with spaCy using the configured settings.

run

pipeline.PybiberPipeline.run(
    corpus,
    return_tokens=False,
    normalize=True,
    force_ttr=False,
)

Parse and compute features from an in-memory corpus DataFrame.

run_from_folder

pipeline.PybiberPipeline.run_from_folder(
    directory,
    recursive=False,
    return_tokens=False,
    normalize=True,
    force_ttr=False,
)

Read, parse, and compute features from a folder of .txt files.

to_analyzer

pipeline.PybiberPipeline.to_analyzer(biber_df)

Create a BiberAnalyzer from a Biber feature matrix.