biber

parse_functions.biber(tokens, normalize=True, force_ttr=False, mattr_window=100)

Extract Biber features from a parsed corpus.

Parameters

Name Type Description Default
tokens pl.DataFrame A polars DataFrame with the output of the spacy_parse function. required
normalize Optional[bool] Normalize counts per 1000 tokens. True
force_ttr Optional[bool] Force the calcuation of type-token ratio rather than moving average type-token ratio. False
mattr_window int Window size (in tokens) for MATTR (moving-average TTR). If the shortest document in the corpus has fewer than mattr_window alphabetic tokens and force_ttr is False, the window is reduced to that minimum length with a warning. 100

Returns

Name Type Description
pl.DataFrame A polars DataFrame with, counts of feature frequencies.

Notes

MATTR is the default as it is less sensitive than TTR to variations in text lenghth. For very short texts, MATTR depends on the chosen window size; if any document is shorter than the requested window, the window is reduced to the shortest document length (with a warning). Set force_ttr=True to always compute simple TTR.