biber
parse_functions.biber(tokens, normalize=True, force_ttr=False, mattr_window=100)Extract Biber features from a parsed corpus.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| tokens | pl.DataFrame | A polars DataFrame with the output of the spacy_parse function. | required |
| normalize | Optional[bool] | Normalize counts per 1000 tokens. | True |
| force_ttr | Optional[bool] | Force the calcuation of type-token ratio rather than moving average type-token ratio. | False |
| mattr_window | int | Window size (in tokens) for MATTR (moving-average TTR). If the shortest document in the corpus has fewer than mattr_window alphabetic tokens and force_ttr is False, the window is reduced to that minimum length with a warning. |
100 |
Returns
| Name | Type | Description |
|---|---|---|
| pl.DataFrame | A polars DataFrame with, counts of feature frequencies. |
Notes
MATTR is the default as it is less sensitive than TTR to variations in text lenghth. For very short texts, MATTR depends on the chosen window size; if any document is shorter than the requested window, the window is reduced to the shortest document length (with a warning). Set force_ttr=True to always compute simple TTR.