Technical Notes
Version 0.2.0 Changes
Lightweight Dependencies
Starting with version 0.2.0, google_ngrams has been redesigned to use minimal dependencies while maintaining full functionality. The package now includes custom implementations that replace the previous dependencies on scipy and statsmodels.
What Changed
- Hierarchical Clustering: Custom implementation in
vnc_helpers.pyreplaces scipy’s clustering functions - Smoothing: Cubic regression splines with ridge regularization in
scatter_helpers.pyreplace statsmodels GAM fitting - Package Size: Significantly reduced installation footprint
- Performance: Maintained or improved performance for core VNC and smoothing operations
Benefits
- Faster Installation: No need to install large scientific computing libraries if you only need google_ngrams functionality
- Reduced Conflicts: Fewer dependency conflicts in virtual environments
- Consistent Behavior: Custom implementations ensure consistent results across different platforms
- Maintained Functionality: All user-facing functions work exactly the same way
Supported Methods
The package continues to support all the same visualization and analysis methods:
timeviz_vnc()- Variability-based neighbor clustering dendrogramstimeviz_scatterplot()- Scatterplots with smoothed fits using cubic splinestimeviz_barplot()- Bar plots for frequency datatimeviz_screeplot()- Scree plots for cluster analysiscluster_summary()- Cluster analysis results
Implementation Details
VNC Clustering
The VNC implementation follows Gries and Hilpert’s original methodology exactly:
- Distances calculated using standard deviations or coefficients of variation
- Hierarchical clustering maintains leaf order for periodization analysis
- Custom dendrogram truncation preserves temporal relationships
Cubic Spline Smoothing
The smoothing implementation uses:
- Truncated power basis with interior knots
- Ridge regularization for stability
- Bootstrap confidence intervals
- Automatic handling of edge cases and numerical stability
These technical details are transparent to users - all functions work the same way as before, just with a lighter dependency footprint.