Download Corpus Data
Download Corpus Data
The Download Corpus Files page lets you save your processed corpus and related data tables to your computer. You can choose between downloading just the tokenized corpus or all processed data, in formats suitable for future analysis or use in other tools.
What You Can Do
- Download your target or reference corpus as a single file for future use in this tool
- Download all processed data (including frequency tables and document-term matrices) for use in other environments (like R or Python)
- Choose between CSV and Parquet formats for bulk downloads
Step-by-Step Guide
1. Choose a Corpus
- Select Target or Reference corpus using the radio buttons.
- If you have not loaded a reference corpus, you will see a message explaining how to do so.
2. Choose Data to Download
Corpus file only:
Download just the tokenized corpus as a.parquet
file.
Use this if you plan to reload your corpus into this tool later.All of the processed data:
Download a ZIP archive containing the token file, frequency tables, and document-term matrices.
Use this if you want to analyze your data in other tools (like R or Python).
3. Choose File Format (for all data)
- Select CSV or PARQUET as the format for your ZIP archive.
4. Download
- Click the Download button in the sidebar to save your files.
- The download will start automatically.
Tips for New Users
Tip
- If you want to continue your analysis in this tool later, download the corpus file only.
- If you want to use your data in other software, download all of the processed data in your preferred format.
- CSV files are widely compatible; Parquet files are more efficient for large datasets and work well with Python and R.
If You Get Stuck
Important
- Make sure you have processed a target corpus before trying to download.
- If you see a warning about missing data, use the Load Data button in the sidebar.
- If you have not loaded a reference corpus, you can do so from the Manage Corpus Data page.