Have a personal or library account? Click to login
A Comprehensive Digital Corpus of Song Ci Poetry for Computational Analysis Cover

A Comprehensive Digital Corpus of Song Ci Poetry for Computational Analysis

By: Yao Song  
Open Access
|Jan 2026

Abstract

This paper presents and describes a comprehensive, open-access digital corpus of Song Ci poetry, comprising over 20,000 poems from the Song Dynasty. The corpus was created by aggregating and standardizing texts from multiple public-domain sources into a single, curated SQLite database (‘ci_curated.db’). Each poem record includes the author’s name, rhythmic title (cipai), and poem text in machine-readable form. By providing a large-scale, well-documented corpus, this dataset supports a wide range of computational tasks in the digital humanities—including authorship attribution, stylometry, and the training of language models on classical Chinese—and facilitates reproducible and comparative research.

DOI: https://doi.org/10.5334/johd.405 | Journal eISSN: 2059-481X
Language: English
Submitted on: Oct 8, 2025
|
Accepted on: Dec 8, 2025
|
Published on: Jan 16, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Yao Song, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.