Abstract
Benchmarking for literary analysis is complicated by a persistent mismatch between the fixed context windows of classification models and the emergent properties of literary forms. Here, I approach this challenge by reconsidering semantic parallelism in Chinese regulated verse (lüshi 律詩) as a problem of scale. I first employ a “teacher” model to label parallelism at the couplet (meso) level and then test which “student” model architecture—micro (character), meso (couplet), or macro (poem)—can most effectively recover this labeling rule. The experiment points to a Goldilocks hypothesis: performance is maximized when the classifier is structurally aligned with the scale at which the feature has been encoded. This finding yields further practical insights: (1) bottom-up aggregation of local predictions sacrifices raw performance but offers greater interpretability by exposing the specific decisions of a misaligned model; (2) top-down inference requires additional training computation to compensate for global noise and achieve performance comparable to aligned models; (3) if the goal is to better understand how artificial intelligence represents specific literary phenomena internally (“vector poetics”), aligned classifiers afford the most direct and promising access. By examining different forms of (mis)alignment between texts and models, the study invites discussion on whether meaningful benchmarking requires matching the computational “unit of analysis” with the humanistic “unit of inquiry.”
