Abstract
The RISE Humanities Data Benchmark is a framework and collection of curated datasets for evaluating large language models (LLMs) on humanities-related tasks. The datasets are designed to be small and task-specific and are each accompanied by manually verified ground truths. An accompanying tool systematically submits the datasets to various LLM providers and models using shared prompts and configurations, then automatically scores the results against the ground truths. The results are published and searchable through a web interface. The framework aims to promote greater reproducibility, transparency, and consistency in LLM-based data processing in the humanities.
