(1) Overview
Akira Kurosawa’s films have been widely studied, yet openly available, filmography-wide quantitative datasets that support reproducible analysis of his visual style remain scarce. Existing computational approaches often fall into two extremes: labor-intensive manual cinemetrics that scales poorly (Tsivian, 2009), or frame-level annotation workflows that provide rich visual labels but do not directly represent shot-level structure and editing patterns (e.g., Savardi et al., 2021; Argaw et al., 2022). Moreover, relying on single-value summaries such as average shot length (ASL) can obscure skewness and within-film variation, motivating the need for distributional and structural measures (Redfern, 2023). Drawing on “distant viewing” principles (Arnold & Tilton, 2019), we release a shot-level dataset covering 30 Kurosawa feature films, including shot boundaries and durations, per-shot color descriptors, and face-derived proxies for facial presence and shot scale. The release contains derived numerical measurements only and does not include any audiovisual content, enabling open redistribution and reuse for quantitative film-style research, methodological validation, and teaching.
Repository location
Context
This dataset was produced as part of an independent research project on computational film-style analysis. As of the dataset release date (2025-12-31), no peer-reviewed publications based on this dataset have been published.
(2) Method
Steps
Figure 1 summarizes the automated pipeline for generating shot-level “syntax proxy” measures. We apply TransNet v2 (Souček & Lokoč, 2024) for shot boundary detection and discard shots shorter than 0.5 s. From each retained shot, five frames are sampled at fixed fractional positions (t = 1/6–5/6) and decoded with OpenCV (Bradski, 2000). We then extract (i) full-frame face statistics with RetinaFace (Deng et al., 2020), using the maximum face bounding-box height/frame height as a face-size proxy; and (ii) color metrics from a central ROI cropped by removing 12.5% from all four borders to reduce subtitles, watermarks, and border artifacts (Figure 2).

Figure 1
Overview of the shot-level data extraction pipeline.

Figure 2
Schematic definition of the region of interest (ROI) used for color feature extraction. The ROI is obtained by removing 12.5% from all four borders of the frame.
On the ROI, we compute HSV saturation (S) and CIELAB-derived brightness (L*) and warmth (b*–128) using a 1% trimmed mean, and average frame-level values to shots. On the full frame, RetinaFace yields per-frame face count and a face-size proxy; shot-level summaries include face coverage (frames with ≥1 face/5) and the median face-size proxy. Shot-scale labels are assigned only when face coverage ≥0.6 (≥3/5 frames); otherwise, shots are coded as No_Face/Other. We also export auxiliary variables (e.g., raw and truncated ASL, flags for shots >40 s, and log-transformed shot duration) with all shot records to CSV.
Sampling strategy
This dataset is a film-level census of all 30 feature films directed by Akira Kurosawa. For each film, TransNet v2 (Souček & Lokoč, 2024) produces a complete shot list with start/end timestamps and durations; shots shorter than 0.5 s are excluded from the released shot-level dataset and therefore from film-level summaries.
We report two film-level pacing statistics: raw ASL (mean duration of retained shots, ≥0.5 s) and a truncated ASL (mean after additionally excluding shots longer than 40 s) as a sensitivity analysis. Importantly, shots >40 s are kept in the shot-level dataset but flagged, as they may reflect genuine long takes or occasional boundary misses (e.g., gradual transitions). Reporting both values provides a conservative range and supports auditing of extreme-duration candidates.
For within-shot visual measurements, we use deterministic systematic sampling (Figure 3). From each retained shot, five frames are sampled at uniformly spaced relative positions (1/6–5/6) to avoid boundary-adjacent frames, and the same sampled frames are reused across feature types. Color features (HSV saturation; CIELAB-derived brightness and warmth) are computed per sampled frame and averaged to shot-level values.

Figure 3
Schematic illustration of deterministic within-shot sampling (five frames at 1/6–5/6 of shot duration) and the reuse of the same frames for color and face/shot-scale feature extraction.
From the same frames, RetinaFace (Deng et al., 2020) detects faces and records face count and the face-size proxy. A shot is labeled face-present if faces are detected in ≥3 of five frames (coverage ≥0.6). Shot scale is assigned from the median face-size proxy using thresholds: <0.13 (Long/Full), <0.25 (Medium), <0.45 (Medium Close-up), ≥0.45 (Close-up); otherwise it is coded as No_Face/Other.
To contextualize this proxy-based labeling choice, we contrast it with human-annotated alternatives, which differ in scope and scalability. CineScale (Savardi et al., 2021) provides frame-level labels across nine categories, and AVE (Argaw et al., 2022) annotates shot size as one of eight attributes using a professional workflow. ShotBench (Liu et al., 2025), meanwhile, benchmarks visual grammars for VLMs but still reports fine-grained inter-category confusion. Unlike these resources, our dataset prioritizes a scalable, filmography-wide operational proxy derived from face size. Intended for reproducible statistics rather than exhaustive taxonomy, we (i) gate labels by face coverage, (ii) retain a No_Face/Other code, and (iii) report both exact and adjacent agreement.
All analysis scripts and configuration files used to generate the dataset are publicly released to support reproducibility.
Quality control
As an external benchmark for pacing, we compared film-level ASL against Cinemetrics for all 30 films, prioritizing Barry Salt’s entries (n = 11) and otherwise using community-contributed values (n = 7); 12 films had no matching entry. To mitigate under-segmentation around gradual transitions, we report raw ASL (shots ≥0.5 s) and truncated ASL (excluding shots >40 s) as a plausible range. Of the 18 films with references, 16 fall within this range; the remaining discrepancies are flagged for manual audit (Table 1).
Table 1
Film-level ASL for 30 Kurosawa films (raw and truncated) compared with Cinemetrics references (Salt prioritized; otherwise community). “Not checked” indicates the community column was not consulted because a Salt value exists; “Missing” indicates no matching entry.
| MOVIES | ASL | TRUNCATED ASL | ASL RANGE (RAW-TRUNCATED) | BARRY_SALT | COMMUNITY | SOURCE TIER |
|---|---|---|---|---|---|---|
| Sanshiro Sugata, 1943 | 10s | 8.9s | 10–8.9s | NA | NA | Missing |
| The Most Beautiful, 1944 | 10.3s | 7.5 s | 10.3–7.5s | NA | NA | Missing |
| Sanshiro Sugata Part II, 1945 | 11.ls | 8.1s | 11.1–8.1s | NA | 10.4s | Community |
| No Regrets for Our Youth, 1946 | 11.0s | 8.4s | 11–8.4s | NA | NA | Missing |
| One Wonderful Sunday, 1947 | 13.8s | 9.9s | 13.8–9.9s | NA | NA | Missing |
| Drunken Angel, 1948 | 14.1s | ll.1s | 14.1–11.1s | NA | 12.9s | Community |
| The Quiet Duel, 1949 | 20.2s | 10.6s | 20.2–10.6s | NA | NA | Missing |
| Stray Dog, 1949 | 14.3s | 8.5s | 14.3–8.5s | NA | 13s | Community |
| Rashomon, 1950 | 13.3s | 11s | 13.3–11s | 12.5s | Not checked | barry_salt |
| Scandal, 1950 | 16.6s | 10.5s | 16.6–10.5s | NA | NA | Missing |
| The Idiot, 1951 | 17.6s | 10.4s | 17.6–10.4s | NA | NA | Missing |
| Ikiru, 1952 | 18.3s | 11.1s | 18.3–11.1s | 16.1s | Not checked | barry_salt |
| The Men Who Tread on the Tiger’s Tail, 1952 | 13.8s | 9.9s | 13.8–9.9s | NA | NA | Missing |
| Seven Samurai, 1954 | 9.1s | 7.8s | 9.1–7.8s | 8.0s | Not checked | barry_salt |
| I Live in Fear, 1955 | 26.7s | 13.6s | 26.7–13.6s | NA | 24s | Community |
| Throne of Blood, 1957 | 13.1s | 9.3s | 13.l–9.3s | 12.5s | Not checked | barry_salt |
| The Lower Depths, 1957 | 17.3s | 10.2s | 17.3–10.2s | NA | NA | Missing |
| The Hidden Fortress, 1958 | 11.0s | 7.8s | 11–7.8s | 10.2s | Not checked | barry_salt |
| The Bad Sleep Well, 1960 | 17.9s | 10.1s | 17.9–10.1s | NA | 16.5s | Community |
| Yojimbo, 1961 | 14.0s | 9.6s | 14–9.6s | 12.9s | Not checked | barry_salt |
| Sanjuro, 1962 | 13.2s | 10s | 13.2–10s | NA | 13.8s | Community |
| High and Low, 1963 | 17.5s | 11.9s | 17.5–11.9s | NA | 17.5s | Community |
| Red Beard, 1965 | 21.8s | 11.1s | 21.8–11.1s | 21.0s | Not checked | barry_salt |
| Dodes’ka-den, 1970 | 19.0s | 10.4s | 19–10.4s | 20.0s | Not checked | barry_salt |
| Dersu Uzala, 1975 | 20.5s | 11.4s | 20.5–11.4s | 20.1s | Not checked | barry_salt |
| Kagemusha, 1980 | 11.0s | 8.2s | 11–8.2S | 10.5s | Not checked | barry_salt |
| Ran, 1985 | 11.1s | 7.8s | 11.1–7.8s | NA | NA | Missing |
| Dreams, 1990 | 17.1s | 10.7s | 17.l–10.7s | 14.0s | Not checked | barry_salt |
| Rhapsody in August, 1991 | 16.5s | 10s | 16.5–10s | NA | NA | Missing |
| Madadayo, 1993 | 13.0s | 9.1s | 13–9.1s | NA | NA | Missing |
To assess internal reliability, we manually annotated 62 shots sampled across films and visual conditions (black-and-white vs. color; dialogue-driven vs. action). Shot-boundary detection achieved F1 = 0.91 within a ±0.5 s tolerance. Using the “face-present” rule (≥3/5 frames), face detection reached shot-level F1 = 0.96. Face-size–based shot-scale classification achieved 82% exact and 87% adjacent accuracy (±1 scale bin).
Taken together, the Cinemetrics comparison and our manual audit support the reliability of the released measures. Remaining discrepancies—often linked to gradual transitions or challenging face conditions—are handled via conservative reporting (raw vs. truncated ASL) and flagged for targeted review. Face-derived shot scale should be interpreted as an operational proxy.
(3) Dataset Description
Repository name
Zenodo
Object name
kurosawa_shot_level_data.csv;
Kurosawa30.zip;
generate_kurosawa_dataset.py
Format names and versions
CSV (UTF-8);
Python script (.py) [Python 3.10]
Creation dates
(2025-12-22) to (2025-12-29)
Dataset creators
Xueran Wu (Luoyang Normal University; principal investigator; data curation).
Language
English
License
Creative Commons Attribution 4.0 International
Publication date
(2025-12-31)
Notes
Kurosawa30.zip contains 30 per-film shot-level CSV files (one CSV per film).
(4) Reuse Potential
This dataset provides a filmography-wide resource for quantitative research on Akira Kurosawa’s 30 feature films. The timestamps and derived metrics enable pacing analysis (e.g., ASL profiles; Figure 4) and stylistic clustering, bridging the gap between classical scholarship and recent digital humanities and computational film-style research. Specifically, the face-derived shot scale and face-size proxy offer granular evidence to revisit Stephen Prince’s (1991) formalist observations, while cutting rates allow for a re-evaluation of the rhythmic patterns described by Donald Richie (1996). Methodologically, these structured metrics support modern “Distant Viewing” frameworks and meet recent calls for distributional data beyond simple averages. Finally, this quantitative groundwork complements recent reception studies (e.g., Martin, 2017), which emphasize that Kurosawa’s global appeal rests on visual aesthetics—a dimension that this dataset enables systematic quantitative characterization.

Figure 4
Film-level average shot length (ASL) across 30 Akira Kurosawa feature films in the dataset.
Users should note that unlike human-annotated datasets such as CineScale (frame-level) or AVE (rich taxonomy), our labels are face-derived operational proxies emphasizing person-centric framing. Consequently, shots dominated by landscapes or heavy occlusion may be coded as No_Face/Other. Limitations also include the absence of audiovisual content and residual automation errors (e.g., under-segmentation in gradual transitions). Researchers requiring exhaustive annotation may use this dataset as a scalable, low-cost baseline for training dedicated classifiers.
AI assistance
Generative AI tools were used to assist with language polishing and code refactoring under the author’s direction. The author reviewed all outputs and takes full responsibility. No AI tools were used to generate or manipulate the research data, measurements, figures, or results.
Acknowledgements
The author gratefully acknowledges everyone who supported the production of this dataset. Special thanks to Tao Wu and Shujia Shen for their contributions. The author is also deeply grateful to Professor Heng Liu, Associate Professor Feixuan Hu, and Lecturer Shaobo Zhang for their valuable guidance. Finally, the author thanks Xingyun Liu, Yuya Fu, Hexuan Ma, and Jiankun Wang for their assistance.
Competing Interests
The author has no competing interests to declare.
Author contributions
Xueran Wu: Conceptualization, Methodology, Software, Data Curation, Formal Analysis, Writing – Original Draft, Visualization.
