Have a personal or library account? Click to login
Automatic Detection of Four-Panel Cartoon in Large-Scale Korean Digitized Newspapers using Deep Learning Cover

Automatic Detection of Four-Panel Cartoon in Large-Scale Korean Digitized Newspapers using Deep Learning

Open Access
|Jun 2024

Figures & Tables

Table 1

Research phases of this paper.

NUMBERRESEARCH PHASE
1Training data collection of Four-Panel Cartoons (FPCs).
2Labelling process of the training dataset.
3Fine-tuning of the YOLOv5 Model.
4YOLOv5_FPC model evaluation: F1-score.
5Image file collection from the Chosun Ilbo News Library (1920–1940), totaling 47,777 JPG files.
6Data mining: Deployment of the YOLOv5_FPC model to the 47,777 JPG files from the Chosun Ilbo News Library, to detect FPC image ojects.
7Database curation: Uploading the Excel and CSV file which contains metadata for URLs of YOLOv5_FPC-detected 1035 images files (1040 FPC objects), which includes preciously undiscovered FPCs, to the JOHD Dataverse (Lee etal., 2024a).
8Data analysis of the detected FPC objects.
9Development of the YOLOv5_FPC-Detector script, leveraging the Google Colab platform for enhanced computational efficiency and wider application for the public.
johd-10-205-g1.jpg
Figure 1

The initial YOLOv5 Model could not detect an FPC.

Table 2

Matrix and era of the “Four-panel Cartoon Image Dataset”.

SETFPC MATRIXCOLONIAL ERAPOST-COLONIAL ERA
Training4 × 13137
2 × 22628
Validation4 × 1123
2 × 247
Testing4 × 186
2 × 267
johd-10-205-g2.png
Figure 2

Model performance while fine-tuning.

johd-10-205-g3.png
Figure 3

F1-score of our YOLOv5_FPC model.

johd-10-205-g4.png
Figure 4

Chosun Ilbo News Library newspaper metadata (1920–1940) (ChosunIlboNewsLibrary, 2024).

johd-10-205-g5.jpg
Figure 5

47,777 image files collected from the Chosun Ilbo News Library (1920–1940) (ChosunIlboNewsLibrary, 2024).

johd-10-205-g6.png
Figure 6

Our Dataset: “Metadata for the YOLOv5_FPC Detected Images” (Lee et al., 2024a) containing the URLs (YOLOv5_FPC-detected 1035 image files; 1040 FPC objects in total), and their publication dates sourced from the Chosun Ilbo News Library (1920–1940).

johd-10-205-g7.jpg
Figure 7

Previously undiscovered FPC image data from the Chosun Ilbo News Library digital archive (ChosunIlbo, 2024).

Table 3

Metadata definitions of Figure 4 Excel file columns.

COLUMN NAMESDEFINITION
idThe unique identifier for each article of Chosun Ilbo.
page_noThe page number of the article.
titleThe title of the newspaper article.
regdateThe registration date of the article.
typeThe type of the article.
publication_dayThe day of the week when the article was published.
sectionThe section of the newspaper where the article is placed.
publication_dateThe date when the article was published (Year-Month-Day).
completenessThe completeness of the article (“Y” indicates Yes).
bodyThe main text of the article.
publication_noThe publication number.
node_idNumeric identifier associated with the article.
source_image_fileThe name of the image file.
@timestampThe timestamp indicating when the newspaper textual data was collected to the Excel file (Figure 4).
isnThe International Standard Number (ISN) associated with the article.
source_xml_fileThe XML file name.
page_sectionThe section of the newspaper (society, general section, advertisement, politics, culture).
urlThe URL to the article.
imageThe URLs containing the website links to the scanned and digitized image files of the Chosun Ilbo newspaper database (used for collecting 47,777 JPG image files in this research).
sub_titleThe subtitle of the article.
authorsThe author(s) who wrote the newspaper article.
Table 4

Frequency analysis of the 1040 YOLOv5_FPC-detected FPCs discovered from the Chosun Ilbo News Library (1920–1940) (ChosunIlboNewsLibrary, 2024).

INDEXNAME OF FPC DETECTED USING THE YOLOV5_FPC MODEL (CHOSUN ILBO NEWS LIBRARY, SPANNING 1920–1940)FREQUENCY (PER FPC)
1Meongteongguri726
2Byeokchangho126
3Japanese Language-Written Cartoon104
4Baekgongsan13
5Dolbo and Mikki12
6Ttukdugiui Seollori11
7Chador’s Adventure9
8Arctic Exploration8
9Rubber Balloon7
10Football Player8
11Makdongi and Goose2
12Biography of a Fool2
13Paengkenggun’s Monkey Catching1
14Buffalo and Fish1
15Then Yes1
16Hobang Bridge1
17Ice Snack1
18The Evil of Alcohol1
19Put Your Hands Up1
20Better Radio1
21Tiger Den1
22Hide and Seek1
23Samyeong’s Caramel Cartoon1
24Love Trees1
Table 5

Frequency comparison of the “Meongteongguri” series.

SERIES OF “MEONGTEONGGURI” FPCCHUNG’S RESEARCH FINDINGS OF FPCS (CHUNG, 2016)OUR YOLOV5_FPC FINDINGS OF FPCSDIFFERENCES (PER FPC)
Reporter Life Part 1None35+35
Modern LifeNone4+4
Social Work5062+12
Heonmulkyeoji4855+7
Ssutdeokdaegi1819+1
Student Life12120
Ssonawatso990
Self-sufficiency8786–1
Round the World148147–1
Hunger Life5019–31
Dating Life181178–3
Family Life102100–2
johd-10-205-g8.png
Figure 8

Automatic FPC detection using the weights of the YOLOv5_FPC model on Google Colab. This script imports the weights and downloads dependencies required for the automatic detection process.

johd-10-205-g9.png
Figure 9

Users can simply upload their files to detect FPCs on their local computers.

johd-10-205-g10.png
Figure 10

The detected FPCs are saved on their local computers.

DOI: https://doi.org/10.5334/johd.205 | Journal eISSN: 2059-481X
Language: English
Submitted on: Mar 6, 2024
Accepted on: Apr 19, 2024
Published on: Jun 6, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Seojoon Lee, Byungjun Kim, Bong Gwan Jun, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.