Observing the Coming of Age of Video Game Graphics: Exploring the historical development of video game graphics through distant viewing, hermeneutics and image clustering

Adrian Demleitner

doi:10.5334/johd.251

Full Article

(1) Context and motivation

Viewing Video Game Graphics History from Afar

Video game graphics evolved from simple text-based or basic representations in the 1970s through 8-bit and 16-bit eras in the 1980s and early 1990s, culminating in the introduction of 3D graphics in the mid-1990s. This briefly outlined history is based on anecdotal and partial knowledge and is usually linked to a few generally known games or singular case studies (Gingold, 2024; Kushner, 2013; Montfort & Bogost, 2009; Ruggill & McAllister, 2015). This begs the question of how this development from text to graphics and 3D ties into the larger, and certainly more complex, history of video game graphics? Further, recent studies in video game history have expanded beyond the traditional focus on the USA or Japan, revealing the significance of local video game cultures in regions such as the UK, Czechoslovakia, France, and Australia (Blanchet, 2020; Kirkpatrick, 2015; Švelch, 2018; Swalwell, 2021; Wade, 2016). These works highlight the need to research video game graphics not only as a globally unified process, but also as a complicated development arising out of geographical and temporal patches.

Video game graphics have been researched through various perspectives. Whereas some case studies concentrate on the aesthetics and applications of specific graphics rendering methods (Ruggill & McAllister, 2015), others focus on the computing systems underlying hardware and their graphic capabilities (Montfort & Bogost, 2009). Further research has been done on the techno-historic limits that set the frame within which video games’ graphics could develop (Hutchison, 2008), player reception on improved graphics (Birken, 2023) or the media-theoretical nature of video game images (Fizek, 2022). While this research is fundamental to an understanding of video game graphics, it is usually limited in its historic scope.

Although ground-breaking research has been published on the subject, up to this day, little advance has been made in studying the material conditions of what makes video game graphics on a larger scale. This necessitates an approach that takes a broad range of different video games into account, spanning not only several years of publication, but also different computing systems and countries of origin. These recent studies that focus on local video game history have shown large gaps in our knowledge and archives, motivating multiple initiatives for geographical counter-histories attempting to counteract this neglect (Pfister et al., 2023). As it becomes increasingly evident that our understanding of the field is incomplete, and as crucial aspects of the history of digital technology is being lost due to obsolescence and bit-rot, we need to consider alternative approaches to studying the evolution of video game graphics (Guay-Bélanger, 2022).

I attempt to provide the base material for such an approach with the Video Game History Screenshots (VHS) dataset. Showcased through this paper, I made a first foray into looking at this specific historical development over the span of three decades, from 1960–1990. Given the scope of video game graphics history, the question arises, how can we gain an overview? This shall serve as the guiding question for this inquiry. Further, is using a distant viewing approach feasible for analysing the historical development of video game graphics across different computing systems and geographical contexts? What insights can be gained about the evolution of video game graphics by examining a large dataset of screenshots from 1960–1990, beyond traditional singular case studies?

In line with Melanie Swalwell’s call to study ordinary culture (Swalwell, 2021), I propose a distant viewing approach to the history of video game graphics, including under-researched and lesser popular video games. As a hermeneutic approach, a distant viewing becomes a generous, “rich, and browsable interface that reveals the scale and complexity of digital heritage” (Whitelaw, 2015). It is located within the larger framework of cultural analytics, which applies computation, visualisation, and data to explore culture at a large scale (Manovich, 2020). Distant viewing as a method cares about the interpretative act of analysing large visual datasets (Arnold & Tilton, 2019, 2023). Being able to work with such a dataset as well as analysing formal aspects makes distant viewing a fitting approach to researching the former outlined questions.

(2) Dataset description

The Video Game History Screenshots Dataset

To distant view the larger history of video game graphics, my methodological approach included the sourcing, visualizing and analysis of a new dataset. It consists of 113,555 in-game screenshots, from 4316 video games from 1962–1990, with Figure 1 showing an example. Whereas in sourcing and visualising I applied mainly machine-assisted techniques, the analysis was partially of hermeneutic and interpretative nature. Sourcing focused on collecting a uniform set of images from MobyGames and enriching these with metadata from MobyGames and Wikidata. Visualising involved different approaches to reducing the dimensionality of the dataset, arranging the samples by similarity, as well as detecting clusters. The analysis of the computational results applied interpretation of the global structure of the visualisation, as well as studying global and local structures of interest. The VHS dataset is yet the largest structured collection of video game screenshots of older video games. Whereas MobyGames, the source for the images contained in the VHS dataset, hosts by far more screenshots, the platform posses limits regarding availability for research inquiries and prohibits applications through linked open data.

Exemplary dataset sample with a screenshot of Bruce Lee (U.S. Gold/Datasoft, 1986, Amstrad PC).

Repository location: https://doi.org/10.5281/zenodo.13349250
Repository name: Zenodo
Object name: thgie/VHS dataset-v1.1
Format names and versions: PNG, JSON, CSV, IPYNB
Creation dates: 19. June–17. July 2024
Dataset creators: Adrian Demleitner
Language: English
Licence: ODC Open Database Licence v1.0
Publication date: 17. July 2024

For the analysis process, I leveraged on the open-source suite FiftyOne, which offers a scriptable environment as well as an advanced user interface to work with image datasets in various ways. Next to having the support of a large and established community, this choice allowed for more customization and control over the whole process. It is a toolkit for curating, visualising, and managing unstructured visual data, and streamlines data-centric workflows for working with visual datasets. Next to a choice of various ways of visualising image similarity, it also enables working with different computer vision models, making it highly adaptable. After initial tests with different models, I applied the self-supervised transformer model DINOv2 ViT-B/14 (Oquab et al., 2024) to calculate the image embeddings, and used Uniform Manifold Approximation and Projection (UMAP) (McInnes et al., 2020) for the reduction of dimensionality. For the cluster detection, I relied on k-means.

To explore the questions at hand, I needed to include in-game screenshots that represent a video game’s visual modes as fully as possible. A visual mode is the structure of what is visible on the screen, as well as that screen’s specific function (Arsenault et al., 2015). Examples are the start screen, full-screen menus, or game play screens. These modes encompass different interactive functionalities and are relevant to inquiring video game images as a unique type of visual interfaces. Although there can be variations within a mode, for example in different in-game moments, most often the scope of game play and graphics is covered when all modes are present as data points. A large and complete repository of video game screenshots is difficult to come by, and I chose MobyGames as the sole provider of screenshots. MobyGames is one of the largest community-driven platforms for the collection of knowledge on video games. It is also the only archive with a sufficient amount of visual material that spans video game history and computing platforms. Creating secondary media of video games, such as screenshots or recordings, involves the games being played. The resources of doing such on a large scale quickly exceeds the capacity of a single researcher or even a research project. This is why crowdsourced repositories such as MobyGames are invaluable.

At the time of writing this article, MobyGames hosts entries to over 289,650 video games spanning all known video game systems and attempting to collect as much factual information per game as possible. The first entries are from 1950, and the first entry with screenshots is Spacewar! (Mainframe) from 1962, setting this year as the starting point for the VHS dataset. These screenshots are provided by members of the platform and usually produced by playing a game in an emulator, in the case of older titles. MobyGames encourages its member to submit unaltered screenshots that show different aspects of a game as it was supposed to look like when it was played at the time of its release. This leads to a large part of the images in the dataset being in their original lower resolutions.

Being maintained mainly by amateurs and video game enthusiasts is not without problems (Pfister et al., 2023). There are open questions on accessing data, searchability, and most importantly, completeness. Working with MobyGames makes it difficult to assess, what will be missing from the dataset. A comparison with figures of published games could be a possible approach, but would favour countries where video game history research focused on to this day. Large parts of video game history besides Japan and the USA are lesser researched, and only just started evaluating what games other geographical regions developed and published. These outlined issues highlight an inherent bias in the dataset, especially in its aspects of year of publication as well as country of origin. There is a steady incline of games per year (Figure 3) and an over-representation of video games from the USA, Japan, and the UK (Figure 2). As is, this dataset represents the history of video game graphics as seen by the MobyGames communities, from which I sourced the dataset. It is difficult to evaluate if those leanings correlate with real-world numbers or are solely the effect of the focus of the work by the MobyGames’ members.

Histogram of the video games’ countries of origin.

Histogram of the video games’ years of publication.

(3) Method

Accessing MobyGames, I encountered a two-fold problem. As of 2022, it is owned by Atari, which invests in the development of the platform, but also expects a return of invest. While browsing the platform is free of charge, some advanced features, such as exporting search results as structured data files or API access, are only available to paying members. Further, the API imposes rate limits and does not allow filtering by date of publication. To circumvent these issues, I instead queried Wikidata for video games with a MobyGames ID and a release date until 1990. The year was chosen in favour of excluding the advent of 3D graphics in a first attempt at this study’s approach. Although taking the route via Wikidata enabled easier access to the screenshots and profiting from Wikidata’s metadata, it also limited the number of video games in the dataset. Whereas MobyGames returned roughly 22’200 video games for the chosen timespan, Wikidata only little over 4300, amounting to only a fifth of what would be available.

Distant viewing includes various methodological considerations (Arnold & Tilton, 2023). In my case, I attempted to meaningfully read a large quantity of images in relation to each other as well as their metadata, through critical hermeneutics (Gallagher, 2016). Analysing such a large visual dataset exceeds interpretative approaches and depends on the aid of computational methods. I expected meaningful insights through the observation of the dataset, after I visualised the images by their similarity. Such a layout offers a first glimpse of the images’ overarching relationality through observation of the images in conjunction to their contexts. To visualize images by their similarity, computational methods offer different approaches, all of which depend on various algorithms to calculate similarity. Images are very dense and rich in terms of information that they contain, which would lead to unfeasible needs of computational resources if processed directly. A first step is thus the reduction of this complexity through calculating the images’ embeddings. These are simplified numerical representations for each image, usually produced through a computer vision model. Embeddings stand in for the semantic content that machine learning models have recognized in an image. This implies that different computer vision models produce different embeddings, since they were usually trained in different ways, and on different visual material.

To be able to make meaningful observations, I needed a balance between the global and local structures in the resulting visualisation. t-SNE and UMAP were considered for the visualisation of similarity. For the embeddings, I tested the computer vision models Inception-v3 (Szegedy et al., 2015), ResNet-101 (He et al., 2015) and DINOv2. Inception-v3 is in use in the popular distant viewing toolset PixPlot by the Digital Humanities Lab at Yale. Regarding the first two models, I was unable to configure the visualisations satisfyingly. The global structure tended to become overly dispersed, without opening up to meaningful hermeneutic interpretations. Likewise, did visualisations through the clustering algorithm t-SNE tend to be overly dense, making it difficult to study local structures and details. The combination of UMAP and DINOv2 finally allowed me to produce results that were accessible in their global and local structures and allowed for detailed readings of semantic as well as formal clusterings.

I calculated several embeddings for each unique combination of the two hyperparameters (minimum distance being 0.001, 0.01 and 0.5, and number of neighbours being 100, 500 and 700). The global structure tended to become overly dense and uniform when minimum distance and number of neighbours leaned towards higher values. Instead of deciding for one final visualisation, the combination of parameters allowed to visualize and highlight aspects in global as well as local structures. Whereas combinations of higher values tended to create dense visualisations, and displayed global trajectories, lower values created better accentuated local structures, opening up the view into interesting details of the visualisation. I also produced several clusterings through k-means, with a varying number of cluster groups. These differently sized and detailed clusters supporter my analysis of general visual aspects of the global structure and local structures, and focus on specific formal or semantic elements.

(4) Results and discussion

The resulting visualisations are the result of fine-tuning the hyperparameters of the UMAP dimensionality reduction, and I found them to present an interesting balance between global and local structures, highlighting global trends and local clusterings. As outlined before, the main focus is the exploration of the history of video game graphics, as well as finding new openings for detailed research inquiries. To that end, two aspects of the analysis were of importance. Those are the distribution and visualisation of data points according to their games’ year of publication in the global structure, as well as how and what the setup will cluster towards in local structures.

Figure 4 shows the visualisation of all screenshots arranged by similarity. The samples are coloured by their video game’s year of publication. It is important to note, that some dimensionality reduction techniques such as UMAP, which was applied here, produce axes that do not have an inherent direct meaning. Instead, meaning is derived by analysing the dataset samples’ position in relation to their context and clusters. Clearly visible are several gradients regarding the year of publication, with darker colour indicating older games, and brighter colours newer ones. It can be generalized that screenshots of older games until 1980 can be found on one side whereas newer games younger than 1985 tend to be located on the other, amounting to a general development of older to newer games. I welcomed this result in relation to the initially outlined problem of studying single video games in relation to the larger history of video game graphics. The visualisation locates images with text only on one end and gradually develops to images incorporating graphics, and finally to video games with more dynamic images, higher resolutions and more colours. I could observe a strong correlation between time of publication and visual complexity in the global structure.

Visualisation of the roughly 113’000 data points by similarity.

The visualisation in Figure 5 highlights the development of video game graphics along formal elements in eight distinctive clusters. A large area on the right (a–c), splitting into several peninsulas, contains screenshots with mostly text. The middle section (d–e) signifies the arrival of crude graphics in video games and their development towards improved graphic capabilities and professionalised video game development. Finally, in the area on the left (g–h), containing predominantly newer games, we have a general tendency for screenshots of games for platforms with higher resolution and colour-depth, and with a higher degree of gameplay action and moving graphics. A larger area in the upper centre displays an area with few data points. Around this area are clusterings of early games (c, d, e, h), consists of screenshots that are very close to desktop application interfaces, with windows, framed messages or skeuomorphic user-interface elements. This area is partially absent of graphical elements that make a screenshot intuitively recognizable as being of a video game. Other larger aspects of the similarity visualisation include an island for team-sport games on top (g–h), islands for racing games and flight simulators on the left (g), and various smaller peninsulas focusing on specific genre-defining formal aspects.

Visualisation of eight distinctive clusters.

My general observation unfolded the global layout structured through basic formal aspects that include text, graphic styles, colours, resolution as well as basic graphic elements such as grid-systems or illustration styles. Other groupings are clearly based on the screenshots’ contents and semantics. Due to the diversity of the screenshots in the datasets, from abstract graphics, to screens with only text and full-screen illustrations, large parts of the visualisation flow into each other. Although the arrangement by similarity is readable intuitively on a per-screenshot basis, this makes meaningful inquiries into the larger clusters difficult without a clear research focus. This relates to the second initially outlined question, on the types of insights such an overview could generate. As I did not intend specific results regarding the clusterings, the following highlights have to be taken as an exploration into the possibilities of this approach. While the global structure of the visualisation confirms existing knowledge, smaller clusters show the potential of complicating the history of video game graphics.

The following two groups were highlighted through the k-means clustering. They are among the groups with a clear focus, and I chose to outline them to demonstrate potential entry points for further research on the history of video game graphics or contents. The first cluster concentrates on the inclusion of problematic image material in video games, and the second focuses on a specific video game genre.

(4.1) Problematic Media

The visualisation Figure 6 highlights in blue the position of this cluster, locating it in an area of newer games in the late 1980s. This correlates with new graphic capabilities of including and displaying better illustrations in video games. The ability to include images, graphics and illustrations, of course, introduced visual material to video games, that can be deemed harmful content. This includes display of violence, glorification of war and fascist symbolic, disturbing and pornographic images. This specific cluster (Figure 6) centres on large illustrations and photos that contain such imagery. These illustrations appear in a wide range of game modes, such as title screens, cut-scenes, in-game and others. The range of covered video game genres is also quite broad, ranging from war simulation to puzzle games. A few memorable examples includes depictions of violence, threatening fascist politicians on title screens, naked women posing lasciviously as the background of strip poker games, and nightmare inducing illustrations shown in Figures 7, 8, 9, and 10.

Overview of the visualisation with the problematic media cluster highlighted.

PowerMonger (Bullfrog Productions/Electronic Arts, 1992, Genesis).

Rocket Ranger (Cinemaware/Mirrorsoft, 1988, Apple IIGS).

Weird Dream (Best Ever Games/Rainbird Software, 1988, DOS).

Strip Poker: A Sizzling Game of Chance (Artworx Software, 1982, Apple II).

(4.2) Formal Aspects

The Boxing cluster highlighted in Figure 11 is an example of a grouping, in which formal and semantic aspects mingle and evolve in the direction of a genre. The cluster is singled out on an island, together with sport-related video games, such as baseball, rugby, basketball, and tennis. These types of sports potentially points towards their popularity and the USA as an early video game nation. The clustering indicates that the visualisation was able to clearly distinguish these images from the rest of the dataset. The cluster is held together by a few semantic-formal elements. Despite aesthetic similarities, the video games contained are of fundamental different game play, mechanics, and visuality. The screenshots example how differences in game mechanics and aesthetics, in perspective and graphics, can cognitively be understood as being of the same video game principle shown in Figures 12, 13, 14, and 15. Given their individual differences, the unifying semantic-formal elements are classic boxing gloves, ring-ropes, and half-naked combatants in short pants. The cluster is centring on few formal aspects that together build the semantics of the boxing video game genre. To some extent, this cluster also includes screenshots with lesser extravagant wrestling scenes, highlighting the two sports’ similarities.

Overview of the visualisation with the Boxing cluster highlighted.

World Championship Boxing Manager (Goliath Games, 1990, Atari ST).

Sierra Championship Boxing (Evryware/Sierra On-Line, 1985, PC Booter).

TKO (Accolade/Electronic Arts, 1988, DOS).

Panza Kick Boxing (Futura/Loriciels, 1992, TurboGrafx CD).

(5) Implications/Applications

Finding Ludic Interfaces

The VHS dataset was meant to look at the history of the development of video game graphics. My approach of distant viewing the history of video game graphics through the reduction of dimensionality and image clustering was done through computer-assisted methods and hermeneutical interpretation.

The visualisation with the help of a newer computer vision model was able to deal with the dataset’s diverse material and adequately construct the overall development as well as create clusters around different formal and semantic aspects, which now opens up the dataset for detailed inquiries. Further, the model was able to distinguish video game modes, although mixed-mode clusters were created when the group centred on semantic aspects. The clustering by modes implies that they share a high degree of formal elements, which is an important finding for me, when reading video game images towards their function as interfaces. A mode usually pertains to what a player’s main activities are within that moment in a video game. This finding supports that the dataset and visualisation could potentially be continued to be researched regarding the formal development of video game graphics. Certain areas of the visualisation have also shown to be closely related to developments in desktop application interfaces, furthering the pathway for a larger investigation on the history of user interfaces.

The two exemplary clusters discussed in the results point to the potential for further research and reuse of the dataset. The group with problematic media shows how technological advancements in late 1980s video game graphics enabled the introduction of controversial visual content across diverse game genres, complicating linear narratives of the development of video game graphics. By mapping the emergence of problematic imagery, further research could open critical pathways for interdisciplinary investigations into the complex relationships between technological capability, cultural representation, and interactive media. Further, video game classification remains a complex challenge, with current approaches predominantly focused on gameplay mechanics that often inadequately capture the medium. The cluster around boxing-related semantic-formal elements proposes an alternative approach, integrating semantic-formal and visual elements to develop a genre classification systems based on visuality, and challenging traditional taxonomic approaches.

As I outlined before, the dataset hosts the potential to be expanded. The accompanying metadata has not been fully considered in the current study. Especially, metadata regarding geographical contexts or regarding production information can be of interest to inquire local video game histories. On the other hand, this is also the most urgent gap to consider. Gathering what has been developed and published besides traditional video game nations, such as the USA and Japan, is in its infancy. Likewise, despite the scope of MobyGames, it will have blind spots on these local video game histories. After expanding the VHS dataset by the already existing material, it could be used a vantage point to inquire what is missing.

Either way, it could profit from more sources. Next to tapping into other video game archive platforms, the inclusion of video game screencast stills is a viable option (Burghardt & Piontkowitz, 2024). Video game graphics do not define themselves via static composition and construction alone, but often unfold over time. This is to say that there are insights to be found in researching the moving images and animations of video games. As the clusterings have indicated, meaning also unfolds in the interplay between the semiotic layer and the functional affordances. An approach concerning just static images or video stills is omitting a vital aspect of video game studies.

Nonetheless, a specific aspect of video game graphics was not well received by the model. Ludemes are a specific type of interface element unique to video games (Hansen, 2023). These elements are linked to game play rules and are the “meeting point and mediating factor between the player’s agency and the game’s visual representation of its internal state” (Arsenault et al., 2015). A good example are pushable blocks in Soko-Ban (Thinking Rabbit, 1982, PC-88) shown in (Figure 16), or The Legend of Zelda: A Link to the Past (Nintendo, 1991, Super Nintendo). I expected some clustering around better known ludemes, such as pushable blocks, but could not observe them in any significant way. This indicates a blind spot in the model’s understanding of video game interfaces. Tackling this issue asks for the training and fine-tuning of a specialised computer vision model, necessitating the annotation of the dataset as base data. I have incorporated the conceptual element of ludemes in a CIDOC-CRM based ontology (Demleitner, 2024), that could be used for such an annotation task, but it rests in an experimental state and has not been applied on such a large scale. For further development and support of this finding, a larger discussion on ludemes and the specifics of video games interfaces is necessary.

In-game screenshot from Soko-Ban (Thinking Rabbit, 1982, DOS).

Acknowledgements

I would love to thank my supervisors Dr. Tobias Hodel, Dr. Eugen Pfister, and my dear colleague Florian Spiess for discussing this article with me as well as gifting me their professional expertise. Further, I want to thank my significant other, Florence Aellen, for being a vital source of support and encouragement.

Competing Interests

The author has no competing interests to declare.

Author Contributions

Adrian Demleitner: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft

Observing the Coming of Age of Video Game Graphics: Exploring the historical development of video game graphics through distant viewing, hermeneutics and image clustering

Full Article

(1) Context and motivation

Viewing Video Game Graphics History from Afar

(2) Dataset description

The Video Game History Screenshots Dataset

Figure 1

Figure 2

Figure 3

(3) Method

(4) Results and discussion

Figure 4

Figure 5

(4.1) Problematic Media

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

(4.2) Formal Aspects

Figure 11

Figure 12

Figure 13

Figure 14

Figure 15

(5) Implications/Applications

Finding Ludic Interfaces

Figure 16

Acknowledgements

Competing Interests

Author Contributions

Paradigm

My account