Have a personal or library account? Click to login
Named Experts in Fine Art Auction Text from the Getty Provenance Index German Sales (1900–1945) Cover

Named Experts in Fine Art Auction Text from the Getty Provenance Index German Sales (1900–1945)

Open Access
|Dec 2025

Full Article

(1) Overview

The Getty Provenance Index is a digital repository of art market information including archival inventories, sales catalogues, dealer stock books, and public collections (Davis, 2019). This data spans art auctions and private contract sales from 1650 to 1945, with extensive data from Belgium, France, Germany, Austria, Switzerland, Great Britain, the Netherlands, and Scandinavia. The largest subset of the art auction sales catalogue data is the German sales collection covering the period of 1900–1945 (hereafter GPI).

The GPI is not only a data tool for provenance research of Nazi Era looted artwork (Schuhmacher, 2024) but also presents opportunities analysis of and networks. To date, research has focused on networks of commercial participants and dealer networks (Schich et al. 2017). An under-researched area of the art market is the crossover of the commercial interests of the art market and art experts/connoisseurs, who authenticated works in a commercial context (Gaskell, 2002; Watrelot, 2023). The lack of research in this area is partly due to a lack of usable data. The data exists but in unstructured formats embedded within auction descriptions. Using SQL to extract raw text strings containing named experts and LLM inference to refine the raw data structures, a new dataset was created with cleanly extracted named experts and their academic titles.

Repository location

University of Leeds Research Data Repository. https://doi.org/10.5518/1781.

(2) Context

The data was produced as part of a PhD project researching the application of machine learning and LLM analysis to analyse commercial connoisseurship in the GPI.

(3) Method

The dataset was created using a combination of SQL text extraction and LLM processing to refine the data into a usable format (Karjus, 2025). SQL was used to filter auction records containing key terms relating to expert attestations. These are references to academic or institutional titles (Prof., Dr., Hofrat etc.) or the German ‘Gutachten’ (expert opinion). A new property (extracted_text) was then added by extracting a raw string snippet of up to 150 characters following any keyword match related to expert opinions or titles. For this, only the second half of the auction entry text was targeted, as this is where such references appear in the auction catalogues. The new data structures were then extracted as JSON to facilitate the processing of text data with LLMs (Mukanova et al., 2024). LLM processing was favoured over rule-based extraction (i.e. regular expression) because LLMs can handle the many spelling and annotation variations present in the GPI (Keraghel et al., 2024).

An example JSON structure following the SQL extraction is shown below.

{

“artist_name_1”: “Pieter Aertsen”,

“auction_house_1”: “Fischer”,

“full_auction_entry”: “001 Pieter Aertsen Amersterdam 1507–1875 – Ecce Homo. Gutachten von Dr. Max J Friedländer. Siehe die Abbildung. Holz H. 57cm B. 31,5cm”,

“object_type”: “Gemälde”,

“record_id”: 1258,

“sale_date”: “1927-07-19 00:00:00”,

“extracted_text”: “Gutachten von Dr. Max J. Friedländer. Siehe die Abbildung…”

}

The LLM processing focussed only on the extracted_text field in the JSON object. This ensured quicker LLM inference and reduced computational overhead. The model chosen for the inference task was Qwen 3–8b (Yang et al., 2025), a midsize open-source LLM. Open-source models were chosen given the increased flexibility, transparency and reproducibility offered by open-source tools (Manchanda et al., 2025; Hwaszcz et al., 2025). Various models were evaluated for quality, including Llama 3-8b (Meta, 2024), Gemma 2 9b (Google & DeepMind, 2024), Mistral 7b (Jiang et al., 2023), and Salamandra (Gonzalez-Agirre et al., 2025). Qwen 3-8b performed the best when extracting academic titles and clean names. An example of the enhanced data structure with clean academic titles and expert names is shown below.

{

      “artist_name_1”: “Claes Molenaer”,

      “auction_house_1”: “Achenbach (Walther)”,

      “extracted_text”: “Geh.-Rat Friedländer: … ist ein echt signier”,

      “gpi_auction_entry”: “0031 Claes Molenaer gest. 1676 Winterlandschaft. – Holz, signiert, Größe 62:46 cm. S. R. Gutachten Geh.-Rat Friedländer: … ist ein echt signiertes Werk von Claes Molenaer. Tafel 11 62 cm x 46 cm”,

      “object_type”: “Gemälde”,

      “record_id”: 31,

      “sale_date”: “1937-03-10 00:00:00”,

      “expert_names”: [

            “Friedländer”

      ],

      “titles_in_text”: [

            “Geh.-Rat”

      ],

      “heidelberg_url”: “http://digi.ub.uni-heidelberg.de/diglit/achenbach1937_03_10

}

The data is also provided in excel format for ease of use.

The enhanced dataset comprises c. 5,000 examples of catalogued fine art objects where named experts feature in the object descriptions. The data set includes the data points displayed in Table 1.

Table 1

Outline of fields, their data types, formats and descriptions.

FIELD NAMETYPEFORMAT/STRUCTUREDESCRIPTION
record_idIntegerNumericUnique identifier assigned within the internal research database to each auction record.
sale_dateStringYYYY-MM-DDDate on which the auction took place.
artist_name_1StringTextName of the principal artist associated with the auctioned object.
object_typeStringTextCategory of object sold (e.g., painting, sculpture, print).
auction_house_1StringTextName of the auction house responsible for conducting the sale.
gpi_auction_entryStringFull textComplete auction entry text as recorded in the Getty Provenance Index.
extracted_textString150-character snippetSubstring automatically extracted from the latter half of the GPI entry, containing references to expert opinions.
expert_namesArray of stringsListCleaned list of expert surnames identified within the extracted text by the LLM pipeline.
titles_in_textArray of stringsListHonorifics, academic titles, or institutional designations found in the auction text (e.g., “Dr.”, “Prof.”, “Geh.-Rat”).
heidelberg_urlStringURILink to the corresponding digitised catalogue page hosted by Heidelberg University Library.

Quality control

To control the quality of the extraction and mitigate any LLM hallucinations (Banerjee et al., 2025), random samples were taken and controlled manually for retrieval quality. An additional automated data quality check was performed using Python to screen for any retrieval issues. It is, however, acknowledged that dealing with historical data and data mining is not a perfect solution, and some entries may show inconsistencies. Data users are encouraged to check whether the data in the current extracted format is suitable for downstream tasks. For example, the data may benefit from further standardisation using the Getty Union List of Arist Names (Harpring, 2010). The data has been left in its current form to respect the original text as closely as possible and maximise user use through LLM refinement.

(4) Dataset Description

Repository name

University of Leeds Research Data Repositories

Object name

named_experts_getty_prov_index_german_sales.json/named_experts_getty_prov_index_german_sales.xlsx

Format names and versions

JSON and CSV/XLSX file formats

Creation dates

2025-10-22.

Dataset creator

Mathew Henrickson, School of Computer Science (AI for Language Group), University of Leeds, UK

Language

English (fields/keys)/German (content/values)

License

CC-BY (4.0)

Publication date

2025-10-22.

DOI reference

https://doi.org/10.5518/1781

(5) Reuse Potential

This dataset supports historical, data-driven analysis of named expert opinions in the German-language fine art auction market (1900–1945). It enables examination of how named expertise was deployed in auction catalogue documentation to support attribution, authenticity, and value. The dataset is designed to be used in conjunction with existing art market historical research; its interpretation depends on being situated within broader historical, institutional, and biographical contexts.

The reuse potential of the dataset can be divided into three main categories:

  1. quantitative profiling of named expert references,

  2. empirical analysis of the overlap between institutional authority and commercial interests, and

  3. network-based analysis of auction house activity and named expert citation patterns.

a) Quantitative Profiling of Art Market Experts

The dataset can be used to construct quantitative profiles of art market experts, analysing their prominence, institutional affiliations, citation frequency, and areas of material expertise. This offers a complement to existing qualitative scholarship (Montias, 1999). An example is the museum director Wilhelm von Bode; one of the most cited experts in the data. His profile—strongly concentrated on paintings, with a smaller extension into sculpture—can be set alongside established research of his career and influence (Paul & Levis, 1995) to examine how museum-based connoisseurship was used within commercial contexts. Such analysis contributes to current academic discussions on the crossover between institutional authority and the art market in the early twentieth century (Gaskell, 2002).

The dataset also features experts such as Max J. Friedländer, whose influential writings on attribution and connoisseurship (Friedländer, 1946) underpin a distinct profile within the data. Friedländer appears almost exclusively in relation to paintings, reflecting a highly specialised form of scholarly authority. These profiles are of relevance to current research on expertise, provenance, and value formation, as they allow long-assumed distinctions between scholarly authority and commercial practice to be tested empirically.

b) Institutional and Market Context Analysis

The dataset also supports analysis of structural relationships between scholarship and commerce in the German art market (1900–1945). Museum directors and prominent scholars operated in academic and commercial spheres (Gaskell, 2002). Rather than focusing on individual careers, this data enables aggregation of expert attributions across auction houses and different time periods. As argued by Baudrillard (1972), museums function as epistemological guarantors, whose authority stabilises value within the market, while the market, in turn, reinforced institutional prestige. By analysing patterns in when, where, and how named expert opinions appear the dataset enables empirical investigation of these commercial and scholarly interdependencies.

c) Network Based Analysis

The dataset can also be reused for comparative and network-based research. While prior studies have demonstrated the prestige effects of artist names on market value (Oosterlinck & Radermecker, 2019), the influence of named expert opinions remains under-researched. The data facilitates analysis of expert co-occurrence within catalogues, repeated associations with auction houses, and thematic concentrations of expertise, extending existing art-market network approaches (Schich et al., 2017; Fletcher & Helmreich, 2018).

Finally, the dataset could be integrated into the GPI, which was recently remodelled as a linked-data resource using an event-based approach aligned with the Linked Art data model (Sanderson, 2024). In the GPI events are represented as connected objects, agents, institutions, and transactions. Named expert opinions and Gutachten map naturally onto this structure as discrete attribution events linking scholars to artworks. The data supports the transformation of unstructured text into structured, machine-readable assertions of expertise, enhancing semantic modelling and enabling future integration other datasets (Davis, 2019; Schich et al., 2017).

Illustrative Analysis

The following example demonstrates how the dataset can be used to analyse temporal patterns in the citation of expert opinions. Figure 1 shows a marked increase in references to Gutachten and named expert opinions from the late 1920s onwards. This rise may reflect periods of instability, where authentication and named expert opinions were increasingly used to underpin both value and buyer. The pattern also suggests a shift towards greater interdependency between scholarly institutions and the art market, as well as processes of market professionalisation during the interwar period (Kräussl, 2007).

johd-11-415-g1.png
Figure 1

References to Gutachten and named expert opinions in auction catalogues from 1900–1945.

The peak in expert references in the early 1930s is followed by a sustained decline after 1933, which may reflect structural changes in the German art market rather than a straightforward reduction in the use of expertise. This period coincides with significant disruption to scholarly and commercial networks, including the dismissal and forced emigration of many established experts, particularly from museum and academic positions (Petropoulos, 2016). At the same time, changes in regulatory oversight and documentation practices may have reduced the explicit naming of experts in auction catalogues (Bähr, 2018). The decline suggests a shift in documenting techniques, rather than its disappearance from art-market practice.

Acknowledgements

This work was undertaken on the Aire HPC team and Research Computing at the University of Leeds.

Competing Interests

The author has no competing interests to declare.

Author Contributions

Mathew Henrickson – Conceptualisation, Data Curation, Formal Analysis, Methodology, Writing.

DOI: https://doi.org/10.5334/johd.415 | Journal eISSN: 2059-481X
Language: English
Submitted on: Oct 14, 2025
|
Accepted on: Dec 1, 2025
|
Published on: Dec 23, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Mathew Henrickson, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.