Cross-Referenced Data on Electoral Disputes and French Legislative Election Results

Caroline Bligny; Frédérique Letué; Marie-José Martinez; Romain Rambaud; Alya Hafsaoui

doi:10.5334/johd.315

Full Article

(1) Overview

Repository location

https://doi.org/10.57745/0BI3LB

Context

The dataset was produced within the framework of the JADE (“Justice Algorithmique Des Elections” for “elections algorithmic justice”) project and has already been used in several papers (Rambaud et al., 2023a, 2024, 2025a, 2025b). The project aims to identify the causes of cancellation of an election by a judge. In France, court decisions on legislative and senatorial elections are handled by the Conseil constitutionnel (Constitutional Council), while election results are managed by the Ministère de l’Intérieur (Ministry of the Interior). The proposed dataset brings together these two sources of data on the Fifth French Republic, from November 1958 to April 2024. It is likely to be completed after new data enrichment or new elections.

(2) Method

The raw data comes from two open-source datasets available on the website https://www.data.gouv.fr/fr/. At the last update date (March 30^th 2024), the court decisions dataset consisted of 4,158 XML files and the election dataset consisted of 30 Excel files. Table 1 shows the information included in both source datasets, namely information on the electoral disputes relating to legislative (AN) and senatorial (SEN) elections, as well as legislative election results since 1958.

Table 1

Main information contained in both source datasets.

	ELECTORAL DISPUTES OF THE LEGISLATIVE (AN) AND SENATORIAL (SEN) ELECTIONS	RESULTS OF THE LEGISLATIVE ELECTIONS
Source	https://www.data.gouv.fr/fr/datasets/constit-les-decisions-du-conseil-constitutionnel/	https://www.data.gouv.fr/fr/pages/donnees-des-elections/
Original data sets	Decision number Decision date Decision title Decision type (‘nature’) Solution adopted by the court Text of the court decision URL of the court decision	Election year Election round Electoral subdivisions (‘departement’ and ‘circonscription’) Number of registered voters Number of voters Number of votes cast For each candidate, gender, last name, first name, political nuance, number of votes

We first loaded and reshaped both source datasets into a relational database (RDBMS), before processing the link between them and adding either legal or project related information. Next, the JADE dataset was extracted from this database.

Using an internal project database, built with the powerful open-source PostgreSQL system, allows us to manage and extract data from different perspectives.

Steps

The following steps are partly iterative.

1) JADE database design: Define the appropriate tables

Figure 1 shows the database model built from the two source datasets.

Database model built from the two datasets.

The information on court decisions is gathered into a main ‘decision’ table. The input data (one file per decision) are XML files including metadata. We added two tables: the first one includes personal data, and the second computed additional fields. The Excel election results files were split into several tables to take into account redundant information found in each row. There are, now, eight tables: electoral subdivisions (‘subdivision’ and ‘departement’ tables), general data for each election (‘election’ table), results for each election (‘resultat’ table), each election round (‘resultat_tour’ table) and each candidate (‘resultat_candidat’, ‘candidat’ and ‘appartenance’ tables).

2) Integrate data from the source data files

We developed a python code able to download the source data files from the Web, read them and insert the desired data into the corresponding JADE database tables. The program can be run multiple times if needed to rebuild the entire database.

Since the XML decision files are standard, we were able to rely on the XML tags to select the appropriate data. However, the Excel election results files are more heterogeneous: the same information can have different columns names and locations in different files. The software code manages the different formats, reads each row and then populates the tables.

3) Link elections and decisions

In the database, the link between one decision and one election is established by the field ‘id_election’ as a foreign key in the ‘decision’ table. The link is created by searching the correspondence between the fields present in both source datasets, namely ‘departement’, ‘circonscription’, ‘date_dec’ and ‘date_premier_tour’. For ‘departement’ and ‘circonscription’, the information is included in the title of the decision, with some name variations. For the dates, we searched for the most recent election date before the decision date. This may cause a false link if the decision applies to a partial election which is not in the election database. To solve this problem, we added a Boolean field entitled ‘election_partielle’. To determine its value, the algorithm searches for the potential regular election date at the beginning of the decision text. If this date is not given, the decision is checked by a law specialist and the value is set to TRUE if applicable.

In some cases, the link between a decision and an election result cannot be established either because the decision applies to many electoral subdivisions, or because the decision deals with another decision and not with an election result. A similar case can be observed with the 8^th legislature when the voting method was not a majority vote (proportional election in 1986). For 3.8% of AN decisions, the link cannot be established.

4) Enrich data

We added different calculated fields to query the database and extract the JADE dataset. Table 2 displays the list of the database calculated fields needed for the statistical and legal analysis.

Table 2

List of the database calculated fields.

	INTERNAL PURPOSES	ANALYSIS PURPOSES
Derived fields	Year of request Rank of candidate Round number	Legislature Reelection Nuance and gender of the elected candidate Outgoing candidate (yes/no)
JADE additional information		JADE solution (regrouped categories) Article 38 of order n°58–1067 ruling on the inadmissibility of the request (yes/no)

5) Clean data

a) Error detection in the source datasets

The election results files were created by hand and include a few typos. We searched for the errors by 1) comparing the sum of all the votes received by the candidates and the number of votes cast (55 errors found), and 2) checking that the number of registered voters is greater than the number of voters, the latter being greater than the number of votes cast (108 errors found). With 15,317 results per round, this represents a 1% error rate. We have corrected these data in the database using the values given by Wikipedia.

b) Variation in the source datasets

In the election results files, we found many writing inconsistencies in the spelling of the candidates’ names: first name spelled out in full or only represented by the initial, last name represented by either the birth or married name. Correcting this information is necessary to accurately define the fields ‘reelection’ and ‘sortant’ (outgoing). In order to solve the problem, we carried out a detailed search in the database for elected candidates who follow each other in chronological order within the same electoral subdivision, and who have either the same first name or the same last name. Using this method allowed us to correct 191 names, approximately 5% of the total data.

6) Extract JADE dataset

Once the above steps have been executed, the database is ready to be used for various purposes, including dataset extraction for statistical analysis. To prepare the dataset extraction, we added an SQL view that aggregates data by election result (i.e., results for a given election date and a given electoral subdivision) and computes several vote differences in absolute value and percentage. This view is then exported into a CSV file.

Dataset summary

The final JADE dataset includes a total amount of 4,158 court decisions: 3,881 AN and 277 SEN. The dataset contains 59 variables precisely described in the file entitled “jade_champs.tab” and available on the web page https://doi.org/10.57745/0BI3LB. Table 3 displays the main variables.

Table 3

Main variables of the JADE dataset.

VARIABLES	DESCRIPTION
solution	Court decision (15 possible values)
solution_jade	Court decision (10 regrouped categories)
article38	Article 38 of order n°58–1067 ruling on the inadmissibility of the request: 6 possible values depending on the probability of inadmissibility
nuance_elu	Political nuance of the elected candidate: 130 possible values
genre_elu	Gender of the elected candidate (M = Male/F = Female)
reelection	Outgoing status of the candidate (TRUE/FALSE)
tx_ry_nuance	Political nuance of the first (y = 1), second (y = 2) or third (y = 3) candidate in the first (x = 1) or second (x = 2) round: 130 possible values
tx_ry_genre	Gender of the first (y = 1), second (y = 2) or third (y = 3) candidate in the first (x = 1) or second (x = 2) round (M = Male/F = Female)
tx_ecarty	Vote gap between the first and second candidates (y = 1) or between the second and the third candidates (y = 2) in the first (x = 1) or second (x = 2) round

Analysis of the “solution” field shows that almost 50% of the decisions result in ineligibility, while 40% of the requests are rejected. Cancellations account for 1.7% of the court decisions. Additional statistics are available on the JADE website (https://jade.univ-grenoble-alpes.fr). These statistics may evolve with future versions of the dataset.

Data Quality

Each row of the dataset corresponds to a court decision, supplemented where possible, by a summary of the corresponding election results (3,598 out of 4,158 decisions, i.e., 86.5%). Incomplete rows correspond either to senatorial elections (6.7%) and to some partial or old legislative elections (subdivisions abroad before 1988), all for which we do not have data in the source files (3.2%), or to the 8^th legislature, the rectification decisions, or the court decisions on several electoral subdivisions (3.6%), for which the data do not fit our fields. Finally, 3.4% of AN court decisions are incomplete. Additionally, the source files do not contain the candidates’ names before 1988, so we could not compute the ‘reelection’ field for 15% of the relevant court decisions. For similar reasons, gender is missing in 34% of the relevant court decisions. We plan to improve this point in a future version of the dataset by looking for a complementary data source.

(3) Dataset Description

Repository name

https://entrepot.recherche.data.gouv.fr/

Object name

20240702_jade_extract.tab

Format names and versions

CSV file, Tab-Delimited

Creation dates

2024-07-02

Dataset creators

Caroline Bligny (LJK/UGA): database designer, software architect, developer, data curation, supervision; Marie-José Martinez and Frédérique Letué (LJK/UGA): dataset designer, data validation, statistical expertise; Mickael Pereira (UGA Master student): developer, Romain Rambaud and Alya Hafsaoui (CRJ/UGA): legal expertise.

Language

French

License

License OBDL

Publication date

2024-10-16

(4) Reuse Potential

The main interest of the JADE dataset relies on the link between court decisions and election results. The JADE dataset can be used by researchers in legal studies or political sciences to analyze electoral disputes. By examining the vote gaps, Rambaud et al. (2025a) were able to confirm that the electoral judge tends to cancel an election when the vote gap is small enough, i.e., less than 1.4% of the votes. But other variables in the JADE dataset, such as gender, political nuance, or the outgoing status of the candidate remain unexplored. It would be interesting to ask whether gender or some political nuances are associated with higher rates of election cancellation. It would be also interesting to ask whether there are differences in the treatment of electoral disputes for outgoing candidates or for candidates with certain political nuances. The JADE dataset can also be used to raise public awareness of electoral disputes.

Acknowledgements

The authors would like to thank all the students who contributed to data cleaning: Thibault Abeille, Samuel Ibghi, Ilda Sehitaj, Mariia Kliueva, Aymeric Sciers. The authors would like to thank Nadine Lynn-Martinsons and Marjolaine Tauveron for their help in improving the English writing of the article.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

Caroline Bligny: Data curation, Methodology, Software, Writing – original draft, Writing – review & editing

Marie-José Martinez and Frédérique Letué: Formal analysis, Writing – original draft, Writing – review & editing

Romain Rambaud: Conceptualization, Data curation, Funding acquisition, Project administration, Writing – review & editing

Alya Hafsaoui: Data curation, Writing – review & editing