(1) Overview
Introduction
Scaling, “the construction of measures by associating qualitative judgments about unobservable constructs with quantitative, measurable metric units” [1], remains one of the most common but difficult tasks in empirical social science research. Many modern scaling techniques rely on variations and extensions of statistical approaches that combine factor analysis and generalized linear models. However, unlike other applications of factor models, the key quantity of interest in scaling problems is the estimates of the underlying latent traits associated with each individual/experimental unit.
In the case of instruments composed exclusively of binary items, item response theory (IRT) models [2] are well-established scaling tools. Correspondingly, software that can fit IRT models to data is widely available. In the case of the R computing environment, examples of packages that can accomplish this include eRM [3], ltm [4], mirt [5], psychotree [6, 7], and MCMCpack [8]. Furthermore, IRT models can also be fitted using general-purpose multilevel modeling packages such as lme4 [9], lavaan [10], MCMCglmm [11], and bmrs [12, 13]. Here, we are mainly interested in scaling applications in political science, and especially those related to the estimation of revealed preferences from voting data in deliberative bodies such as the U.S. Congress or the U.S. Supreme Court, with the latent scale interpreted as representing the “ideology” of the legislator or judge in a liberal-conservative scale. In that context, IRT models can be seen as examples of spatial voting models [14, 15]. Various R packages for fitting models in this class are available, including wnominate [16] and pscl [17, 18].
Most of the latent variable models for binary data implemented in the packages listed above implicitly assume that the response functions of the items (which describe how the probability of a positive outcome depends on the latent trait) are monotonic. While such an assumption is appropriate in some applications (e.g., in educational testing), it can lead to misleading results in other settings. For example, it is not uncommon to see members of the U.S. House of Representatives who are understood to be on opposite sides of the ideological spectrum vote together against a measure supported by the majority [19, 20, 21, 22]. In those circumstances, scaling methods that assume that response functions are monotonic will lead to estimates of the latent traits that make these extreme legislators appear centrists.
The literature on scaling models that can accommodate non-monotonic response functions is limited. Probably, the best-known example is the generalized graded unfolding model (GGUM). GGUM “folds” the response function of a traditional graded response model, which allows individuals to exhibit lower response probabilities when an issue is further away from their ideal points in either direction of the latent scale. Computational methods for GGUMs have been implemented in various R packages, including GGUM [23], which provides an interface to the original Fortran code developed for the GGUM2004 software [24], as well as mirt [5] and ScoreGGUM [25]. Bayesian estimation of GGUMs using Markov chain Monte Carlo (MCMC) algorithms was first considered by [26] and [27]. In particular, [26] demonstrated that Bayesian estimates avoid the kind of bias to which frequentist estimates are prone when the model parameters lie at the end of the latent scale. MCMC algorithms for GGUMs have been implemented in the R packages MCMC GGUM [28] and bmggum [29]. However, the algorithms implemented in these packages can mix slowly and require substantial ad hoc tuning. More recently, [22] proposed a Metropolis-Coupled MCMC (MC3) method, available in the R package bggum. This MC3 algorithm enhances the exploration of parameter spaces by allowing multiple parallel chains to exchange information while running at different temperatures.
Recently, [30] introduced probit unfolding models (PUMs) as an alternative to GGUMs. Unlike GGUMs, PUMs can be directly motivated as spatial voting models, making their use especially appealing in political science applications. Furthermore, [30] provides substantial empirical evidence that PUMs fit data lead to complexity-adjusted fit that is at least as good as that of GGUMs in real applications. This paper describes pumBayes, a new R package that enables Bayesian estimation of these PUMs using MCMC algorithms that require minimal or no tuning. pumBayes can accommodate both static models in which the latent traits of the given individual are the same for all items, and dynamic models in which batches of observations are collected at different points in time and the latent trait is allowed to evolve slowly from one batch to another. The package also includes various support functions that can be used to pre-process data, select hyperparameters, summarize output, and compute metrics of model fit.
The PUM
The PUM [30] is a one-dimensional scaling model closely related to item response models. It can be motivated through the use of random utility functions. To ground our presentation, we describe it using the usual political science terminology. In this context, we typically have I legislators/judges who vote in favor or against J issues. PUM assumes that the (latent) preferences of legislator/judge i, denoted by , belong to a one-dimensional Euclidean latent “policy” space. It is common to refer to as the “ideal point” of legislator/judge i, as it represents their preferred policy. Similarly, each issue has associated with it three positions in the policy space, , and such that either or , where is associated with an affirmative vote, while both and are associated with a negative vote on issue j. Legislators/judges choose among these three options based on (random) utility functions of the form:
where , , and are independent standard Gaussian shocks. Then, if we let represent the vote of legislator/judge i on issue j, the probability of an affirmative vote is given by
where and .
Note that this formulation includes a two-parameter probit IRT model as a limiting case. Indeed,
Prior distributions
pumBayes uses prior distributions for the issue-specific parameters that are independent across and take the form
where , , denotes the q-variate normal distribution with mean vector a and variance matrix A and, similarly, denotes the q-variate normal distribution with mean vector a and variance matrix A truncated to .
For the static version of the model, pumBayes uses a standard normal distribution as the prior for the ideal point, i.e., independently for all . For the dynamic version of the model, the ideal points are still assumed to be independent across individuals, but they are linked across time for a given individual using a first-order autoregressive process whose stationary distribution is again a standard normal distribution,
The autocorrelation parameter of this latent autoregressive process is then given a hyperprior (a Gaussian distribution truncated to the (0,1) interval) and its value learned from the data. This choice of prior distributions ensures that the parameters are identifiable to shifts and scalings of the latent policy space. To ensure identifiability to reflections of the policy space, users must identify individual(s) in the data for whom the sign of their ideal point will be fixed to be positive.
Posterior computation
pumBayes relies on an MCMC algorithm to generate samples from the posterior distribution of the model. The algorithm assumes that any missing data are missing completely at random. The construction of the sampler involves two data augmentation tricks. The first augmentation trick is reminiscent of that described in [31] and involves the introduction of vectors of latent variables, for every and . The definition of these latent variables is tightly linked to the form of the utility functions in (1):
where , and are standard normal distributions. These augmentations ensure that most of the full conditional distributions belong to standard families from which direct sampling is possible.
The second augmentation trick breaks down the mixture prior on and into the two fully connected regions associated with their support. In particular, for , we let if and only if and , and otherwise. Then, , and
This second augmentation is important to ensure that the algorithm can fully explore the posterior distribution of and, in particular, that it can move between the quadrant where and , and the quadrant where and . To sample zj, the algorithm relies on two Metropolis-Hasting steps with different relative frequencies. For the first of these moves, proposed values are generated by flipping the signs of and , while for the second move, the proposed values for and are generated from the prior conditioned on the proposed quadrant. The first of these moves is expected to be most effective for items for which is close to (e.g., quasi-unanimous votes in our political science application). The number of pairs in this category is typically relatively low, but when the move is useful, the acceptance probability tends to be very high. Hence, it is often appropriate to attempt it relatively infrequently. On the other hand, the second move is most useful when the posterior distribution of is clearly bimodal. This move tends to be useful for a larger number of pairs, but the proposals also tend to have more moderate acceptance probabilities. Hence, we propose to perform it relatively more frequently. Full details of the algorithm can be found in [30] and [32].
Implementation and Architecture
pumBayes is an R package that provides an implementation of the MCMC algorithms described above, as well as a series of support functions to preprocess data and construct various summaries of the posterior samples. The main components of the algorithms are implemented in C++, with the R functions providing a simplified interface to that code. Because of this architecture, pumBayes relies on the external package Rcpp for its core functionality. Its two main functions are sample_pum_static (which, as the name suggests, generates posterior samples from the static version of PUM) and sample_pum_dynamic (which generates samples from the dynamic version).
Installing the package
pumBayes is a package built on top of the R software environment for statistical computing and graphics. For instruction on how to install R, see https://www.r-project.org/. pumBayes requires that various additional R packages be installed first. Please see the “Dependencies” section under “Availability” below.
pumBayes is available from the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/package=pumBayes and from the original GitHub site https://www.github.com/SkylarShiHub/pumBayes. pumBayes can be installed either directly from CRAN using the install.packages function in R:

or from GitHub using the install_github function in the devtools package:

Fitting static PUMs using pumBayes
We start by demonstrating the use of the function sample_pum_static to fit the static version of PUM. sample_pum_static can handle data in the form of either a rollcall object from the pscl package [17], or in the form of a logical matrix where TRUE values correspond to affirmative votes, FALSE values correspond to negative votes, and NAs correspond to any type of missing data. If a matrix object is used as input, the names of the rows are assumed to correspond to the legislators names.
For our demonstration, we use pumBayes to recover legislators’ preferences from roll-call voting data from the 116th U.S. House of Representatives. The data can be downloaded from https://voteview.com using the pscl package.

h116 is an object of class rollcall. For our purposes, the key component of any rollcall object is votes, which, as the name suggests, contains the outcomes of the votes cast by each legislator who served in the House during this period. These outcomes are typically encoded as 1, 2 or 3 for an affirmative vote (“Yea”, “Paired Yea” and “Announced Yea”), 4, 5, or 6 for a negative vote (“Announced Nay”, “Paired Nay” and “Nay”), and 0, 7, 8, 9 or NA for different types of missing values (“Not Yet a Member”, “Present” and “Not Voting”). The component codes of the rollcall object typically includes an explanation of these codes.

Another useful component of a rollcall object is legis.data, which contains information about the legislators involved with this dataset.

Typically, datasets downloaded from https://voteview.com need to be preprocessed before they can be analyzed. The function preprocess_rollcall in pumBayes can handle transformations of the original rollcall object, including the removal of pre-specified legislators or votes, or the merging of pairs of legislators. In the specific case of h116, we will use this function to perform the following operations on the data:
The President of the United States at the time (Donald Trump) needs to be removed. The following code identifies the row index in the matrix of votes that corresponds to him (note that there are no other legislators in this House that share the last name TRUMP):

In the analysis of congressional voting records, lopsided (unanimous or quasi-unanimous) votes are often removed before scaling because they provide little or no information about legislators’ preferences. In these cases, we are interested in removing only completely unanimous votes.

Two legislators (Justin Amash, representing MI-3, and Paul Mitchell, representing MI-10) left the Republican Party during this period to become independents. Hence, each of them appears twice in h116. For the purpose of this analysis, we merge their records before and after becoming independents and label the combined voting records for the whole 116th House as such.

Finally, after all the above filters are applied, we are interested in removing any legislator who missed more than 60% of the remaining votes from the dataset.

All of the above parameters are bound into a list that is passed as an argument to the function preprocess_rollcall. This function returns a “clean” version of the original rollcall object:

The final step before being able to fit the model is to select the hyperparameters. The default values used by pumBayes correspond to , and . A way to evaluate the suitability of these default values is to visualize the implied prior on the probability of an affirmative outcome obtained from Equation (2). The function tune_hyper allows us to generate a random sample from this implied distribution, and a histogram of these samples provides a convenient visualization.

As Figure 1 shows, the default hyperparameter values imply a prior that places most of its mass close to either 0 or 1. This in turn implies an assumption that, for most votes, the majority of legislators are quite certain about how they will vote. This seems most appropriate in the context of this application. The prior also places slightly more mass close to 1 than it does close to 0. Again, this is appropriate, as the agenda-setting powers of the majority mean that most motions put to a vote in the U.S. House tend to pass, and therefore, receive more affirmative than negative votes.

Figure 1
Histogram of samples from our default prior distribution for affirmative voting probability.
The function tune_hyper can also be used to explore the implications of alternative prior specifications. For example, a prior that is concentrated around 0.5 can be obtained by setting , and (please see Figure 2).


Figure 2
Histogram of samples from our alternative prior distribution for affirmative voting probability.
We are now ready to invoke the function sample_pum_static to sample from the posterior distribution. We aim at generating a total of 40,000 samples, of which the first 20,000 will be discarded as part of the burn-in period. These samples are obtained by thinning a longer chain every 10 iterations.

Typically, the ideological ranks implied by the ideal points are the key quantities of interest in this kind of analysis. These can be obtained using the function post_rank.

Note that the posterior median rank for Mo Brooks (who represented Alabama’s 5th district) is 428, placing him among the most conservative Republicans in the House during this period. On the other hand, the posterior median rank for Martha Roby (representing Alabama’s 2nd district) is 280, placing her as a centrist Republican.
It is illustrative to compare the ideological ranks estimated by PUM with those estimated using traditional approaches such as IDEAL [33], which is implemented in the pscl package:

An effective way to visualize this comparison is a scatterplot of the posterior median ranks estimated by the two models.

As Figure 3 shows, the rankings generated by PUM can differ significantly from those generated by IDEAL, especially for Democratic legislators. The top four discrepancies are for the members of the “Squad”: Representatives Alexandria Ocasio-Cortez (NY), Ilhan Omar (MN), Ayanna Pressley (MA), and Rashida Tlaib (MI). IDEAL positions them as centrists, while PUM places them among the most liberal in the Democratic caucus. Given the Squad’s advocacy for progressive policies, PUM’s ranking seems more accurate [19, 20].

Figure 3
Posterior median rank of the ideal points of legislators in the 116th House of Representatives under IDEAL (x axis) vs. those under PUM (y axis).
The results look similar if we instead compare against the ranks generated using the wnominate package (see Figure 4):


Figure 4
Maximum likelihood estimates of the ideal points of legislators in the 116th House of Representatives under WNOMINATE (x axis) vs. the posterior median rank of legislators’ ideal points under PUM (y axis).
As we mentioned in the introduction, the differences in the estimates generated by these models are due to the fact that PUM allows for both monotonic and non-monotonic response functions. To illustrate this, we construct estimates of the response functions associated with three separate votes using the function item_char in pumBayes:

As Figure 5 shows, PUM can capture both monotonic and non-monotonic response functions depending on the information contained in the observed data. Vote 5 (clerk session vote number 6) and vote 9 (clerk session vote number 10) are examples of votes with monotonic response functions, while vote 6 (clerk session vote number 7) is an example of a vote with a non-monotonic response function.

Figure 5
Posterior mean (solid lines) and 95% pointwise posterior credible intervals (gray regions) for the response functions associated with selected votes in the 116th House of Representatives.
Comparisons between PUM and other Bayesian models can be carried out using a blocked version of the Watanabe–Akaike Information Criterion (WAIC) [34, 35, 36]. For a given House and model, the WAIC is given by
where represents the probability that legislator i votes “Aye” in issue j. Lower values of WAIC provide evidence in favor of that particular model. Like the Akaike Information Criterion and Bayesian Information Criterion, the WAIC attempts to balance model fit with model complexity. However, unlike these two criteria, WAIC is appropriate for hierarchical settings and is invariant to reparametrizations of the model.
To calculate WAIC values, we first calculate the probability array of voting “Aye” using functions predict_pum and predict_ideal contained in pumBayes. The output of each of these functions is a three-dimensional array, where the dimensions correspond to members, issues, and MCMC iterations. Then the function calc_waic can be used to evaluate the complexity-adjusted fit of both PUM and IDEAL models using Equation (3):

The values clearly indicate that PUM outperforms IDEAL on this particular dataset, as evidenced by its significantly smaller WAIC value.
Fitting dynamic PUMs using pumBayes
We now demonstrate the use of the function sample_pum_dynamic for estimating time-varying preferences. Data is passed into sample_pum_dynamic through two arguments. The first is a logical matrix containing the vote outcomes and in which TRUE values correspond to affirmative votes, FALSE values to negative votes, and NAs correspond to any type of missing data. This is the same as one of the formats accepted by sample_pum_static. The second argument is a vector that indicates the time period to which each vote belongs.
For our illustration, we focus on voting data for the U.S. Supreme Court between 1937 and 2021, which is available for download at https://mqscores.wustl.edu/replication.php. This dataset includes the voting outcomes on non-unanimous decisions, with the outcome being encoded as 1 (TRUE) if the judge voted to reverse the lower court decision, and as 0 (FALSE) if they voted to unhold it. The data is available as part of pumBayes through the object scotus.1937.2021.

The syntax for sample_pum_dynamic is very similar to that for sample_pum_static. Furthermore, note that we use the same hyperparameters for as we did in the static case:

The path of the ideal points is often a quantity of interest. For example, we might want to plot their posterior means and associated credible intervals for a few selected Justices.

Figure 6 suggests that Hugo Black’s preferences shifted substantially during his long service in the Court. Early in his tenure, his preferences tended to become more negative (’liberal’) over time. However, starting around 1950, his ideal point shows a consistent trend toward becoming more positive (“conservative”). Antonin Scalia’s preferences also seem to have evolved over time but, in his case, he became first more positive (“conservative”) and then more negative (“liberal”). In contrast to these two Justices, Warren E. Burger’s preferences remained relatively steady during his term as Chief Justice.

Figure 6
Posterior mean (solid lines) and 95% credible intervals (shaded regions) of the trajectories of the ideal points of selected U.S. Supreme Court Justices.
A salient feature of PUM is that it can sometimes produce posterior distributions for the ideal points that are bimodal. Hugo Black’s preferences during the late 60s is a good example (see Figure 7).

Figure 7
Posterior distributions for Justice Hugo Black’s preferences.

A key hyperparameter in dynamic PUM models is the autocorrelation coefficient of the underlying autoregressive process. To investigate the performance of the model, we might want to compare the prior and posterior distributions for this parameter. Figure 8 shows that, in this dataset, the posterior distribution of is centered close to the prior, but is much more concentrated. Indeed, its posterior mean is 0.891, and the associated 95% posterior credible interval is (0.863,0.905).


Figure 8
Prior and posterior distributions of the autocorrelation parameter ρ.
Similar to the static case, it is illustrative to compare the complexity-adjusted fit of PUM against that of the more standard dynamic IRT model proposed by [37], which is implemented by the function MCMCdynamicIRT1d in the package MCMCpack [8]:


Similar to the static case, the functions predict_pum and predict_irt can be used to calculate the predicted voting probabilities across MCMC samples, and calc_waic can be used to evaluate the WAIC score for each term. The difference between the score under the standard model and PUM can then be displayed as a line plot.

As seen in Figure 9, the difference in WAIC is positive on most terms, suggesting that PUM tends to provide a better explanation to the U.S. Supreme Court Justices’ voting patterns than the standard dynamic IRT model.

Figure 9
Differences in WAIC scores between a dynamic IRT model and a dynamic PUM. In this graph, ΔWAIC = WAIC(IRT)– WAIC(PUM), so that positive values of ΔWAIC indicate that PUM provides better complexity-adjusted fit to the data.
Quality control
Packages submitted to CRAN undergo rigorous testing and quality control. Details are available at https://cran.r-project.org/web/packages/submission_checklist.html. In particular, pumBayes successfully passed the CRAN R CMD check. Results from this check can be found on CRAN (https://cran.r-project.org/package=pumBayes). Post-deployment quality control is ensured by CRAN’s automatic compliance system, which checks whether packages are compatible with new releases of the base R system. Incompatible/non-compliant packages are archived unless issues are resolved within a few weeks of each release.
This manuscript was written as a fully reproducible document with embedded code that is executed every time the file is compiled using knitr and laTex. The examples presented in the sections “Fitting static PUMs using pumBayes” and “Fitting dynamic PUMs using pumBayes” faithfully reproduced results included in earlier peer-reviewed publications [30, 32]. These detailed instructions enable users to replicate these results and, therefore, ensure that the package operates correctly. Furthermore, these two sections include comparisons against alternative methodologies that serve as additional correctness checks.
(2) Availability
Operating system
pumBayes can be run on any operating system that supports R.
Programming language
R 3.6.0 or above.
Additional system requirements
None.
Dependencies
pumBayes requires the following R packages:
Rcpp
RcppArmadillo
RcppDist
mvtnorm
RcppTN
pscl
List of contributors
All authors contributed to the software.
Software location
Code repository
Name: CRAN
Persistent identifier: https://cran.r-project.org/package=pumBayes
Licence: GPL-3
Publisher: Skylar Shi
Version published: 1.0.0
Date published: 2025-05-30
Language
English
(3) Reuse potential
pumBayes streamlines the applications of PUMs to scaling problems, which are ubiquitous in the social sciences. For example, in political science applications, the scaling of voting data to estimate the ideology of members of deliberative bodies is prevalent, with the estimates often being used as either predictors or as outcomes in regression and classification problems. But while the package (and the illustrations) presented in this paper are geared towards political science applications, the methods are quite general and can be applied to other settings (such as marketing or non-cognitive measurement applications) in which we might expect non-monotonic response functions. Hence, there is extensive potential for reuse by researchers and practitioners in the social sciences.
Acknowledgements
The authors would like to thank Kevin M. Quinn for help using the R package pscl.
