Best Practices and Guidelines with Respect to Psychometric Consumer Reported Outcome Measures for Use in Research on Tobacco- and Nicotine-Containing Products – A Consensus-Based Approach

Stacey McCaffrey; Esther F. Afolalu; Thomas Salzberger; Christelle Chrea; Saul Shiffman

doi:10.2478/cttr-2025-0014

Full Article

INTRODUCTION

Self-report plays a key role in data collection in research on tobacco- and nicotine-containing products (TNPs), including tobacco harm reduction (1,2,3) and public health research (4), behavioral research, and regulatory research (5–6). Consumer reported outcome measures (CROM) are measurement instruments that can be divided into two fundamentally different forms. We refer to CROM intended to measure observable characteristics and behavior as “Descriptive CROM.” Examples of Descriptive CROM include items pertaining to TNP use, such as the average number of cigarettes smoked per day, or to demographics. In contrast, CROM intended to measure underlying (‘latent’) attributes that are not directly observable are referred to as “Psychometric CROM.” Examples of Psychometric CROM commonly used in TNP research include product perceptions (e.g., relative and absolute risk perceptions), responses to the product use (e.g., psychological dependence, withdrawal symptoms, sensory effects, liking/satisfaction), claim perceptions (e.g., believability of a modified risk claim), and impact on quality of life. In a Psychometric CROM, a measurement of a latent variable can be inferred from the items, which are manifestations of the latent variable. This contribution is to codify best practices and guidelines for the selection, development, modification, and implementation of Psychometric CROM in research on TNPs.

Currently, only one guidance document exists pertaining to CROM in the tobacco product space; this is the United States (U.S.) Food and Drug Administration (FDA) Center for Tobacco Products tobacco product perception and intention (TPPI) study guidance (7). The scope of the TPPI study guidance is confined to providing general recommendations related to the development, adaptation, and use of measures of perception, intentions, and understanding in tobacco product perception studies conducted as part of applications to U.S. FDA. It does not address other Psychometric CROM, or the use of CROM in research other than TPPI regulatory studies submitted to this U.S. regulator. Therefore, additional guidance that pertains to other types of Psychometric CROM (e.g., psychological dependence, withdrawal symptoms, satisfaction, etc.), and the use of CROM in contexts other than TPPI studies submitted to the U.S. FDA is needed.

The absence of additional guidance around CROM in the TNP regulatory space in the U.S. and globally is noteworthy given the need for CROM data to support regulatory filings, such as premarket tobacco product applications and modified risk tobacco product applications in the U.S. While there are several guidance documents related to patient reported outcomes (PRO) measures developed by FDA’s drug evaluation center (8, 9), these documents may not be directly applicable to CROM in the TNP space. Consequently, these guidelines were developed to provide comprehensive support for academic, industry, and regulatory researchers using Psychometric CROM in studies evaluating TNPs.

METHODS

Guideline development was initiated in 2020 by a CORESTA (Cooperation Centre for Scientific Research Relative to Tobacco) CROM (10) working group (WG) comprised of 11 researchers from different TNP companies with diverse expertise and backgrounds, including psychometrics, PRO measures, survey methodology, and TNP use behaviors. As a starting point for guideline development, WG members reviewed 48 different documents (see Supplementary Material), including peer-reviewed publications and publicly available guidelines and best practices published by other prominent organizations from related fields.

The WG adopted a consensus-based approach inspired by prominent outcomes research organizations, such as The International Society for Pharmacoeconomics and Outcomes Research. Development of the guidelines was a collaborative, iterative process, and the WG sought feedback from the research community, specifically from subject matter experts (SMEs) with diverse perspectives and expertise representing measurement science, public health, academia, and the tobacco industry throughout the guideline development process. Outlines and drafts of the guidelines were also presented and discussed at various health and tobacco research conferences. Based on feedback and suggestions provided, the guidelines were updated and revised (11,12,13).

RESULTS

The guidelines and best practices related to the use of Psychometric CROM in TNP research include four sections. The first section, Defining the construct to be measured and identifying the ideal CROM characteristics, is considered a critical first step in the use of Psychometric CROM and is intended to facilitate the researcher’s decision as to whether an existing “off the shelf” CROM is appropriate for their study (without any additional modifications or testing), whether an existing CROM might be modified to meet the researcher’s needs, or whether it is necessary to develop a new CROM for purposes of the study. It should be reviewed before the other sections. The second section of the guidelines presents best practices for modifying an existing Psychometric CROM and recommendations for empirical evidence to support the modification. The third section provides an overview of the general stages of Psychometric CROM development and validation, including best practices for executing each stage. Finally, the fourth section presents considerations for the implementation of Psychometric CROM in research, including considerations pertaining to scoring and interpretation of the scores and findings generated from the CROM.

Section 1: Defining the construct to be measured and identifying the ideal CROM characteristics

The definition and conceptualization of the construct to be measured should be the starting point of a research study that requires a Psychometric CROM, as it dictates the content of the CROM needed to address the study’s objective. However, the context of measurement (e.g., the population being studied, the mode of administration of the CROM), driven by the objectives of the study, is equally important when it comes to considering the qualities of the ideal CROM for a study. Often, several Psychometric CROMs to assess a construct exist (e.g., risk perception), and therefore, the exercise described here of defining the construct to be measured and identifying the ideal CROM characteristics can help the researcher determine which existing CROM would be the most appropriate fit for the study.

As a hypothetical example, in a longitudinal study lasting 6 months with a 30-day assessment schedule, a researcher may want to evaluate self-reported changes in respiratory symptoms when adults who smoke and are not diagnosed with pulmonary disease switch completely from combustible cigarettes to another TNP product. The characteristics of the ideal Psychometric CROM for this study would be a CROM assessing respiratory symptoms that is also appropriate for use with the target population, e.g., covering a range of relevant respiratory symptoms relevant for a non-diseased population of people who smoke. Additionally, the CROM should have an appropriate recall period (i.e., 30 days) and should be sensitive to detect changes in respiratory symptoms expected to occur following switching within the timeframe of the study.

Engaging in this exercise of specifying the features of the ideal CROM may result in one of three possible outcomes:

(1)
an existing CROM is appropriate, with or without any additional testing,
(2)
an existing CROM appears to be applicable but needs to be modified to meet the study’s needs, or
(3)
a new CROM needs to be developed.

To illustrate using the example above, the researcher conducts a review of peer-reviewed literature, national surveys, and CROM databases but is unable to identify an existing CROM appropriate for use with a non-diseased population of people who smoke. That is, existing tools were developed and validated specifically for use with clinical populations, such as those with asthma or chronic obstructive pulmonary disease (COPD) (e.g., St. George’s Respiratory Symptom Questionnaire (14)), hence, their content focuses on severe respiratory symptoms that are inappropriate for the study population. Importantly, such a mismatch of the severity of the CROM’s items and the study participants’ symptoms may lead to reduced measurement precision and a profound floor effect, inhibiting the researcher’s ability to detect reductions in respiratory symptoms over time. Therefore, the researcher decides to develop a new CROM.

Even a commonly used, validated CROM with relevant content may not be the correct choice for a study if it does not align with what the researcher needs to measure within the study context. A CROM is not ‘valid’ for all uses and applications, as validity data applies to a specific context of use. Therefore, if the intended context of use of a Psychometric CROM substantially departs from the study’s needs, the existing CROM may not be appropriate in its current form and modifications or, at least, some additional testing may be necessary. An instance where additional testing of a CROM may be warranted is if a psychometric property that is considered critical for the study has not been evaluated (e.g., sensitivity to detecting change over time in a longitudinal study). Whether additional testing is indicated also depends on the role the Psychometric CROM plays in a study; if it addresses a primary objective, the researcher may place greater importance on the rigor of the validity evidence for the Psychometric CROM, than if it were a secondary, tertiary, or exploratory objective.

Several other key aspects should also be considered at this stage. Firstly, CROM are typically developed and validated in a specific language. If the study’s language differs, a translation or linguistic validation may be necessary. Secondly, the reading level of a CROM should be taken into consideration, especially if the CROM will be administered to individuals with low health literacy or limited language skills. Thirdly, CROM are usually developed and validated for specific age groups, such as adults or adolescents. A CROM solely validated for adults should not be administered to adolescents without further research confirming its applicability. Lastly, CROM may be designed for specific TNPs (e.g. cigarette) or TNP user groups (e.g., former cigarette smokers). Thus researchers should be cautious when using the CROM outside its intended group without necessary adaptations and supporting evidence.

Table 1 includes considerations when determining the optimal Psychometric CROM characteristics for a particular study. This list is not exhaustive but highlights some key factors for consideration. Except for the first consideration (definition of the construct to be measured), the considerations are not listed in order of importance, as this will be dictated by the study requirements.

Table 1.

Considerations when defining the Psychometric CROM characteristics of greatest importance for a particular study.

Consideration	Description / examples
Definition of the construct to be measured	What is the concept to be measured, and how is it defined? What are the components/aspects of the construct that should be represented in the CROM? How broad should the scope of the CROM conceptually be (particularly relevant for multi-faceted constructs, such as quality of life)? Is the construct likely stable (a trait) or unstable (a state) over time? Especially for constructs that vary over time, is a specific timeframe required (e.g., urge to smoke “over the past 7 days” vs. “right now”) as part of the construct’s definition?
Score interpretation	Is the researcher looking for a single total score, or separate scores to reflect the different aspects of the construct? What sort of guidelines for score interpretation are desired?
Defining context of use within the study	What target population do the participants in the study represent (e.g., adults who smoke cigarettes)? An ideal CROM would have psychometric evidence supporting its use with participants representing the target population. Will the CROM need to be appropriate for any subpopulation(s) of interest (e.g., those with limited health literacy, youth)? What is the study type and study design (e.g., cross-sectional vs. longitudinal; if longitudinal, over what duration)? Will the CROM be applied to different products (candidate and comparator products)? Is the intent to make quantitative comparisons between products?
Psychometric functioning	What are the psychometric properties of greatest importance within the context of the study (e.g., ability to detect change, equivalence of scores across product categories and/or user groups, predictive validity, etc.)?
Administration considerations (mode/method of administration)	Does the study require electronic administration? If so, on devices with different screen sizes? Does the CROM require electronic administration/scoring (e.g., computer adaptive testing, display of digital images, skip logic)? Does administration of the CROM involve conditions that might affect participant responses (e.g., responding to CROM in front of study staff)? Does the CROM need to be self-administered or interviewer-administered? If administered repeatedly, how frequently will the CROM be completed? Is assessing change over time part of the objective? Does high frequency of administrations impact on data quality (lack of engagement and attention resulting in straightlining, increased frequency of missing responses, etc.)? Are there study restrictions regarding the length (time required for administration) of the CROM? In which countries / languages will the CROM be administered?
Accessibility of the CROM	Licensing fees, permission to use, copyright clearance

Section 2: Modifying an existing Psychometric CROM

This section presents definitions and examples of modifications that a researcher might make to an existing CROM. CROM modifications can vary greatly in type (i.e., changes to content, administration, and/or application) and extent (i.e., minor, moderate, substantial). Additionally, it discusses qualitative and quantitative strategies that can be used to gather evidence to support the modifications, as well as the factors that influence the type and extent of evidence recommended to support the modifications.

Types of CROM modifications

CROM modifications can be classified as related to content, administration, and application. Content modifications include alterations to the CROM instructions, sentence stems, item wording, response options, etc. Administration modifications imply an adaptation of how the CROM is presented to the study participants, whereas application modifications relate to changes in the population or product to which the CROM is applied. Table 2 provides (non-exhaustive) examples of all three types of modifications.

Table 2.

Types of Psychometric CROM modifications.

Type of modification	Illustrative examples (non-exhaustive)
Content: Modifying the instructions, items, and/or response options	Changing instructions and/or item content to reference a different product category (e.g., “electronic nicotine delivery systems (ENDS)” instead of “cigarettes”, updating language/terminology) Adding item(s) Removing item(s)/only administering a subset of items Adding images to items to improve clarity/comprehension Changing the recall period (e.g., “in the past 30 days” to “in the past 7 days”) Removing or introducing a response option of “I don’t know” Adding response labels so that a scale is fully labeled Changing the number of response categories Changing response category labels
Administration: Changing the mode, method, and/or format of administration	Administering a CROM developed for paper-and-pencil electronically Changing the method of administration from self-report to interviewer-administered Modifying a CROM to fit a small screen device (smartphone) by administering one item per screen instead of the items together as a grid Changing a rating task (asking the participant to respond to each item by selecting a value on a numerical rating scale) to a drag-and-drop task Changing the order of item administration (fixed order vs. randomized)
Application: Applying the CROM in a new way, such as to a new population or product other than the one for which it was originally developed/ validated	A measure of smoking susceptibility developed to assess susceptibility among youth is used to assess smoking susceptibility in adults A measure of cigarette dependence developed for use with adults who smoke cigarettes is administered to people who use ENDS to assess dependence on ENDS Translating a CROM into a different language and administering it to a new population (i.e., individuals whose primary language differs from languages in which the CROM has been validated) Administering a CROM to individuals from another culture (i.e., individuals whose cultural background differs in a relevant way from the background of individuals for whom the CROM was originally validated)

The three types are not mutually exclusive as a CROM modification may impact multiple areas. For instance, modifying a CROM for use with a new population would likely include both content and application modifications.

Extent of CROM modification

The extent of CROM modifications can, in principle, be mapped onto a continuum ranging from very minor to very substantial. That said, the distinction between gradations of modification severity is only relevant to the extent that it provides direction regarding the scope of the evidence needed to support the modification. Thus, the guidelines propose two broadly defined classes of “Minor” or “Substantial” modifications (Table 3). The key distinction is whether the modification changes the interpretation of the CROM content by the participants and, consequently, their responses. A Minor modification is not reasonably likely to impact interpretation and response to the CROM above and beyond changes to interpretation and response that are a result of improving clarity/reducing measurement error. Conversely, Substantial modifications are reasonably likely to change interpretation and response to the CROM.

Table 3.

Recommendations pertaining to CROM modifications.

Modification	Minor	Substantial
Definition	Modifications that are not reasonably likely to impact end-users’ interpretation of CROM content and response to the CROM, above and beyond changes to interpretation and response that are a result of improving clarity/reducing measurement error. ^a	Modifications that could reasonably change end-users’ interpretation of the CROM content and response to the CROM items.
Examples	Making the text bold and underlining the recall period in the instructions (“In the past 7 days”) for visibility and emphasis Changing font size or font style Adding additional clarifying language to an item or instruction without changing the substance Adding an image of the product being referenced Administering a paper-and-pencil CROM electronically, without changing the presentation of the CROM Administering items forming a single dimension from a multi-dimensional CROM ^b	Administering a subset of items from a unidimensional CROM (developing a “short form”) Changing the type of response task (e.g., a numerical rating task is changed to a drag-and-drop task) Changing the type of response scale (e.g., from a 5-category fully labeled scale to a visual analog scale, from 5-point descriptive response scale to 11-point numerical rating scale) Changing the content of the response scale (e.g., replacing a frequency scale with an intensity scale) Adding items to a CROM Administering the CROM to a population for which it was not developed (e.g., a measure of cigarette dependence developed for use with adults who smoke cigarettes is administered to adolescents) Applying the CROM to TNPs for which it was not developed (e.g., a measure of cigarette dependence is administered to individuals who use ENDS to assess dependence on ENDS) Translating a CROM into a new language and administering it to individuals who speak this new language
Recommended approach(es) to support modification	Generally, no evidence is needed In certain circumstances, qualitative evidence may be helpful (e.g., to ensure that new clarifying language added to instructions is clear) Usability testing may be helpful when modifying a paper-and-pencil CROM for electronic administration	Qualitative and/or quantitative evidence is always recommended Quantitative evidence is recommended to support development of a short form ^c If CROM content is substantially changed (e.g., changing response task or response scale, adding new items), either (or both) qualitative and quantitative evidence could be used to support the modification ^d If scores from two versions of a CROM are being directly compared in a study (e.g., ENDS dependence vs. cigarette dependence), quantitative evidence is recommended Quantitative and/or qualitative evidence is needed when administering a CROM to a new population (e.g., youth vs. adults) or product Qualitative and in some cases quantitative evidence is recommended when translating a CROM into a new language

a

Often, minor modifications are made with the explicit intention of correcting inaccurate interpretation or misunderstanding (reducing measurement error), which may subsequently correct interpretation.

b

This modification would be considered Minor, assuming that the items from that dimension are scored and interpreted separately from the remaining items comprising the full CROM. If the researcher is dropping items of a unidimensional CROM to create a short form, impacting scoring, this would generally constitute a Substantial modification.

c

Qualitative strategies may also be helpful, such as asking SMEs to review the items comprising the new short form to ensure that no critical content from the long form of the CROM is missing.

d

Depending on the modification, qualitative evidence is generally helpful to ensure that participants understand the new content, such as the new response task (e.g., drag-and-drop task), type of rating scale (e.g., participants perceive that the new response categories reflecting intensity make sense given the construct being measured and are the appropriate level of granularity), or new item(s). In many cases, quantitative evidence is recommended to verify adequate psychometric functioning of the modified CROM (e.g., that the response categories are ordered, that new/modified items are internally consistent with other items and loading onto factors as anticipated, etc.).

In practice, it may not always be obvious whether a given modification is Minor or Substantial, as some modifications may seem “Moderate.” However, such a classification is not conclusive, as ultimately the researcher will need to decide whether additional evidence will be collected to support the modification. If in doubt, collecting qualitative and/or quantitative evidence can be helpful to support modifications even when it may not be considered necessary. Of note, when modifying content of an existing CROM, it is recommended to clearly document such changes as well as their rationale.

An example of a Minor modification could be adjusting the question by adding clarifying language to “In your opinion, how harmful is smoking to your health?” by inserting the word “cigarette” (i.e., “In your opinion, how harmful is smoking cigarettes to your health?”) to reduce potential confusion that the item may refer to another product (e.g., cigars). Of note, by correcting misinterpretation, Minor modifications may indeed have an impact on participants’ responses to the CROM by reducing measurement error caused by misunderstanding. A similar example of a Minor modification would be adding an image of the product to the CROM being referenced to reduce response error.

Conversely, if a CROM with a 4-point descriptive response scale (say, from “not at all likely” to “extremely likely”) is modified so that it now uses an 11-point numerical rating scale (from 0% to 100% likely), this is considered a Substantial modification, as the change to the response options may impact how participants think about and respond to the CROM content. Accordingly, the psychometric properties of the CROM may change significantly. At the very least, measurements based on these different response scales are generally not comparable. For example, a participant who previously said “not at all likely” may select any number of responses when provided with a more granular numeric rating scale (e.g., 0%, 10%, 20%), as the granular scale may allow them to express their perception of likelihood more effectively. Therefore, as with other Substantial modifications, such a modification warrants additional testing.

Recommended types of evidence that can be gathered to support the modification

Overview of approaches to gather evidence

For many instances of CROM modifications, especially when content is modified, conducting cognitive debriefing interviews (15,16,17) to qualitatively assess understanding and interpretation of the modified CROM would be helpful and is recommended. Usability testing may also be useful in certain circumstances, such as when modifying CROM formatting to fit a small screen electronic device (smartphone). Quantitative evidence typically requires larger data sets and the application of statistical analyses assessing psychometric properties. Researchers may choose to leverage both qualitative and quantitative strategies to ensure data quality, interpretability, and comparability, particularly when Substantial modifications are made.

Factors influencing type and extent of evidence recommended to support the modification

Generally, the type and extent of evidence recommended to support a CROM modification is influenced by two factors. The first factor is the extent of the modification (Minor or Substantial). Minor modifications generally do not necessitate additional testing; however, the conservative approach would be to conduct cognitive testing to verify that the modification did not negatively interfere with accurate interpretation of the CROM. Conversely, collecting evidence to support a Substantial modification is typically recommended. In the extreme, modifications to a CROM may be so substantial that the CROM might reasonably be considered a “new” CROM, as opposed to being “modified” from the original; in such instances, the researcher should consider following subsequent recommendations applicable to new Psychometric CROM development (see Section 3).

The second factor is how the modified CROM will be used and interpreted. For instance, whenever measurements based on the original CROM and the modified CROM are to be compared, evidence of psychometric equivalence is strongly recommended. If, for example, a CROM assessing cigarette dependence is modified to reference electronic nicotine delivery systems (ENDS) dependence and both versions of the CROM are being used in a study to directly assess differences in cigarette and ENDS dependence as a primary study objective addressing a regulatory requirement, quantitative evidence of measurement equivalence would be needed to support comparability. The same is true for comparisons of measurements from people who smoke cigarettes and people who use ENDS. The main purpose of the analysis is to avoid any bias in the measurements. Commonly applied approaches for testing and supporting the application of a CROM to a new product or population and the comparability of measurements include multi-group analysis (18,19,20) when applying confirmatory factor analysis (21–22), and differential item-functioning when using Rasch measurement methods or item response theory (18,19,20). Some examples of quantitative assessments of psychometric equivalence in the TNP space are documented in the literature (23,24,25,26).

Linguistic/cultural adaptations

If the original CROM is to be applied in a cultural context other than that for which it is validated, a cultural adaptation is required. A cultural adaptation is always a Substantial modification since the cultural background profoundly impacts how the CROM is interpretated by the respondents. A change of culture typically also involves a change of language as a key aspect of culture. Therefore, a cultural adaptation usually also involves a linguistic adaptation that ought to ensure the CROM’s conceptual and content equivalence and preserve the psychometric meaning of the items.

A researcher should attempt linguistic/cultural adaptations only with support from relevant experts or organizations specializing in this type of work, due to the numerous factors that need to be taken into consideration to maintain integrity of the CROM (10, 27,28,29,30,31). The process typically requires a mix of professional translators, who are bilingual, fluent or native speakers of the languages involved. Consultation of SMEs or developers of the CROM also contributes to the quality of the linguistic validation. A Translatability Assessment (TA) (32) conducted during a CROM’s development stage generally raises the chance of a successful linguistic adaptation.

Linguistic validation typically starts with a clear explanation of the different items/concepts present in the CROM. It then involves a forward translation from the source to the target language, a back translation into the source language, and subsequent comparisons of the source version and the back-translated version and iterative adaptations of the translation until a linguistically equivalent version is established. Finally, cognitive debriefing interviews with speakers of the target language who also represent the target population may be conducted to help reach a linguistically equivalent version, and to provide qualitative evidence for a successful linguistic adaptation. In addition, psychometric comparability of measurements across the original CROM and translated CROM versions requires additional quantitative evidence of measurement equivalence.

Section 3: Developing and validating a new Psychometric CROM

When existing Psychometric CROM do not satisfy the researcher’s needs and modifying an existing CROM would not be sufficient, it may be necessary to develop a new Psychometric CROM (or so substantially modify an existing Psychometric CROM that it could arguably be considered a “new” CROM). This determination would be made based on the outcome of the exercise described in Section 1.

This section of the guidelines provides an overview of the general stages of Psychometric CROM development, including best practices for executing each stage. The specifics of a CROM’s development and validation will (and should) be idiosyncratic and iterative, depending on the construct to be measured and the qualities of the Psychometric CROM that are most important.

Conceptual model development

The starting point should always be a well-considered definition of the construct to be measured by a multi-item Psychometric CROM. By completing the exercise described in Section 1, the researcher would have already started the conceptual model development process. The conceptual model describes the core content of the construct and identifies various components of the construct that need to be represented in the CROM.

This section begins by introducing the reader to a conceptual model through hypothetical examples. Next, the reader is provided with a high-level overview of various strategies that could be used to develop a conceptual model.

General principles

Building upon the definition of a construct, a conceptual model, generally depicted in the form of a figure or diagram, presents the key content/components to be measured by the CROM, as well as the theoretical structure of the concept of interest. For example, a conceptual model of psychological dependence might include components such as craving, withdrawal, tolerance, and perceived loss of control. Having a conceptual model can help to ensure adequate construct representation during item generation, that is, the content of the new CROM reflects all critical parts of the construct. A conceptual model should also include information about the theorized structure, which should be empirically evaluated during later stages of the CROM development process. If confirmed quantitatively, this structure will inform scoring and interpretation.

A new CROM’s content covering all components from the conceptual model provides evidence of adequate construct representation and content validity. Conversely, if the new CROM’s content does not cover all components from the conceptual model, the CROM may have construct underrepresentation, threatening its validity.

Methods to develop a conceptual model

Qualitative, quantitative, and mixed-methods approaches can be used to develop a conceptual model. For example, a researcher might choose to use one or more of the following: surveys, individual interviews with SMEs or individuals representing the target population, focus groups, a card sorting task, social media analysis, literature review, etc. The purpose of these approaches is to identify and gain in-depth information about relevant aspects, domains, and facets of the concept(s) of interest, such as, in our example above, the experience of dependence among people who use TNPs. While the involvement of subjects representing the target population is highly recommended, experts in the field, or key opinion leaders, also provide valuable assistance when it comes to synthesizing evidence from multiple sources. Finally, regulatory perspectives may also inform the development of the conceptual model, if applicable.

There is no “right” way to develop a conceptual model that fits all circumstances. The approach that the researcher chooses to pursue will likely depend on various factors, including the construct and complexity of the construct to be measured, extent of peer-reviewed literature published on the topic (which could be leveraged to inform conceptual model development), the regulatory need that the CROM is being used to address (if applicable), etc. See Supplemental Material for hypothetical examples of conceptual model development. Interested readers are referred to other source documents for additional information (33,34,35) and for examples of conceptual model development in the TNP space, see (36–37).

Item generation and CROM drafting

Drafting the CROM includes developing any instructions, items, and response options (here collectively referred to as “CROM components”). Item generation should aim to develop an adequate range of items to cover the breadth of content within the concepts of interest defined in the draft conceptual model. If prior qualitative research was conducted with the target population, items can be constructed using as many of the respondents’ own words and descriptions of the concepts of interest as possible and appropriate to strengthen relevance and content validity of the CROM, and to facilitate comprehension of the items. At this stage, the researcher should also consider the intended mode and method of administration (e.g., if a CROM is to be administered on the participant’s preferred device, the CROM should be drafted in a way that retains its integrity across screen sizes). Considering plans for CROM administration early in the CROM drafting process can help prevent having to modify the CROM later.

Using a table or tracking matrix is recommended to document CROM component sourcing (if applicable), and to track revisions, additions, and removal of items and rationale for changes as the researcher moves through the remaining phases of the CROM development process.

At this stage, a researcher may also wish to start proactively preparing and planning for linguistic/cultural adaptation as part of the development process by conducting a TA (32) as part of the item generation process. This should be considered even if an intercultural application is not immediately intended as it broadens the scope of the CROM’s applicability in the future.

When drafting a new Psychometric CROM it is crucial to follow best practices (7, 9, 38,39,40). These include starting with an item pool that adequately represents the conceptual model, using simple and direct language or images to aid comprehension, and avoiding jargon, slang, and biasing language. Each item should focus on a single concept, avoiding “double-barreled” questions (questions that ask about more than one topic) and hypothetical questions. The recall period should be relevant. Response options should align with the construct being measured, cover all potential responses, and be distinguishable. Biasing response labels should be avoided. Skip logic can be used to avoid the use of "not applicable" response options. Bipolar scales should be symmetrical, and “I don’t know” should be visually distinct and placed last in the response set.

Measurement precision is an important aspect for consideration when drafting a CROM. Measurement precision, or the CROM’s ability to distinguish between participants with similar amounts of the construct being measured, is optimized when the CROM content is well-targeted to the end-users (i.e., as with the respiratory symptom CROM described in Section 1, this means that the CROM asks about mild-to-moderate respiratory symptoms that are likely applicable to people who smoke without COPD). Longer CROM with very granular response options will not necessarily lead to higher measurement precision, e.g., if items are redundant, if the items are not well-targeted to the end-users, if participants cannot distinguish between the response options (leading to guessing and increased measurement error), if the length or complexity of the CROM is too cognitively burdensome (reducing CROM acceptability and quality of respondent data), etc. That said, during CROM development, the researcher may intentionally start with a larger item pool, including similar (or potentially redundant) items, as it is easier to remove items based on quantitative evaluation than to add new items, which may require additional qualitative testing.

Of note, these recommendations pertain to the drafting of a single CROM (e.g., respiratory symptoms, or psychological dependence CROM). Additional recommendations and considerations when developing a survey (combining multiple CROM) for a research study are discussed in Section 4.

Redefine the draft CROM through cognitive testing

General principles

For a new Psychometric CROM, individual cognitive debriefing interviews with individuals who represent the target population of the CROM are highly recommended (41). Participants should include individuals who represent the full range of the target population, being sure to include socio-demographically diverse participants and those who have limited health literacy, as appropriate. Through cognitive testing, the researcher can determine whether all components of the CROM are understood and interpreted as intended, whether items are perceived as applicable and relevant, whether relevant content may be missing, etc. In brief, cognitive interviews provide the researcher with the opportunity to refine the CROM (enhance content validity, improve measurement precision, and avoid problems such as misinterpretation or bias).

Measurement challenges in TNP research addressed with cognitive testing

There are specific measurement challenges often faced in TNP research that researchers can attempt to mitigate through cognitive testing. For example, cognitive testing or other qualitative research strategies can help verify that appropriate language is being used within the CROM with respect to products, behaviors, and product descriptions (which may be especially important in instances where the CROM references new or unfamiliar products). Another challenge is the potential for social desirability to bias responses if participants perceive that a particular response is socially acceptable, or conversely, that a particular response is socially disapproved (e.g., participants mayfeel that it is not acceptable to express no intention to quit smoking). Such challenges can often be identified and addressed through cognitive testing, reducing measurement error, and avoiding additive biases.

Methodological considerations for the conduct of cognitive testing

Here we briefly list general best practices for cognitive testing when used to develop and refine CROM for use in TNP research. General guidance related to the conduct of these interviews and analysis of cognitive interviewing data can be found elsewhere (15,16,17). Cognitive testing should be conducted individually with participants representing the target population, and participants should be appropriately diverse with respect to potentially relevant variables (e.g., TNP use, demographics, health literacy). If interviews are conducted virtually, the interviewer should observe the participant completing the CROM via video and screenshare to allow for behavioral observations, such as when the participant hesitates when formulating an answer. Interviewers should be experienced with cognitive interviewing techniques and use a semi-structured guide, which permits deviations as appropriate. Think-aloud protocols asking the participants to verbalize everything that comes to their mind when completing a proposed CROM may be useful in early rounds of cognitive testing. However, in later rounds it can undermine realism by prompting participants to think more deeply than they naturally would, potentially leading to responses that do not reflect typical behavior. Although there are several different approaches to probing, it is generally recommended that, at least during final waves of testing, retrospective probing be used for purposes of enhancing realism and generalizability of findings. The administration of the CROM during cognitive testing should mimic the intended mode and method in which the CROM will be administered in the study. Testing should continue until saturation has been reached, which is defined as the point at which additional interviews seem unlikely to yield new or useful information (9). This is typically determined by tracking feedback from several participants at a time (e.g., 6–8) using a saturation tracking table. While it is not uncommon to reach saturation after approximately 25–30 participants, the number of interviews needed depends on various factors, such as the heterogeneity of the end-users, and cannot be determined in advance. Modifications to the CROM should be documented and tested in subsequent interview waves.

Quantitative methods to evaluate key psychometric properties of the CROM

After cognitive testing, it is typically recommended that the researcher conducts a quantitative study to assess relevant psychometric properties of the new CROM. The psychometric properties of greatest importance will dictate the design and analysis plan of the quantitative study. For example, if test-retest reliability, ability to detect change over time, or predictive validity are of great importance, then the quantitative study may be a prospective longitudinal study where the CROM is administered multiple times. The sample for a quantitative psychometric evaluation typically includes individuals representing the target population in which the CROM will be administered. That said, other groups may be included for purposes of establishing further psychometric properties (e.g., known-group validity, which is a CROM’s ability to distinguish among groups known to be distinct (42)). In contrast to other quantitative studies (e.g., TNP use prevalence studies) where the purpose is to generate a population-level estimate, the researcher typically needs not aim to have a sample that is representative of the population as a whole, but instead may target specific populations for purposes of facilitating the psychometric evaluation (e.g., imposing a soft target for the minimum number of female persons who complete the study to permit evaluation of item functioning across gender).

Detailed recommendations regarding the conduct of an appropriate psychometric evaluation are beyond the scope of these guidelines. It is always recommended that researchers work with appropriately qualified experts in designing and executing quantitative psychometric evaluations. Some examples of psychometric frameworks for CROM evaluation include traditional Classical Test Theory (43) and more recent approaches, such Item Response Theory (44) or the Rasch Measurement Model (45–46). Interested researchers may consult other publications and documents on the topic (e.g., (9, 47,48,49). Recent publications presenting the development and validation of CROM in the TNP space may also be helpful reference documents (25, 36, 50,51,52,53). Whenever feasible, we recommend validating a (modified or newly developed) CROM before implementing it in a study. The CROM validation process can necessitate changes to the CROM, provide direction for scoring, and ensure that the CROM in its final form functions properly, meets the requirements of the study, and is therefore suitable for its intended purpose. That said, there may be instances where it is not possible to conduct a validation study before it is used in a study. In such cases, data collected in the study may be used to assess the CROM’s psychometric properties; however, there are substantial limitations with such an approach. Validation studies often benefit from additional data collection that may not be feasible as part of such a study. For example, in the validation study, the researcher may

(1)
administer other CROM alongside the new/modified CROM for purposes of assessing convergent or discriminant validity,
(2)
administer the new/modified CROM at various assessment timepoints to assess sensitivity or predictive validity, and/or
(3)
administer the new/modified CROM to additional populations not included in the validation study for purposes of establishing known-groups validity.

More critically, results from the validation analyses may reveal that the CROM is not functioning as expected or requires modification, issues which would no longer be possible to address. If a combined study is the only option, it should include all necessary variables for CROM validation and consider sampling requirements to meet both study objectives. Further, over-recruiting for the study can allow a random a random subset of data to be set aside for CROM validation.

To facilitate appropriate implementation of the CROM in future studies, development of a User’s Guide is strongly recommended. It should include the final CROM with administration instructions (e.g., recommended mode and method of administration, notes for electronic administration (e.g., item presentation, “back” button, skip logic, forced responses)), as well as recommendations related to the handling of missing data, scoring, and interpretation. All information in the User’s Guide should be science-based and derived from the CROM’s development and validation studies. Finally, the User’s Guide should include any relevant copyright or licensing information, available languages, and the process for new translations.

Section 4: Application and interpretation of a Psychometric CROM

This final section of the guidelines contains recommendations pertaining to the application of CROM in research, including cautions and common pitfalls that could jeopardize the validity of the data collected. At this stage, it is assumed that the researcher completed the exercise described in Section 1 of these guidelines, and that psychometric functioning of Psychometric CROM has been evaluated in accordance with earlier sections of these guidelines.

Implementation of a CROM

When it comes to implementing a CROM, established recommendations, such as those outlined in a User’s Guide, should be carefully followed to preserve integrity of the CROM data. If the researcher intends to deviate from such recommendations, recommendations outlined in Section 2 (Modifying an existing Psychometric CROM) should be followed.

Empirical evidence should support the intended mode and method of CROM administration. If the researcher intends to combine data from different modes/methods in a single study (e.g., some participants complete the CROM electronically and others by paper-and-pencil), empirical evidence of comparability across modes/methods is strongly recommended.

If a CROM is interviewer-administered, either face-to-face or over the telephone, consistency between interviewers and adherence to interviewer-guidelines (e.g., instruction to read all questions in full, how to probe, etc.) is crucial. In cases where the interviewer is making a judgment that shapes the data, such as coding a response to a category, the consistency of coding between interviewers becomes an important part of the psychometric performance of the CROM and must be empirically established.

Related to the mode of administration, but typically not covered in CROM guides, is the context of administration, i.e., the setting in which data collection takes place, which may affect validity. For example, participants may feel uncomfortable when responding to a CROM in front of study staff or in the presence of family members, which could impact responses to the CROM. Contexts that do not provide privacy or anonymity may be particularly prone to social desirability bias.

Psychometric CROM are often implemented in multi-CROM surveys. Such surveys must be carefully constructed, as factors such as the order in which the CROM are administered (potentially resulting in order effects) and survey length can impact the data. First, the order of CROM administration within a survey must be carefully considered. While in rare instances the researcher may randomize CROM presentation within an electronic survey to try to reduce the potential for order effects, there are numerous reasons why a researcher should instead choose to strategically fix the order of CROM administration within a survey. For example, CROM within the same survey may differ in important design aspects, that could spark confusion or, if not noticed by participants, could lead to systematic measurement error. This may be particularly problematic if response option polarity “flips” between CROM, if CROM have different recall periods, if CROM instructions ask participants to respond in different ways (e.g., ask participants to respond based only on the information presented vs. based on their opinion; give participants permission to guess [“take your best guess”] vs. utilize a [“don’t know” response], etc.). The researcher might also choose to strategically place CROM in a certain order because exposure to content of one CROM would likely affect responses to CROM administered later, e.g., if risk perception CROM are administered before behavioral intention CROM. If administered randomly, such factors could reasonably increase measurement error within a survey. Furthermore, the measurements of different participants who complete the sequence of CROM in different orders would no longer be easily comparable.

Second, long surveys can lead to excessive response burden and fatigue, which can compromise the quality of the data collected. Researchers should therefore try to limit survey length to the extent possible. As a rule of thumb, an average duration of no more than about 15 minutes is generally recommended. In longer surveys, if feasible, it is generally advisable to place the CROM most relevant to the study objectives at the beginning when participants may be most alert and able to give their most thoughtful and engaged responses. While incentives may help improve participant retention, they do not protect against rapid, careless responding (“speeding”). Time stamps marking the start and end times of electronic surveys can be used to identify, and potentially eliminate, “speeders”. In addition, validity checks (e.g., checks for straightlining, manipulation checks, attention checks such as instructed response items, consistency checks) can be used to identify problematic responders.

If inattentive response behaviors from respondents are observed in a study, it raises the question of whether to exclude such data. The potential advantage of cleaner data could be offset by reduced statistical power due to the decreased sample size. Data cleaning may also result in differential removal of participants representing certain sociodemographic backgrounds or other potentially relevant subgroups (e.g., those with limited health literacy), which could introduce bias. As data deletion can be controversial, the study protocol or the statistical analysis plan should prescribe any data correction or removal procedure in advance. A sensitivity analysis can also be conducted comparing the results based on the complete and the reduced data.

Scoring and interpretation of the CROM

Ultimately, the purpose of administering a Psychometric CROM is to estimate a latent variable based on observed responses. Researchers should rely on the CROM User Guide (if one exists) or other guidance by the CROM developers to determine CROM scoring, including handling of missing data, transformation of scores (if applicable), etc. If a CROM features multiple domains, it should be clarified whether individual domain scores are to be formed and/or whether a sum score across all domains is permissible (i.e., a total or composite score). Similarly, with respect to score interpretation, researchers should also rely on the User Guide or guidance provided by the CROM developers or other publications.

Documentation

The way in which Psychometric CROM were implemented in a study should be documented in the study report and any related publications. For example, if multiple CROM were implemented in a survey, the order of CROM administration within the survey should be clarified. It should also be stated whether the participants were allowed to skip responses or whether all responses were mandatory. If CROM were adapted/modified for purposes of the study, such modifications and their justifications should be clearly documented. It can also be helpful to include images of the programmed survey as an appendix to a study report, so that readers can see how CROM were administered. Mode and method of CROM administration should be noted, as well as any potentially relevant contextual factors. Finally, deviations to the planned application of the CROM should be reported, and potential consequences or limitations related to such deviations should be described.

DISCUSSION

Consistent with other areas of science, social and behavioral science require objective measurement that is precise, replicable, comparable, and measures what it is intended to measure. Research in the field of tobacco and nicotine should be no exception. When measuring latent behavioral constructs relevant to TNPs, such as psychological dependence or risk perceptions, it is critical that the researcher has reliable and valid measurement tools, i.e. Psychometric CROM. These guidelines equip researchers with recommendations and best practices for the use of Psychometric CROM in the context of TNP research (54). Separate guidelines related to the use of Descriptive CROM in the context of TNP research are reported elsewhere (55).

The validity of conclusions drawn from research studies involving the measurement of latent constructs depends on the validity of the Psychometric CROM used to measure those constructs. Thus, CROM selection, modification, or new development needs to be an informed decision. The guidelines walk the reader through construct definition and development of a conceptual model, which imply the content of the Psychometric CROM needed. The context of measurement determines the qualities and properties of the CROM to ensure valid measurement. An existing CROM should only be used if it meets these requirements. Otherwise, modifications are needed, which can be Minor or Substantial based on their type and extent. The guidelines assist the researcher in determining which modifications may be qualified as Minor and do not require additional evidence, and which should be considered Substantial necessitating qualitative and/or quantitative evidence to support the CROM’s validity. When in doubt, gathering further evidence is always preferable as it will help mitigate limitations and increase the credibility and acceptability of the study data.

If no existing CROM is suitable and a modification of an existing CROM is unlikely to meet the needs of the study, a new CROM must be developed. These guidelines provide general recommendations for the development and validation of a new Psychometric CROM. However, researchers should be aware that the development of a new CROM requires a comprehensive research project that may involve several studies requiring specialized psychometric expertise. Regardless of whether the researcher is using an existing, modified, or new CROM, the implementation of a CROM and the derivation of measurements in a study should always adhere to available specific CROM guidelines and recommendations, or other available evidence. As Psychometric CROM are frequently administered as part of a survey, their implementation also needs to consider survey and study design issues, particularly factors such as survey length and order of CROM administration within a survey. Decisions about which CROM should be prioritized (for example, by administering it earlier when participant attention is generally higher) depend on the purpose of the study and emphasize the interplay not only between study requirements and CROM selection but also with CROM implementation.

In all, these guidelines offer non-binding recommendations grounded in scientific rationale and measurement science regarding the use of Psychometric CROM in TNP research. They do not authoritatively represent the views of regulatory bodies or any guidance that such bodies may publish. These guidelines are not intended to reflect unattainable standards; the researcher is ultimately responsible for defending their research and may choose to implement or not implement the recommendations in these guidelines. Furthermore, these guidelines do not endorse or recommend specific Psychometric CROM to assess different constructs; the optimal CROM for a particular study depends on the context of the study and application of the CROM.

CONCLUSIONS

These guidelines aim to support researchers by providing science-based recommendations for the use of Psychometric CROM in TNP research. While these guidelines reflect the consensus-based approach of the CORESTA CROM working group’s current thinking regarding best practices related to Psychometric CROM, best practices may also evolve over time with advances in research on TNPs, psychometrics, and measurement science.

Best Practices and Guidelines with Respect to Psychometric Consumer Reported Outcome Measures for Use in Research on Tobacco- and Nicotine-Containing Products – A Consensus-Based Approach

Full Article

Paradigm

My account