Emotion regulation difficulties is considered a transdiagnostic factor across psychopathologies (1). The Difficulties in Emotion Regulation Scale (DERS) is a widely used self-report measure in both clinical and non-clinical adolescent populations (2). Since its original validation, the DERS-36 has been cited over than 13,000 times, with translation into multiple languages, and adapted into several short versions (e.g. DERS-8 (3), DERS-16 (4), DERS-18 (5), DERS-32 (6)). The DERS-36 is founded on a conceptual model of emotion regulation involving six superior processes: (1) non-acceptance of negative emotional responses, (2) difficulties engaging in goal-directed behavior when experiencing negative emotions, (3) difficulties controlling impulses when experiencing negative emotions, (4) lack of awareness of emotional responses, (5) limited access to emotion regulation strategies perceived as effective, and (6) lack of clarity of emotional responses. Relative absence of any or all of these abilities indicate difficulties with emotion regulation, according to Gratz and Roemer (2).
Although extensive clinical research has demonstrated an association between higher scores on the DERS-36 and mental health issues in adolescents (7,8,9), no normative threshold for emotion regulation has been established for the DERS-36 in this population. The absence of reference values in healthy youth populations limits our ability to interpret DERS-36 scores in clinical studies, regardless of the population investigated. This gap highlights the need for establishing normative reference values for the DERS-36. To address this, we conducted a literature review aimed to estimating the 90% reference interval for the DERS-36 total score (DERS-T) in youths, drawing from all available studies on Pubmed (MEDLINE) that include either community-based populations or healthy volunteers.
We conducted a systematic search on Pubmed (MEDLINE) on 12 March 2024. The full search strategy is provided in Supplementary material 1 (p. 1). Titles and abstracts of studies retrieved from the search were screened by CT to identify studies potentially meeting the following inclusion criteria: (1) empirical studies in English; (2) inclusion of youths aged 11 to 19 years, either as healthy volunteers in clinical studies or as participants in community-based studies; (3) use of the DERS-36 as the outcome measure; and (4) reporting of either the total score or scores from all six subscales using summed, mean or median scores.
Studies were excluded based on the following criteria: (1) non-English language; (2) inclusion of participants outside the specified age range; (3) use of short forms of the DERS; (4) reliance on parent-reported outcomes; and (5) insufficient reporting, such as the omission of standard deviations or other essential statistical data. To standardize across studies, mean scores were converted to summed scores, and subscale scores were aggregated to form a total score.
The full text of each study marked as eligible was then reviewed to confirm eligibility. For studies reporting the DERS-36 at multiple time points, only baseline scores were extracted. A flow diagram detailing the study selection process and reasons for exclusion is provided in Supplementary material 1 (p. 3).
The 90% reference interval was calculated by multiplying the standard deviation of the sample by a z-score of 1.645. Given that higher scores on the DERS-36 reflect more emotion regulation difficulties, the upper reference level is interpreted as the threshold for identifying emotional dysregulation. Thus, results falling above the reference interval (the top 5%) are flagged as “abnormal”. Since the DERS-36 does not have a designated lower threshold, all results below the upper reference interval (the remaining 95%) are flagged as normative levels of emotion regulation.
A total of 1,454 articles were screened of which 33 studies met the inclusion criteria. In addition, we included data from the DERS-T for healthy volunteers from a study by the first author, currently under review for publication. Thus, 34 studies were ultimately included in the calculation of the reference interval for normative emotion regulation. Among these, 20 studies involved community-based populations (n = 6,960), while the remaining 14 studies included healthy volunteers (n = 766), yielding a total sample size of 7,726 participants. Notably, two of the studies included as a community-based population entailed a total of five different groups (10,11). A summary of the study characteristics is provided in Table 1.
Study characteristics
| Community-based | Healthy controls | |
|---|---|---|
| Number of studies – n | 20 | 14 |
| Total number of participants – n | 6,960 | 766 |
| Age – mean (SD)1 | 15.25 (1.7) | 14.77 (1.8) |
| Sex – n (%) | ||
| Female | 4221 (60.6) | 381 (49.7) |
| Male | 2732 (39.3) | 385 (50.3) |
| Other | 7 (0.1) | 0 (0) |
| Main nationality – n | Australia (1) | Denmark (1) |
| China (2) | Norway (1) | |
| Hungary (1) | Turkey (12) | |
| India (1) | ||
| Iran (1) | ||
| Italy (4) | ||
| Portugal (1) | ||
| Spain (1) | ||
| Sweden (1) | ||
| Turkey (1) | ||
| UK (2) | ||
| USA (4) | ||
| DERS-T score – weighted mean (SD) | 84.4 (23.4) | 76.5 (20.0) |
Mean age in community-based populations was available for 22 groups. Age range of 13–17 year was reported by Weinberg et al. (2009) and 12–17 years by Kring et al. (2023).
The overall mean DERS-T score was 83.6 (SD = 23.2). The reference interval for all included participants ranged from 45.5 up to 121.8, with the upper threshold of 121.8 representing the benchmark for normative emotion regulation, as the DERS-T does not have a designated lower threshold. The reference interval is illustrated in Figure 1. Further details on the reference interval can be found in Supplementary material 2.

Reference interval for DERS-T in youths from community-based populations and healthy volunteers.
The range for the DERS-T is 36 to 180. The reference interval for Community stems from studies using background population data including youths experiencing heightened levels of psychopathology. The reference interval for healthy volunteers stem from studies including youths as symptom-free and psychiatrically healthy. The black reference intervals represent the dispersion of DERS-T from individual studies. The blue reference intervals represent the combined dispersion from three groups respectively: Community-based population (46.0–122.9), healthy volunteers (43.6–109.4), and all (45.5–121.8).
The present analysis identified a threshold of 121.8 for normative emotion regulation in youths, derived from the DERS-36. This threshold provides a point of reference for clinicians and researchers to evaluate whether an individual's DERS-36 total score aligns with expected normative values or deviates significantly, potentially indicating emotional dysregulation. The established threshold is notably higher than the mean DERS-T scores observed in various diagnostic categories, including suicidality and self-harm, prior to treatment initiation (14,15,16,17). However, exceptions exist, particularly in studies examining self-harm where mean DERS-T scores exceed this threshold (18,19,20).
The substantial variability in reference intervals across studies may reflect a true continuum of normative emotion regulation, suggesting that emotional regulation in youths exists along a spectrum (21). Alternatively, these differences could be attributed to methodological discrepancies in the use and reporting of the DERS-36, which complicates the generalizability of results across studies and necessitates careful interpretation. Below, we highlight five major themes related to the methodological variation in DERS-36 usage that could impact research findings and clinical interpretation.
Despite recommendations to remove the emotional awareness subscale due to weak psychometric properties (8,22,23,24) our study used the DERS-36, incorporating all six subscales. While most research continue to include all six subscales, some adhere to the recommendations by modelling each of the subscales individually (25,26,27). This discrepancy between psychometric recommendations and current practice may introduce concerns about the validity and reliability of the DERS-36 scores when comparing studies that use different versions of the scale.
Consistent reporting is essential for the comparability and synthesis of results across studies. While studies in our review followed the guidelines from Gratz and Roemer (2) by providing standard deviations for the summed scale items, other reporting formats, such as reporting mean scores to reflect the original Likert-scale (28,29) or using median scores to reflect non-normal distributions (30), are also prevalent. The variation in reporting formats challenges the interpretation and replication of results. To enhance standardization, future research should consider following the reporting practice proposed by Gratz and Roemer.
Missing data and outliers can have a significant impact on results by distorting statistical analyses, thereby leading to inaccurate conclusions (31,32). Of the 34 included studies in our reference interval calculation, only six explicitly addressed missing data, using either methods such as series mean imputation (33,34,35) or multilevel modeling (36,37,38). Only one study explicitly accounted for outliers, albeit in generic terms stating to use “a Boxplot method to deal with outliers” (35). The lack of consistent handling of missing data and outliers raises concerns about the robustness and validity of the reported findings.
Observed variability in the reference interval may be influenced by sociodemographic factors, including age, sex, and nationality. Preliminary work suggests that the psychometric properties of the DERS-36 are consistent across sociodemographic groups in young adults (39). Future studies should explore whether this consistency holds for adolescents, as differences in sociodemographic characteristics could affect the generalizability of the reference interval for normative emotion regulation in youths.
The primary goal of establishing a reference interval was to distinguish between normal and abnormal emotion regulation. Following the work of Schulte-van-Maaren et al. (40) on reference intervals for anxiety measures, we recognize the need for inclusion of participants with varied levels of psychopathology to improve the representativeness of the reference intervals. Though not necessarily symptom-free, the participants should however be psychiatrically healthy, which we cannot ensure in the studies included as community-based populations. Thus, an imperative question on how to define normality remains.
Given the five issues outlined above, along with the significant variability in DERS-T scores across studies, we caution against interpreting the reference interval from the DERS-36 as definitively representing normative emotion regulation in youth. Further validation is necessary to establish the diagnostic accuracy and research validity of this reference interval. Future studies should re-evaluate the threshold through a more comprehensive systematic review, incorporating additional databases and multiple assessors and ideally conduct a meta-analysis to determine whether DERS scores reliably differentiate between normative emotional regulation and clinically significant dysregulation.
In conclusion, the 90% reference interval for the DERS-36 total score in youths, derived from community-based and healthy volunteer-based studies included in this literature review, is not sufficiently reliable for guiding clinical and scientific interpretation. The considerable heterogeneity in the application and reporting of the DERS-36 across studies limits the generalizability of the findings and underscores the need for caution when interpreting results. To validate the proposed reference interval, future research should conduct a meta-analysis, adhering to established methodological guidelines, to ensure greater consistency and reliability in the interpretation of DERS-36 scores.