Have a personal or library account? Click to login

Beyond surface correlations: Reference behavior mediates the disruptiveness-citation relationship

Open Access
|May 2025

Figures & Tables

Figure 1.

Quantifying the 5-year CD index.
Quantifying the 5-year CD index.

Figure 2.

Basic yearly split-sample Poisson regression coefficients of the CD index on citation counts from 1950 to 2016. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. Dark-red coloring indicates significant positive coefficients (p < 0.05), and light-red coloring indicates non-significant positive coefficients. Dark-green coloring indicates significant negative coefficients (p < 0.05), light-green coloring indicates non-significant negative coefficients. * p < 0.05, ** p < 0.01, *** p < 0.001.
Basic yearly split-sample Poisson regression coefficients of the CD index on citation counts from 1950 to 2016. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. Dark-red coloring indicates significant positive coefficients (p < 0.05), and light-red coloring indicates non-significant positive coefficients. Dark-green coloring indicates significant negative coefficients (p < 0.05), light-green coloring indicates non-significant negative coefficients. * p < 0.05, ** p < 0.01, *** p < 0.001.

Figure 3.

Yearly split-sample Poisson regression coefficients of the CD index on citation counts from 1950 to 2016, with (a) 292 fields fixed effect, (b) reference count fixed effect, (c) team size fixed effects, and (d) all fixed effects. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. Dark-red coloring indicates significant positive coefficients (p < 0.05), and light-red coloring indicates non-significant positive coefficients. Dark-green coloring indicates significant negative coefficients (p < 0.05), light-green coloring indicates non-significant negative coefficients. * p < 0.05, ** p < 0.01, ***p < 0.001.
Yearly split-sample Poisson regression coefficients of the CD index on citation counts from 1950 to 2016, with (a) 292 fields fixed effect, (b) reference count fixed effect, (c) team size fixed effects, and (d) all fixed effects. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. Dark-red coloring indicates significant positive coefficients (p < 0.05), and light-red coloring indicates non-significant positive coefficients. Dark-green coloring indicates significant negative coefficients (p < 0.05), light-green coloring indicates non-significant negative coefficients. * p < 0.05, ** p < 0.01, ***p < 0.001.

Figure 4.

Split-sample Poisson regression coefficients of the CD index on citation counts based on reference counts. (a) Basic models without any fixed effects. (b) Full models including all fixed effects. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. * p < 0.05, ** p < 0.01, *** p < 0.001.
Split-sample Poisson regression coefficients of the CD index on citation counts based on reference counts. (a) Basic models without any fixed effects. (b) Full models including all fixed effects. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. * p < 0.05, ** p < 0.01, *** p < 0.001.

Figure 5.

Reference counts as a contributing factor to the bias against CD index. (a) The average reference count of papers increases over time. (b) The average 5-year citation count of papers increases over time. (c) The average 5-year CD index of papers decreases over time. (d) Papers with higher reference count are associated with higher 5-year citation count. (e) Papers with higher reference count are associated with higher 5-year CD index. (f) Papers with higher reference count are associated with higher parameters of 5-year CD index (ni, nj, and nk). (g) The number of papers with different reference count follows lognormal distribution. (h-i) The complementary cumulative distribution function (ccdf) and probability density function (pdf) of the CD index for papers with varying reference counts. Shaded areas indicate 95% confidence intervals.
Reference counts as a contributing factor to the bias against CD index. (a) The average reference count of papers increases over time. (b) The average 5-year citation count of papers increases over time. (c) The average 5-year CD index of papers decreases over time. (d) Papers with higher reference count are associated with higher 5-year citation count. (e) Papers with higher reference count are associated with higher 5-year CD index. (f) Papers with higher reference count are associated with higher parameters of 5-year CD index (ni, nj, and nk). (g) The number of papers with different reference count follows lognormal distribution. (h-i) The complementary cumulative distribution function (ccdf) and probability density function (pdf) of the CD index for papers with varying reference counts. Shaded areas indicate 95% confidence intervals.

Figure 6.

Yearly split-sample Poisson regression coefficients of new words, new word combinations, and their reuse on citation counts from 1950 to 2016. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. Dark-red coloring indicates significant positive coefficients (p < 0.05), and light-red coloring indicates non-significant positive coefficients. * p < 0.05, ** p < 0.01, *** p < 0.001.
Yearly split-sample Poisson regression coefficients of new words, new word combinations, and their reuse on citation counts from 1950 to 2016. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. Dark-red coloring indicates significant positive coefficients (p < 0.05), and light-red coloring indicates non-significant positive coefficients. * p < 0.05, ** p < 0.01, *** p < 0.001.

Figure 7.

Yearly split-sample Poisson regression coefficients of atypicality and disruptive citations on citation count from 1950 to 2016. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. Dark-red coloring indicates significant positive coefficients (p < 0.05), and light-red coloring indicates non-significant positive coefficients. * p < 0.05, ** p < 0.01, *** p < 0.001.
Yearly split-sample Poisson regression coefficients of atypicality and disruptive citations on citation count from 1950 to 2016. Error bars depict the upper and lower bounds of the 95% confidence intervals based on robust standard errors. Dark-red coloring indicates significant positive coefficients (p < 0.05), and light-red coloring indicates non-significant positive coefficients. * p < 0.05, ** p < 0.01, *** p < 0.001.

The effect of the CD index on citation count with team-level controls_

Models(1)(2)(3)(4)(5)(6)(7)
DV: 5-year citation count (Poisson regression)
5-year CD index-0.1993***(0.0133)-0.1541***(0.0137)-0.1639***(0.0137)-0.1789***(0.0135)-0.2018***(0.0135)-0.1618***(0.0147)-0.1234***(0.0151)
ln(Team size) 0.3405***(0.0016) 0.2533***(0.0018)
ln(Institution count) 0.3960***(0.0020) 0.2159***(0.0023)
ln(Country count) 0.4382***(0.0029) 0.0358***(0.0034)
ln(Home field count) 0.1950***(0.0018) -0.0498***(0.0018)
Gender diversity 0.2868***(0.0021)0.0579***(0.0021)
Field FEYesYesYesYesYesYesYes
Year FEYesYesYesYesYesYesYes
Observations13,180,60313,180,60313,180,60313,180,60313,180,60312,262,49712,262,497
Pseudo R20.058260.082340.07550.067330.061150.059240.08252

The effect of the number of new words on citation count with reference-level controls_

Models(1)(2)(3)(4)(5)(6)
DV: 5-year citation count (Poisson regression)
ln(New words count+1)0.2191***(0.0021)0.2195***(0.0020)0.2094***(0.0021)0.1970***(0.0021)0.1956***(0.0021)0.1771***(0.0020)
ln(Ref count) 0.6233***(0.0009) 0.8398***(0.0011)
ln(Ref age+1) -0.4137***(0.0009) -0.6649***(0.0015)
ln(Avg ref cit+1) 0.1836***(0.0004) 0.6474***(0.0017)
ln(Max ref cit+1) 0.1439***(0.0002)-0.3873***(0.0011)
Field FEYesYesYesYesYesYes
Year FEYesYesYesYesYesYes
Observations25,263,98725,263,98725,174,03225,262,88025,262,88025,174,032
Pseudo R20.063920.178670.088350.095750.101680.26690

The effect of the number of new word combinations on citation count with reference-level controls_

Models(1)(2)(3)(4)(5)(6)
DV: 5-year citation count (Poisson regression)
ln(New words count+1)0.0978*** (0.0004)0.0665*** (0.0004)0.1002*** (0.0004)0.0894*** (0.0004)0.0841*** (0.0004)0.0598*** (0.0004)
ln(Ref count) 0.6144*** (0.0010) 0.8333*** (0.0011)
ln(Ref age+1) -0.4242*** (0.0009) -0.6654*** (0.0015)
ln(Avg ref cit+1) 0.1800*** (0.0004) 0.6491*** (0.0017)
ln(Max ref cit+1) 0.1398*** (0.0002)-0.3900*** (0.0011)
Field FEYesYesYesYesYesYes
Year FEYesYesYesYesYesYes
Observations25,263,98725,263,98725,174,03225,262,88025,262,88025,174,032
Pseudo R20.072840.181980.098060.103220.108140.26989

The effect of atypicality on citation counts with reference-level controls_

Models(1)(2)(3)(4)(5)(6)
DV: 5-year citation count (Poisson regression)
Atypicality (percentile)0.4457***(0.0018)0.0249***(0.0019)0.5505***(0.0017)0.3110***(0.0018)0.2475***(0.0019)0.0217***(0.0018)
ln(Ref count) 0.6557***(0.0009) 0.8536***(0.0011)
ln(Ref age+1) -0.5450***(0.0010) -0.6462***(0.0014)
ln(Avg ref cit+1) 0.1764***(0.0004) 0.6479***(0.0017)
ln(Max ref cit+1) 0.1407***(0.0002)-0.3834***(0.0011)
Field FEYesYesYesYesYesYes
Year FEYesYesYesYesYesYes
Observations27,656,58727,656,58727,584,16527,656,56927,656,56927,584,165
Pseudo R20.067560.179130.103710.094610.09970.26379

The effect of new word reuse on citation count with reference-level controls_

Models(1)(2)(3)(4)(5)(6)
DV: 5-year citation count (Poisson regression)
ln(New words reuse+1)0.1213***(0.0011)0.1180***(0.0011)0.1147***(0.0011)0.1112***(0.0011)0.1113***(0.0011)0.0936***(0.0011)
ln(Ref count) 0.6230***(0.0009) 0.8390***(0.0011)
ln(Ref age+1) -0.4117***(0.0009) -0.6622***(0.0015)
ln(Avg ref cit+1) 0.1827***(0.0004) 0.6453***(0.0017)
ln(Max ref cit+1) 0.1434***(0.0002)-0.3865***(0.0011)
Field FEYesYesYesYesYesYes
Year FEYesYesYesYesYesYes
Observations25,263,98725,263,98725,174,03225,262,88025,262,88025,174,032
Pseudo R20.066380.180960.090560.097900.103850.26838

The effect of the CD index on citation count with author career-level controls_

Models(1)(2)(3)(4)(5)(6)(7)
DV: 5-year citation count (Poisson regression)
5-year CD index-0.1993*** (0.0133)-0.1012*** (0.0137)0.0429**(0.0146)0.0590*** (0.0146)0.2263*** (0.0157)0.2307*** (0.0157)0.3443*** (0.0166)
ln (Avg career age+1) 0.3368*** (0.0011) -0.3891*** (0.0019)
ln (Avg career 0.2601*** -0.2974***
productivity+1) (0.0007) (0.0058)
ln (Max career 0.2445*** -0.0278***
productivity +1) (0.0005) (0.0048)
ln (Avg career 0.2198*** 0.4242***
citations +1) (0.0005) (0.0044)
ln (Max career 0.2065***0.0573***
citations+1) (0.0004)(0.0040)
Field FEYesYesYesYesYesYesYes
Year FEYesYesYesYesYesYesYes
Observations13,180,60313,180,60313,180,60313,180,60313,180,60313,180,60313,180,603
Pseudo R20.058260.084420.112910.120750.165950.16940.20231

The effect of the CD index on citation count with reference-level controls_

Models(1)(2)(3)(4)(5)(6)
DV: 5-year citation count (Poisson regression)
5-year CD index-0.1993***(0.0133)0.6228***(0.0204)0.2267***(0.0123)-0.00060.0310(0.0170)1.920***(0.0212)
ln(Ref count) 0.5939***(0.0013) 0.8642***(0.0016)
ln(Ref age+1) -0.4257***(0.0011) -0.7780***(0.0019)
ln(Avg ref cit+1) 0.1910***(0.0005) 0.6947***(0.0021)
ln(Max ref cit+1) 0.1478***(0.0003)-0.4027***(0.0014)
Field FEYesYesYesYesYesYes
Year FEYesYesYesYesYesYes
Observations13,180,60313,180,60313,180,60313,180,60313,180,60313,180,603
Pseudo R20.058260.167050.085220.092940.098030.27983

The effect of new word combination reuse on citation count with reference-level controls_

Models(1)(2)(3)(4)(5)(6)
DV: 5-year citation count (Poisson regression)
ln(New words reuse+1)0.0970***(0.0003)0.0741***(0.0003)0.0952***(0.0003)0.0888***(0.0003)0.0860***(0.0003)0.0597***(0.0003)
ln(Ref count) 0.6078***(0.0010) 0.8260***(0.0011)
ln(Ref age+1) -0.4181***(0.0009) -0.6545*** (0.0014)
ln(Avg ref cit+1) 0.1755***0.0004) 0.6402*** (0.0017)
ln(Max ref cit+1) 0.1363***(0.0002)-0.3860***(0.0011)
Field FEYesYesYesYesYesYes
Year FEYesYesYesYesYesYes
Observations25,263,98725,263,98725,174,03225,262,88025,262,88025,174,032
Pseudo R20.084250.190050.108480.112900.117610.27463

The effect of disruptive citations on citation count with reference level controls_

Models(1)(2)(3)(4)(5)(6)
DV: 5-year citation count (Poisson regression)
ln(Disruptive citations+1)0.9791***(0.0004)0.9435***(0.0004)0.9657***(0.0004)0.9682***(0.0004)0.9663***(0.0004)0.8993***(0.0005)
ln(Ref count) 0.1827***(0.0004) 0.2734***(0.0006)
ln(Ref age+1) -0.2951***(0.0005) -0.3831***(0.0007)
ln(Avg ref cit+1) 0.0497***(0.0002) 0.1780***(0.0007)
ln(Max ref cit+1) 0.0435***(0.0001)-0.0992***(0.0005)
Field FEYesYesYesYesYesYes
Year FEYesYesYesYesYesYes
Observations29,009,69029,009,69028,888,58029,007,83129,007,83128,888,580
Pseudo R20.709480.719830.719530.711230.712170.73775
DOI: https://doi.org/10.2478/jdis-2025-0029 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 7 - 31
Submitted on: Feb 9, 2025
Accepted on: Apr 24, 2025
Published on: May 28, 2025
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Alex J. Yang, Fanming Wang, Yujie Shi, Yiqin Zhang, Hao Wang, Sanhong Deng, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution 4.0 License.