On July 29, 2020, the Academic Degrees Committee of the State Council—an advisory council of the Chinese government—announced that the category “interdiscipline” had been added to the list of national disciplines accessible for academic degrees. This initiative will not only result in a structural change to China's classification of academic degrees, it was also designed to promote the future development of interdisciplinarity in China. As a case in point, three months after its release in late October 2020, the National Natural Sciences Foundations of China (NSFC) announced the launch of a new department for “interdisciplinary studies”. This will be the ninth department of the NSFC, and will focus on funding interdisciplinary projects. As the first change to the NSFC funding scheme in 11 years, the decision has drawn much attention.
Interdisciplinarity is a hot topic in science and technology policy. However, the concept of interdisciplinarity is both abstract and complex, which makes it difficult to fully represent or measure interdisciplinarity in terms of indicators, which can be compared among them. A variety of measures for diversity, as a proxy of interdisciplinarity, has been proposed in the literature. Further, one can find such indicators to measure the interdisciplinarity of a set of articles, patents, or journals. In this study, we ask: Can one rank institutions in terms of their disciplinary diversity? And, if so, what does this tell us about interdisciplinarity?—noting that diversity is not necessarily a goal universities strive for; some aspire to be the best in a particular discipline.
During the last few years, we, the authors of this paper, have explored the scientometric measurement of interdisciplinarity and diversity in scholarly communications in collaboration with a number of colleagues. Contributions to this program of studies were made (in alphabetic order) by Lutz Bornmann, Wolfgang Glänzel, Inga Ivanova, Ronald Rousseau, Caroline S. Wagner, and Ping Zhou (Leydesdorff & Ivanova, 2020; Leydesdorff, Wagner, & Bornmann, 2018 and 2019; Zhang, Rousseau, & Glänzel, 2016; Zhang, Sun, Chinchilla-Rodríguez, Chen, & Huang, 2018; Zhang, Sun, Jiang, & Huang, 2021). One of our objectives has been to develop a non-commercial, public-domain application that allows researchers and policy analysts to measure the diversity of any document set or network structure using a range of indicators. To our best knowledge, no such tool has ever been developed, at least not for public consumption.
A large number of indicators of “diversity” have been proposed in the literature (e.g. Rao-Stirling diversity; Stirling (2007), the Gini-coefficient, Simpson (1949) indicator, Hirschman-Herfindahl (Herfindahl, 1950; Hirschman, 1945), etc. In this communication, we report on the facilities which we created during the last two years. Particularly, we introduce the freely available program interd_vb.exe (available at http://www.leydesdorff.net/software/interdisc.2020/) for this purpose. We document the various options and provide instructions for practitioners interested in measuring diversity and interdisciplinarity. By elaborating on the measurement of the disciplinary diversity of the research portfolios of the 42 top universities listed as the “Double First-Class” universities (Liu et al., 2018), we are able to show the options and choices to be made given the current state of the art.
Technical instructions are additionally available at http://www.leydesdorff.net/software/interdisc.2020/index.htm. The inputs and outputs are in .csv format. The same output is also stored in interdis.bdf. The subsequent analysis demonstrates the options and choices that can be made as route to a final comparison. As a disclaimer, note that we are in no way professional programmers. We cannot guarantee that our routines are error-free, and we acknowledge that the user interface could be improved. However, as a test, one of us programmed the application in two different computer languages, and the results were virtually the same. Additionally, we do believe the functionality is unique and, therefore, state of the art for what it is.
One of the advantages of the application is its ability to handle large volumes of data. For example, the need to analyze an entire database, such as Web-of-Science (WoS), Scopus, or Google Scholar, is becoming increasingly common. Analyses of this magnitude can generate baselines for evaluating the disciplinary diversity of articles, journals, topics, etc. The Interdisc program can relieve the computational overhead of processing massive amounts of data. That said, although the equations used to calculate diversity indicators are often mathematically transparent, specifying the terms as computer code can help analysts to further precision in decisions that would not otherwise be involved in a manual calculation.
Interdisciplinarity can be operationalized as references to different literatures. Such co-citing is known in scientometrics as bibliographic coupling (Kessler, 1963). When a document, for example, cites both articles in physics journals and in sociology journals, this can be expected to indicate interdisciplinarity more than citing chemical physics and solid-state physics in the same document or in the same set. In other words, one couples literature from different disciplines in the references. This coupling can be at the level of articles, journals, or Web-of-Science Subject Categories (WCs).
Bibliographic coupling is an indicator on the citing side and thus the operation opposite to co-citation: co-citations across disciplinary borders indicate interdisciplinary diffusion, whereas the measurement of interdisciplinarity by bibliographic coupling focuses on aggregated citing behaviour.
Whereas “interdisciplinarity” by citing papers refers to documents, documents are often not the units of analysis in the case of research evaluation at the institutional level. The interdisciplinary operator of bibliographic coupling is defined in terms of disciplines and not in terms of institutions. Does the diversity of a university in terms of departments indicate interdisciplinarity or only comprehensiveness of a research portfolio? Since there is no coupling in terms of different fields, one may measure only comprehensiveness, and not interdisciplinarity.
Institutional units are primarily administratively and not disciplinarily organized. The diversity indicators apply to disciplinary differentiations; social differentiation in terms of departments, etc., may have a different meaning. For example, diversity may also indicate comprehensiveness. How does this work out empirically?
In this section, we first discuss the following indicators of diversity and interdisciplinarity in terms of the basic equations:
Using Shannon's (1948) information theory, one can measure diversity as the uncertainty in a distribution. The equation of the Shannon entropy can be stated as follows:
The Simpson index was originally developed to measure “concentration” (Rousseau, 2018; Simpson, 1949). Stirling (2007) introduced the concept into the field of scientometrics as a way to evaluate the variety of subject categories and the unevenness in the distribution of these categories. For this reason, Simpson diversity is often called a “dual concept” indicator of diversity. It combines variety with balance in a single number. The equation for Simpson's diversity index is
Stirling (2007) proposed Rao-Stirling (RS) diversity to measure interdisciplinarity, distinguishing variety, balance, and disparity as the three components of interdisciplinarity. Formally, the indicator is calculated as
The novelty of RS lies in the disparity term (dij). The other part of Eq. 3 is the same as the Simpson index, which measures both variety and balance.
In most scientometric applications, α and β are set to 1 (Rafols & Meyer, 2010), which simplifies Eq. (3) to:
True RS diversity has its origins in a variant of the Hill indicator proposed by Leinster and Cobbold (2012) which adds disparity into the Hill equation traditionally used in ecology. This indicator was subsequently modified by Zhang et al. (2016) as follows:
Stirling (1998) stated that “any integration of variety and balance into dual concept diversity must necessarily involve the implicit or explicit prioritization of the subordinate properties”. From this, Leydesdorff et al. (2019) proposed a new diversity indicator, called DIV, that divides interdisciplinarity into its three components (variety, balance, and disparity) and recombines them by multiplication. An empirical experiment proves the advantages of this new indicator over RS diversity. Formally, DIV is expressed as follows:
Rousseau (2019) suggested some improvements to DIV. He showed that DIV can be turned into a measure of True Diversity by removing the term N (variety) in the denominator of Eq. 6. Rousseau argued that a better framework for diversity measurement would account for several requirements, not all of which are met by existing frameworks. Responding to the improvements made by Rousseau (2019), Leydesdorff, Wagner, and Bornmann (2019) provided an updated version of the improved DIV* as a True Diversity measure:
The Gini coefficient is a well-known indicator for representing income inequality among people and wealth inequality among nations (Lorenz, 1905). Hence, when measuring the diversity of interdisciplinary research with the Gini coefficient, the research is treated as a system comprised of three elements—variety, balance, and disparity (Porter & Rafols, 2009; Rafols & Meyer, 2010) where (1 – Gini) is used as the indicator of balance (Nijssen et al., 1998).
The theory of relative mean differences defines the Gini coefficient as (e.g. Buchan, 2002):
Note, however, that there are several alternative definitions of the Gini coefficient. See, for example, that provided at https://en.wikipedia.org/wiki/Gini_coefficient (cf. Rousseau (1992)).
If the x values are first placed in ascending order such that each x has rank i, some of the comparisons above can be avoided and computation is therefore more efficient, i.e.:
For G to be an unbiased estimate of the true population value, it should be multiplied by n/(n-1) (Dixon, 1987; Mills & Zandvakili, 1997). In the bibliometric literature, this index is also known as the Pratt index (Pratt, 1977). The value of both the Gini and the normalized G are provided by interd_vb.exe.
The concept of coherence based on network analysis has attracted attention from researchers in scientometrics (e.g. Rafols, 2014). While the diversity indicators rely on a pre-defined category system, coherence can be generated via a bottom-up approach that describes the intensity of the relations between any elements in a network. From this perspective, comprehensive frameworks composed of diversity and coherence have been proposed to improve the depiction of interdisciplinary systems (Rafols & Meyer, 2010).
The program interd_vb.exe (http://www.leydesdorff.net/software/interdisc.2020/interd_vb.exe) was rewritten based on the routine Mode2Div.exe previously programmed in the so-called xBase language. Unfortunately, computing cosine values for large matrices can be time-consuming with xBase, which imposes a soft limit on the size of the datasets that can be processed. Hence, we rewrote Mode2Div. exe in Visual Basic 6 to become interd_vb.exe, i.e. the online Interdisc application. Visual Basic 6 runs on Win10 (32/64 bits) and does not require the predetermined amount of memory to be allocated to processing. Therefore, the only limitation to the size of the dataset that can be processed is hardware. The two programs, interd_vb.exe and Mode2Div.exe, have similar objectives but a different organization and architecture, and the results they produce are exactly the same. Both programs are documented in Leydesdorff et al. (2018, 2019) and the software is available for download from https://www.leydesdorff.net/software/interdisc.2020/ and Figshare (https://figshare.com/account/articles/12871529).
One key difference between the two versions of the program is their input requirements. In the case of mode2div.exe, the input is stored listwise using the Pajek format, each line describing the row and column of a cell in a matrix of values. Thus, the input can be read as three fields without any system limitations. The data is assumed to be 2-mode so that an asymmetrical (citation) matrix can be processed. The program then computes the diversity measures along the column vectors of a data matrix saved in .csv format. As an example, to measure the interdisciplinarity of a set of documents, one could use jcitnetw.exe(1) to easily generate a co-occurrence matrix of cited journals in the Pajek format, using plain text downloaded from the Web of Science. More details on this can be found at https://www.leydesdorff.net/software/mode2div/.
Stirling (2007) added a new element to diversity measurement: disparity. Disparity indicates the distance between two subjects in the sample(s) under study. For example, if the distances in a subset are small, this space can be considered a niche of related variety (Frenken et al., 2007). However, disparity as a factor in both RS and the DIV requires the choice of a distance metric. Following Salton and McGill (1983), Ahlgren, Jarneving, and Rousseau (2003) proposed cosine as a non-parametric measure of similarity for bibliometrics. From a comparison of a number of similarity/distance measures, Egghe and Leydesdorff (2009) concluded that the cosine fulfills a number of requirements.
Like Pearson correlations, cosine values are defined in a vector space and are therefore positional, whereas the very similar Jaccard index is relational. Unlike the Pearson correlation, however, cosines do not normalize to a mean and, since bibliometric distributions are highly skewed, normalizations using the mean are to be avoided. Our routines use (1 – cosine), which can be considered a distance measure. Pragmatically, the terms of a cosine can be written as co-occurrence in the numerator and the sum of squares along the two column vectors x and y multiplied in the denominator. Note that, here, the matrix rows contain the disciplines and the columns contain the universities, so the cosine values are computed between the row vectors.
One disadvantage of Mode2Div.exe is that data is often not readily available in Pajek format and converting the data into this format may generate other problems (Pfeffer, Mrvar, & Batagelj, 2013). The most generic format for data, however, is a matrix as a comma or tab-separated plain ASCII file. There are no size limitations for this data, although Excel (depending on the Office version) may not allow for more than 255 variables. This data, however, can also be written using a text editor (e.g. the freeware Note++) or any other program. The size of the matrix is only limited by external factors such as free diskspace.
The routine begins with asking for the name of the .csv file containing the variables and the number of vectors to be compared for the purposes of error correction. The file is then rewritten into output which is reported in the files interdis.dbf and equivalently interdis.csv. The specific differences in terms of inputs, outputs, and other related items about these programs are summarized in Appendix Table S1.(2)
As empirical data, we used the portfolio of research articles from the 42 Chinese universities listed as “Double First-Class universities” between 2017 (when the list was first released) and 2019. The Chinese government offers substantial support to this select group of universities through a series of special programs. Additionally, although this particular list has only been published since 2017, similar initiatives under different names have existed periodically since the 1990s, with the majority of universities considered to be elite remaining much the same this whole time. Thus, these 42 institutions were selected because this group is both clearly delineated and large enough to provide a large-scale sample. In addition, we also included the portfolios of two well-known American universities, Harvard and Stanford, to provide a standard those in the West might find easier to benchmark. In a subsequent article, Leydesdorff, Wagner, and Zhang (2021), we further compare these results with 205 Chinese universities.
Each of the universities in the sample promotes itself as a comprehensive university. However, some note specific missions or strengths; for instance, the agricultural universities. The publications associated with each university were retrieved using the organization's name and/or its variants from the Preferred Organization Index in WoS.
The domains searched include the Science Citation Index Expanded (SCI-E), the Social Sciences Citation Index (SSCI), and the Arts & Humanities Citation Index (A&HCI) in the Web of Science (WoS) Core Collection. We limited the document type to articles and reviews. The number of articles retrieved per university are listed in Table 1 in decreasing order.
Number of publications associated with the 44 universities in our sample (2017–2019); in decreasing order.
| No. | University name | Papers | No. | University name | Papers |
|---|---|---|---|---|---|
| 1 | Harvard Univ | 76,144 | 23 | Northeastern Univ | 14,893 |
| 2 | Shanghai Jiao Tong Univ | 37,016 | 24 | Beihang Univ | 14,484 |
| 3 | Zhejiang Univ | 35,204 | 25 | Dalian Univ of Technology | 13,861 |
| 4 | Tsinghua Univ | 32,681 | 26 | Zhengzhou Univ | 12,993 |
| 5 | Stanford Univ | 32,428 | 27 | Northwestern Polytechnical Univ | 12,497 |
| 6 | Peking Univ | 30,160 | 28 | Chongqing Univ | 12,451 |
| 7 | Sun Yat-Sen Univ | 26,823 | 29 | Univ of Electronic S & T of China | 12,334 |
| 8 | Huazhong Univ of S & T | 24,822 | 30 | Xiamen Univ | 11,607 |
| 9 | Fudan Univ | 24,475 | 31 | Beijing Institute of Technology | 11,206 |
| 10 | Sichuan Univ | 23,259 | 32 | Beijing Normal Univ | 10,043 |
| 11 | Central South Univ | 22,870 | 33 | Nankai Univ | 9970 |
| 12 | Xi’an Jiaotong Univ | 22,698 | 34 | Hunan Univ | 9811 |
| 13 | Shandong Univ | 21,601 | 35 | Lanzhou Univ | 9156 |
| 14 | Jilin Univ | 21,068 | 36 | China Agricultural Univ | 8762 |
| 15 | Harbin Institute of Technology | 20,750 | 37 | Northwest A & F Univ | 7817 |
| 16 | Univ of S & T of China | 20,747 | 38 | East China Normal Univ | 7610 |
| 17 | Wuhan Univ | 19,748 | 39 | National Univ of Defense Technology | 6601 |
| 18 | Nanjing Univ | 19,246 | 40 | Ocean Univ of China | 6390 |
| 19 | Tianjin Univ | 17,778 | 41 | Renmin Univ of China | 2946 |
| 20 | Tongji Univ | 17,226 | 42 | Yunnan Univ | 2835 |
| 21 | Southeast Univ | 16,959 | 43 | Xinjiang Univ | 1979 |
| 22 | South China Univ of Technology | 15,595 | 44 | Minzu Univ of China | 760 |
We first organized the data into an asymmetrical occurrence matrix of the 44 universities against 254 WoS categories. We then computed the six diversity measures using Interd_vb.exe.
The interdisciplinarity scores for each indicator and university are listed in Table 2. Additionally, we have provided a ranking against each indicator. For example, for the DIV* indicator, Stanford University is ranked No. 1, whereas, according to the True RS indicator, it is ranked No. 15. Tsinghua University, which is widely considered to be the top university in China, sits in 21st place on the list of DIV*. Keep in mind, however, that this is a ranking of comprehensiveness as measured by disciplinary diversity, not of impact. As mentioned in Section 2.6, the Gini coefficient is a measure of unbalance, and therefore (1 – Gini) is used in the computation of DIV* (Eq. 7; Table 2).
The Indicator scores generated by interd_vb.exe routine.
| University | DIV* | Rank | True RS | Rank | Simpson | Rank | Shannon | Rank | Variety | Rank | Disparity | Rank | (1-Gini) | Rank |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Stanford Univ | 40.260 | 1 | 1.503 | 15 | 0.986 | 1 | 6.831 | 1 | 0.988 | 1 | 0.472 | 23 | 0.340 | 1 |
| Sun Yat-Sen Univ | 35.754 | 2 | 1.549 | 6 | 0.983 | 5 | 6.663 | 2 | 0.945 | 5 | 0.474 | 12 | 0.314 | 2 |
| Peking Univ | 33.352 | 3 | 1.516 | 13 | 0.982 | 7 | 6.568 | 4 | 0.953 | 3 | 0.474 | 15 | 0.291 | 4 |
| Zhejiang Univ | 33.237 | 4 | 1.549 | 7 | 0.983 | 3 | 6.594 | 3 | 0.949 | 4 | 0.473 | 18 | 0.292 | 3 |
| Harvard Univ | 32.328 | 5 | 1.288 | 39 | 0.983 | 6 | 6.512 | 7 | 0.988 | 1 | 0.471 | 25 | 0.274 | 7 |
| Shanghai Jiao Tong Univ | 31.151 | 6 | 1.553 | 5 | 0.984 | 2 | 6.565 | 5 | 0.921 | 9 | 0.471 | 26 | 0.283 | 5 |
| Sichuan Univ | 30.092 | 7 | 1.527 | 12 | 0.983 | 4 | 6.517 | 6 | 0.913 | 11 | 0.473 | 19 | 0.274 | 6 |
| Wuhan Univ | 29.117 | 8 | 1.548 | 8 | 0.982 | 9 | 6.465 | 8 | 0.917 | 10 | 0.473 | 16 | 0.264 | 8 |
| Northeastern Univ | 28.892 | 9 | 1.485 | 18 | 0.975 | 24 | 6.335 | 15 | 0.945 | 5 | 0.466 | 37 | 0.258 | 10 |
| Fudan Univ | 28.102 | 10 | 1.457 | 22 | 0.979 | 17 | 6.361 | 12 | 0.929 | 7 | 0.468 | 35 | 0.254 | 12 |
| Shandong Univ | 27.683 | 11 | 1.492 | 17 | 0.981 | 11 | 6.416 | 9 | 0.898 | 13 | 0.472 | 24 | 0.257 | 11 |
| East China Normal Univ | 27.471 | 12 | 1.495 | 16 | 0.980 | 12 | 6.415 | 10 | 0.886 | 18 | 0.470 | 28 | 0.260 | 9 |
| Nanjing Univ | 26.735 | 13 | 1.444 | 24 | 0.977 | 20 | 6.256 | 21 | 0.929 | 7 | 0.475 | 9 | 0.238 | 19 |
| Beijing Normal Univ | 26.427 | 14 | 1.575 | 4 | 0.976 | 22 | 6.328 | 16 | 0.890 | 16 | 0.467 | 36 | 0.250 | 13 |
| Xiamen Univ | 26.392 | 15 | 1.439 | 25 | 0.977 | 18 | 6.301 | 19 | 0.894 | 14 | 0.474 | 13 | 0.245 | 16 |
| Tongji Univ | 26.286 | 16 | 1.538 | 11 | 0.979 | 13 | 6.341 | 13 | 0.894 | 14 | 0.471 | 27 | 0.246 | 15 |
| Huazhong Univ of S&T | 25.700 | 17 | 1.452 | 23 | 0.979 | 15 | 6.308 | 18 | 0.886 | 18 | 0.475 | 10 | 0.241 | 18 |
| Central South Univ | 25.298 | 18 | 1.545 | 9 | 0.979 | 16 | 6.336 | 14 | 0.843 | 23 | 0.479 | 2 | 0.247 | 14 |
| Lanzhou Univ | 23.959 | 19 | 1.513 | 14 | 0.981 | 10 | 6.362 | 11 | 0.815 | 26 | 0.472 | 22 | 0.245 | 17 |
| Jilin Univ | 23.323 | 20 | 1.431 | 26 | 0.975 | 26 | 6.177 | 22 | 0.866 | 21 | 0.474 | 14 | 0.224 | 22 |
| Tsinghua Univ | 23.253 | 21 | 1.371 | 29 | 0.975 | 27 | 6.120 | 25 | 0.913 | 11 | 0.470 | 29 | 0.213 | 25 |
| Xi’an Jiaotong Univ | 22.879 | 22 | 1.409 | 27 | 0.975 | 25 | 6.128 | 24 | 0.890 | 16 | 0.472 | 21 | 0.214 | 24 |
| Zhengzhou Univ | 21.553 | 23 | 1.463 | 20 | 0.976 | 23 | 6.152 | 23 | 0.811 | 27 | 0.475 | 7 | 0.220 | 23 |
| Southeast Univ | 20.325 | 24 | 1.385 | 28 | 0.970 | 33 | 5.971 | 29 | 0.870 | 20 | 0.470 | 30 | 0.196 | 28 |
| Renmin Univ | 20.323 | 25 | 1.458 | 21 | 0.979 | 14 | 6.277 | 20 | 0.748 | 35 | 0.457 | 41 | 0.234 | 20 |
| Yunnan Univ | 18.896 | 26 | 1.540 | 10 | 0.982 | 8 | 6.314 | 17 | 0.681 | 39 | 0.469 | 31 | 0.233 | 21 |
| Nankai Univ | 18.091 | 27 | 1.334 | 32 | 0.969 | 40 | 5.894 | 32 | 0.827 | 24 | 0.462 | 40 | 0.187 | 30 |
| Univ of S&T – China | 17.782 | 28 | 1.303 | 37 | 0.968 | 41 | 5.797 | 38 | 0.850 | 22 | 0.477 | 5 | 0.172 | 38 |
| Tianjin Univ | 17.466 | 29 | 1.296 | 38 | 0.970 | 35 | 5.852 | 37 | 0.819 | 25 | 0.475 | 11 | 0.177 | 35 |
| South China Univ of Technol | 17.286 | 30 | 1.349 | 31 | 0.970 | 36 | 5.882 | 33 | 0.795 | 28 | 0.473 | 20 | 0.181 | 31 |
| Chongqing Univ | 17.029 | 31 | 1.313 | 34 | 0.970 | 34 | 5.856 | 36 | 0.795 | 28 | 0.478 | 3 | 0.176 | 36 |
| Hunan Univ | 16.958 | 32 | 1.307 | 36 | 0.973 | 30 | 5.913 | 31 | 0.772 | 32 | 0.480 | 1 | 0.180 | 32 |
| Ocean Univ of China | 16.824 | 33 | 1.734 | 1 | 0.977 | 21 | 6.102 | 26 | 0.677 | 40 | 0.473 | 17 | 0.207 | 26 |
| Dalian Univ of Technol | 16.509 | 34 | 1.315 | 33 | 0.974 | 29 | 5.922 | 30 | 0.756 | 34 | 0.478 | 4 | 0.180 | 33 |
| Harbin Inst of Technology | 15.412 | 35 | 1.286 | 40 | 0.969 | 38 | 5.769 | 39 | 0.776 | 31 | 0.475 | 8 | 0.165 | 39 |
| Beihang Univ | 15.007 | 36 | 1.311 | 35 | 0.970 | 37 | 5.762 | 40 | 0.780 | 30 | 0.465 | 38 | 0.163 | 40 |
| China Agricultural Univ | 14.671 | 37 | 1.670 | 3 | 0.973 | 31 | 5.873 | 35 | 0.701 | 37 | 0.469 | 34 | 0.176 | 37 |
| Northwest A&F Univ | 14.040 | 38 | 1.681 | 2 | 0.972 | 32 | 5.881 | 34 | 0.665 | 41 | 0.469 | 33 | 0.177 | 34 |
| Beijing Inst of Technol | 13.944 | 39 | 1.269 | 44 | 0.969 | 39 | 5.728 | 41 | 0.724 | 36 | 0.475 | 6 | 0.159 | 41 |
| Xinjiang Univ | 12.921 | 40 | 1.369 | 30 | 0.975 | 28 | 5.997 | 28 | 0.571 | 42 | 0.463 | 39 | 0.193 | 29 |
| Univ of Electronic S&T of China | 12.847 | 41 | 1.281 | 42 | 0.950 | 44 | 5.428 | 43 | 0.768 | 33 | 0.448 | 43 | 0.147 | 42 |
| Minzu Univ of China | 12.104 | 42 | 1.464 | 19 | 0.977 | 19 | 6.049 | 27 | 0.535 | 44 | 0.448 | 42 | 0.199 | 27 |
| Northwestern Polytechnical Univ | 12.062 | 43 | 1.275 | 43 | 0.962 | 42 | 5.571 | 42 | 0.693 | 38 | 0.469 | 32 | 0.146 | 43 |
| National Univ of Defense Technol | 7.783 | 44 | 1.285 | 41 | 0.951 | 43 | 5.274 | 44 | 0.563 | 43 | 0.446 | 44 | 0.122 | 44 |
The Spearman rank-order correlations are provided in Table 3. The DIV* indicator correlates much more closely to the VARIETY and GINI indicator, as is to be expected since (1-GINI) is actually used to calculate DIV.* H owever, there is only a moderate correlation between the two true diversity indicators, True RS and DIV* at (ρ = 0.50; p < 0.01). Further, the rankings of the top five universities according to these two indicators are inconsistent. These unexpected results raise further questions.
Spearman's correlations for ranking order generated by Interd_vb.exe (N = 42).
| DIV* | TRUE RS | VARIETY | DISPARITY | (1 -GINI) | SIMPSON SHANNON | |
|---|---|---|---|---|---|---|
| DIV* | ||||||
| TRUE RS | .563** | |||||
| VARIETY | .926** | .323* | ||||
| DISPARITY | .215 | −.092 | .230 | |||
| (1 – GINI) | .936 | .717** | .772** | .074 | ||
| SIMPSON | .789** | .766** | .551** | .085 | .917** | |
| SHANNON | .911** | .734** | .725** | 087 | .990** | .950** |
Correlation is significant at the 0.01 level (2-tailed).
Correlation is significant at the 0.05 level (2-tailed).
The new element added to the Striling (2007) to the measurement of diversity and interdisciplinarity was disparity. In Table 3, disparity indeed is not significantly correlated with any of the other diversity indicators. Factor analysis of this data (Table 4) shows disparity (and variety) as a second component. Unlike True RS, DIV* captures both dimensions, as was Stirling's theoretical intention.
Factor analysis of the interdisciplinarity and diversity indicators (N = 42).
| Rotated Component Matrixa | ||
|---|---|---|
| Component | ||
| 1 | 2 | |
| True RS | .881 | −.133 |
| Shannon | .877 | .455 |
| (1-Gini) | .862 | .456 |
| Simpson | .830 | .390 |
| Div* | .703 | .657 |
| Variety | .329 | .853 |
| Disparity | .792 | |
Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. Rotation converged in 3 iterations; 85.1% of the variance explained.
As stated above, when applying interd_vb.exe, the terms of the cosine are pragmatically computed using co-occurrences in the sample in the numerator and the square roots of the products of sum of squares along the thus affiliated vectors x and y in the denominator. Disparity is then defined as the sum of local values of (1-cosine) over the set. This matrix is a “sample-dependent” local matrix since it reflects the disparity within the data samples. Consequently, these values vary with the data-sample used as input. It may often be convenient for analysts and developers to calculate the diversity values in this way (locally), particularly, when one has no access to a global disparity matrix. However, the systems of reference for the cosine-normalization are then different among samples.
In contrast to local disparity, using a global matrix solves (almost by definition) the problem of comparability across samples. To demonstrate the difference between “local” and “global” matrices, we recalculated the diversity scores using a global cosine matrix based on the full set of JCR data for 2019. These data include 236 subject categories in the Science and Social Sciences Citation Indexes (but not the 25 in the Arts & Humanities Citation Index).
The results for both DIV* and True RS are shown in Table 5, and Table 6 shows the Spearman's correlations for the ranking order of the two indicators.(3) As expected, the correlation between DIV* and True RS (or RS) increased (from 0.502 to 0.695), demonstrating that the consistency between different diversity indicator values can be improved by using a global matrix instead of a local matrix.
Local vs. global disparity using JCR data for 2019.
| University | DIV* | Rank | TRUE RS | Rank |
|---|---|---|---|---|
| Stanford Univ | 72.956 | 1 | 5.488 | 2 |
| Sun Yat-Sen Univ | 68.429 | 2 | 4.741 | 5 |
| Zhejiang Univ | 63.343 | 3 | 4.300 | 12 |
| Peking Univ | 62.654 | 4 | 4.632 | 8 |
| Shanghai Jiao Tong Univ | 60.907 | 5 | 4.643 | 7 |
| Sichuan Univ | 58.367 | 6 | 4.033 | 19 |
| Harvard Univ | 58.301 | 7 | 4.461 | 9 |
| Wuhan Univ | 56.903 | 8 | 4.702 | 6 |
| Northeastern Univ | 54.887 | 9 | 4.161 | 15 |
| Fudan Univ | 54.196 | 10 | 3.897 | 22 |
| Shandong Univ | 53.921 | 11 | 4.162 | 14 |
| East China Normal Univ | 51.757 | 12 | 4.309 | 11 |
| Xiamen Univ | 51.354 | 13 | 3.705 | 24 |
| Tongji Univ | 51.348 | 14 | 4.823 | 4 |
| Beijing Normal Univ | 50.815 | 15 | 5.082 | 3 |
| Huazhong Univ of S&T | 50.632 | 16 | 3.963 | 20 |
| Central South Univ | 50.535 | 17 | 4.107 | 17 |
| Nanjing Univ | 50.285 | 18 | 3.851 | 23 |
| Lanzhou Univ | 47.622 | 19 | 4.087 | 18 |
| Jilin Univ | 46.049 | 20 | 3.292 | 34 |
| Xi’an Jiaotong Univ | 45.655 | 21 | 3.664 | 25 |
| Tsinghua Univ | 45.121 | 22 | 3.601 | 26 |
| Zhengzhou Univ | 43.389 | 23 | 3.442 | 29 |
| Southeast Univ | 39.662 | 24 | 3.902 | 21 |
| Renmin Univ | 37.896 | 25 | 5.563 | 1 |
| Nankai Univ | 36.427 | 26 | 2.950 | 43 |
| Yunnan Univ | 36.236 | 27 | 4.382 | 10 |
| Univ of S & T – China | 35.002 | 28 | 2.876 | 44 |
| Tianjin Univ | 34.613 | 29 | 2.995 | 41 |
| South China Univ of Technol | 34.388 | 30 | 2.978 | 42 |
| Ocean Univ of China | 32.747 | 31 | 4.202 | 13 |
| Chongqing Univ | 32.519 | 32 | 3.260 | 35 |
| Hunan Univ | 32.394 | 33 | 3.379 | 32 |
| Dalian Univ of Technol | 31.933 | 34 | 3.355 | 33 |
| Harbin Inst of Technol | 30.166 | 35 | 3.191 | 36 |
| Beihang Univ | 30.029 | 36 | 3.508 | 28 |
| China Agricultural Univ | 29.258 | 37 | 3.396 | 31 |
| Northwest A & F Univ | 27.904 | 38 | 3.402 | 30 |
| Beijing Inst of Technol | 27.102 | 39 | 3.184 | 37 |
| Univ of Electronic S&T of China | 26.892 | 40 | 3.073 | 39 |
| Xinjiang Univ | 25.828 | 41 | 3.531 | 27 |
| Northwestern Polytechnical Univ | 23.873 | 42 | 3.031 | 40 |
| Minzu Univ of China | 22.645 | 43 | 4.132 | 16 |
| National Univ of Defense Technol | 16.236 | 44 | 3.118 | 38 |
Spearman's correlations for consistency of rank order – local vs. global disparity.
| DIV*_local | TRUE RS_local | DIV*_global | TRUE RS_global | |
|---|---|---|---|---|
| DIV*_local | ||||
| TRUE RS_local | .502** | |||
| DIV*_global | .996** | .516** | ||
| TRUE RS_global | .697** | .707** | .695** |
Correlation is significant at the 0.01 level (2-tailed).
With a correlation between the local and global values of DIV* at .996, DIV* is obviously not sensitive to the scaling. As Rousseau (2019) noted, the disparity in DIV* “is just a relative (normalized) sum.” With hindsight, this seems an advantage of DIV* when compared with True RS.
There are some interesting observations to be made in terms of the results of specific universities. Comparing Stanford University and Tsinghua University as examples, Stanford University ranks significantly higher than Tsinghua according to both DIV* and True RS, as shown in Table 4. The science overlay maps in Figures 1 and 2 illustrate this vividly (Carley et al., 2017; Leydesdorff et al., 2016; Rafols et al., 2010). Using VOS Viewer for the visualization (Waltman et al., 2010), each node represents a WoS category, and the size of the node indicates the number of publications.
Figure 1
Science overlay map of the publications with an address at Tsinghua University. [Note: The base map of disciplines was developed from the matrix of 227 × 227 cells of WoS categories. This was generated on the basis of direct citation counting and normalized with the cosine function (Carley et al. 2017).
Figure 2
The science overlay map of the publications associated with Stanford University. [Note: The base map of disciplines was developed from the matrix of 227 × 227 cells of WoS categories. This was generated on the basis of direct citation counting and normalized with the cosine function (Carley et al. 2017).
It is clear (on the basis of visual inspection of these two maps) that the category distributions of the two universities are very different. Stanford University obviously prioritizes research in Clinical Medicine, Biomedicine, and other medical disciplines, while Tsinghua University has a clear focus on Computer Science & Engineering, Material Science, and other Engineering fields. However, although each university has strengths in particular disciplines, the distribution of disciplines across Stanford's portfolio is more balanced than that across Tsinghua's.
DIV* values were more in line with our intuition about the diversity of these universities than the RS or True RS values. The latter, particularly worsen when the results are based on local disparity matrices. Using this local matrix, however, some field-specific universities like Ocean University of China and the Northwest Agriculture & Forestry University are found to have high diversity values with the True RS (and RS) indicators. These results raise further questions.
The results for RS/True RS are more sensitive than DIV* to the choice of similarity measures (Rafols & Leydesdorff, 2010). As Rousseau (2019) notes: “DIV, taking disparity into account as just a relative (normalized) sum” is not sensitive to scaling. In Eq. (8), disparity is only defined at the level of the sample; the interaction between category i and category j (pi and pj, respectively) with dij is not taken into account at the cell level, only the total sum of all disparity values is.
Table 2 (above) showed that the Ocean University has the highest True RS diversity of all universities. However, when checking the specific distribution of Web of Science categories, we found that more papers are published within Oceanography (14.01%) than any other category. Yet, Oceanography is a relatively marginal category in our sample, with much lower cosine similarities than other categories. As a result, the disparity (1-cosine) between Oceanography and other categories is much higher than on average, at a value of 0.73 vs 0.47, respectively. The extraordinarily high proportion of publications in Oceanography and the category's high disparity from other categories leads to an unexpectedly high diversity value when measured with RS/True RS. However, when using a global similarity matrix (Table 4), the scores of RS/True RS in most field-specialized universities decreased. As noted, these rankings were not affected by this effect when using DIV*.
The portfolio of papers with a Harvard address covers a wide range of categories and the distribution is relatively balanced. However, the cosine similarities of the categories with most publications are relatively high, i.e. they tend to have low disparity values, which results in a lower valus of RS/True RS when using a local similarity matrix. These empirical results suggest that RS diversity values based on a global disparity matrix provide results that are more in line with expectations. Therefore, insofar as a user has access to a global matrix one is advised to use this instead of the values generated endogenously by our software.
When universities operate in similar markets with the same institutional imperatives, such as tasks specified in national legislation, one might expect them to develop isomorphism (Halffman & Leydesdorff, 2010; Powell & DiMaggio, 1991; Wagner, Bornmann, Cai, & Leydesdorff, in preparation). However, our results indicate that universities do not tend toward isomorphism when it comes to comprehensiveness, as they do with impact. We reason that this is because impact is measured and prioritized in the bureaucratic frameworks of the state, whereas comprehensiveness is influenced by local opportunities, such as emerging technologies in the companies geographically or intellectually nearby. Hence, developing a deeper understanding of institutional comprehensiveness demands consideration of a broader context and more aspects of society, such as missions of specific universities.
Our analysis clarifies further differences between impact and comprehensiveness. Competition for impact pertains to quality, while competition for diversity/specialty pertains to differentiation. For example, shielding intellectual property rights is specific to a university's relations with industry. When it comes to comprehensiveness, the specificity of the knowledge content matters more than the formal criteria of measuring and comparing output and impact. In our opinion, interdisciplinarity, diversity, or comprehensiveness should not be considered another type of impact. While impact can be formalized across units of operation, e.g. faculties, departments, etc., after proper normalization, diversity or comprehensiveness remains content-based.
In other words, the analytical distinction between intellectual and social organization does not mean that the two dimensions can be traded off at the level of a university. On the contrary, one can expect a correlation, whether positive or negative, between the different types of research efforts. However, the differences between the two make it urgent that we develop a set of indicators for measuring diversity comparable to those of impact. By making an application available that allows users to generate the various measures of diversity for any data matrix, we hope to have contributed to this objective of quantifying and measuring diversity.
Finally, we note that although diversity is often used as a proxy for measuring interdisciplinarity, one should not expect any simplistic index to produce an informative outcome on its own (Abramo et al., 2018). The interpretations of the values of indicators should always be addressed according to the context, the purpose, and the specific object under study. The empirical analysis of the 42+ Chinese universities in terms of diversity measures not only relates to interdisciplinarity at the intellectual level, but also reflects comprehensiveness at the institutional level. Although comprehensiveness is not necessarily a goal of universities, it may reflect the status quo of disciplinary diversity within a university (or at least the structural feature of a disciplinary distribution). The measurement results of this study provide a knowledge base for understanding portfolios. A better understanding may provide new windows on potential policies and thus facilitate the development of interdisciplinarity within a university.
See for further details at http://www.leydesdorff.net/software/interdisc.2020/
The local cosine matrix was generated with interdisc_vb.exe; the global one was retrieved from http://www.leydesdorff.net/software/wc19. The cosine similarity matrix for the WoS categories based on JCR 2019 data is also provided at http://www.leydesdorff.net/wc15/wc19.