Going Breiger on White: Harrison C. White’s identity in...

Introduction

In 2006, Social Network Analysis was already about a century old. But Network Science, the research domain concerned with studying the structure and dynamics of complex networks in general, was still starting up. People were reading the books of Watts (2003) and Barabási (2003) and were trying to find small-world and scale-free properties of their network datasets. I was one of them. To make progress with network analysis, I had luckily secured a spot in the sociology graduate program at Columbia University. Unfortunately, Watts was on sabbatical, but fortunately, there was also Harrison C. White. I enrolled. The task of the class was to help rephrase his grand structural theory of social action called Identity and Control, eventually published as a second edition (White, 2008). Chapters were circulating for editing, and I was happy that no one had picked the one that seemed the lowest hanging fruit, the chapter on “Networks and Stories”. I had practical knowledge in centrality analysis and blockmodeling, but what were all those other concepts floating around, like “catnets”? Small-world networks were covered in the book. But where were scale-free networks?

Well-prepared and months into the rewriting, I put a scale-free “discipline” up for discussion that had intrigued me since I had seen it in Barabási’s book (2003, p. 233). Harrison said this model was going in the right direction but saw no need to change the three disciplines that were already in the book. How about adding a reference to scale-free networks, I asked. No need either, he replied, since he was already citing previous work by Ijiri & Simon (1977). During the discussion, Harrison with a respectful grin referred to me as “one of those techy types,” meaning those with a technical and formal take on networks. Though my input had just been rejected, I felt as if I had been knighted. The point of this introduction is that the social networks domain is nothing coherent. It harbors techy types and theoreticians like Harrison, it has currents and banks, and it consists of subdomains that hardly know each other’s work. Freeman (2004) has described the dynamics of styles in Social Network Analysis, including the influx of physicists from Network Science. I have called the combination of these two domains Social Network Science or SNS (Lietz, 2016).

In this paper, I take an empirical look at the intellectual position and influence of Harrison C. White, commonly abbreviated as HCW, in SNS. HCW was both a sociologist and a physicist. As a central figure in Social Network Analysis, did he give attention to Network Science? And did he get attention from the physicists once they entered the domain? More generally, which streams of SNS did he position himself into? And how did the streams of SNS refer to his works? Ultimately, these are questions of identity, and duality is a way to understand identity. Duality means that two phenomena or processes are mutually dependent, like two sides of one coin (Mützel & Breiger, 2021). Here, duality means that HCW is a story of SNS, and SNS is a story of HCW. By crafting methods and models that turned out to be very useful, he helped establish the domain, and the more he did so, the more he was received as the identity that helped establish the domain.

Empirically, I represent HCW and SNS by scholarly publications. These have an active or agentic aspect and a passive or – in the Durkheimian sense – factual aspect, both of which are indicative of identity. I construct the network of citing papers to see how HCW had actively positioned himself in SNS and the network of cited references to see how he was passively received and, hence, had influenced the domain. These two networks are known as the “bibliographic coupling” (Kessler, 1963) and “co-citation” (Small, 1973) networks in the scientometric literature. They are dual because references constitute the edges in the network of papers and papers constitute the edges in the network of references. My work is an application of a simple matrix calculus, intended to model the duality of persons and groups, introduced into sociology by Harrison’s student Ronald L. Breiger (1974). The title pun to “go Breiger” is from Lee & Martin (2018) who have generalized the approach. The duality of papers and references is depicted in Figure 1. It shows how the two kinds of networks can be projected from the bipartite network of papers and references.

Subdomains of Social Network Science

My source of materials is a data set of 24,873 citing publications making up the network domain of SNS as retrieved from the Web of Science database. Unfortunately, only the information we are not using here (authorship and word usage) is publicly available (Lietz, 2020). Publications range from 1916 to 2012. To infer subdomains, I employ a topic model based on Latent Dirichlet Allocation (LDA). This class of models is generative and was originally developed to identify patterns (topics) in text corpora. LDA being a generative model means that text documents are assumed to be generated from topics which are, in turn, assumed to be patterns of words contained in the documents (Blei et al., 2003). Note how LDA implements a duality of topic inference and document generation. The process of inferring topics as distributions over words is dual to the process of generating documents as distributions over topics.

LDA can easily be applied to citation data – the model does not care if the integers that are processed represent words or cited references. That means, the input of the topic model is a matrix of citing papers (rows) and cited references (columns). Publications that both cite and are cited are duplicated in rows and columns. SNS covers almost a century. Hence, I employ a dynamic topic model that accounts for the temporal structure of the data, the evolution of the domain (Blei & Lafferty, 2007). Since SNS grows exponentially, the domain was roughly similar in size in the first 80 years as well as in each of the last six years. Hence, I aggregated papers into nine almost equally sized time bins. The topic model requires the number of desired topics to be set, and five topics have proven to be well interpretable (Lietz, 2020). ⁽¹⁾

The subdomains described in Table 1 and Figure 2 are very well in line with those described by Freeman (2004) and those found using a hybrid clustering method based on citation and text data (Lietz, 2020). Historically, the domain had its origins in the topic that would later be called “social support”, bloomed in a period when it differentiated into the three subdomains “methods & models”, “organizations & innovation”, and “social capital”, and more recently changed face with the emergence of “complex networks”. The last subdomain is where the physicists dwell. Each subdomain has its own internal dynamics of change. For example, HCW’s foundational paper WHITE_1976_A_730 on blockmodeling was the most probable reference to be cited in the beginning of “methods & models”, but more recently it gave way to MCPHERSO_2001_A_415 on homophily.

Table 1.

Subdomains of SNS. Labels were derived from inspecting titles, abstracts, and author keywords. The size refers to the number of papers. Years are those in which the first 25%, 50%, and 75% of all papers in the subdomain were published.

Subdomain	Size	Quartile years
Social support	4,124	1995 / 2003 / 2009
Methods & models	1,822	2001 / 2007 / 2010
Organizations & innovation	5,293	2001 / 2007 / 2010
Social capital	5,280	2002 /2008 / 2010
Complex networks	8,354	2007 / 2009 / 2011

How HCW had positioned himself in SNS

In the SNS dataset, I have tried to identify all publications by HCW, both on the citing and cited sides. I have taken HCW’s bibliography from Azarian (2005) and completed the 112 references, which range from 1952 to 2003, by a few recent ones not listed therein. The Web of Science is mostly a database of journal papers, and the SNS dataset contains the 12 papers by HCW listed in Table 4.Figure 3 shows where these papers are positioned in the overall network of citing papers. The network is reduced to the maximum spanning tree to be visually interpretable. Node colors indicate subdomains (see Table 1 as a legend), and weighted edges give the number of cited references two papers have in common. Interpreting Figure 3 requires care because the maximum spanning tree is devoid of weak ties, which means, many pairs of papers that do have references in common do not have an edge in the figure. What we see is a skeleton of the whole structure.

The two papers on blockmodeling (WHITE_1976_A_730 and BOORMAN_1976_A_1384) are both found in the “methods & models” subdomain. According to Freeman, blockmodels helped institutionalize the domain because they solved a “methodological problem for which there was no method” (2004, p. 126). Accordingly, we find the two papers in the multicolored – hence: multi-subdomain – core of the network. 8 of 12 papers belong to “organizations & innovation”. Five papers are obvious fits. Three works on meanings in social dynamics (WHITE_1995_S_1035, MISCHE_1998_S_695, and MOHR_2008_T_485) add texture to that subdomain. Two other papers on meaning from dynamics (GODART_2010_P_567 and FONTDEVI_2011_S_178) co-constitute the “social capital” subdomain which is unified around core references to Putnam and Coleman but also to another student of HCW, Mark S. Granovetter.

We can further quantify HCW’s active citation behavior. Consider that the black nodes with numbers in Figure 1 are his papers. Since we know which subdomain a paper belongs to, we can compute the probability P_HCW of HCW to cite in a subdomain, reported in Table 2. P_HCW is 3.50 times higher in “methods & models” and 2.21 times higher in “organizations & innovation” than the baseline, the overall probability P to cite in a subdomain. To put this into perspective, we can compute a probability P_public of papers, citing references cited by HCW, to cite in a subdomain. If all black and numbered nodes in Figure 1 are HCW’s papers, then papers citing references cited by HCW are all neighbors of those nodes in the network of papers (left projection). P_public is a contextual probability to cite in a subdomain. It is no longer higher than expected for “methods & models”, which means, this subdomain is not where the references that HCW cites are more likely to be cited. It is only in “organizations & innovation” where his cited references resonate beyond chance. This is where HCW actively positioned himself after his initial methodological contributions to “methods & models”. Though he shared the physics background of those in “complex networks”, he and those similar to him gave less attention to that subdomain than expected.

Table 2.

Probabilities to cite in subdomains

Subdomain	P	P_HCW	P_public
Social support	0.18	0.00	0.08 (0.43)
Methods & models	0.06	0.22 (3.50)	0.07 (1.10)
Organizations & innovation	0.26	0.58 (2.21)	0.41 (1.55)
Social capital	0.26	0.20 (0.79)	0.30 (1.16)
Complex networks	0.24	0.00	0.15 (0.62)

How HCW was received in SNS

The cited side of the publication dataset is more complete than the citing side. After some additional cleaning, I found 61 references by HCW and 50 percent of the publications from Azarian’s bibliography in the reference lists of the 25k citing papers. The 16 references with more than 10 citations are listed in Table 5. The network of cited references is shown in Figure 4. Edge weights give the number of papers that co-cite pairs of references. Due to duality, subdomains are now indicated as edge colors, not node colors. With 564,412 nodes, the network is much larger. It is also reduced to the maximum spanning tree. Hence, Figure 4 depicts the skeleton that underlies the full structure. This uncovers many unicolored regions. These are the meaning structures that harbor the stories (methods and exemplars) specific to the corresponding subdomains.

Many of HCW’s references cluster in the red region, which means, they are dominantly co-cited by papers in “organizations & innovation”. These are his works on markets and general theory, including his books Chains of Opportunity, Careers and Creativity, Identity and Control, and Markets from Networks. The second largest group of references clusters in the bottom-left region. These are his works on structural equivalence and blockmodels. This whole region of the network is rather multicolored, which means, the references co-cited there get attention from many subdomains. This is particularly true for the large core of references in the middle of which we find HCW’s book Canvases and Careers. The references there get cited from all subdomains, including “complex networks” but excluding “social support”, because they are methods with broad application potential. These works make up the consensus that held large parts of the whole domain together towards the end of the data recording period. The blue region is the intellectual core of “complex networks”, and HCW’s publication on “Search parameters for the small world problem” (WHITE_1970_S_259) was picked up by the physicists. It is worth mentioning that the historical “social support” origin of the domain is rather decoupled from the rest of the network.

As with the citing side of things, we reflect these interpretations quantitatively in Table 3. The probability P_HCW of references by HCW to get cited is 4.09 times higher than expected in “methods & models” and 1.90 times higher in “organizations & innovation”. Again, we look at references neighboring his publications in the network to put these citations into context. The probabilities P_public of references, cited by papers citing references by HCW, to be cited are still higher than expected. While his influence on these two subdomains is quite large, his direct as well as contextual impact on “complex networks” is much lower than expected by chance.

Table 3.

Probabilities to get cited in subdomains.

Subdomain	P	P_HCW	P_public
Social support	0.18	0.02 (0.10)	0.02 (0.11)
Methods & models	0.06	0.26 (4.09)	0.17 (2.68)
Organizations & innovation	0.26	0.50 (1.90)	0.52 (1.96)
Social capital	0.26	0.15 (0.59)	0.19 (0.73)
Complex networks	0.24	0.08 (0.32)	0.11 (0.46)

The identities of HCW and SNS

We have inferred and described five subdomains of SNS. In his early career, HCW had actively contributed to the “methods & models” subdomain. When his focus shifted to markets and meanings, he settled in the “organizations & innovation” subdomain. At no point in the history of SNS were subdomains ever called out loud – which authority could have done this, anyway? – and taking position in a subdomain is no decision that actors would consciously make. Individual and collective identities emerge from the feedback loops we have modeled via duality. Decisions to cite a certain set of references in a scholarly paper are influenced by what others in the domain have cited in the past. Subdomain boundaries emerge because they reduce the amount of information one must process. Identities are “styles”, to use the proper concept from Identity and Control. They are sensibilities of sorting and selecting stories from a menu of options. As collectives behave in ways of styles, network domains and subdomains that are social networks conditioned on narratives form and enable future action (White, 2008, ch. 4).

As HCW’s works became cited and highly cited in subdomains of SNS and his name became increasingly connected to an idea, his identity became more and more factual and gained aspects of a set of stories. These are the stories of catnets, structural equivalence, blockmodels, relational markets, switching, meaning, and self-similarity. Cultural change is the rule, not the exception. As the techy types, themselves method developers and model builders like the early HCW, entered the domain in large numbers, the style HCW had already moved on towards less technical though still formal concepts in SNS. His connection to “complex networks” was through small-world networks. In 2000, he hired Watts as a sociology professor at Columbia University. Their exchange led to one of the few papers in Network Science that actually refers to the concept of identities (Watts et al., 2002). After Network Science somewhat reinvented the wheel (Lietz, 2016), it made tremendous progress by uncovering universal laws that underlie the complexity of networked systems. Harrison has by no means been quiet about this. According to Abbott, concepts in physics map to his theoretical concepts: “renormalization theory becomes self-similarity, gravitational strings become catnets” (1994, p. 897). These stories remain to be read outside the network domains where they are still confined.

I have employed the LDA Sequence Model implemented in Python’s Gensim package. To allow topics to change strongly over time, I have set the chain-variance parameter to a large value of 0.5. Data preprocessing has a strong influence on the outcome. To bring out topics that are defined in themselves, I have only kept publications that cite references that are at least cited five times.

Going Breiger on White: Harrison C. White’s identity in Social Network Science

Full Article

Paradigm

My account