PATHS AND SEMIPATHS: RECONCEPTUALIZING STRUCTURAL COHESION IN TERMS OF DIRECTED RELATIONS

Authors


Direct correspondence to Rick Grannis, Sociology Department, University of California, Los Angeles, 264 Haines Hall, Box 951551, Los Angeles, CA 90095-1551; e-mail: grannis@soc.ucla.edu.

Abstract

In a groundbreaking article, Moody and White (2003) introduced the concept of structural cohesion, simultaneously characterizing emergent communities and their internally embedded layers by the number of node-independent paths interconnecting individuals. Like many studies, however, they “corrected” the directionality discovered in some of their data. While often done for important purposes, doing so potentially confounds structural cohesion with unrelated concepts. Some relations, especially those relating to the dynamic aspects of social life, are inherently directed, in whole or in part, and it may prove worthwhile to respect this directionality. In this article, I recast structural cohesion in terms of directed social relations and identify four distinct ways of measuring it. In two example data sets—hiring relations among graduate programs and trust relations among neighborhood residents—I show that only strong embeddedness, a type of structural cohesion emerging from directed relations, proves to be a powerful, robust, independent explanatory factor. I further show that if the directionality in the data in these examples had been “corrected,” the importance of structural cohesion would have been dramatically undervalued.

1. PATHS AND SEMIPATHS: STRONG AND UNILATERAL STRUCTURAL COHESION

In a groundbreaking article, Moody and White (2003) introduced the concept of structural cohesion, simultaneously defining a group property characterizing emergent communities and their internally embedded layers as a function of the number of node-independent paths interconnecting nodes.1 They noted that, analytically, solidarity could be partitioned into an ideational component, referring to members' sense of this togetherness and identification with a collectivity, and a relational component, referring to the actual connections among members of the collectivity. They argued that the most important qualitative relational feature to focus on in understanding solidarity is the dependence of a group upon particular individuals for its connectedness and the appropriate quantitative measure of this feature is the minimum number of individuals who, if removed from a group, would disconnect the group. This definition identifies the cohesion of a set of actors, of subsets nested within this set, and of individual actors embedded in the set. Using the logic of the flow of social and cultural resources within a social network, Moody and White (2003) argued quite effectively for the importance of this conception.

Their discussion, however, ignored the potential for directionality inherent in these flows. The importance of social flow for social life makes a focus on directionality vital. Social life consists, in part, of the flow and exchange of social goods and resources (norms, values, symbols, etc.) among various social actors. While social structure is often conceived of as static and the product of mutual, two-way relationships, social flow is often directed.

In this article, I recast structural cohesion in terms of potentially directed social relations and identify four ways of measuring structural cohesion: (1) recursive, (2) strong, (3) unilateral, and (4) weak. I show that, whenever researchers treat innately directed relationships as if they were undirected, they implicitly choose one of these types. Each type, however, offers different implications for our understandings of community and social roles. In Section 2, I formally outline the four types of directed structural cohesion and argue that, in many cases, analysis should maintain the directionality inherent in data. I then empirically demonstrate the importance of a focus on directed structural cohesion with two examples: sociology graduate programs sending their PhDs to other sociology graduate programs (in Section 3) and trust relations among residents in a neighborhood (in Section 4). Section 5 concludes the article.

2. STRUCTURAL COHESION, EMBEDDEDNESS, AND DIRECTED SOCIAL RELATIONS

2.1. Paths, Semipaths, and Structural Cohesion

Moody and White's (2003) analysis focused only on undirected relations and did not explore structural cohesion emerging from directed relations. In fact, to conduct their analysis, they modified one of their data sets to treat it “as if” the relationships had been reciprocated (Moody and White 2003, ftn. 16). Despite this common practice, a long line of social psychological research indicates that often they are not (Heider 1979; Newcomb 1981; Wellman 1988). Some relations are inherently reciprocal; others, however, are inherently directional; and, some are a complicated mixture of both.2 In many instances, there may be much to gain by respecting the directionality inherent in the data,3 and we should at least check the assumption that directionality can be safely ignored.

To model directed social relations and the social structure emerging from them, I use the language of directed graphs, or digraphs.4 A relation is directed if it is oriented from one actor to another. We can represent a directed relation between actors as an arc between two nodes in a digraph directed from the origin or sender to the terminus or receiver.5 Thus there are two potential arcs between any pair of nodes, one going in each direction. Ignoring directionality assumes that either both exist if either exists or that neither exists unless both do.

A path is a sequence of distinct nodes and arcs, beginning and ending with nodes, in which each arc has its origin at the previous node and its terminus at the subsequent node. A semipath is also a sequence of distinct nodes and arcs, where all successive pairs of nodes are connected by an arc from the first to the second, or by an arc from the second to the first. Essentially, in a semipath the direction of the arcs is irrelevant. Thus, every path is a semipath, but not every semipath is a path. Two paths, or semipaths, from i to j are node-independent if they have only nodes i and j in common. Thus, paths 1 and 2 between nodes i and j are node-independent if none of the intermediaries on path 1 are also intermediaries on path 2.

A network's k-connectivity is simultaneously equal to the minimum number, k, of nodes that, if removed, would disconnect the network and the minimum number of node-independent paths, k, connecting each pair of actors in the network.6 A k-component consists of a maximal set of nodes with a maximal node connectivity layer of k. Essentially, in a k-component, the nodes that form each of the k paths must also be members of the component. Moody and White (2003) explicitly equated a network's structural cohesion to its k-components. In their operationalization of the data, however, they did not actually measure paths but instead measured semipaths since they ignored the potential directionality of the ties (see Moody and White 2003, ftn. 16).

Below, I distinguish four types of structural cohesion—weak, unilateral, strong, and recursive—which emerge when we consider directed data. Distinguishing these types is not merely a mathematical idiosyncrasy, but rather has potentially profound implications in many situations. For example, when we consider the flow of information or influence, this flow is naturally directed; it goes from one social actor to another. While some, at times perhaps most, of it is reciprocated, it is still the net sum of directed flows and measuring it as such is more faithful to the data and thus important for more accurate theoretical conclusions.

2.2. Weak Embeddedness

For a digraph, a pair of nodes, i and j, is weakly connected if the nodes are joined by a semipath (see Figure 1). (This usage of “weak” has no correspondence whatsoever to Granovetter's [1973]“weak ties.”)

Figure 1.

The four types of structural connectivity.

When we consider the k-connectivity of a set of nodes, their weak k-connectivity equals the minimum number, k, of node-independent semipaths connecting each pair of nodes in the set. Figure 2 displays seven nodes and eleven directed arcs. All nodes are weakly connected to each other. All nodes except node 7 are weakly 2-connected to each other; nodes 2 and 3 are weakly 3-connected to each other; and nodes 1 and 3 are weakly 4-connected to each other, as shown in Table 1(a).

Figure 2.

K-connectivities illustrated.

Table 1. 
Pairwise Connectivities of Nodes in Figure 2
(a) Weak k-Connectivity
 Node 1Node 2Node 3Node 4Node 5Node 6Node 7
Node 10242221
Node 22032221
Node 34302221
Node 42220221
Node 52222021
Node 62222201
Node 71111110
 
(b) Unilateral k-Connectivity
 
 Node 1Node 2Node 3Node 4Node 5Node 6Node 7
 
Node 10121221
Node 21011211
Node 32101221
Node 41110111
Node 50000000
Node 60000000
Node 70000000
 
(c) Strong k-Connectivity
 
 Node 1Node 2Node 3Node 4Node 5Node 6Node 7
 
Node 10121000
Node 21011000
Node 32111000
Node 41110000
Node 50000000
Node 60000000
Node 70000000
 
(d) Recursive k-Connectivity
 
 Node 1Node 2Node 3Node 4Node 5Node 6Node 7
 
Node 10010000
Node 20000000
Node 31000000
Node 40000000
Node 50000000
Node 60000000
Node 70000000

To be in a k-component, however, is different from being k-connected, as it requires the nodes that form the path to also be members of the component. While every node in a k-component is k-connected to every other node in the k-component, not every set of k-connected nodes is a k-component. In Figure 2, nodes 1 and 3 are connected through four node-independent semipaths but three of these involve other nodes; similarly, nodes 2 and 3 are connected through three node-independent semipaths but two of these involve other nodes. Nodes 1, 2, 3, 4, 5, and 6 are each connected to the others through two node-independent semipaths and these use only the connected nodes as path members. Thus, while all nodes are members of a weak 1-component, the only other connectivity that translates into a component is that all nodes except node 7 are members of a weak 2-component.

This distinction highlights the need to define one more concept, the k-connectivity, or the embeddedness, of a single node. For a single node, its weak embeddedness equals the highest valued weak k-component of which it is a part. Thus, in the example illustrated in Figure 2, nodes 1, 2, 3, 4, 5, and 6 have a weak embeddedness of 2 and node 7 has a weak embeddedness of 1 (see Table 2).

Table 2. 
Embeddedness of Nodes in Figure 2
 Recursive EmbeddednessStrong EmbeddednessUnilateral EmbeddednessWeak Embeddedness
Node 11122
Node 20122
Node 31122
Node 40112
Node 50022
Node 60022
Node 70011

The directed flow of information, influence, or other social resources may not occur at all between social actors who are only weakly connected.7 Thus, weak connectivity potentially confounds structurally relevant connectivity with apparent connectivity. In the examples, I will show that this may cause us to underestimate the importance of structural cohesion.

2.3. Unilateral Embeddedness

For a digraph, a pair of nodes, i and j, is unilaterally connected if they are joined by a path from i to j, or a path from j to i, or both (see again Figure 1).8 Notice that this is a stricter form of connectivity than weak connectivity, and thus it implies the latter (e.g., if a pair of nodes is unilaterally connected, they are also weakly connected; but the reverse is not necessarily true). It should be noted that the term unilateral is graph-theoretic (Bang-Jensen and Gutin 2001), having no necessary implications beyond the fact that the path may potentially orient in only one direction.

When we consider the k-connectivity of a set of nodes, their unilateral k-connectivity equals the minimum number, k, of node-independent paths, not merely semipaths, connecting each pair of nodes in the set. Note that these paths may orient in only one direction. A set of nodes' unilateral k-connectivity will always be less than or equal to its weak connectivity since it will not include those semipaths that are not also paths. Examining Figure 2 again, nodes 1, 2, 3, and 4 are unilaterally connected to each other and each individually to nodes 5, 6, and 7. Nodes 1 and 3 are unilaterally 2-connected to each other and each individually to nodes 5 and 6. Node 2 is further unilaterally 2-connected to node 5, as shown in Table 1(b).

Defining unilaterally connected components is a bit more complicated. Nodes 1, 2, 3, and 4 are unilaterally connected to each other and they are also each individually connected to nodes 5, 6, and 7. Nodes 5, 6, and 7, however, are not unilaterally connected to each other and therefore cannot as a set be members of the unilateral component. Each of them individually, however, is a member of a unilateral component involving nodes 1, 2, 3, and 4. Therefore, there are three overlapping unilateral components (1, 2, 3, 4, and 5; 1, 2, 3, 4, and 6; and, 1, 2, 3, 4, and 7).9 While these three unilateral components overlap, they are distinct. This ability to overlap is a unique quality of unilateral components.

Even higher unilateral k-components exist. Nodes 1, 3, and 6 are members of a unilateral 2-component. There are two node-independent paths connecting each pair of nodes and they use only these three nodes as path members. Also, nodes 1, 2, 3, and 5 are members of a unilateral 2-component. There are two node-independent paths connecting each pair of nodes and they use only these four nodes as path members. Again, while these two 2-components overlap, they are distinct.

For a single node, its unilateral embeddedness equals the highest valued unilateral k-component of which it is a part. Thus, nodes 1, 2, 3, 5, and 6 would have a unilateral embeddedness of 2 while nodes 4 and 7 would have a unilateral embeddedness of 1.

2.4. Strong Embeddedness

For a digraph, a pair of nodes, i and j, is strongly connected if there is a path from i to j, and a path from j to i, although the path from i to j may contain different nodes and arcs than the path from j to i (see again Figure 1). As a consequence of this definition, two strongly connected nodes are also on a cycle with each other. A cycle is similar to a path except that the beginning and ending node are the same. Notice that this is a stricter form of connectivity than weak or unilateral connectivity, and thus it implies both (e.g., if a pair of nodes is strongly connected, they are also unilaterally and weakly connected; but the reverse is not necessarily true).

When we consider the k-connectivity of a set of nodes, their strong k-connectivity equals the minimum number, k, of node-independent cycles shared by each pair of nodes in the set. The strong k-connectivity of a set of nodes will always be less than or equal to its unilateral or weak connectivity. Examining again Figure 2, we note that nodes 1, 2, 3, and 4 are strongly connected to each other and that nodes 1 and 3 are further strongly 2-connected, as shown in Table 1(c).

While nodes 1, 2, 3, and 4 are members of a strong 1-component, there is no strong 2-component since the second strong path between nodes 1 and 3 must utilize other nodes. For a single node, its strong embeddedness equals the highest valued strong k-component of which it is a part. Thus, in the example illustrated in Figure 2, nodes 1, 2, 3, and 4 have a strong embeddedness of 1, while nodes 5, 6, and 7 have a strong embeddedness of 0.

Strong connectivity is interesting in that it is quite possible to have strongly connected k-components without having any cohesion at lower levels. In this sense, strong structural cohesion is a truly emergent property. For example, in Figure 3 there are seven nodes. Each node has a directed arc either to or from every other node but in no cases both. There are three asymmetric cycles (1 > 2 > 3 > 4 > 5 > 6 > 7 > 1; 1 > 3 > 5 > 7 > 2 > 4 > 6 > 1; and 1 > 4 > 7 > 3 > 6 > 2 > 5 > 1) and there are two node-independent paths between each pair, which means they are all strongly 2-connected. Because each of the seven nodes are strongly 2-connected to every other one of the seven nodes, all seven nodes are members of a strong 2-component, but not a single edge is reciprocated.10 Furthermore, there is no subset of five or fewer of these nodes that are strongly connected at all.

Figure 3.

The emergent nature of strong k-connectivity.

2.5. Recursive Embeddedness

For a digraph, a pair of nodes, i and j, is recursively connected if there is a path from i to j, and a path from j to i and if the path from i to j uses the same nodes and arcs as the path from j to i, in reverse order (see again Figure 1). Notice that this is the strictest form of connectivity and thus it implies all of the others (e.g., if a pair of nodes is recursively connected, the nodes are also strongly, unilaterally, and weakly connected; but the reverse is not necessarily true).

Recursive connectivity identifies a subset of those who are strongly k-connected, at any level k, engaged in one particular relationship, giving and receiving trust along the exact same paths. To understand this distinction, imagine person 1 transmitted norms and values to person 2 who then passed them along to person 3 while person 4 transmitted norms and values to person 5 who then passed them along to person 6. Recursive connectivity distinguishes the case where person 3 transmitted them back to person 1 through person 2 from the case where they did so through person 5 (and similarly, the cases where person 6 transmitted them back to person 4 through either person 5 or person 2). In both cases, recursive connectivity focuses solely on the former condition and completely ignores the latter (see Figure 4).

Figure 4.

Contrasting recursive and strong connectivity. Note that these figures are not meant to represent the entirety of the relevant networks, or even all of the paths between the displayed nodes. They rather serve only as examples of path types between nodes 1 and 3 (as well as between nodes 4 and 6).

Does this distinction make a difference? Arguably, if the same intermediary brokers social resources in both directions between two persons, this gives them power over both directions of the flow between these two particular persons, whereas if connectivity was strong but not recursive, a single intermediary could disrupt one, but not both, directions of the flow. However, in both scenarios persons 2 and 5 have power over the same amount of flows between the same numbers of people. When we consider the entirety of the scenarios, and not particular persons, their power is unchanged.

When we consider the k-connectivity of a set of nodes, their recursive k-connectivity equals the minimum number, k, of node-independent recursive paths connecting each pair of nodes in the set. The recursive k-connectivity of a set of nodes will always be less than or equal to its strong, unilateral, or weak connectivity. Examining Figure 2, we see that nodes 1 and 3 are recursively connected to each other and are also members of a recursive 1-component as the recursive path includes only these two nodes, as shown in Table 1(d).

For a single node, its recursive embeddedness equals the highest-valued recursive k-component of which it is a part. Thus, nodes 1 and 3 have a recursive embeddedness of 1, while all other nodes have a recursive embeddedness of 0.

2.6. The Implications of Translating Directed Data into Undirected Data

Although much research has focused on how to correctly determine whether an undirected relation exists or does not exist given only a directed relation, it needs to be emphasized that such research implicitly assumes that the underlying relations are undirected. It is certainly true that some relations are basically undirected in nature and should be treated as such. However, it is equally true that some relations are basically directed in nature and should also be treated as such. I do not mean to imply that all relations should be treated as directed; I am saying only that this decision needs to be made theoretically, not as a consequence of available methodological tools. Some things are nails and some things are screws and the determination of which is which should not depend upon whether we have only a hammer.

Researchers often “clean” or simplify a set of data by converting it from directed to undirected. I argue that doing so has potentially profound theoretical implications when measuring structural cohesion because the method researchers choose to analyze their data predetermines the embeddedness type they are able to measure. There are essentially two methods in which directed data are translated into undirected relations. The most common is to assume a relation exists in both directions if it can be shown to exist in either direction, to treat all relationships “as if” they had been reciprocated. The less common method involves assuming that no relation exists in either direction unless it can be shown to exist in both directions.

Measuring structural cohesion on data transformed through the first method measures weak connectivity, while measuring structural cohesion on data transformed through the second method measures recursive connectivity. If relations are truly directed, then weak connectivity confounds structural cohesion with unrelated concepts by including semipaths, which are unable to transmit social resources, in its calculations. When studies generate significant findings measuring weak connectivity, this may result from the fact that, since all other forms of connectivity are subsets of it, weak connectivity is confounded with them. The concern with recursive connectivity, in contrast, arises not from the paths it includes but from those it excludes; all recursively connected paths may be valid, but in many instances they are not the only ones that are valid.

Converting directed data to undirected relations predetermines the embeddedness type that we index. I now show that such a predetermination has potentially dramatic empirical implications.

3. PH.D EXCHANGE NETWORK

To demonstrate that we may reach substantially different conclusions about important issues by retaining information about the directionality of social relations, I explore two quite different examples, sociology graduate programs sending PhD students to other sociology graduate programs and trust relations among residents in a neighborhood closely associated with a gang. In both cases, I make no attempt to treat the subtle theoretical and substantive issues. Instead, my purpose is to show that a focus on directionality more accurately represents these social networks and captures our intuitions about these examples as well as better accounting for important outcome variables.

The first example I examine is sociology graduate programs sending PhD students to other sociology graduate programs. To create this network model, I consulted the American Sociological Association's Guide to Graduate Programs of Sociology for information on the institution where each full-time faculty member at a graduate program received his or her PhD. Of the 216 sociology graduate programs in the United States, 124 produced PhDs who were currently employed at another U.S. sociology graduate program. I treated these 124 programs as nodes in a network with a directed arc from program A to program B if program B hired as professors individuals who received their PhD from program A.11 This network is clearly directed;12 only 130 of the 1137 directed relations (11 percent) are reciprocated.

To some extent, these PhD students may influence the norms, values, symbols, ideas, and beliefs of their social contacts at their new graduate programs. This influence has implications. Kuhn (1970) argued that belief in the empirical validity of a theory could be sustained long past the available empirical evidence if scientists were embedded in research communities who systematically interpreted data in similar ways. Similarly, Friedkin (1998) showed that scientists generate consensus by exchanging ideas, research questions, methods, and implicit rules for evaluating evidence with their collaborators. Martin (2002) argues that we can link the shape of an idea space to the structure of a network. Thus, the set of ideas we hold to be true is largely a function of the group of people we interact with, and belief consensus depends critically on the shape of the underlying social network (Burt 1987).13 It would follow that much of the sense of the unity of U.S. sociology comes from the primary interaction that graduate programs have with one another, exchanging PhD students. It is the mechanism by which they “reaffirm the boundaries of the group” and express “mutual affirmation” (Burris 2004).

I suggest that the social structure that results from the flow of PhD students may be mirrored in the cognitive frameworks through which individuals interpret their social world and their own interactions within it. In the case of sociology graduate programs, these cognitive frameworks are best indicated by the rankings sociologists ascribe to the various graduate programs regarding the scholarly quality of their faculty and the effectiveness of the program in educating researchers.14 To analyze these rankings, I used the 1995 ranking of graduate programs in sociology done by the National Research Council.15 I hypothesize that the more embedded a graduate program is in the U.S. sociology PhD exchange network, the more important it will be perceived to be in U.S. sociology.

3.1. Strong Embeddedness

I begin by examining strong embeddedness. Most sociology graduate programs exist in a single strongly connected component, cycling PhD students amidst themselves.16 This component contains 104 graduate programs with the remaining 20 graduate programs being isolates. This component has six structurally cohesive layers with a median embeddedness of 3 (and an arithmetic mean of 2.98). Thus, graduate programs are typically embedded in three cycles of strong connectivity, with each cycle involving distinct graduate programs. Figure 5(a) displays the strong component with graduate programs shaded by embeddedness level. Figure 5(b) displays only those graduate programs in the top (6th) level of strong embeddedness and the relations between them. This smaller set of labeled nodes not only allows the reader to trace connectivity by sight but also provides a substantive sense of the relation between embeddedness and rank (see below). Figure 5(c) displays the network that remains when those in the strong 6-component have been removed. While less dense than the entire network (37 percent of all ties involved the 11 programs in the strong 6-component), it is still quite connected.

Figure 5(a).

The strong component within the sociology PhD exchange network. Nodes represent graduate programs; arcs indicate transfer of PhD students; node shade indicates embeddedness level (black = 6 layers; white = 1 layer).

Figure 5(b).

The strong 6-component within the sociology PhD exchange network.

Figure 5(c).

The strong component within the sociology PhD exchange network. Nodes represent graduate programs; arcs indicate transfer of PhD students; nodes in the strong 6-component have been removed.

While the network contains departments that send and receive a disproportionate number of PhDs, it is not the case that a single graduate program can act or behave in such a way as to substantially influence the diffusion of PhD students because typically two alternative paths, of which they are not a member, exist.17 Therefore, this exchange of PhD students so infuses departments that it creates a real social entity of U. S. sociology, at least for the 104 members of the strongly connected component.

Exactly how does strong embeddedness relate to the rankings of the various graduate programs? The strongly connected component matches quite well with the generally understood set of ranked graduate programs. Of the 104 graduate programs in the strongly connected component, 88 of them are among the 95 graduate programs ranked by the National Research Council. The seven ranked graduate programs that are not members of the strongly connected component have an average rank of 85th and include four of the bottom six rankings. Furthermore, within the strongly connected component, increasing embeddedness corresponded to higher rank. Those only in the first layer have an average rank of 81st, those in the second layer 53rd, in the third layer 50th, in the fourth layer 33rd, and in the top two layers 9th. These mean differences are highly significant (p << .001) and eta2= .802, clearly indicating a powerful monotonic relationship between strong embeddedness and rank. While the embeddedness layers are an interval variable, the rankings are probably not. If we treated them as such, however, the analysis yields consistent findings with an extremely strong linear relationship between strong embeddedness and rank with an adjusted r2= .763 (p << .001). However we choose to approximate it, it appears that more than three-fourths of the variation in rankings appears to be captured by strong embeddedness layer for the PhD exchange network. Strong embeddedness corresponds to graduate program ranking in a powerfully monotonic, almost linear, fashion. Thus, the social structure that results from the exchange of PhD students is mirrored in the rankings ascribed to the various graduate programs.

3.2. Strong Embeddedness and Traditional Network Measures

Because other network measures can be shown to relate to these rankings, we need to explore how the explanatory power of strong embeddedness compares with that of more traditional network measures, both nodal level (e.g., out-degree, in-degree,18 clustering19) and network level (e.g., out-closeness, in-closeness,20 betweenness21). Altogether, these measures have about the same explanatory power as strong embeddedness alone and three of them—out-closeness, betweenness, and out-degree—prove significant (Table 3, Model 1). Including strong embeddedness in the model leaves only strong embeddedness, betweenness, and out-degree significant (Model 2).

Table 3. 
Regression of Sociology Graduate Program Ranking on Embeddedness Types and Traditional Network Measures
Model  1  2  3  4  5
  1. Note: Standard errors in parentheses.

  2. *p < .05 **p < .01 ***p < .001

Strong embeddedness −7.209   
 (2.317)***   
Unilateral embeddedness  −0.883  
  (0.516)  
Recursive embeddedness   −.991 
   (2.016) 
Weak embeddedness    −2.077
    (1.222)
Out-closeness−1.249−.543−1.153−1.214***−.978***
(.246)***(.326)(.250)***(.257)(.291)
In-closeness−48.081−25.225−42.369−44.530−33.950
(33.121)(32.426)(24.589)(34.041)(33.805)
Betweenness2.9462.3072.5602.805*1.977
(1.025)**(1.036)*(1.080)*(1.069)(1.163)
Out-degree−.78−.695−.544−.709**−8.48***
(.193)***(.186)***(.235)*(.241)(.195)
In-degree−.579−.53−.184−.564−.080
(.5)(.477)(.545)(.503)(.575)
Clustering−44.112−45.632−45.369−43.109−68.507*
(24.849)(23.7)(24.589)(25.041)(28.466)
Constant336.191225.368312.772318.839*290.030
(151.681)*(148.957)(150.646)*(156.378)(152.497)
Adjusted R2 .780*** .800*** .785*** .778*** .784***

We can directly compare the explanatory power of strong embeddedness against those other variables that also proved significant (e.g., out-degree, out-closeness, and betweenness) by examining partial correlations (Table 4). Strong embeddedness maintains robustness in the presence of the other measures while they lose significance in its presence.

Table 4. 
Correlations Between Ranking and Various Network Indices (Columns) Partialling for Each Network Index (Rows)
Correlation Between Rank and …Strong EmbeddednessUnilateral EmbeddednessRecursive EmbeddednessWeak EmbeddednessOut-degreeOut-closenessBetweenness
  1. *p < .05 **p < .001

Partialling for strong embeddedness −.124  −.044  −.170  −.181  −.060  −.026 
Partialling for unilateral embeddedness−.758** −.542**−.242**−.607**−.740**−.081 
Partialling for recursive embeddedness−.690**−.270** −.383**−.445**−.638**−.092 
Partialling for weak embeddedness−.846**−.549**−.576** −.767**−.784**−.254*
Partialling for out-degree−.631**−.341**−.238**−.386** −.564** .129 
Partialling for out-closeness−.539**−.430**−.380**−.152**−.405**  .023 
Partialling for betweenness−.870**−.629**−.733**−.482**−.781**−.822** 

In sum, strong embeddedness powerfully accounts for the variability in rank. Other measures appear to be more like misspecifications of strong embeddedness than independent explanations since they offer no additional explanatory power beyond strong embeddedness, but strong embeddedness offers additional explanatory power beyond them.

3.3. Unilateral Embeddedness

The unilateral embeddedness of a set of nodes equals the minimum number, k, of node-independent paths connecting each pair of nodes in the set. Unilateral embeddedness, to the extent to which it is distinct from strong embeddedness, is about flow from one set of social actors (e.g., graduate programs) to another in one direction only; it is about diffusion, not exchange. All 124 graduate programs exist in a single unilaterally connected component. This component has 17 structurally cohesive layers with a median embeddedness of 8 (and an arithmetic mean of 9.2). Thus, graduate programs' unilateral embeddedness is quite intense, typically involving nine paths, each of which involves distinct graduate programs. Unilateral embeddedness is greater than strong embeddedness since strong embeddedness is necessarily a subset of unilateral embeddedness, specifically the subset that cycles back on itself.

Within the unilaterally connected component, increasing embeddedness corresponded to higher rank ranging from 92nd for those in the second and third layers (none of the ranked graduate programs had a unilateral embeddedness less than 2) to 9th for those in the 17th layer. These mean differences are highly significant (p << .001) and eta2= .640, clearly indicating a powerful monotonic relationship between strong embeddedness and rank. Furthermore, unilateral embeddedness displays a linear relationship with rank with an adjusted r2= .603 (p < .001). It appears that about three-fifths of the variation in rankings appears to be captured by the unilateral embeddedness layer for the PhD exchange network.

Unilateral embeddedness, however, is not significant in the presence of the traditional network measures (Table 3, Model 3). Examining partial correlations (Table 4) shows that it is a little less powerful than the other measures and loses some of its robustness in their presence. It loses significance entirely in the presence of strong embeddedness, as do the other measures.

3.4. Ignoring Directionality

If we had ignored directionality, how closely would recursive or weak embeddedness have approximated the results discovered by strong embeddedness? For the PhD exchange network, recursive embeddedness identified less than half (51 out of the 104) of the strongly connected graduate programs distributed across four embeddedness layers.22 Weak embeddedness, in contrast, identified all graduate programs to be in a single component with 31 embeddedness layers.

Increasing recursive embeddedness does correspond to higher rank. A strong linear relationship appears between recursive embeddedness and rank with an adjusted r2= .587 (p < .001).23 Thus, it initially appears to be a less powerful version of strong embeddedness. Including traditional network measures as explanatory variables (Table 3, Model 4), however, leaves recursive embeddedness insignificant.

Similarly, weak embeddedness also linearly relates to rank, although much less powerfully with an adjusted r2= .273 (p < .001).24 Most importantly, however, as with recursive embeddedness, including traditional network measures as explanatory variables (Table 3, Model 5) leaves weak embeddedness insignificant.

Comparing the four types of embeddedness to each other (Table 4) and partialling each one against the others shows that strong embeddedness maintains its robustness in the presence of the other variables but that the other types of embeddedness maintain neither robustness nor significance in its presence.

We clearly reach substantially different conclusions by retaining information about the directionality of social relations. While strong, unilateral, and recursive embeddedness all powerfully relate to graduate program ranking, only strong embeddedness maintains robustness in the presence of traditional network measures. Unilateral, recursive, and weak embeddedness lose robustness and significance in the presence of out-closeness and out-degree. In contrast, these traditional network measures maintain robustness in the presence of these forms of embeddedness. All other forms of embeddedness and all traditional network measures lose robustness (and usually significance) in the presence of strong embeddedness.

It is unsurprising that strong embeddedness is the most powerful corollary of rank since the exchange of PhD students is clearly directed in nature and strong embeddedness therefore most closely measures intuitive descriptions of connectivity, measuring the extent to which there are cycles of PhD students flowing back and forth among graduate programs. That the other network measures also lose explanatory power in the presence of strong embeddedness is intriguing, however. For example, with reference to the centrality measures, this would suggest that it is not a program's control of the flow of PhD students that relates to its rank (an intuition suggested by the centrality measures) but rather the depth of its immersion in that flow.

4. NEIGHBORHOOD TRUST NETWORK

The second example I explore involves trust relations among residents in a neighborhood closely associated with a gang. During two years of participant observation, I identified hundreds of respondents as part of an adaptive link-tracing method that began with multiple convenience samples.25 The initial respondents were asked to identify others whom they knew personally and they believed to be “important” or “influential” in the neighborhood. This process was repeated across a period of two years, although by the second year very few new names were generated.26

The network I focus on was based on bonds of trust and loyalty, specifically among youth and young adults (aged 12 to 25). While there were several ways these sentiments were expressed in interviews, one succinct and poignant phrase was that someone was “for you.” This phrase was used widely throughout the community as an expression of trust in someone else's loyal commitment to one's well being.27 During the course of my interviews, respondents were asked to identify who in the neighborhood was “for them.” One hundred and thirty-eight respondents identified 792 alters who were “for” them. Excluding the 59 parents of the person doing the nominating, 733 alters were identified, 129 of these being unique members of the respondent set. I treated individuals as nodes in a network with a directed relationship from person B to person A if person A reported that person B was “for them.”28 While more reciprocal than the academic network, this network is clearly directed as well with only 390 of the 733 relations (53 percent) being reciprocal.

While both the nodes and the relations comprising the PhD exchange network are easily documentable, this network is less so. Our sample did, however, capture the vast majority of those identified as “for” another respondent. Eighty-eight percent of all those identified by one respondent as “for” them, who were also in the correct age range, were actually interviewed. Specifically, 20 individuals were identified as “for” one of the respondents but were not themselves successfully interviewed. To deal with this missing data, I performed all calculations in three distinct ways: treating the 20 arcs as if they would have been reciprocated in all cases if the alter had been interviewed; treating the 20 arcs as if it would have been unreciprocated in all cases if the alter had been interviewed; and, excluding all 20 arcs in which the alter was not interviewed. In all results reported, modifying the data in these ways to account for missing values fails to change any significant coefficient by more than 2 percent.

In this neighborhood, the local gang was in direct, albeit somewhat friendly, competition with the local Catholic youth group. While most of the youth and young adults (even the handful of Protestants) regularly attended gang parties as well as events of the Catholic youth group, most differed as to which they attributed priority. In the interviews, I asked respondents to rate their identification with either the gang or the Catholic fellowship on a 5-point Likert scale, with 4 representing complete identification with the gang, 0 representing sole identification with the Catholic fellowship, and 2 representing an equal identification with both.29Figure 6 displays the network with node shading ranging from black (indicating neighborhood identify of 4) to white (indicating neighborhood identify of 0).

Figure 6.

The neighborhood trust “for you” network. Node shading indicates identification ranging from black (complete identification with gang) to white (complete identification with Catholic fellowship).

I hypothesize that people tend to identify similarly to others to the extent to which they are structurally embedded with them. Furthermore, I hypothesize that structural embeddedness will more powerfully affect neighborhood identification than simpler network indices such as raw connectivity (e.g., the number of paths of various lengths connecting individuals that would influence them to identify similarly). I operationalize this “raw connectivity” in terms of three variables: a dummy variable indicating whether or not two residents were directly connected and two interval variables representing the number of two-step and three-step paths interconnecting them. This is a subtle distinction, but an important one. While related to the number of paths reaching an individual, an individual's structural embeddedness indexes mutual interconnectedness among a set of individuals. It is a community-level property emerging from individual interactions.

4.1. Strong and Unilateral Embeddedness

With reference to the “for you” relation, most neighborhood residents were strongly connected to each other but with relatively low strong embeddedness. The “for you” network produced four strong embeddedness layers but the median strong embeddedness was 1 (with a mean of 1.35). Thirty-seven residents were isolates with reference to strong connectivity.

Strong embeddedness related closely to neighborhood identification. The absolute difference in two residents' neighborhood identification proved to be a linear function of the deepest level of strong embeddedness that they shared (p < .001) with an adjusted r2 of .314. When I include the three control variables in the model (Table 5, Model 4), the resulting adjusted r2 of .324 was virtually identical to that of strong embeddedness alone, indicating that they yielded no additional explanatory power.30 Examining standardized coefficients indicated that strong embeddedness was clearly the most powerful explanatory factor.31 Testing these control variables separately (Models 1 through 3) displayed that they failed to account for the differences in neighborhood identification.

Table 5. 
Regression of Difference in Respondents' Neighborhood Identification Choices on Embeddedness Types and Number of Paths of Various Lengths
Model  1  2  3  4  5  6  7  8
  1. Note: Standardized coefficients in parentheses.

  2. *p < .05 **p < .001

Directly connected−.830*−.256−.464−.365−.327−.360−.333−.267
(−.129)(−.040)(−.072)(−.057)(−.051)(−.056)(−.060)(−.042)
Number of 2-step or fewer paths −.117−.259−.239−.194−.224−.244−.271
 (−.130)(−.083)(−.266)(−.216)(−.249)(−.023)(−.302)
Number of 3-step or fewer paths  −.243.029*.037*.032*.025.016
  (−.097)(.336)(.432)(.374)(.002)(.194)
Shared strong embeddedness   −1.113** −.840**  
   (−.574) (−.433)  
Shared unilateral embeddedness    −.903**−.270  
    (−.608)(−.182)  
Shared recursive embeddedness      −.750** 
      (−.355) 
Shared weak embeddedness       .023
       (−.036)
Constant1.524**1.553**1.661**2.171**2.057**2.171**1.779**1.617**
Adjusted R2.017*.026.039.324***.289***.329***.134***.029

With reference to the “for you” relation, most neighborhood residents are unilaterally connected to each other and with somewhat deeper embeddings. In this network, unilateral embeddedness was more nuanced than strong embeddedness. The “for you” network produced seven unilateral embeddedness layers with a median unilateral embeddedness of 2 (with a mean of 1.95).

As with strong embeddedness, unilateral embeddedness related closely to neighborhood identification. The absolute difference in two residents' rating of their identification with their neighborhood was linearly related to the deepest level of unilateral embeddedness that they shared (p < .001) with an adjusted r2 of .257. Again, including the three control variables (direct connection, number of two-step paths, and number of three-step paths) added little explanatory power and the standardized coefficients show unilateral embeddedness to be the most powerful explanatory variable (Model 5). Combining unilateral and strong embeddedness in the same model (Model 6), however, causes unilateral embeddedness to lose significance.

4.2. Ignoring Directionality

If we had ignored directionality, how closely would recursive or weak embeddedness have approximated the results discovered by strong and unilateral embeddedness?

For the neighborhood trust network, recursive connectivity identified four embeddedness layers with a median recursive embeddedness of 1 (and a mean of 1.15). As with strong embeddedness, 37 residents were isolates with reference to recursive connectivity. The absolute difference in two residents' rating of their identification with their neighborhood was linearly related to the deepest level of recursive embeddedness that they shared (p < .001), although the adjusted r2 of .126 was substantially weaker, but recursive embeddedness maintains significance in the presence of the control variables (Model 7).

Weak embeddedness, in contrast, indiscriminately identified all neighborhood members to be in a single component with no isolates. Weak connectivity identified eight embeddedness layers with a median recursive embeddedness of 5 (and a mean of 4.37). Weak connectivity identified no isolates. There was no apparent relationship between weak embeddedness and neighborhood identification, either by itself or in the presence of the control variables (Model 8).

Residents identified similarly to others to the extent to which they were structurally embedded with those individuals. This would have been missed, however, if we did not account for the directionality inherent in these relations. Both strong and unilateral embeddedness related to neighborhood identification. If we had ignored directionality, recursive embeddedness would have yielded substantially less explanatory power, and weak embeddedness would have discovered no relationship with neighborhood identification at all. More traditional network measures such as the number of paths interconnecting residents would also have failed to account for the differences in neighborhood identification.

Finally, it is interesting to consider the implications of the fact that, in this scenario, the semipaths measured by weak connectivity were not relevant to neighborhood identification. Thus, we can “reverse engineer” embeddedness and suggest that, to the extent to which they correspond to neighborhood identification, the “for you” influence relations are truly directional in nature.

5. CONCLUSION

Moody and White (2003) used the logic of the flow of social and cultural resources within a social network to argue quite effectively for their conceptualization of structural cohesion. Such cohesion proved to be a powerful explanatory aid to understand the data sets reviewed here as well. In the case of the PhD exchange networks, structural embeddedness powerfully related to graduate program ranking while all traditional network measures lost robustness (and usually significance) in the presence of one of its forms, strong embeddedness. In the neighborhood example, residents identified similarly with others to the extent to which they were structurally embedded with them. In contrast, the number of paths interconnecting residents failed to account for the differences in neighborhood identification.

Not all structural cohesion is equal, however. If the only form of embeddedness we had focused on was weak embeddedness, the default used by many researchers, we would have dismissed it. In both examples, it offered little or no explanatory power. For the sociology PhD exchange network, it only marginally correlated to rank and lost significance when traditional network measures were included in the model. For the neighborhood trust network, it proved to be completely unrelated to neighborhood identification. Residents' patterns of similar identification to others with whom they were structurally embedded would have been missed if we did not account for the directionality inherent in these relations. In fact, if we had used only the number of interconnecting paths and weak embeddedness, we might have dismissed network models entirely for the neighborhood data, and prematurely. In contrast, in both examples, strong embeddedness was the single most powerful explanatory variable; furthermore, the traditional network measures as well as the other forms of embeddedness lost significance when combined with it, suggesting that they may not capture independent explanatory dimensions.

Moody and White (2003) noted that the ability to extend social theory in formal network terms depends on our ability to unambiguously attribute social mechanisms to topological features. In these examples, if we had not respected the inherent directionality of our data, we would have reached substantially different conclusions.

Footnotes

  • 1

    Two paths from i to j are node-independent if they have only nodes i and j in common.

  • 2

    These complicated relations can be dissected into reciprocal and directed components or into multiple complementary, but distinct, directed relations (such as the flow of money in one direction and the flow of goods in the other). This can always be done theoretically and often empirically as well.

  • 3

    A focus on directionality is especially important when we consider the dynamic aspects of social life. While social structure is often conceived in terms of mutual, two-way relationships, the flow and exchange of social resources among social actors is necessarily directed.

  • 4

    There are only one difference between a graph and a digraph: in a digraph the direction of the lines is specified.

  • 5

    The only difference between an arc (in a digraph) and a line (in a graph) is that the orientation of the relation is specified—that is, an arc has direction whereas a line simply records the presence of a tie between two nodes.

  • 6

    Menger proved that a graph in which k is the minimum number of nodes whose removal would disconnect the graph also has at least k node-independent paths connecting every pair of nodes, and vice versa; see Harary (1969) for Menger's proof.

  • 7

    Of course, this is only for the ideal type of truly directed relations. Many, if not most, relations measured as directed will themselves be simplifications of more nuanced relations. For example, we can imagine a three-point Likert scale asking respondents to categorize a relationship as “mostly directed toward me,”“mostly directed away from me,” and “roughly reciprocal.” In such a case, a relation identified as primarily directed in one direction might still contain a percentage of reciprocation.

  • 8

    Note that if both exist, the pair is also strongly connected.

  • 9

    For example, in the first case, node 1 is unilaterally connected to nodes 2, 3, 4, and 5; node 2 is unilaterally connected to nodes 1, 3, 4, and 5; node 3 is unilaterally connected to nodes 1, 2, 4, and 5; node 4 is unilaterally connected to nodes 1, 2, 3, and 5; and node 5 is unilaterally connected to nodes 1, 2, 3, and 4. Furthermore, the paths that connect them use only those nodes as members.

  • 10

    I thank an anonymous reviewer for providing Figure 3.

  • 11

    Seventy-seven percent of the interprogram transfers that occur involve one and only one professor being sent from one graduate program to another.

  • 12

    I should note that while I have defined ties for the PhD exchange network in a way that accounts for their directed nature, it still fails to account for two additional limitations. First, each tie is created at a particular moment in time, and paths should be characterized in ways that respect this timing information (Moody 2002); specifically, if program i sends a PhD graduate to program j after program j has sent a graduate to program k, then there is no prospect of diffusion from program i to k through j. Second, if a graduate from program i spends time at program j before moving to program k, then flows from i to j and from j to k are also implicated but only the flow from i to k is recognized in the current construction. We can imagine addressing these issues with even more complicated models fully accounting for each academic's entire work history, at a year-by-year level, including how many person-years of influence have been given from one graduate program to the next. This would, of course, involve gathering more data, but that is not my intent. Instead, I offer a simple solution showing that more information can be gleaned from the same data set.

  • 13

    The literature on citation networks suggests similar intellectual integration of scientific disciplines (Crane and Small 1992; Newman 2001).

  • 14

    Granted, these rankings are not merely peer assessments, but they do reflect and are reflected by them.

  • 15

    The rankings themselves as well as a discussion of the method of research for this ranking appear in the Chronicle of Higher Education (Magner 1995). To correspond the network data to the date of the rankings, I used only those PhD exchanges occurring before 1995.

  • 16

    To identify k-components, we need to both identify maximal sets of nodes that share k node-independent paths and to verify that these paths involve only nodes in the set.We first need to calculate the number of node-independent paths between each pair of nodes, using network flow algorithms such as those by Ford-Fulkerson (1962) or Edmonds-Karp (1972). To use these algorithms to calculate node-connectivity rather than edge-connectivity, we “split” each node into two nodes connected by a single tie so that flow can pass through a node only once.To identify maximal sets of nodes that share a minimum of k node-independent paths between every member of the set, in the case of recursive, strong, or weak k-connectivity, we reorder the rows and columns of the connectivity matrix to create square blocks, centered on the matrix diagonal, all of whose members share a flow of at least k.To identify maximal unilaterally k-connected sets, we conduct a similar process, but, instead of identifying squares, we identify upper triangular matrices. The relevant nodes for the unilateral component will be those corresponding to either the rows or the columns of the upper triangular matrix because nodes that are exclusively sources will have a zero in the diagonal entry of their row while nodes that are exclusively sinks will have a zero in the diagonal entry of their column.To verify that the k paths use only nodes in the set, we then extract this subset of identified nodes as a potential k-component and rerun our network flow algorithm on it alone. Three distinct possibilities result: (1) all of the nodes in the set might still be connected by at least k node-independent paths involving only nodes in the set, in which case we have successfully identified a k-component; (2) none of the nodes in the set might still share a connectivity of at least k and thus do not form a k-component; and (3) some, but not all, of the nodes in the set might still share a connectivity of at least k. In this case, we extract this new subset of nodes as a new potential k-component and again rerun our network flow algorithm with the same three potential outcomes. We repeat this process until either all of the extracted nodes share a connectivity of at least k or until none do.Note that, at each stage of this process, several potential k-components might emerge, each of which needs to be tested independently.

  • 17

    Since there were shown to be three node-independent paths between any typical pair of graduate programs, by the definition of node independence, a program that was on one path between any typical pair would not be on the other two.

  • 18

    The out-degree of a node is the number of ties directed away from it; the in-degree of a node is the number of ties directed toward it.

  • 19

    Clustering, or transitivity, can be measured as the proportion of all two-step paths (i → j, j → k) that are also direct paths (i → k) (Watts 1999).

  • 20

    In its most common formulation, a node's out-closeness is the reciprocal of its out-farness, or the sum of the lengths of the shortest paths from it to every other node; its in-closeness is the reciprocal of its in-farness, or the sum of the lengths of the shortest paths from every other node to it (Freeman 1979). Because the digraph is disconnected and some distances are infinite, however, I took the reciprocals before, rather than after, the summation.

  • 21

    A node's betweenness is a function of the proportion of shortest paths linking all pairs of other nodes that pass through it (Freeman 1979).

  • 22

    Recursive connectivity would compare even more poorly to unilateral connectivity.

  • 23

    A similarly strong monotonic relationship exists. Those not in any recursive embeddedness level have an average rank of 66th and those only in the first layer have an average rank of 46th; the average rank is 33rd in the second layer, 19th in the third layer, and 9th in the fourth layer. These mean differences are highly significant (p << .001) and eta2= .599.

  • 24

    Again, a monotonic relationship exists. Those in the lowest two levels of weak embeddednesss have an average rank of 92nd to those in the 31st level having an average rank of 14th. Again, these mean differences are highly significant (p << .001) and eta2= .463.

  • 25

    The starting points included a middle school, a Catholic parish, several small Protestant congregations, two gang outreach groups, and a newsletter operated by and for gang members.

  • 26

    The advantage of such a method is that network studies have found that central {socially involved (Casciaro 1998), boundary spanning (Calloway, Morrissey, and Paulson 1993) or highly visible (Brewer and Yang 1994)} individuals are both more accurate in their reports and are recalled more by others. Furthermore, recall errors tend to bias in favor of more common and long-term, routine, and typical interaction (Freeman and Romney 1987; Freeman, Romney, and Freeman 1987)—that is, the more the individuals and a pair interact, the more they recall each other and agree about their interactions with the same third alters (Romney and Faust 1982). Therefore, respondents’ reports will be biased toward persons who are central to normative interaction patterns.

  • 27

    A commonly identified person being “for” someone else was a juvenile nominating his or her mother.

  • 28

    Note that the relationship comes toward those doing the reporting since they are being asked whom they believe is “for” them, not who they are “for.”

  • 29

    Of the 138 respondents, 22 identified as a 4 (sole identification with the gang), 23 identified as a 3, 56 identified as a 2 (equal identification with both, and 37 identified as a 0 (sole identification with the Catholic fellowship). No one identified as a 1.Note that during the data collection process the orientation of the Likert scale was randomly rotated (e.g., whether the gang or the Catholic fellowship were 4 or 0). I use only one orientation for simplicity of presentation.

  • 30

    The calculation of degrees of freedom bears noting. While there are 18,906 (138 × 137) potential directed relations and 9453 (half this number) differences in ratings, the latter at least are certainly not all independent. When calculating differences, there are only 137 uniquely independent differences. For example, if there are four persons—A, B, C, and D—and we know the difference between A and each of the others, we can immediately derive all of the differences between them. For example, if B was 2 more than A and C was 1 less than A and D was 2 less than A, then, of necessity, B is 3 more than C and 4 more than D and C is 1 more than D. Once any individual's differences with all others are known, so are the differences among them.There are also fewer degrees of freedom in some of the independent variables than might first appear. If A, B, and C are members of a component, then by knowing that A and B share component membership and that A and C share component membership, I know that B and C do so as well. The converse, however, is not true. Knowing that neither A and B nor A and C do not share component membership does not tell me anything about B and C. The limiting case is when all are members of a single component, in which case, as soon as I know that one person shares component membership with all others, I know that all others share this membership with each other. Thus, the limiting case here is 137 degrees of freedom as well (in the case of any universal component, more in the case of a more limited component).

  • 31

    Note that the coefficient for strong embeddedness is negative, which indicates that increasing embeddedness corresponds to decreasing differences in how a person identifies (as we hypothesized it would).

Ancillary