This paper displays a preliminary result of studying disciplinarity and interdisciplinarity based on a new source of data: doctoral dissertations. By looking into the doctoral dissertations across more than a century, this paper aims to report the overall development of disciplines and the interaction among disciplines. Preliminary results demonstrate significant increases in the number of disciplines across the 20th century and the level of interaction among disciplines. This poster focuses on the 30 highest producing disciplines and examines the dependency of these disciplines on other disciplines (where dependency is measured by the proportion of dissertations within that discipline that are also labeled with another discipline). The data reveals that science disciplines (e.g., mathematics, chemistry, and physics) bare fewer interdisciplinary features than social science and humanities disciplines and that many contemporary social science and humanities disciplines (e.g., black studies and women's studies) are highly dependent. The implications of these preliminary findings and the areas of further investigation using this dataset are proposed.
Journal articles have provided the primary data source for scientometricians and other scholars of science to describe the scientific landscape. However, these analyses and resulting visualizations shared similar problems with early cartographic representations of the earth. As explorers gained greater information about certain areas, these were represented with high levels of accuracy and granularity. Unexplored areas were often vague and spatially underrepresented. Current knowledge maps suffer from the same, reflecting our partial understanding of the scholarly enterprise. For example, Boyack's canonical “Map of Science”11 has high spatial representation and levels of granularity for the natural and medical sciences but the social sciences and humanities are small and relatively undifferentiated. Hence, these areas seem less important, less productive, and less connected to the other areas of knowledge. However, the biases of the data source in this and the majority of knowledge maps often go unacknowledged. When working with Web of Knowledge data, disciplines which produce high rates of journal articles become grossly overrepresented and disciplines with different types of output are omitted or marginalized. Therefore, a new source that would provide comprehensive and commensurate data for science, social science, and humanities disciplines is needed.
The doctoral dissertation is a new data source that can enhance our current understanding of the landscape of science for several reasons. Firstly, all research disciplines produce dissertations. Therefore, this genre does not favor certain disciplines over others. Secondly, each individual produces only a single dissertation in each discipline. Therefore, dissertations are not skewed in the direction of subdomains or authors who might be inordinately prolific. Lastly, dissertations are heralded as a student's original and independent contribution to the research landscape (Isaac, Quinlan, & Walker, 1992). We should, therefore, expect dissertations to provide indicators of innovation and novelty for a discipline (Machlup, 1982).
Despite of richness of dissertation data, it is largely neglected in science studies, likely due to the difficulty in obtaining large-scale data. While those with institutional subscriptions to Web of Knowledge can download data in moderately-sized units, ProQuest does not offer such a feature. Therefore, most large-scale academic genealogy projects to date have relied on manual and crowdsourced data gathering (e.g., MPACT22 and the Mathematics Genealogy project33 ). Due to the laboriousness of the data gathering, resulting analyses typically focused on a single discipline (e.g., Sugimoto, Ni, Russell, & Bychowski, 2011; Malmgren, Ottino, & Amaral; 2010).
This paper is a preliminary result of studying the disciplinary and interdisciplinary based on doctoral dissertations across all disciplines. It should be noted that, in this paper, each subject category provided by ProQuest dissertation database44 is treated as a proxy for a discipline. The goal of this paper is to provide a macro-level analysis of the growth of disciplines and the interaction of disciplines using dissertations as the primary data source.
This project relies on the data provided by the ProQuest dissertation database. The data set (referred to in this manuscript as PQuest) was provided from ProQuest and covers about 2.3 million dissertations from 1490 research institutions across 66 countries from 1848 to 2009. Only those dissertations that are submitted for the degree of research doctorate (n=1,850,855) are investigated in this paper. The major attributes of each dissertation in PQuest data set are listed as in Figure 1: each dissertation has its document attributes (i.e., title, abstract, keywords), institution attributes (i.e., school name, address), degree attributes (i.e., degree year, degree type), mentorship attributes (i.e., mentor names) and subject category attributes (i.e., subject categories). Due to space, this paper will focus on the subject categories and leave the relationship with other variables for future papers.
It should be noted that some dissertations in the dataset are assigned with more than one subject category (Figure 1). As an example, Table 1 lists the dissertation with ID=15 in PQuest database, and the three subject categories with which it is assigned. Each subject category listed in Table 1 has two levels: the part preceding the comma is the first level, and the items following the comma are considered second level categories. Therefore, the example dissertation belongs to two first level subject categories, i.e. psychology and health science, but three second level subject categories.
Table 1. Example dissertation w/ multiple subject categories
Health Sciences, Education
For this paper, a dissertation with more than one second level subject category is called a multi-subject dissertation, and a dissertation with only one second level subject category assigned is called single-subject dissertation.
Overall Trends of Interdisciplinarity
The dissertations in the PQuest dataset were first examined by the number of subject categories they are assigned with. Figure 2 displays the frequency distribution of dissertations by the number of second-subject categories they have been assigned with; the inset provides the cumulative percentage.
As shown, approximately 60% of the dissertations investigated are considered single-subject. Of the multi-subject dissertations, about 90% of them are assigned with two or three subject categories. Only a few dissertations were assigned with more than five second level subject categories, and one dissertation was maximally assigned with 13 second level subject categories.
However, these proportions of single- to multi-subject dissertations have not been stable over time. As shown in Figure 3, single-subject dissertations were the norm until the 1980s. In the 2000s, 63% are multi-subject dissertations. However, this should be interpreted lightly, as changes in indexing may be partially responsible for this growth in multi-subject dissertations.
The growth in multi-subject dissertations may also reflect the growing number of possible subject categories (Figure 4). Subject categories have increased from 23 first level and 44 second level categories before 1900 to 163 first level subject and 427 second level categories in the 2000s.
Dependency and independency of disciplines
Subsequent analyses were then conducted to examine the levels of multi-disciplinarity within individual disciplines. For this analysis, the first level category is considered as a proxy for the discipline; however, the operationalizations of single- and multi-subject dissertations remain.
Education is the largest discipline in the dataset (Figure 5); Baltic Studies is the smallest. Philosophy is the discipline that has the longest longevity (see inset in Figure 5). Similar to the previous visualizations, the blue bar shows the single-subject dissertations and the orange shows the multi-subject ones. As shown, some disciplines have equal numbers of single-subject and multi-subject dissertations (e.g. education [50.2%, 49.5%], computer science [51.2%. 48.8%) and agriculture (49.3%, 50.7%). For some disciplines (e.g., women's studies, sociology, energy and environmental sciences), the majority of disciplines are multi-subject, perhaps indicating a higher degree of interdisciplinarity. In contrast, many disciplines are characterized by predominately single-subject dissertations (e.g., mathematics, chemistry, and physics), which may demonstrate lower levels of interdisciplinary traits in these disciplines.
Thirty subject categories only appeared in the presence of other subject categories. We call these completely dependent disciplines and provide a listing in Table 2. As shown, most of these subject categories contain relatively few dissertations: 22 have less than 100 dissertations with this subject category. No disciplines are completely independent, although horticulture, mathematics, chemistry, musical performances, and physics top the list as the most “independent.”
Table 2. Completely dependent subjects and dissertations
Latin American Studies
Pacific Rim Studies
Middle Eastern Studies
North African Studies
Sub Saharan Africa Studies
South Asian Studies
Alternative Dispute Resolution
Area Planning and Development
Near Eastern Studies
French Canadian Culture
Cultural Resources Management
DISCUSSION, LIMITATIONS & FUTURE WORK
This paper provides a preliminary overview of subject categories in dissertations. Using nearly two million dissertations across more than a century, we provide a unique overview of the scholarly landscape. This data demonstrates the growth in dissertation, the rising number of single and second level subject categories, the growing trend towards interdisciplinarity and the variations in levels of dependency. The size of discipline varies: education turns out to be the largest discipline in terms of number of doctorates granted, which probably to some extend proves that dissertation data might provide richer information than journal articles in science studies: if we utilize the Web of Knowledge database, education will definitely not be the one with the largest number of publications.
One dominant limitation arose from this research. By operationalizing multi-disciplinarity as the instance of a dissertation with more than one second-level category, we were unable to distinguish between a dissertation that contained many second-level categories under a single first-level category and those that contained multiple first level categories. It might be the case that the former represents growing maturation and specialization within a discipline while the latter is a more accurate barometer on interdisciplinarity. Therefore, future research will analyze at both the second and first level categories, for a more nuanced understanding of disciplinarity and interdisciplinarity.
Additionally, this work did not examine the clustering of disciplines by co-assigned subject categories. Future work should be done to create knowledge maps from this data that can be compared and evaluated against current knowledge maps based on journal articles and conference proceedings.
Figure 5 provided some evidence of the presence and longevity of disciplines in the dataset. However, more research is needed to evaluate the lifecycles of disciplines in terms of interdisciplinarity, topical diffusion, and dependency.
As noted in Figure 1, each dissertation contains many metadata elements that could be examined in subsequent research. For example, advisor name information could be used to examine the birth of disciplines and the transfer of disciplinary knowledge through individuals. Additional analyses of knowledge diffusion could be done on the level of the institution or geographic region. Combining this with topic analyses of the titles and abstracts of dissertations would provide a rich understanding of knowledge growth and transfer.
These analyses set the platform for understanding this novel dataset. Future analyses would do well to build upon this platform by combining and analyzing heterogeneous datasets. For example, merging the ProQuest dataset with journal article metadata from Web of Knowledge could provide a fruitful source of information about mentorship and mobility in the scientific workforce.
This manuscript is based upon work supported by the Faculty Research Support Program from Indiana University and the international funding initiative Digging into Data. Specifically, the Digging into Data funding comes from the National Science Foundation in the United States (Grant No. 1208804), JISC in the United Kingdom, and the Social Sciences and Humanities Research Council of Canada. The authors would also like to thank ProQuest for making this data available for research.