Assaying the importance of system complexity for the systems engineering community

How should organizations approach the evaluation of system complexity at the early stages of system design in order to inform decision making? Since system complexity can be understood and approached in several different ways, such evaluation is challenging. In this study, we define the term “system complexity factors” to refer to a range of different aspects of system complexity that may contribute differentially to systems engineering outcomes. Views on the absolute and relative importance of these factors for early–life cycle system evaluation are collected and analyzed using a qualitative questionnaire of International Council on Systems Engineers (INCOSE) members (n = 55). We identified and described the following trends in the data: there is little between‐participant agreement on the relative importance of system complexity factors, even for participants with a shared background and role; participants tend to be internally consistent in their ratings of the relative importance of system complexity factors. Given the lack of alignment on the relative importance of system complexity factors, we argue that successful evaluation of system complexity can be better ensured by explicit determination and discussion of the (possibly implicit) perspective(s) on system complexity that are being taken.

research is awarded to the Thales Bristol Hybrid Partnership in Autonomous Systems (Grant EP/R004757/1) to address the challenge of hyrbid autonomous systems engineering. Growth in autonomous systems, smart cities and systems-of-systems deployments foreground the challenges of increasing system complexity, where a large number of diverse and interdependent components, subsystems, and systems interact via nonlinear relationships resulting in emergent behavior and properties that can be difficult to predict and understand, such as the resilience of these complex systems. [11][12][13] While the complexity of a system is an important characteristic for organizations trying to realize systems, and the term "system complexity" is used frequently, the reality is that this is a contested term, subsuming a myriad of constituent definitions, perspectives, and emphases. The motivation for the study presented here is to explore the extent of the apparent tension between these multiple perspectives on what the term "system complexity" means to the systems engineering community. To do so, we collect judgments of the overall importance of a number of system complexity factors, and also pair-wise comparisons between them, from members of the systems engineering community.
In this study, we define the term "system complexity factors" to refer to a range of different aspects of system complexity that may contribute differentially to systems engineering outcomes (e.g., structural complexity, functional complexity, development complexity). The present study explored systems engineers' views on the absolute and relative importance of these different contributing factors to system complexity for early-life cycle system evaluation. The study data were collected using an online questionnaire of 55 members of the International Council on Systems Engineers (INCOSE) conducted over a four-month period (March to June 2019). Participants were asked to rate the importance of six candidate factors contributing to system complexity (system complexity factors) on a Likert-scale and also via a series of pairwise comparisons. Participants were also asked to rate their prior experience evaluating the same system complexity factors. The participants' experiences of conducting complexity evaluation were also measured and the influence of this experience on system complexity factor importance judgments was assessed.
If the community is using a set of terms relating to "system complexity" in a mature and coherent manner, we would expect the following features to occur in response to our survey. While it might be that some system complexity factors are more important than others, and that some are of roughly equal importance, we would expect to see, at the individual and sample population levels, evidence of coherent mental models of system complexity factor importance. That is, we would expect most respondents to be transitive in their judgments, and for consistency between relative and absolute judgments of importance. Although there may be some inconsistencies in responses, we would expect these to occur for terms that are judged to be of similar importance or of low overall importance. We might also expect to find more experienced practitioners to have more consistent judgments than less experienced practitioners, or for practitioners with similar backgrounds and experiences to share similar judgments.
The purpose of this paper is to collate judgments on the relative and absolute importance of terms relating to "system complexity" in order to explore the maturity of the community's lexicon. To enable this, we collect judgments on system complexity factor importance in an absolute sense by asking for judgments on an ordinal scale and in a relative sense by asking for judgments on pair-wise comparisons. The results are presented as trends in the data when system complexity factors are evaluated on an ordinal scale, trends in the data when system complexity factors are evaluated in a pair-wise comparison, and reflections on free-text answers.
This paper is structured as follows; first, a literature review contextualizes the identified system complexity factors used in the qualitative questionnaire, then the design of the questionnaire is described before discussing the trends in the data. Finally, potential rationales for the results are offered and implications for organizations hoping to better evaluate system complexity are discussed.

LITERATURE REVIEW
A significant challenge for those wishing to evaluate system complexity, and one that persists despite considerable research effort, is finding a single, agreed definition of the term "system complexity" itself. 3,6,14,15 Even determining a distinction between a complex system and a complicated system is not unanimously agreed by the community, where some argue the distinction is between how ordered a system is and therefore how predictable a system is due to the presence (or absence) of nonlinearities and changes within the system, which give rise to emergent behavior [16][17][18] , others emphasize the distinction in terms of how difficult a system is to understand or successfully realize, stressing that complexity is observer dependent. 19 Some researchers argue that engineering efforts should be concerned, primarily, with dynamic complexity [20][21][22] , while others emphasize sociopolitical complexity 22,23 or structural complexity [3][4][5]22,[24][25][26][27][28] (see also descriptive complexity 29 ), and others have provided extensive reviews of different definitions further highlighting the diverse conceptual landscape. 30,31 These ideas, and that of Sillitto 22  are several different types of system complexity identified by literature; Fischi, Nilchiani, and Wade draw a distinction between complexity from the perspectives of "the system being observed," "the capabilities of the observer," and "the behavior the observer is attempting to predict" 21 while Simpson and Simpson categorize the complexity of an engineered system as one of the four following types; "cognitive complexity," "behavioral complexity," "organic complexity," and "computational complexity." 33,34 Moreover, what counts as a reasonable approach to defining system complexity depends on what type of system of interest (SoI) is being considered; is it limited to the technical system(s) being developed and deployed, or does it also include the systems of processes and resources that are involved in developing and deploying such technical systems? 35 Is the project that strives to realize the system under consideration? [36][37][38][39] Does it include the processes of utilizing the system once deployed or the user's perceptions of how complex the system is (e.g., how familiar users of the system are with important features of the system)? 40,41 What is the boundary of the SoI; is it the physical context of the implemented system or does it also include the more extended strategic/business context? [42][43][44][45][46] While several approaches purport to provide a quantitative measure of the complexity of a system, they more realistically provide a quantitative measure of the complexity of a particular representation of a system (i.e., a particular view on the architecture of a system). 25,26,[47][48][49][50] A distinction is also required between the complexity of a representation of a system (i.e., the structural complexity of a system architecture) and the qualitative perceived and observer-dependent complexity of the system. 25,26,40 As a consequence, the development of unambiguous and reliable measures of system complexity is a considerable challenge.
While several criteria linked to system complexity, such as "requirement difficulty," "cognitive fog," and "stable stakeholder relationships," have been found to have statistically significant correlations with sys- We use the term "Technical Novelty" to represent the number of similar systems the organization has already developed in the same deployment domain, the amount of reuse in the system, the number of high added-value elements, and the level of innovation required to deliver the system. This notion of system complexity has been given multiple different terms: "the difficulty of creation," 10 the "implementation context and system context," 43 "socio-political complexity," 2-4,23,32,61 and "system engineering effort and criticality." 59 A commonly used term is "Structural Complexity," 2-4,22,24,32 which we define as the number, diversity, distribution, connectivity, and constraints on constituent components, subsystems, systems, and operational nodes. The term is also related to what has been termed the "implementation context," 43 and the "system engineering effort and criticality," 59 We define "Functional Complexity" as the number, behavior, interdependencies, and synchronization of functions and functional chains, including data types, processing, and memory constraints and algorithms. 58,60 This notion of system complexity can be considered as related to the difficulty conducting functional analysis and allocation 13,62,63 .
We use the term "Behavioural Complexity" to mean the ability to define and predict system modes, functions, states, behavior, performance, and missions, including degree of autonomy and the impact of the environment. This is akin to what has been termed "dynamic complexity," 3,4,20,21,23,32,64 relating to both the "strategic" and "system context," 43 and has also been termed "operational concept stability" and "system behaviour stability." [58][59][60] "Development Complexity" is the amount, and availability, of resources required to develop the system throughout its life cycle, including the interlacing of programs, the degree of challenge of requirements, and the maturity of technology, regulations, standards, processes, and methodologies. Again, this notion of system complexity has been used under various labels: "the difficulty of creation," 10 "socio-political complexity," 2-4,23,32,61 It relates to the "system context" and "implementation context," 43 and has also been termed "development process complexity," "operational complexity," and the "impact of environment on the solution." 59 Finally, "Organisational Complexity" is the number, diversity, level of support, and involvement of internal and external system stakeholders. This notion of system complexity is akin to the term "sociopolitical complexity," 2-4,23,32,61 "life-cycle interlacing," "user diversity," and "engineering organisation" 59 and relates to the "stakeholder context." 43

METHODOLOGY
The questionnaire contained seven sections: (1) consent to partici- We use Fleiss' to measure the degree to which respondents agree on rankings of system complexity factor importance, taking the impact of chance agreements into consideration (see Appendix A).

RESULTS
The results of the data analysis on the questionnaire responses are presented in the following manner. First, we describe the make up of the sampled population, before reporting the overall between-participant agreement on the relative importance of system complexity factors.
Then, the results are grouped under subheadings in response to specific research questions: How important are different system complexity factors? How are system complexity factors related? Are there distinct views within the participant population? How does experience evaluating a system complexity factor relate to its perceived importance? Are the system complexity factors explored here exhaustive?
Are participants internally consistent in their ratings of system complexity factor importance? Are ratings of system complexity factor importance consistent between question types?
The make up of the sampled population in terms of their experience, role, employment sector, and employment location is shown in ing context, only some (16%) were relatively inexperienced. The roles that best describe the respondents were "Systems Engineer" (44% of respondents) and "Systems Architect" (24% of respondents). A range of employment sectors were represented by the sample population, the most frequent sector being "Defence and Space" (38%). The respondents were predominately employed within Europe (76%). When asked how much experience respondents have conducting system complexity evaluation, from options of "Not Sure," "None," "Not A Lot," "Some," "Quite A Lot," and "Lots," the modal responses was "Some" experience. When asked if the organization they are affiliated with conducts system complexity evaluation, from options of "Not Sure," "Never," "Rarely," "Sometimes," "Very Often," and "Always," the modal response was "Sometimes." Between-participant agreement on the relative importance of system complexity factors is low; when asked to rate the importance of the six complexity factors on an ordinal (Likert) scale ("Extremely Important," "Moderately Important," "Somewhat Important," "Slightly Important," "Not At All Important"), = 0.021 (Z = 3.158, p-value = 0.002).
It could be argued that the lack of between-participant agreement on the relative importance of system complexity factors is due to the different backgrounds and experiences participants have had with system development projects. We consider subpopulations based on their responses to self-reported background questions and recalculate Fleiss' .
Respondents who reported they had over 20 years of experience working in a systems engineering context had a different ranking of system complexity factor importance, compared with the overall population shown in Table 1 TA B L E 1 Mean rank of experience and importance rating for the six complexity factors for the population (n = 55), including mean rank of importance rating for those who self-report as highly experienced working in a systems engineering context (> 20 years experience, n = 24 ), and for those who self-report as "Systems Architects" (n = 13) Note: Respondents tend to view technical novelty as the least important aspect when evaluating a novel system to be engineered and organizational complexity as the most important. Experience ratings ranked as 0 = "Not At All Experienced", 1 = "Slightly Experienced", 2 = "Somewhat Experienced", 1 = "Moderately Experienced", 4 = "Extremely Experienced". Complexity factor importance ratings ranked as 0 ="Not At All Important", 1 = "Slightly Important", 2 = "Somewhat Important", 1 = "Moderately Important", 4 = "Extremely Important".

F I G U R E 3
Frequency of importance ratings for each complexity factor (n = 55) when respondents were asked to rate the importance of the six complexity factors, shown for each complexity factor. No factors were rated as "Not At All Important" to be the best role descriptor of their work, = 0.008, Z = 0.299 p-value = 0.764, indicating a lack of between-participant agreement with others of the same experience level.

Importance of different system complexity factors
When respondents were asked to rate the importance of the six system complexity factors on an ordinal (Likert) scale ("Extremely Important," "Moderately Important," "Somewhat Important," "Slightly Important," "Not At All Important"), "Organisational Complexity," "Behavioural Complexity," "Development Complexity," and "Functional Complexity" appear to be particularly important to the community as a whole, shown in Figure 3 and Table 2, with modal ratings of "Extremely Important" for each factor. "Structural Complexity" was not considered to be as important, with a modal rating of "Moderately Important." "Technical Novelty" appeared to be the least important, with modal rating of "Somewhat Important." Respondents who reported "Systems Architect" to be the best role descriptor of their work rated "Structural Com-plexity" third more important whereas the whole population rated it fifth most important.
We test to see if the responses are essentially random by conducting a 2 test on the distribution of responses for each system complexity factor. The results are shown in Table 2 and Table 3. The 2 test implies that the null hypothesis that the results are random can be rejected with high confidence, particularly for "Organisational Complexity," "Behavioural Complexity," and "Functional Complexity."

Relationships between system complexity factors
Next, we examine correlations (Spearman's rank order correlations, ) between the scoring of system complexity factor importance. The results are shown in Table 4 and Figure 4. Note that we are considering 15 correlations here and hence will only be interested in correlation coefficients that are significant at the p < 0.01 level or better. Interestingly, "Technical Novelty," with a low median rank and mode, has a significant positive correlation with "Functional Complexity." It makes TA B L E 2 Frequency of importance ratings for each complexity factor (and shown as a percentage of respondents)

F I G U R E 4
Spearman's rank order correlation coefficients ( (55)) between complexity factors for the sample population, * corresponds to p < 0.05, ** corresponds to p < 0.01 and *** corresponds to p < 0.001. All correlations shown are positive TA B L E 3 Frequency of ratings with corresponding 2 test statistic (d.f. = 1) and p-value against an equal distribution of ratings for each Complexity Factor, with "Slightly Important" and "Somewhat Important" collapsed into the term "Average Importance" and "Moderately Important" and "Extremely Importance" collapsed into the term "Particular Importance"; Again, *** denotes p < 0.001,** denotes p < 0.01, and * denotes p < 0.05, otherwise not significant Complexity factor "Average" ("Slightly" + "Somewhat") "Particular" ("Moderately" + "Extremely") 2 Organizational complexity 13

Distinct views within the participant population
In this section, we examine whether there are distinct clusters of the population that share the same judgments on system complexity factor importance. The data were analyzed using a hierarchical TA B L E 4 Table of Spearman's rank order correlation coefficients ( (55)) between complexity factors when respondents were asked to rate their importance on a Likert-scale (d.f. = 53); *p < 0.05, **p < 0.01, and ***p < 0.001, otherwise values are not significant . Cluster A has of 0.044, and for a 2 test of distribution of importance ratings across two categories ("average importance" and "particular importance") all complexity factors be considered of "particular importance" apart from "Technical Novelty" (statistically insignificant test result). Cluster B has of 0.002, and for a 2 test of distribution of importance ratings across two categories ("average importance" or "particular importance") all of the complexity factors could be considered to be of "average importance" (p < 0.05) apart from "Organisational Complexity" and "Behavioural Complexity" (statistically insignificant test results) agglomerative clustering algorithm where importance ratings were converted to integers. A number of metrics could be used to examine "distance" between different respondents' views of system complexity factor importance. Here, we use the simplest approach (Manhattan distance). Other approaches could account for individuals who agree on which factor is most important, but disagree on which factor is least important, or those in the "middle of the pack." However, here we focus on the simplest method. From the cluster analysis, flat clusters were determined which grouped the respondents into one of four groups.
The resulting clusters (A − D) represent 80%, 15%, 4%, and 2% of the sample, respectively. The distribution of ratings within these clusters is shown in Figure 5 and in Table 5. test of distribution of importance ratings across two categories ("average importance" or "particular importance"), all of the system complexity factors can be considered to be of "average importance" (p < 0.05).
Based on this analysis, it seems there are distinct clusters within the sample population with two competing views: a large proportion of the TA B L E 5 Mean rank of experience and importance rating for the six complexity factors for the population (n = 55), cluster A (n = 44) and Cluster B (n = 8) Note: Cluster A and B have a similar ranking with the overall population for experience evaluating each complexity factor. Cluster A has the ranking of the top two important complexity factors reversed, with "Behavioural Complexity" rated as the most important and "Organisational Complexity" as the second most important. Conversely, Cluster B has different importance rankings for all the complexity factors apart from the top two most important. Experience ratings ranked as 0 = "Not At All Experienced", 1 = "Slightly Experienced", 2 = "Somewhat Experienced", 1 = "Moderately Experienced", 4 = "Extremely Experienced". Complexity factor importance ratings ranked as 0 = "Not At All Important", 1 = "Slightly Important", 2 = "Somewhat Important", 1 = "Moderately Important", 4 = "Extremely Important".

F I G U R E 6
Responses to Likert-type rating of respondent's experience evaluating complexity factors with responses shown for each complexity factor sample population who do not agree on the relative importance of these factors but agree on the absolute importance of all factors apart from "Technical Novelty," contrasted with a smaller cluster who suggest a lack of absolute importance of all of the factors but also do not agree on the relative importance of the factors.

Relationship between experience evaluating a system complexity factor and its perceived importance
We asked "To what extent do you have experience evaluating the following aspects?" with options of "Not At All Experienced," "Slightly Experienced," "Somewhat Experienced," "Moderately Experienced," or "Extremely Experienced" for each of the six complexity factors. The results are shown in Figure 6.
We collate the ratings of system complexity factor importance and examine the Spearman's rank order correlation coefficient ( ) with collated ratings of experience evaluating that factor. Overall, (330) = 0.334 (d.f. = 328,p< 0.001, = 0.01), demonstrating that generally respondents rate system complexity factors that they have experience evaluating as more important than those that they have less experience evaluating, shown in Table 6. We also examine correlations between the ratings of experience evaluating each system complexity factor, finding that generally experience evaluating one system complexity TA B L E 6 Note: Leading diagonal shows Spearman's rank order correlation coefficients ( (330)) between complexity factor importance and experience evaluating that complexity factor; *p < 0.05, **p < 0.01, and ***p < 0.001, otherwise values are not significant. When the ratings of complexity factor importance and experience evaluating that factor are collated into two variables, the overall correlation is = 0.334 (d.f. = 328, p < 0.001, = 0.01) demonstrating that generally respondents rate complexity factors that they have experience evaluating as more important than those that they have less experience evaluating. Off-diagonals show Spearman's rank order correlation coefficients ( (55)) between complexity factors when respondents were asked to rate their level of experience evaluating each factor on a Likert-scale (d.f. = 53); *p < 0.05, **p < 0.01, and ***p < 0.001, otherwise values are not significant.
factor seems correlated to experience evaluating every other, apart from "Technical Novelty."

Relevance of the system complexity factors used in the questionnaire
We check the relevance of the six system complexity factors by examining if the community thinks these six factors are all unimportant. None of the respondents gave any of the six system complexity factors an importance rating of "Not At All Important." Although this does not mean that the six system complexity factors chosen in this survey are exhaustive, they are at least relevant.
We asked each respondent "What other aspects are important to you when evaluating system complexity?" and received a mixture of free-text responses, suggesting that there is a wide range of contextually relevant aspects that are important when evaluating system complexity. The most frequent emerging themes include system interfaces and dependencies (nine responses), nonfunctional requirements including safety and security (eight responses), and client/customer/user complexity (e.g., their understanding of the system, novelty of the system to them, willingness to accept change) (seven responses). We also find further evidence supporting the relevance of the six terms used in the survey as seven respondents answered that these factors were sufficient. When we asked "When evaluating system complexity, what other aspects do you have experience with?" a range of answers were received, suggesting a wide range of aspects that are currently evaluated. The usefulness of these aspects was not reported however. The most frequent answers included nonfunctional requirements (including safety, security, and regulatory compliance requirements) and the "ilities" (e.g., flexibility, adaptability) (seven respondents), financial and commercial complexity (six respondents), and stakeholder complexity (diversity, expectations) (three respondents), while seven respondents answered with no other aspects.
We also asked each respondent to "Please describe your experience of complexity evaluation (for example; the extent to which this type of activity has been a part of your job, the purpose of any complexity evaluation that you have been involved in, how successful or otherwise you felt complexity evaluation was, the challenges you faced, etc.)." While the most common response (23 respondents) was to provide no answer, the second most frequent answer (seven respondents) related system complexity evaluation to risk evaluation (technical, project/program). Answers relating complexity evaluation to risk suggested complexity evaluation are "performed to understand the program risks," "to identify where we carry our biggest risks," "highlights the complexity and its associated risks to leadership," while another respondent answered "what seems to me to be important is to understand what the complexities are -so identification rather than evaluation -and then what the risks /potential impacts associated with those complexities are -and then take management action. . . " Four respondents answered that a subjective evaluation has been done, where "nothing formal" was done, with a respondent answering that "Evaluation has been on a 'gut' basis; I haven't used any systematic approach," and another respondent answered that complexity evaluation "is often completed at a high level view and too often based on the experience of F I G U R E 7 Distribution of the number of non-transitive triples (N = 55) for the survey respondents (filled bars), where 58% of respondents gave at least one non-transitive response, compared with the distribution of the number of non-transitive triples from a null model (n = 10,000,000) (open bars). A two sample Kolmogorov-Smirnov test was used to test whether the null model and the survey results came from the same distribution. Here, D = 0.502, p < 0.001 at = 0.05 giving confidence the two samples are not the same distribution the assessor making it a subjective process rather than objective. Complexity is in the eye of the beholder." These answers provide evidence that some systems engineering practitioners are already using the term "system complexity" as a proxy for system risk (whether technical or programme) but that some lack formal approaches.

Participant internal consistency in ratings of system complexity factor importance
An alternative to rating each system complexity factor's importance on a Likert-scale is to elicit pair-wise comparisons between system complexity factors. Pair-wise comparisons may be nontransitive, where a respondent rates system complexity factor A as equally or more important than B, and rates B to be equally or more important than C, but rates C to be equally important or more important than A. By counting the number of nontransitive triples in the pair-wise comparisons, we have an approach to characterize how inconsistent respondents were in their answers. We use the procedure from Ref. 67 to count the nontransitive triples. We compare the results of a single randomly selected simulation run (55 sets of responses) with the survey responses, Figure 8, which shows that while there are some triples that are more likely to be answered in a nontransitive way by the respondents, this distribution could be explained by chance selection.
Other than a random distribution of nontransitive triples, we could imagine the most frequent nontransitive triples to be those that are rated to be of similar importance. Surprisingly, the most frequent nontransitive triples include both system complexity factors that are on average particularly important but also those that are not considered to be as important (e.g.,"Structural Complexity" contrasted with "Technical Novelty"). We count the number of nontransitive responses that occur for each possible pair of system complexity factors and examine the correlation between this count, and the difference in mean rating for the same pair of system complexity factors, finding no significant correlation suggesting the nontransitive responses are not systematic; that they are not entirely explained by similarity between judgments of system complexity factors.

4.7
Consistency between ratings of system complexity factor importance between question types Judgments of system complexity factor importance should not change depending on whether the sample population were asked to rate their

F I G U R E 9
The proportion of participants that rated each factor as more or equally important than each of the others. The majority proportion is reported for each relationship. An arrow from complexity factor A to complexity factor B means that more participants judged A to be more important than B than vice versa, with the percentage value showing what proportion of participants judged A to be more important than B. Solid arrowheads are consistent with the overall population judgments. Open arrowheads represent judgments that are intransitive with respect to the overall population judgments. An equals symbol represents majority judgments of equal importance importance on a Likert-scale or in a pair-wise manner. The proportion of participants that rated each system complexity factor as more important, or equally important, than each of the others is shown in Figure 9 where the highest frequency of responses is shown for each relationship. Interestingly, the population is generally consistent over longer "distances" between system complexity factors in terms of their importance, and inconsistencies are generally found for the highly Similarly, Figure 10 shows the discrepancy between the two question types, where the order of system complexity factor importance is different depending on whether the sample population is providing judgments on a Likert-scale or in a pair-wise comparison. The sample population had agreement between the two question types in rating "Organisational Complexity" and "Behavioural Complexity" as being particular important. We compared the responses between the two question types by aggregating the Likert-scale responses (by taking the normalized mean rank) and aggregating the pair-wise comparison responses (by using the procedure from the Analytic Hierarchy Process [AHP] to calculate the aggregate of individual judgment 68 ), normalizing the results between 0 and 1. For further details, see Appendix B.

DISCUSSION
Before discussing the implications of the results and limitations of the research, we briefly summarize the main results found during the analysis of survey responses. All six of the terms used here relating to "system complexity" are relevant, but they are not an exhaustive list. The sample population participants identify the absolute and significant importance of "Organisational Complexity," "Behavioural Complexity," and "Structural Complexity" but do not exhibit significant agreement as to the relative importance of these factors. We found two competing views within the sample population, a large majority who judge that all six system complexity factors are absolutely important, but did not agree on their relative importance, and a small minority who judge that all six system complexity factors are of average importance, again not agreeing on the relative importance of the factors.
Considering more homogeneous subpopulations did not increase the amount of agreement on the relative importance of system complexity factors. Several system complexity factors are considered to be related to each other by the sample population, for example, "Organisational Complexity" and "Developmental Complexity" are both terms which relate to the system which develops the SoI, and "Functional Complexity," "Structural Complexity," and "Behavioural Complexity" are all terms that relate to the technical SoI to be developed. These correlations are stable across subpopulations. Generally respondents rate system complexity factors that they have experience evaluating as more important than those that they have less experience evaluating, although this is not true for all individual system complexity factors.
Most of the sample population gave fully transitive responses when asked to evaluate the importance of the system complexity factors in a pair-wise manner, with low nontransitive of responses. Where there were nontransitive responses, it was not systematic and it is not possible to rule out that there was noise in the results.
Inconsistencies appear for the sample population when comparing the results of pair-wise comparisons with judgments on an ordinal scale, suggesting that while the majority of respondents have coherent and consistent mental models of system complexity factor importance, the community overall could stand to improve their mental models of system complexity factor importance. It likely remains that the perceived relevance and importance of system complexity factors is strongly linked to individual experiences on system development projects, it remains that the community could improve the consistency of their judgments on the relative importance of these factors by conducting more frequent, formal evaluations.
The nontransitivity found may be due to inconsistencies in respondent's mental model of system complexity factor importance. Perhaps which factors relate to which factors, a potential hierarchy of terms, a distillation of broader terms relating to system complexity into more quantifiable or atomic terms, and how system complexity factors relate to system type, a consideration that was not explicitly examined in this study. Further, they could consider a through-life cycle perspective, whereas this study emphasized the early system life cycle implications of system complexity. Future work should investigate these points further and develop a richer ontology of system complexity factors, one that moves beyond perceived importance of system complexity factors and instead seeks unambiguous objective measures that differentially impact on system development projects along with a framework to support the through-life cycle evaluation of system complexity, sensitive to the impact of system type on system complexity.
There are inherent limitations to the questionnaires as a research instrument: first, the Likert-type scale assumes linearity of responses, which may not strictly be true. Second, there may be a fatigue effect while respondents completed the questionnaire, which may have contributed to the nontransitivity found. Although, as many respondents were consistent in their answers and overall the importance ratings of system complexity factors do not appear to be rated at random (tested using a 2 test), there can be some confidence in the results despite any fatigue effects. The questionnaire also used pair-wise comparisons to mitigate this concern, which in theory offer a lower cognitive burden for respondents, although the number of individual comparisons required of each respondent may contribute to the fatigue effect.

CONCLUSION
This research has sought to address the question: "To what extent can an organization effectively evaluate system complexity during the early phases of a system lifecycle?" Here, we examine the judgments of systems engineers on the importance of six different factors, which may contribute to system complexity, revealing a lack of significant consensus on which aspects of system complexity are most important when engineering a novel system.
The between-participant agreement on the relative importance of system complexity factors is low, = 0.021, Z = 3.158, p-value = 0.002. In terms of absolute importance, the overall participant population rated "Organisational Complexity," "Behavioural Complexity," and "Functional Complexity" as particularly important but did not rate "Development Complexity," "Structural Complexity," and "Technical Novelty" as particularly important. However, the overall participant population includes two competing views: a majority view that all of the factors are important, but with no agreement on relative importance amongst them, contrasted with a minority view that the terms are only of average importance but again with no agreement on the relative importance among them. Self-reported demographics do not appear to explain the variation in views. Several system complexity factors are considered to be related to each other, for example, "Organisational Complexity" and "Developmental Complexity" are seemingly related terms, and "Functional Complexity," "Structural Complexity," and "Behavioural Complexity" are seemingly related terms, with the same correlation structures stable across subpopulations. Generally respondents rate system complexity factors that they have experience evaluating as more important than those that they have less experience evaluating, although this is not true for all individual system complexity factors.
The majority of respondents gave fully transitive responses when asked to evaluate system complexity factor importance in a pair-wise manner, indicating a maturity in respondent's mental models of the construct of the term system complexity and the overall level of nontransitivity in participant judgments was low (16% of responses were nontransitive). Where nontransitive responses were given, reasons and these differences are well recognized, well understood, and well articulated, this would also enable complex systems evaluation to be undertaken profitably (although the process would be more onerous) and allow effective trade-offs to be made during a project's design phase or later.
However, given the lack of a consensus view on the relative and absolute importance of the terms explored here, systems engineers, architects, and organizations currently are left without clear guidance on which features of system complexity to pay particular attention to.
While the results of this study cannot be used to provide a full model of complexity factor importance that is universally applicable to all in the community, the results of the survey can instead be used to make recommendations on mitigating the challenges presented by the revealed ambiguity of system complexity terms.
First, care should currently be taken when using terms related to subcomponents of system complexity because they remain open to interpretation and do not automatically avoid the ambiguity that is recognized to be associated with the overarching term "system complexity" itself. Organizations wishing to evaluate system complexity should work with a set of clear definitions of the terms that they use, defined in such a way as to address relevant subcomponents of complexity.
This task of defining complexity-related terms should take into consideration the type of system under evaluation and the contextual factors that are most relevant to the evaluation. For instance, the complexity of a predominately mechanical system's architecture may require a different language from that appropriate to the evaluation of a predominately software-based system's architecture, even if the terms being used appear to be the same.
Moreover, combining evaluations of subsystems to achieve an evaluation of overall system complexity should not be regarded as a simple process of addition. For example, the super-system composed by combining the mechanical system with its relevant software-based system may raise entirely new issues that require careful evaluation and may trigger re-evaluation of the original subsystems. It is at this point that different interpretations or assessments of key terms and concepts can cause most damage.
Consideration should also be given to the complexity of a candidate system throughout its life cycle, rather than relying solely on earlyphase evaluations. Care should be taken to consider that the relative importance of different system complexity factors may change as the system progresses through its life cycle. Finally, consideration should be given to ensure that a closed set of system complexity factors is used during evaluations.
Foregrounding the current lack of consensus on the factors implicated in system complexity also provides an opportunity for the community to direct future research toward the development of a holistic framework to support the evaluation of system complexity. To calculate the degree of agreement between respondents on their ratings of complexity factor importance we use Fleiss' . Fleiss' measures the degree of agreement in ratings beyond that which would be expected by chance.
Let N be the number of subjects to be rated (N = 6), n be the num- To calculate , first calculate p j , the proportion of all assignments which were to the j-th category: Then calculate P i , the extent to which raters agree for the i-th subject (i.e., compute how many rater-rater pairs are in agreement, relative to the number of all possible rater-rater pairs): Equal importance in a pair. Corresponding to "They are equally unimportant" and "They are equally important" 3 Moderate Importance. Corresponding to "A is slightly more important than B" 5 Strong Importance. Corresponding to "A is much more important than B" 1 3 Corresponding to "B is slightly more important than A' 1 5 Corresponding to "B is much more important than A"

RESPONSES
The pair-wise responses were converted to integer values using a modified Saaty scale, Table B.1 and stored as a matrix of pair-wise elements,

C i j.
A normalized pair-wise response matrix, X i j, is then created by dividing each element of C i j by the sum of the values in each column of C i j.
Row totals of X i j are then summed and divided by the number of complexity factors evaluated (6) to generate a weighted 'priority vector" for each respondent, W i j.
The weighted priority vector is averaged over all respondents and normalized between zero and one to obtain an overall ranking of complexity factor importance for the sample population.