Documentation for model transparency

Authors


  • The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. Work funded in part by the U.S. Department of Homeland Security, Transportation Security Administration.

Ignacio J. Martinez-Moyano, Argonne National Laboratory and The University of Chicago, Chicago, IL, U.S.A. E-mail: imartinez@anl.gov

Introduction

The system dynamics (SD) modeling approach champions model transparency and facilitates ease of communication about model elements. Transparency—the state or quality of being easily observed and comprehended—is an important attribute of useful models because it enables users to identify and understand the assumptions, relationships, and data used in the model. Model transparency, therefore, is a central element in good modeling practice. SD models, like other types of models, embody scientific theories about how the world works. Specifically, the SD approach calls for the use of dynamic hypotheses as the basis for developing scientific models. We believe that transparency in our models is akin to openness in our scientific research. One way to achieve transparency in our modeling efforts is through the use of thorough documentation processes. As Sterman states, “perhaps the most important pragmatic issue for modelers is documentation” (Sterman, 2000, p. 865). In general, system dynamicists struggle with how to document their models so that other modelers can examine the elements efficiently and effectively, thus increasing confidence in the model's usefulness.

To help modelers increase the transparency of their models through enhanced documentation, scientists at Argonne, building on model documentation work by Oliva (2001), developed a tool that enables modelers to create practical, efficient, HTML-based model documentation and provide customizable model assessments. The tool offers extremely quick generation of model documentation and assessment that can be produced concurrently with model development. Facilitating the generation of documentation and diagnostics of models—documenting and discovering “as you go”—makes it possible, as stated by Sterman, to “uncover errors that otherwise might not be detected until much later, preventing costly rework” (2000, p. 865).

The System Dynamics Model Documentation and Assessment Tool (SDM-Doc), created at Argonne and hosted at the System Dynamics Society Web pages (http://tools.systemdynamics.org/sdm-doc/), is designed to provide documentation of models built using the Vensim modeling software. Each page produced has four general sections:

  • The Model Assessment section shows assessment results in three categories: model information, warnings, and potential omissions. This section allows modelers and model users to gain a better understanding of the basics of the model in terms of its elements and confidence-building tests.
  • The Model Summary section shows a summary of the model variables by type, group, module, and view.
  • The Model Equations section, the heart of the documentation process, includes a description of each equation. Hypertext links enable the user to navigate the model by clicking directly on the variable names in the right-hand side of the equation, or on the names of variables where the documented variable is designated as an input. The equations in this section can be displayed using seven different grouping criteria: variable name, variable type, model view, group, module, module/group/name, and level structure.
  • The Model Assessment Details section provides details about the document generated by the tool, as well as tables reporting details of model assessment results and a table of variable usage in model views.

The SDM-Doc tool is very simple to use. The user needs only to run the application and select a model to be documented (an .MDL file). Once the model is selected, the results are automatically shown (see http://tools.systemdynamics.org/sdm/Handbook-Model-V.html for an example of the SDM-Doc tool output).

The following sections describe the documentation output and its potential use to achieve model transparency.

Model assessment

Model assessment results are organized into three categories: model information, warnings, and potential omissions (see Figure 1).1 Ideally, well-formulated models will show zero warnings and zero potential omissions (with the exception of supplementary variables). When instances of model information, warnings, or potential omissions are found, the title of the category is hyperlinked to a list of the variables that meet this criterion and are included at the end of the HTML page.

Figure 1.

Model assessment results

In the SDM-Doc tool, the model assessment results displayed are customizable according to the options shown in Figure 2. The tool allows for the creation of different profiles with different options just by changing the profile name. Different profiles might be used for different purposes such as model development, model publishing, or model sharing.

Figure 2.

Model assessment results options

Model information

The model information section is organized into four subsections. The first subsection shows information about the size and complexity of the model, identifying the number of variables, stocks, and macros in the model, which provides a general idea of model size and complexity. The second subsection shows the count of flagged variables that the modeler wants to identify using metadata. The way to identify the variables is to add information in the comment box of the equation editor for the variables. Currently, there are three metadata identifiers available:

  1. [Function Sensitivity Parameter: True] to flag sensitivity parameters;
  2. [Source: Source of Data] to identify variables with source information; and
  3. [Data Lookup: True] to identify lookup tables used as data repositories.

When there are variables in these categories, the category name is hyperlinked to a list of variables, which includes a separate list of such variables that is useful for reporting purposes and for model exploration and testing.

The third subsection provides information about the simulation controls: time step, time units, and time horizon of the model. The fourth, and final, subsection provides information about model completeness by showing if the model can be simulated, whether groups were defined by the modeler, and whether a packaged version of the model (i.e. a .VPM file) is available so that the tool can extract information about the graphical representation of the views of the model.2

Warnings

The warnings subsection reports the number of equations that fail a set of documentation and formulation tests. The warnings reported include the following:

  • Undocumented equations: equations for which no documenting text is provided in the comments box of the equation editor. There is no verification made by the tool as to the thoroughness of the description provided by the modeler. In this sense, the tool reports only on the existence of a description, not on its quality.
  • Equations with embedded data: equations in which numeric data were found. Data should never be a part of an equation; they should exist only in constants, data variables, or lookup tables. When data are mixed with logic in equations, the model is obscured and the results may be suspect. In many cases, modelers forget to create additional variables to hold the numeric parameters (data) used in equations.
  • Equations with unit errors: equations that fail the basic dimensional consistency test (see Forrester and Senge, 1980). Although “dimensional analysis cannot prove an equation is correct, it can certainly prove some equations to be incorrect” (Richardson and Pugh, 1981, p. 264). The tool, however, does not test for the existence of arbitrary scaling factors without real-world meaning to achieve dimensional consistency. Equations should be inspected for the existence of such factors, as suggested by Sterman (2000, p. 866).
  • Variables not in any view: variables that, although they exist in the model, are not used as part of the structure shown in the model views. This is a potential problem because it can be a way to hide structure in the model.
  • Incompletely defined subscripted variables: subscripted variables with components that are not comprehensively defined. This is a potential problem and might be an indication of faulty specification of the subscripted variable.
  • Nonmonotonic lookup functions: lookup functions that have changes in their slope (from positive to negative or negative to positive). The functions “should be either nondecreasing (flat or rising) or nonincreasing (flat or falling)” (Sterman, 2000, p. 577) to ensure that the polarity of the causal link between input and output is unambiguous. When a nonmonotonic function is discovered, to isolate the different dynamics embedded in it, it is necessary to separate the curve into its upward- and downward-sloping components. Although mathematically identical, the disaggregated function enables the modeler to clearly identify the different feedback processes at work (Rudolph and Repenning, 2002). The lookup functions that are identified as “used for data” are not reported in this category even if they are nonmonotonic.
  • Cascading (chained) lookup functions: the number of lookup functions that use as input the output of another lookup function. As identified by Martinez-Moyano and Richardson (2001), the causal mechanism may be obscured by the compounded effects of the sequential functions.

Potential omissions

The potential omissions subsection reports the number of unused variables, supplementary variables, supplementary variables being used, overly complex variable formulations, and overly complex stock formulations.

  • Unused variables: variables that are part of the structure of the model but are not used as input in any equation of the model.
  • Supplementary variables: variables that are flagged as supplementary in the Vensim software and that are not used as input in any equation of the model but are still computed. Most of the time, supplementary variables are used as reporting variables. There is no theoretical limit to the number of supplementary variables in a model.
  • Supplementary variables being used: variables that are identified as supplementary but are used as input in at least one equation of the model. This is an indication of a potential formulation problem, as the modeler had considered such variables as reporting variables and now these are part of the feedback structure of the model.
  • Overly complex variable formulations: equations that have more input variables than a certain predefined threshold. The default value used is three input variables in each equation, an approach established by following Richardson's rule (2000). The rule states that, in order to maximize the level of transparency of equation formulation, an equation should have a maximum of three variables as input.

    The threshold used by the tool, however, is customizable by the modeler; the higher the threshold, the lower the number of violations to this rule of equation formulation parsimony. A natural tension arises between the quest for overall models parsimony and equation simplicity. This category, as many others, is hyperlinked to a list of the variables that violate the rule, each with complexity score based on the number of variables used as input for the equation. For example, if the complexity score is 10, it means that the equation to compute the variable reported uses 10 variables as input.

  • Overly complex stock formulations: stock variables that have computations other than addition or subtraction in them. Stocks are accumulations of inflows and outflows; therefore, in principle, the only admissible arithmetic operands are addition and subtraction, as depicted in the following integral equation (from Sterman, 2000, p. 194):
    display math

    In some cases, however, to simplify the detail complexity of the model, modelers may choose to embed other computations in the stock variable, making the stock formulation opaque and difficult to follow. All computations should be moved out of the stock variable and into the corresponding rate variables to increase the transparency of the formulation.

Grouping criteria for variables

After the model assessment results, the tool reports the results for types of variables, groups, modules, and views in summary form, as shown in Figure 3. Depending on what grouping criterion is active, the corresponding hyperlinks become active3 (in Figure 3, the active grouping criterion is: by type).

Figure 3.

Grouping criteria summary reports

The SDM-Doc tool is capable of sorting the model output by each of these four groupings—type, group, module, and view—as well as by variable name and the combined module/group/name sequence. Also available is an option to sort by level structure, which corresponds to the “natural way” to make sense of a model and matches the order proposed by the original DYNAMO documentor (see Richardson and Pugh, 1981, pp. 214–229, for details).4

In addition, a view summary (i.e., a table of variable usage in model views) is created at the end of the HTML file to visualize the distribution of variables included in model views (see Figure 4 for a partial example).

Figure 4.

View summary (partial)

Equation documentation

Each equation is displayed showing what module, group, and type it belongs to (see Figure 5) and all its associated information: name, units, range (when defined), formula, description, views in which the variable is used, whether source information has been defined for it, and what other variables in the model use it.

Figure 5.

Equation description

In the case of a lookup function, the description includes the graphical representation of the tabular data, as shown in Figure 6.

Figure 6.

Equation description: lookup function

This type of model documentation allows users to efficiently navigate model structure by simply clicking on the variable names on the right-hand side of the equations or by clicking on the links to the variables that use this variable as input (“used by”). In this way, model transparency is achieved because all details of the formulation can be quickly examined, even in the case of very large models. In addition, in the by-view grouping (see Figure 7), when a user is given the opportunity to see a diagram of model structure,5 followed by the equations that belong to that part of the structure, as suggested by Sterman (2000, p. 856), an important cognitive process is enabled, allowing for a thorough inspection of the model using all available sources of information.

Figure 7.

Equation description: view graphics (partial)

Conclusion

While the SDM-Doc development at Argonne started as a way to easily navigate model structure (providing hyperlinks to equation components and other features), it has evolved to automate some basic model tests. The tool now also provides a powerful overview of model structure, leading to enhanced model clarity and transparency. Judging models based solely on inputs and outputs is not enough to promote deep understanding of complex systems. The idea behind the development of the tool was to help modelers move from opaque models—“black-box” models—to as close as possible to completely transparent models—“glass-box” models.

Although we may never be able to measure model transparency in an absolute way, and although not every modeler and decision maker will necessarily measure, or know how to value, improved model transparency, we believe that enhanced documentation and assessment practices will result in increased transparency in models, and that increased model transparency will improve model quality and usefulness. In addition, as model transparency increases, the speed with which model users can understand the model structure and use the results will increase, and models will become more objective (than their opaque alternatives), allowing comparison with other models and possibly leading to their use for further development and understanding (i.e., as benchmarks and/or building blocks). Increased transparency enables model builders to develop explicit, common definitions of model constructs—leading to quality peer review of models, increased trust and confidence in model results, and an improved ability to communicate the importance of modeling results.

Increased model transparency also enhances the clarity of model assumptions, traceability of relationships among model variables, understanding of data used in the model (its sources, boundaries, and uses—parametrization and calibration), understanding of model components at different levels (e.g., equation, view, group), and understanding of the model as a whole. As models become more transparent, model builders and the decision makers who use the models are able to better understand the components of the models, resulting in an increased understanding of the structure of the problems they face and in a lower unbounded (i.e., “blind faith”) reliance on models and their output.

Finally, increased model transparency also leads to a higher level of modeler accountability and helps remove barriers to understanding and reproducing model results, potentially leading to enhanced insights, model reusability, and adaptations and extensions of models to address related issues. Model transparency, then, is not only about being able to see inside our models as through a glass box, but also about how the inside of that glass box is organized, making what is behind the glass more easily comprehensible.

The SDM-Doc reports on model components and complexity (e.g., variables, number of stocks, time units) and tests the accuracy of model documentation (e.g., variable usage, warnings, potential omissions, equation documentation, simulatability). By automating these tests, we believe the tool will help the modeling community improve its internal documentation standards, leading to an improved model development process. By providing the overview description of the model size and complexity, as well as supporting multiple perspectives for understanding the model, we believe the tool provides important support for model development, analysis, and eventual understanding.

We hope to be able to continue expanding the model assessment capability as additional tests are identified and vetted by the community, and are delighted to provide the tool as open source and encourage other groups to develop the functionality for other system dynamics modeling platforms.

Notes

  • 1

    1All screenshot examples of documentation presented in this paper come from the Oliva and Sterman (2010) model.

  • 2

    2If the .VPM file is not available, the tool will not be able to display graphics of the model views.

  • 3

    3The views' hyperlinks are always active.

  • 4

    4This option is still under development and will use metadata provided by the modeler and other heuristics to determine the order of display.

  • 5

    5When the .VPM file is available.

Ancillary