WRF‐TEB: Implementation and Evaluation of the Coupled Weather Research and Forecasting (WRF) and Town Energy Balance (TEB) Model

Urban land surface processes need to be represented to inform future urban climate and building energy projections. Here, the single layer urban canopy model Town Energy Balance (TEB) is coupled to the Weather Research and Forecasting (WRF) model to create WRF‐TEB. The coupling method is described generically, implemented into software, and the code and data are released with a Singularity image to address issues of scientific reproducibility. The coupling is implemented modularly and verified by an integration test. Results show no detectable errors in the coupling. Separately, a meteorological evaluation is undertaken using observations from Toulouse, France. The latter evaluation, during an urban canopy layer heat island episode, shows reasonable ability to estimate turbulent heat flux densities and other meteorological quantities. We conclude that new model couplings should make use of integration tests as meteorological evaluations by themselves are insufficient, given that errors are difficult to attribute because of the interplay between observational errors and multiple parameterization schemes (e.g., radiation, microphysics, and boundary layer).

Coupled numerical weather prediction (NWP) and urban land surface models allow a diverse range of urban climate phenomena to be studied (e.g., Best, 2005;Chen et al., 2011;Hamdi et al., 2012). While NWP models simulate the prevailing meteorological conditions at kilometer resolution, land surface models (LSM) parameterize subgrid surface processes that are too small-scale, or (currently) too complex, to be explicitly modeled. Urban land surface models (ULSM) or urban canopy models (UCM), aim to capture the urban form (and sometimes function) created by buildings, roads, and vegetation. be solved for individual facets (e.g., roof, walls, roads) with different levels of complexity (Grimmond et al., 2010. A coupled NWP-UCM may treat the surface as a single vertical layer (single layer UCM), where the entire urban canopy layer is collapsed into a single point, or as multiple layers (multilayer UCM), where the UCM is "immersed" in the NWP to account for the interaction between bluff-bodies (e.g., buildings) and the atmosphere (Chen et al., 2011). The assumptions and simplifications can also vary from treating buildings as being arranged to create infinitely long street canyons (e.g., Kondo et al., 2005, Kusaka et al., 2001Martilli et al., 2002;Masson, 2000), or as cuboids (e.g., Mills, 1997).
Complex models may not perform systematically better than simpler ones (Grimmond et al., 2010. However, simpler models tend to lack features, thus limiting the study of specific urban climate processes (e.g., estimation of building energy consumption or details of it such as air conditioners (AC) energy demand, energy production from solar photovoltaic panels (PV)), which may be of interest to the broad urban climate community.
Here we both outline a technical approach to couple and verify model components and link TEB with WRF to add to other UCM options already available in WRF (UCAR, 2020). To date, these are: bulk urban parameterization within the Noah-LSM (Chen & Dudhia, 2001;Ek et al., 2003), single layer Urban Canopy Model (SLUCM Chen et al., 2011), and the multi-layer Building Effect Parameterization (BEP Martilli et al., 2002) with optional Building Energy Model (BEP+BEM Salamanca et al., 2010). Given the conclusions from (Grimmond et al., 2010 model comparison, we do not expect WRF-TEB to perform systematically better than other models currently available in WRF but we have undertaken this work to (a) offer researchers and practitioners a greater number of features currently unavailable in other models (section 2.2), (b) simplify the evaluation of offline and online TEB in future research and comparison projects (section 4.1), and (c) simplify the integration with future TEB developments (section 4.1).
We describe the coupling between WRF and TEB (section 2) both conceptually (section 3) and technically (section 4) in a way that may be generalizable beyond the scope of WRF (or WRF-TEB). We release the complete source code, data, and tools to make our results reproducible (section 5) and evaluate the model (section 6) with a technical integration test (section 6.3) and meteorological observations (section 6.4).

Weather Research and Forecasting (WRF)
WRF is a popular atmospheric model used in research and NWP applications (Powers et al., 2017). It has been developed under two variants: the Advanced Research WRF (ARW Skamarock et al., 2008Skamarock et al., , 2019, and the Nonhydrostatic Mesoscale Model (NMM Janjic et al., 2001;Janjic, 2003). The support for the latter recently ended (see Developmental Testbed Center, 2018). Here, we exclusively refer to the WRF-ARW variant (as WRF).
WRF-TEB is developed using WRF-CMake version 4.1.5 , 2020 Table 1) as it adds CMake (Kitware Inc., 2019a) support to the latest versions of WRF to simplify the configuration and build process of WRF and WPS (WRF Preprocessing System). Although, WRF-CMake version 4.1.5 does not Table 1 Models and Software Used in the Coupling, Where "Model" Refers to the Science (i.e., as Outlined in the Literature), "Software" Refers to the Actual Software and "Version" the Exact Software-Version Used in Running a Simulation  Masson et al. (2020) include support for WRF-Chem (Grell et al., 2005), WRF-DA (Huang et al., 2009), WRFPLUS (Guerrette & Henze, 2015), or WRF-Hydro (Gochis et al., 2018), its benefits may outweigh these limitations to model developers, code maintainers, and end-users wishing to build WRF, as it includes: robust incremental rebuilds, dependency analysis of Fortran code, flexible library dependency discovery, integrated support for shared (Open Multi-Processing; OpenMP) and distributed (Message Passing Interface; MPI) memory, support for automated testing using continuous integration (CI), and availability of experimental prebuilt binary releases for Linux, macOS, and Windows from the project's GitHub page or through the integration with GIS4WRF , a QGIS (QGIS Development Team, 2019) toolkit for preprocessing and postprocessing, visualizing, and running simulations in WRF. Here we refer to both the physical model and the software (i.e., WRF-CMake) as WRF, unless highlighting specific software features.

Town Energy Balance (TEB)
The physically based single-layer UCM TEB (Masson, 2000) characterizes cities by their surface area of building roofs, walls, roads, and integrated vegetation using a simplified infinite street canyon geometry. The energy balance of impervious and pervious (vegetation) surfaces are calculated independently before being aggregated. To characterize the urban area, TEB requires a surface fraction of vegetation/garden, building, and road area, building height and vertical to horizontal surface ratio. For the calculation of shadowing effects and radiative trapping, the street orientation is assumed isotropic.
The outer surface of each facet is assumed to be sufficiently thin that the layer-averaged temperature can be used to determine the radiative and turbulent surface flux densities (i.e., the impervious skin temperature equals that of first layer-averaged temperatures). Thermal diffusion into materials is calculated using the thermal properties and thickness of the specified layers. The momentum flux is calculated for the whole canopy using a representative roughness length of the city (at model grid point scale), whereas thermal and hydrological fluxes for impervious areas are computed using an aerodynamic resistances network that considers local energy exchange within and above the canyon. Turbulent exchanges inside the urban canyon, and those between the canyon and the atmosphere above, depend on an aerodynamic resistances network with exchange coefficients that depend on wind speed and stability conditions (see Figure 1 in Lemonsu et al., 2004). Other TEB original features include the following: a water reservoir on roofs and road, and a snow mantel on roofs and roads (Masson, 2000), but more recent TEB developments now also include the following: 1. Building Energy Model (BEM Bueno et al., 2012): internal building energy balance (indoor air, floor, and internal mass), windows, heat-ventilation-air-conditioning (HVAC), infiltration, shading devices, and natural ventilation (opening of windows). 2. Road orientation (Lemonsu et al., 2012): specified road orientation, and separate energy balance for adjacent walls. 3. Gardens (Lemonsu et al., 2012): vegetation inside canyons. 4. Green roofs (de Munck et al., 2013). 5. Human behavior related to building energy consumption (Schoetter et al., 2017). 6. Calculation of urban carbon dioxide fluxes (Goret et al., 2019). 7. Irrigation (de Munck et al., 2013): irrigation of green roofs, gardens, and watering of roads. 8. Solar panels (Masson et al., 2014) for hot water and/or photovoltaic (PV).
To implement WRF-TEB, TEB version 4.0.1  Table 1) is used as it includes MinimalDX (Meyer & Raustad, 2020) to improve the modeling of air conditioners (AC), and Psychrolib (Meyer & Thevenard, 2019) to calculate psychrometric functions. Furthermore, support for Linux, macOS, and Windows with CMake allows direct integration in WRF-CMake. In TEB 4.0.1, Features 3 and 4 use a simplified vegetation scheme with a fixed albedo and Bowen ratio, whereas Lemonsu et al. (2012) treats the vegetation by coupling to ISBA (Interaction Soil Biosphere Atmosphere; Noilhan and Planton (1989); see simplifications in section 3). Furthermore, feature 5 and 6 are not available in TEB 4.0.1.

Coupling Approach
TEB is coupled to WRF following the generalized coupling methodology described in Best et al. (2004), where atmospheric quantities from the NWP's lowest model level are passed to the LSM to improve the 10.1029/2019MS001961 Journal of Advances in Modeling Earth Systems calculation of surface fluxes (Best et al., 2004). The current implementation of WRF-TEB is designed to work in both WRF (UCAR, 2020) and WRF-CMake  alongside the Yonsei University (YSU) planetary boundary layer (PBL) scheme (Hong et al., 2006;Noh et al., 2003), the revised MM5 Monin-Obukhov surface layer scheme (Jimnez et al., 2012), and the Noah-LSM as they have been shown to perform reasonably in recent comparison studies for several types of environments (Greve et al., 2013;Hari Prasad et al., 2016;Hu et al., 2010Hu et al., , 2013Shin & Dudhia, 2016;Xie et al., 2012).
The general workflow used in WRF-TEB is as follows: (1) for each grid cell defined as urban, and each time step, WRF provides TEB with atmospheric and model-specific quantities from the lowest model level (Table 2a), and site-specific characteristics such as grid cell spatial coordinates and surface parameters (e.g., building height and roof albedo) from a lookup table in WRF (URBPARM.TBL).
(2) From these, TEB computes area-averaged surface quantities (Table 2b) and passes them to WRF (Figure 1). (3) Surface diagnostics (Table 2c) are calculated in TEB and passed directly to WRF as outputs without affecting calculations in the WRF dynamical core.
As with other TEB features (section 2.2), anthropogenic heat flux options in WRF-TEB are specified in the URBPARM file and modeled as follows: (i) traffic and industrial heat fluxes are user specified, (ii) heating Mass mixing ratio of water vapor kg kg

O O
Note. Full list of parameters are reported in URBAPARM.TBL. a WRF atmospheric surface pressure is assumed to be the same as the atmospheric pressure at the roof level in TEB due to the coupling assumption (see h 0,TEB and h 0,WRF ; Figure 1). WRF-TEB assumes the atmospheric pressure to be the uniform throughout the canyon. Given the different nature of the two models, with atmospheric equations solved explicitly (WRF) and processes parameterized (TEB), several assumptions and simplifications are made: In TEB, the urban canopy layer (UCL) is represented as a point. This results in a mismatch in elevation between TEB and WRF (i.e., h 0,TEB and h 0,WRF ; Figure 1). As with other single-layer UCMs, h 0,WRF is located at the mean building height (given in URBPARM.TBL). This means that the urban canopy layer is located below the surface of the NWP model. This is an important assumption of any single-layer UCM and rarely explicitly stated. This assumption may be acceptable if buildings are low to midrise (e.g., <20m) and of uniform height, such as typical of extensive suburban areas. However, areas with tall and/or variable height roughness elements (e.g., skyscrapers), as in central business districts in many cities worldwide, may not suit the use of a single-layer UCM. Depending on the user's needs and model configuration, outputs from WRF-TEB may require postprocessing to account for this assumption.
Surface diagnostics in WRF are given at standard World Meteorological Organization weather station heights (i.e., 10m for wind, and 2m for air temperature and humidity), thus representing quantities below the mean building height in the case of single layer UCMs. In WRF-TEB, we do not calculate these explicitly but simply rely on TEB's surface diagnostics (i.e., representative at half building height; see Masson, 2000). Given the single-layer assumption, WRF-TEB's surface diagnostics are, in effect, representative at half the building height below the first model level.  Table 2a) are passed to TEB. In turn, TEB computes area-averaged surface quantities such as net all-wave radiation flux density (Q * ), and turbulent sensible and latent heat flux density (Q H and Q E ; Table 2b) and passes them to WRF at the surface (t n+1 ). Under the single-layer UCM assumption, the vertical extent of the atmospheric model extend from h 0,WRF to the top of the atmosphere (TOA) whereas the UCM is assumed below the ground. This therefore creates a mismatch between the real ground level h 0,TEB and the ground level as seen by the atmospheric model h 0,WRF . model simulation. The temporal evolution of soil temperature, surface water storage or vegetation is not represented. The approach of Lemonsu et al. (2012) to simulate in-canyon vegetation with the ISBA model is expected to be more universally applicable than the approach implemented in the current WRF-TEB. The default option in WRF-TEB (b) is to calculate the surface energy balance separately for urban and vegetated areas and to aggregate the fluxes. For the vegetated areas, the Noah-LSM cropland (MODIS class 12 in VEGPARM.TBL) class is assumed, whereas for built areas, TEB is used assuming no vegetation. In (c), users define the fraction of vegetation to model as integrated and nonintegrated respectively.

Software Implementation
The implementation of model to software and its testing are critical aspects of any model development. In WRF-TEB (Figure 2), the data flow and sequence of Fortran subroutine invocations is similar to existing couplings (e.g., SLUCM, BEM) and can be coupled serially as parallelization will be inherited from WRF over the number of grid cells in the domain(s). The main differences are in its code structure, implementation, and location. Although a generalizable framework (Common Community Physics Package; CCPP) is currently being developed (see Developmental Testbed Center, 2019), WRF does not yet provide a clear extension or plugin mechanism for integrating external models. Here we summarize the main techniques and issues encountered during its development that may aid other future meteorological model developments: 1. Modularity: As existing coupled models are tightly interwoven into WRF and partly modified, deviations from the original code can be hard to detect. By treating models as libraries we increase modularity. 2. Clarity: As different couplings are typically in one large Fortran file or subroutine, understanding existing coupling is difficult. This is complicated further by reuse of some variables and parameters. By separating the coupling code, using consistent naming conventions, and introducing a small amount of duplication, we aim to increase clarity and reduce the time to undertake the coupling. 3. Reliability: WRF allows simulations to run even when required inputs are omitted, resulting in uninitialized values that may change results. By using a stricter input validation, we increase reliability of model output.

Modularity
Copying the coupled model's source code (e.g., TEB) directly into the WRF codebase is likely to cause code fragmentation, ultimately hindering model development, collaboration, and (possibly) the formation of a strong coherent community around that model. Once coupled in this way, any improvements, bug fixes, or other changes to the ULSM are inherited from the model source code repository (e.g., on GitHub) and included in WRF with a new commit (or version) identifier that is downloaded during the build process. This provides a central location for "issues," maintains the community around the model, and facilitates the creation of offline tests (e.g., on units or individual components, or end-to-end integration tests of the whole model).
From this, a natural step is to make use of freely available Continuous Integration (CI) services to automate the execution of such tests on new commits on different operating systems and using multiple compiler versions and options, thus providing a stronger sense of reproducibility and trust. Where an appropriate testing methodology exists, CI can also enable model developers to accept code contributions from the community with more confidence.

Clarity
WRF/urban models (e.g., SLUCM, BEM/BEP) often reuse a subset of variables, parameters, and parts of the coupling "glue" code. Problems can arise as WRF allows each model to declare which state variables should be allocated in memory (via "schemes" in its registry). As the glue code for all models of one type is typically in a single large Fortran file (e.g., module_sf_noahdrv.F) with many conditional branches (e.g., IF(SF_URBAN_PHYSICS == 1) THEN …) that is run irrespective of which model is activated, a model code contributor must understand most of the glue code to not introduce unintended memory issues. In the best case, using a variable that is not allocated leads to program crashes. In the worst case, it may lead to reading from, or writing to, other variables that are nearby in memory. In such case, the program may not fail but simply change results, possibly without the user being aware.
Here, we separate the configuration and coupling of TEB as much as possible from other models to improve clarity and reduce the time needed to understand the coupling. The prefix TEB_ is added to state variables, array dimension names, and parameters used by TEB, while as much glue code as possible is moved into the TEB coupling module (module_sf_teb.F; Figure 2). Without shared variables and parameters both the coupling and model can be understood in isolation and evolve independently from other models, hopefully encouraging community contributions. Despite duplicating state variables and parameters between urban models, we believe the benefits outweigh the disadvantages. . The data archive (Meyer, 2020) contains all data and code used to evaluate the models. Users wishing to reproduce our results can download the data archive and run Singularity on their local or remote systems. As small differences in outputs may be possible because of the different hardware used, model outputs are also provided as reference.

Reliability
WRF checks the validity of some, but not all, user inputs. Unfortunately, when an urban parameter is unspecified, unexpected simulation results can occur as uninitialized memory values are used instead of an error being raised. These errors may be hard to trace or can go undetected.
To solve this we separate parameter reading (in module_sf_urban.F) into three phases: (a) initialize all parameters with a known out-of-bounds value, (b) read the user-supplied parameters (from URBPARM.TBL), and (c) check that parameter values are not equal to the out-of-bounds value.
Missing parameters cause an error message with the parameter name provided and stop WRF, thus saving time by early detection. Additional improvements could include checking value ranges for each parameter.

Scientific Reproducibility
Issues of lack of scientific reproducibility have been noted by several authors in various disciplines (e.g., To achieve a reasonable level of scientific reproducibility, several aspects must be considered. In the current context we identify: CPU (micro) architecture, operating system (OS), compiler vendor, compiler version, compiler options, external library versions, source and version of data sets, preprocessing and postprocessing steps of data, model version and configuration, and plotting routines.
One way to achieve a reasonable level of scientific reproducibility is to automate the generation of results included in this article by using a combination of Shell and Python scripts run through a Singularity container . This approch remains architecture dependent but provides full control over OS (i.e., Ubuntu 18.04 in our case), compiler vendor (i.e., GNU in our case), source of data sets, processing steps for data, model version and configuration, and plotting routines. Although this level of reproducibility may be deemed sufficient, the container would still rely on the Ubuntu package repository to install the compiler and external libraries without defining exact versions, or to download the "latest" (un-versioned) WPS high-resolution geographical data set (from the UCAR website). This can therefore lead to downloading newer compiler, library, or data set versions when re-building the Singularity image and ultimately alter results included in this paper.
Here, we achieve a reasonable level of scientific reproducibility by archiving all tools, data (including our results for reference, as equal results can only be guranteed on the same hardware), and software together with a Singularity image containing OS and external libraries to Zenodo (Meyer, 2020; Figure 3). By doing this we remove the need for duplicating configuration settings in tables or appendices, thus reducing accidental errors and allowing reproducibility on local or high-performance computing (HPC) systems. Users wishing to reproduce the results described in this paper can download the data archive (Meyer, 2020) and run Singularity on their local or remote systems (Figure 3).

Model Evaluation
A fundamental aspect of any software development is testing. Although neither WRF nor TEB have been developed with testing in mind, in this section we outline tests for: the coupling (integration test, section 6.3) and meteorological evaluation (section 6.4). The former assesses the coupling technically, while the latter is used to explore the scientific benefit of the coupling. Both tests use similar model configurations (section 6.2) with meteorological and geographical data for Toulouse, France (section 6.1).
During the CAPITOUL (Canopy and Aerosol Particles Interactions in TOulouse Urban Layer Masson et al., 2008) campaign, a Gill HS 50 sonic anemometer for eddy covariance (EC), and other meteorological sensors, were mounted on a tower at 48m to 27m a.g.l. (depending on local wind conditions). EC sensors should be above the roughness sublayer to observe local-scale rather than microscale (e.g., individual urban obstacles) fluxes Roth, 2000). The Goret et al. (2019) analysis of observed momentum fluxes confirms that they are located in the inertial sublayer (constant flux layer).
EC measurements (sampled at 50 Hz) postprocessing includes double-rotation (azimuth and pitch correction), recursive filtering according to McMillen (1988) with filter parameter set to 200s, prior to 30min covariance flux calculation (Pigeon et al., 2007). The EC flux footprint, calculated using the Kljun et al. (2015) model for each 30min interval, identifies that the probable mean 80% fetch extends to around 500m in all wind directions, except for southerlies where it extends to 1km (Goret et al., 2019). Given the homogeneous characteristics within 500m radius of the tower and areas further south, we assume that the observed turbulent fluxes are comparable to the modeled turbulent fluxes (horizontal grid spacing 1km; Figure 4, d4).
Radiation fluxes observed with a Kipp and Zonen CNR1 radiometer (sample rate 0.1Hz) mounted at the tower top are averaged to 1min (used to force the offline TEB) and to 30min (used in the meteorologicalevaluation). Air temperature and relative humidity measured with a Vaisala HMP233 Thermo-hygrometer (sample rate 0.1Hz) at 43.3, 34.2, and 25m a.g.l. when the mast was in the high, medium, and lowest position (respectively) are used as 1 and 30min averages. The atmospheric pressure measured with a Vaisala PTB220 class A barometer (sample rate 0.1Hz) at 20m a.g.l., is used as 1min average. Missing data are gap-filled every 1min using observations from the routine observation station at Toulouse-Blagnac airport (7km northwest of the tower) or a station operating at the site of Météo-France (flat grassland 6.5km west southwest of the tower). The temperature and relative humidity values measured at these stations (2m a.g.l.) are corrected by the average daily cycle of the differences between the values measured at the mast and at these sites. Wind speed measured at these stations (10m a.g.l.) is corrected to the height of the mast assuming a logarithmic wind profile and neutral stratification. The values of aerodynamical roughness length and displacement height are 1.5 and 10.5m, respectively.
The evaluation is undertaken between 2 and 5 July 2004 when the tower was at 48m a.g.l. (28m above roof height). A strong urban canopy layer heat island was present before sunrise on 4 July 2004 (Hidalgo et al., 2008). An offline TEB simulation is forced with the required meteorological data and the surface morphological parameters averaged for the area within in a 500m radius from the tower (see Figure 1 in Goret et al., 2019 anddata in Meyer, 2020).

Model Setup
Four nested domains (Figure 4) are set up using GIS4WRF version 0.14.2  with the innermost domain (Figure 4, d4) centered on the EC tower. The grid spacing is set to 1km for the horizontal and to 66m (increasing with height and in pressure (η) level equivalent) for the vertical, thus allowing equal comparison with observations (i.e., under single layer UCM assumptions (section 3). The 48m a.g.l. EC tower is represented at 33m a.g.l., because the vertical extent of the buildings (mean building height 15m) is not represented in WRF (see single-layer UCM assumptions in section 3). The WPS MODIS land use (UCAR, 2019) and WRF urban fraction for the innermost domain are modified using GIS4WRF:

10.1029/2019MS001961
Journal of Advances in Modeling Earth Systems (a) lake (MODIS class 21) used to indicate the River Garonne within central Toulouse is reassigned to cropland (MODIS class 12) as the river would otherwise be over represented in the 1km grid (the river is ≈ 200 m wide), and (b) the urban fraction (i.e., constant for the whole grid for pixels defined urban; MODIS class 13, i.e., built) is replaced with that from the MApUCE (100m resolution) data (Bocher et al., 2018) linearly interpolated to 1 km to provide spatial variability within the domain. Grid cells classified as urban in the MODIS 30 arcsec data set but not present in MApUCE (Figure 4, d4 yellow) are given a default fraction of 0.15 to be representative of the area. For the initial and boundary conditions we used ECMWF Cycle 28r2 analysis (ECMWF, 2015) with native horizontal grid spacing TL511 (≈ 40km), gridded to 0.36 arc-degree.
Domains are generated using WPS-CMake version 4.1.0 . WPS/WRF are configured with parameters from Table 3 (see namelists in configs/wrf/capitoul in Meyer (2020) for the complete configuration). TEB (offline and online) surface characteristics are derived and adapted from Lemonsu et al. (2004) and Schoetter et al. (2017). To reduce the computational and storage cost during the integration test, we only run the outermost two domains with a longer simulation timestep (Table 3). Other differences in configuration settings between integration test and meteorological evaluation are reported in Table 4.

Integration Test
The model coupling is validated using an integration (i.e., end-to-end) test. Although neither WRF nor TEB carry out unit tests on their components, we assume that each on their own is working correctly. In this test, variables passed in the coupling (Table 2a) and TEB specific output variables (Tables 2b and 2c) are compared between offline (i.e., TEB) and online (i.e., WRF-TEB) models ( Figure 6). As the inputs required to force TEB are not provided as standard WRF outputs, we introduce new variables in WRF's registry. To avoid permanently allocating memory for the additional 11 variables (Table 2) a new registry package (teb_test) is enabled through the namelist configuration option teb_integra-tion_test=1. Thus, only the necessary variables are allocated without performance overheads when tests are not conducted, and further testing can be performed when new software releases are available. Any difference larger than machine precision is attributed to coupling implementation errors. Results are evaluated graphically and statistically (metrics Appendix A).
The implemented test detects errors that cause incorrect 1. loading of parameters from the urban parameter table file; 2. passing of parameters to TEB; 3. conversion of date/time from WRF to TEB conventions; 4. passing of geographical and date/time coordinates to TEB; 5. passing of TEB-internal state variables; 6. storing of TEB-specific output diagnostics; 7. updating of WRF state variables from TEB outputs; 8. grid cell looping; and 9. activation of TEB based on grid cell vegetation type and global scheme selector.
These errors can lead to software crashes or nonidentical results between TEB offline and WRF-TEB. With offline TEB forced with data from the WRF dynamical core quantities converted prior to use in TEB are not assessed. For example, as WRF uses mixing ratio whereas TEB uses specific humidity, it must be converted Vertical grid spacing increasing with height (h) and first level (L1) set to 66m a.g.l. See namelists in configs/wrf/capitoul in Meyer (2020) for the complete list of options used. a ECMWF (2015), b Dudhia (1989), c Mlawer et al. (1997) d Hong et al. (2004, e Hong et al. (2006) f Jimnez et al. (2012), g Chen and Dudhia (2001) h  , i Masson et al. (2020) for TEB. Conversion errors would propagate in TEB and be evident in this comparison. Meteorological evaluation is undertaken separately (section 6.4).
The regular WRF output are directly comparable to offline TEB outputs, as the relevant WRF quantities are not modified further. Modifications are prevented by setting the test grid urban fraction to 100%. As this may be inappropriate for other submodels, the same method as used to obtain TEB forcing could be used. However, it may not be easy to test if the output quantities are correctly passed without using other techniques.
This testing approach ( Figure 6) requires a common calling method of the offline model (TEB). As TEB can be compiled both as a library for online use and as an executable for offline use (Figure 6), the integration test tool (Figure 6) can run both TEB and WRF-TEB with the same source code. Thus, there is a strict testing of the coupling. The CCPP effort (Developmental Testbed Center, 2019) aims to organize models (/schemes) in a central location independent of a target framework. This may solve similar issues in the future. Unfortunately, it is not (yet) ready for use within WRF, or as an offline model and coupling testing tool.
Although a single configuration cannot represent all the degrees of freedom defined by the different model options and input parameters, we activate as many TEB options as possible (Table 4). Results show no visible differences between TEB and WRF-TEB (Figure 7). Similarly, no errors are detected using statistical metrics (NRMSE = 0%, Figure 7).

Meteorological Evaluation
With the coupling code verified (section 6.3), a meteorological evaluation allows the scientific benefit of the coupling to be explored. Given the wide range of WRF options (Powers et al., 2017) the individual choices (e.g., radiation, microphysics, and boundary layer) may have a larger impact than the UCM selected. Here, the evaluation is focused on TEB.
For the evaluation period (2-5 July 2004) the net all-wave radiation flux density (Q * ) is simulated well by TEB but only moderately well by WRF-TEB (Figure 8a; Table 5). Unsurprisingly, TEB forced with observations has a lower mean absolute error (MAE) for Q * (MAE ≈ 7.7W m −2 ; Figure 8a) than when forced with   Table 5). This difference is most likely caused by the cloud microphysics scheme, which simulates too much cloud overnight (2 and 3 July) and the next morning (3 July). This leads to an overestimation of Q * during the night and an underestimation during ; (d, f), 20m a.g.l. (e). Atmospheric pressure corrected for height by linearly interpolating between surface pressure and first model level (center) to measurement height at roof level because of the assumption used in single-layer UCM. All other quantities are uncorrected as changes would be minimal.  Table 5).
Dry-bulb air temperature (T) at 48m a.g.l. (28m above roof level) is generally underestimated by WRF-TEB (mean bias error (MBE) ≈ −1.2 K; Figure 8d; Table 5) for the whole period and during the day, but slightly overestimated at night. Such underestimation requires further investigation but may be caused by other WRF processes (e.g., too low advected temperature). Mass mixing ratio of water vapor (r) is overestimated (MBE ≈ 1.4g kg −1 ; Figure 8f). Pressure (p) and wind components (u, v) are simulated reasonably well (MAE ≈ 0.5Pa and 1.4m s −1 , respectively; Figures 8e, 8g, and 8h; Table 5), indicating that WRF has captured the general atmospheric dynamics.
Overall, the choice of vegetation scheme used (i.e., WRF-TEB with Noah-LSM (WT-N) or with Bowen ratio (WT-B)) results in similar simulation performance (Table 5, Figure 8).

Concluding Remarks
The coupled WRF-TEB model enables a wide range of urban climate processes to be analyzed. In this paper, we describe techniques to help with the coupling approach, implementation, verification, and scientific reproducibility.
In implementing the coupling interface, we do not alter the current WRF framework but, instead, implement techniques to help with software modularity, clarity, and reliability, for example, treating TEB as an external library. We assess the software linkage with an integration test to ensure that the coupling is technically correct. The results of the integration test show no detectable differences with the offline TEB. The meteorological evaluation is used to confirm that the results are physically reasonable; these generally show reasonable agreement with net all-wave radiation and turbulent heat flux densities and other near-surface observations. Although improvements of surface fluxes and near-surface meteorological quantities may possibly be gained from using alternative parameters or parameterization schemes (e.g., microphysics, radiation) when configuring WRF, the interplay of these make attribution difficult. Furthermore, errors may arise from differences between observational source area (e.g., eddy covariance) and model grid length; parameter specification and uncertainties (e.g., from lack of availability, difficulty of "measurement" and theoretical understanding). This highlights the importance of undertaking both integration tests and meteorological evaluations.
Scientific reproducibility is addressed by providing model source code, configurations, data, and scripts with a Singularity image deposited on Zenodo. The coupled WRF-TEB model has been integrated into WRF and WRF-CMake and released as a free, open-source software on GitHub at https://github.com/tebmodel/wrf-teb.
We encourage future versions of WRF to include the implementation of a flexible number (i.e., beyond three) of urban classes to allow for a greater heterogeneity of urban form and function to be represented.

Appendix A: Evaluation Metrics
The vector of differences d = (d 1 ,…,d i ) between two vectors x a and x b of paired quantities x a,i and x b,i is defined as The mean bias error (MBE), mean-absolute error (MAE), and normalized root-mean-square error (NRMSE) for a time series of times 1,…,N are defined as where x a is the arithmetic mean of x a and the RMSE is defined as

Data Availability Statement
Data for this paper are available in Meyer (2020).