Depth camera indoor mapping for 3D virtual radio play

By combining game engines with indoor mapping, it is possible to create interactive virtual environments that represent the real world. In this paper a virtual version of an audio installation in a historic building is produced, where the user freely explores the building and experiences a set of audio clips, creating a virtual radio play. A depth camera indoor mapping system, Matterport, captured a set of staged interiors. The aim was to evaluate the quality and usability of the indoor models and to demonstrate applying them in 3D application development. The quality of the models was evaluated by comparison with laser scanning, revealing limitations with Matterport: increasing the size of the measured area lowered the level of detail and accuracy of volumetric models. The quality of textures was not affected by this limitation, maintaining the appearance of models. To obtain optimised models for mobile 3D applications, a manual revision process was applied.

Introduction STORYTELLING AND EXPERIENCING NARRATIVES is a really fundamental cultural action, serving social, cognitive, emotional and expressive functions. While narratives are a mainstay in artistic genres, such as literature, theatre and cinema, they have also been explored in the context of interactive virtual systems (Roussou, 2001). Psychologically, the use of virtual environments in simulation relies on the assumption that the virtual environments triggering our senses (Allen and Madden, 1985) are perceived, and responded to, in the same manner as the real world (Arthur et al., 1997). Therefore, increasing the visual resemblance of the virtual environment should also increase the resemblance of the reaction triggered by the virtual experience. Consequently, the use of a virtual 3D environment as a pedagogical platform and interface between human sensory inputs and behavioural outputs may intensify both mental and physical learning processes (McGann, 2015). However, it has also been found that the presence and quality of a narrative plays a significant role in the depth of the virtual experience (Gorini et al., 2011). The immersion in virtual environments appears to be formed as a combination of sensory stimulation and narrative. Game engines (for example, Trenholme and Smith, 2008) have been extensively used to create interactive virtual environments. Application areas include gaming and entertainment, urban planning (Herwig and Paar, 2002;Greenwood et al., 2009), landscape visualisation (Manyoky et al., 2014), construction (Shiratuddin and Thabet, 2011) and cultural heritage (Anderson et al., 2010). The term serious gaming has been used with applications containing game-like elements, such as tasked interaction (Bellotti et al., 2012); one of the most common applications being education (Arhippainen et al., 2011). Virtual 3D environments can mimic the real world, may be completely fictive or mix reality with fiction. Virtual environments used for simulation are typically realistic (for example, Heinrichs et al., 2008), while collaboration and education tasks have also been performed in fictive environments (for example, Arhippainen et al., 2011). In cultural heritage, virtual environments may combine current environments with historic reconstructions (Ogleby, 1999). Alavesa et al. (2014) also combine real and fictive elements in facilitating virtual storytelling.
Digital geometric models are needed for virtual environments. In the built environment, these may either be interiors or outdoor environments (Fritsch and Kada, 2004). In this area of interest, several methods have been developed to convert geographical information system (GIS) data  and building models (Bille et al., 2014) into formats compatible with game engines. This allows creation of a virtual environment for existing data sources. Digital geometric models can also be made specifically for the virtual application. Alatalo et al. (2016) present the creation of a photorealistic city model as a platform for multi-user interaction, made by manual modelling. The problem is that accurate manual modelling of large environments requires lots of resources (for example, Fassi et al., 2011). VIRTANEN et al. Depth camera indoor mapping for 3D virtual radio play An alternative approach is to employ 3D measuring methods and automated reconstruction algorithms (Zhu et al., 2011). Accurate 3D reconstructions help to increase the accessibility of poorly reachable sites by producing a digital counterpart easily distributed over the Internet (Remondino et al., 2010;Schmidt et al., 2011). Indoor 3D mapping can be accomplished using several different techniques, including photogrammetry (Shao et al., 2015), depth camera (Henry et al., 2012) and laser scanning (Lee et al., 2013). While applications of virtual environments have been reported for several disciplines, for example, cultural heritage (Ogleby, 1999;Portales et al., 2009;Anderson et al., 2010), architecture and construction (Herwig and Paar, 2002;Fassi et al., 2011) and medical training (Heinrichs et al., 2008), the 3D models used have typically been produced by combining manual modelling techniques with 3D data (Fassi et al., 2011) or 3D reconstruction has been applied for producing partial models of larger targets (Portales et al., 2009). Fewer papers exist for utilising automated 3D mapping for producing virtual environments from large indoor settings.
The aims of this article are: first, to evaluate the quality and usability of geometric indoor models produced with an indoor mapping system based on a depth camera; and second, to demonstrate a process for applying these models in mobile-and browser-based 3D application development.

Creating a Virtual Radio Play
In this paper, interdisciplinary cooperation is presented between researchers and the staff of the radio theatre of Svenska Yle, the Swedish language unit of Yle, the Finnish Broadcasting Company. Yle's tasks as an organisation include the creation of media productions utilising the possibilities offered by new technology. Consequently, Yle is increasingly carrying over its content and media offer to the Internet. The changes in media platforms used by consumers, especially the growing prevalence of mobile devices, have improved Yle's opportunities for also reaching small specialist groups. The aim of the cooperation was to combine online 3D platforms and indoor mapping technology with radio theatre content. A planned audio installation was used as a prototype case.
The production, entitled "Sounds from a room of souls" (in Swedish "R€ oster ur sj€ alarnas rum"), was staged in a historic mental institution building in Helsinki (Finland). Being one the first buildings in Europe designed specifically to operate as a mental hospital, the neoclassical edifice was designed in the 1830s by Carl Ludwig Engel, with the main building completed in 1841. The hospital operated in the premises for over 160 years, after which the building has been adopted for other uses. Yle had plans to realise a staged audio journey in the facilities during the Night of Arts event in Helsinki in 2016. A set of interiors was to be staged as historical mental hospital facilities and installed with a sound system to play short audio stories describing the history of the building. The scenes of the play were based on documentary material, such as the journals of the Chief Physician, Anders Thiodolf Saelan, from the 19th century, as well as other documents from various eras (Kajander-Maavuori, 2016;Karjalainen, 2016;Rothberg, 2016).
To increase the accessibility and temporal duration of the installation (the event was to be open for only two days), a virtual equivalent was also produced. This consisted of 3D models of the staged interiors with audio clips in a game engine. The main focus was in the design and creation of the audio content, with the 3D environment used to support exploration and experience of the sound clips. This allowed the user to experience the audio stories while virtually exploring the environment, creating a so-called "virtual radio play". For creating a virtual representation of the staged environment, indoor mapping was to be applied.

Image-based Reconstruction of Indoor Environments
Photogrammetry can be used to reconstruct both indoor and outdoor environments. Manual photogrammetric reconstructions, such as the iWitness approach of Wendt and Fraser (2007), are labour intensive. Due to the amount of manual work involved, they are not feasible for large and complex environments. Recent studies (Furukawa et al., 2009b;Georgantas et al., 2012;Xiao and Furukawa, 2012) show that automatic techniques, such as dense image matching, are suitable provided that the environment supports automatic feature matching, for example, using the scale-invariant feature transform (SIFT; Lowe, 1999). However, these automated methods are sensitive to errors caused by reflective or textureless surfaces (Furukawa et al., 2009a;Jancosek and Pajdla, 2011;Lehtola et al., 2014), even if the camera is pre-calibrated in a laboratory (Fig. 1). In addition, geometric constraints, such as narrow doorways, reduce achievable geometric orientation accuracy of the rooms to the common frame.
Usually, indoor environments have been reconstructed using an image sequence taken with a camera with a wide-angle rectilinear (non-fisheye) lens. To avoid the problems of external orientation, spherical panoramic images can also be applied if the projection centre of the camera is located at the rotation centre of the camera mount (Haggr en et al., 2004;Kauhanen et al., 2016). It is also possible to generate dense point clouds and textured mesh models from spherical panoramic images (Pagani and Stricker, 2011;Gava and Stricker, 2016). Panoramic image sequences can also be used to produce photorealistic virtual environments like Google Street View (Anguelov et al., 2010). In such cases, geometric information is needed for solving the relative positions of the panoramic images. This approach does not allow completely free movement of the user, as the images are only available from a limited number of positions. Increasing the number of image acquisition points allows moving with finer steps, but also adds to capturing work.

Laser Scanning of Indoor Environments
Laser scanning is employed either from a static instrument (terrestrial laser scanning; TLS) or from a moving platform (mobile laser scanning; MLS) (Thomson et al., 2013; Kaijaluoto and Hyypp€ a, 2015). In outdoor environments, most MLSs rely on global navigation satellite systems (GNSS). As these are not available indoors, other localisation methods are required (Lehtola et al., 2015). The concept of simultaneous localisation and mapping (SLAM) is increasingly applied in indoor MLS. Commercial indoor MLSs have emerged, ranging from light, highly portable devices to cart-mounted devices intended for highly efficient measuring of large indoor environments. Table I gives an overview of three different laser scanning systems, including a conventional TLS and two indoor MLSs. Laser scanning systems typically produce dense 3D point clouds. As most systems operate only using a single laser wavelength, imaging sensors have to be applied if coloured point clouds are to be obtained. Several laser scanning systems include integrated cameras for this purpose. For producing textured mesh models from laser scanning point clouds, additional processing is required. Some commercial services have emerged offering mesh model production from laser scanning point clouds (Autodesk, 2017;Sequoia, 2017), but a dense, coloured point cloud remains the most common end product of laser scanning.

Depth Camera Indoor Mapping Systems
Emergence of highly affordable consumer depth cameras, such as the Microsoft Kinect, stimulated a lot of research for applying depth cameras for 3D modelling of indoor environments, especially for robot navigation applications (Henry et al., 2012). Depth camera systems do not, ideally, operate outdoors as sunlight interferes with their sensors. The geometric quality of depth cameras has been studied and the results show that the random error of depth measurement ranges from a few millimetres up to about four centimetres (Khoshelham and Elberink, 2012). Following this, several commercial products utilising depth camera sensors have been available (DPI-8, 2017;Structure, 2017). In addition, depth camera sensors integrated into smartphones have been introduced (Tango, 2017). For a performance comparison of handheld systems ranging from consumer grade to professional instruments, see Kersten et al. (2016). Due to their limited field of view, single depth sensors are not ideal for mapping large indoor environments (Diakit e and Zlatanova, 2016). For a more comprehensive 3D acquisition, it is possible to combine data from several sensors (Kim et al., 2008).
Matterport System. As a highly integrated commercial indoor mapping system, Matterport combines a 3D measuring instrument with dedicated cloud-based services. One of the intended applications of the system is efficient production of indoor marketing material for real-estate applications. To accomplish this, the system combines three depth camera sensors with red/green/blue (RGB) cameras. Primary products of the system are panoramic image walkthroughs, produced with the RGB cameras. In addition, a 3D model of the environment is produced, facilitating localisation of panoramic imaging locations and rendering an overview of the environment.
In operation, the measuring speed of Matterport is approximately 30 seconds per scan; the recommenced scanning location placement is every 1Á5 to 2Á5 m. The weight of the scanner is about 3 kg, excluding the obligatory tripod. The processing of scans in Matterport is performed with a high degree of automation utilising a cloud computing service associated with the instrument (Matterport, 2017a). An optimised alignment of the scans is solved and a textured mesh model created (Bell et al., 2013) once the scans have been uploaded to the service. The dimensional accuracy of the final model is estimated to be 1% (Matterport, 2017a). After processing, the 3D model with its texture images can be downloaded from the cloud.
While research exists comparing the performance of various depth-camera-based instruments (Kersten et al., 2016), the application of mobile devices with a depth camera for indoor mapping (Diakit e and Zlatanova, 2016) and development of depth camera indoor mapping systems (Henry et al., 2012), very little research exists on the performance or application of the Matterport system.

Unity 5 Game Engine
The Unity 5 game engine (Unity, 2017a) was used in the project for developing the final application. It offers a comprehensive environment for development of interactive 3D applications for several platforms, including browser-based applications with WebGL and mobile applications, for example, on Android platforms.
For 3D content, Unity utilises mesh models that can be imported into the Unity editor in a number of formats (Unity, 2017b). Extensions that allow Unity to render point clouds have been developed (Unity, 2017d), but mesh models remain its core 3D content format. Ideally, the mesh models should be of limited vertex count, have a metric scale, contain UV coordinates for texture mapping and with 2D texture images of limited resolution (Unity, 2017b).
For developing mobile or browser applications, more attention should be given to optimising the applications in terms of memory use and performance (Unity, 2017c). This further reduces the usability of high-resolution bitmaps and high-polygon-count meshes, as well as possible shader complexity in application development.
In the presented development case, a memory limit of 2048 Mb and the exclusion of advanced shader programs were imposed to allow sufficient performance on lower-end platforms. The texture files used were resampled to a resolution of 1024 9 1024 pixels, audio files were recompressed and converted to mono, and only diffuse shaders were used in the rendering.

Indoor Mapping with Matterport
Characteristics of the historic building had to be taken into account when planning for the indoor mapping. First, no GNSS coverage was available inside the building. Nor was it possible to mount any permanent reference targets to the interior. As the interiors had to be measured as staged, the use of moveable targets was also considered problematic. This meant that the measuring method chosen had to operate using merely the features of the indoor environment for localisation. Second, the building contained a number of stairwells, some of which were quite narrow. A highly portable instrument was therefore needed. Third, the scanning was to be performed in a limited time frame, working in cooperation with the Yle scenographers. As it was impossible to empty the entire building for the survey at one time, the measuring was to be done in a number of sessions.
Out of available indoor mapping systems, the depth-camera-based Matterport, along with its associated cloud service, offered a high degree of automation for modelling combined with a portable instrument capable of mapping multi-room indoor environments. MLS and TLS were considered to offer too low a degree of automation in model processing compared with a depth camera. TLS was also considered too laborious for mapping large indoor environments with low occlusions. As for photogrammetry, the environment contained several featureless and poorly lit areas, making image-based acquisition difficult.
Indoor mapping of the building was performed with the Matterport instrument over the course of three days (Fig. 2). The scanned interiors included a number of smaller rooms (maximum size of~20 m 2 ), a large corridor complex with a length of tens of metres and a number of stairwells. Five individual rooms were scanned twice, alternating the number of scanning stations in the room on each occasion (Table II). In one particular space where a room was scanned twice, the position of a single bed in the room was shifted from one side of the room to the other between scans. This was done as the room was to be staged having two beds, but only one was available at the time. The scans were partially overlapping: some of the individual rooms were scanned together with the adjacent corridor and later on scanned separately. In a similar fashion, the scans of the stairwells also contained some segments of the corridors. This was done to obtain the necessary degree of overlapping detail, for example, from door frames, for combining the data from several rooms in further processing. The projective top area of models was estimated by projecting all the mesh vertices onto the XY plane, performing 2D Delaunay triangulation and then calculating the area of the resulting mesh. The maximum edge length used in triangulation was 0Á1 m, except for the models marked with an asterisk, where a length of 1 m was used, as the point clouds were too sparse to be triangulated with a shorter maximum edge length. It roughly resembles the floor area in single-storey interiors.

Terrestrial Laser Scanning
TLS was performed with a Faro Focus3D S120 scanner with 905 nm wavelength and having a ranging error of AE2 mm at 10 (90% reflectivity) and 25 m (10% reflectivity) (Fig. 3). The evaluation data was acquired with two resolution settings, providing point spacing of 6Á3 and 12Á3 mm at a distance of 10 m from the scanner. The scanner also captures panoramic images in order to provide colour for the 3D measurement points. In total, 45 scans were conducted. The principle for gathering the scanning data was that every room had one to three scans depending on the degree of occlusion. The rooms were also scanned from beyond the doorways to ensure that rooms and adjacent corridors could be registered together using sufficient mutual data overlap. The long corridors were measured by setting up a station every 5 m. The scans were matched together using visual alignment and an automatic cloud-to-cloud method provided by the Faro Scene software (Version 6 Á 0).

Evaluation of Geometric Deviations of Indoor Models
Models produced with the Matterport instrument were compared with the TLS point clouds using CloudCompare (64 bit version 2 Á 6 Á 1). First, the point clouds produced by co-registering the individual scans were downsampled to produce a homogenous point density with a 1 cm point spacing and manually segmented to remove reflections from windows and other outlier points.
Matterport models were analysed by using the unedited mesh models from the cloud service. As the datasets did not completely overlap, the non-overlapping segments were removed. Data were preliminarily aligned manually and then using iterative closest point (ICP) for farthest-point removal. Three separate sets were then compared: one large corridor environment; one smaller corridor environment; and an individual small room. The comparison was performed by using the "Mesh to cloud" function of the software, in effect searching for pairs of closest mesh triangles and point cloud points and calculating their separation.

Manual Model Refinement
The first stage of processing was performed automatically by the cloud service associated with the instrument. The pre-aligned scans were further registered, triangulated and textured automatically in the cloud service. The resulting mesh models and texture maps were then downloaded and 3D Studio Max was applied for refining the models.
The manual processing began by manual orientation of the different models, using overlapping areas, such as door frames, as a reference. A number of rooms were present in the large scan projects, but had also been scanned as separate projects. In these cases, the separately reconstructed rooms were preferred, as they offered a higher level of detail. The processing then progressed with refining the object structure of the automatically produced models. Issues in detail level and object structure are further illustrated in the Results section.
After this, the modelling efforts progressed room by room. Several steps were taken to improve the visual appearance and technical functionality of the models, using both manual and automatic tools. Non-manifold geometry was manually repaired. Holes left in the mesh were filled, distortions in surface geometry were smoothed out and unnecessary edge points were deleted. Difficulties were encountered in badly occluded areas, for which the geometry had to be manually crafted. This included highly reflective surfaces, such as windows, and geometrically complicated details, like radiators and railings, both of which were poorly reconstructed in the scans. The aim was to accomplish as low a polygon count as possible, whilst maintaining the essential geometry and appearance of the model (Fig. 4). The texture maps were manually edited to remove historically non-correct details, such as post-it notes accidentally left on the walls. The final model used in the application was assembled by combining several separately processed models, combining the individual scans of rooms and corridors into a continuous indoor environment. Some alterations were made to the floor plan of the building to facilitate the storytelling. For example, the isolation room, which was in reality located in the basement, was moved to the first floor. This was done as there were no other spaces in the basement required for the story, and the route to there would have been very complex and required expanding the scanning campaign to a number of stairwells and corridors. In total, the final model consisted of 292 000 polygons.

Audio Production and Set-up
In total 42 audio clips, resulting in a total play length of approximately 50 minutes, were produced for the virtual radio play. In the completed clips, 23 actors played a total of 46 scripted roles. In addition, a piano was used as a musical instrument.
For triggering the audio content in the application, the location of the user is utilised: the audio clips are played when the user appears within a defined boundary. In addition, maximum and minimum volume boundaries were defined. The minimum volume boundary defines where the volume begins to gradually increase towards the maximum boundary. Within the maximum boundary, the volume remains constant. For a number of doors that the user can pass through, both location and orientation are used: the user enters when standing next to a door and facing towards it. Audio content was also associated with the user opening (or failing to open, if locked) doors. These audio effects were not triggered by the user location, but by the user clicking, or trying to operate, on the defined objects. The final audio set-up in the virtual radio play, therefore, consisted of a set of audio clips varying in intensity, triggered by the user's presence, and another set of clips triggered by interacting with an object (Fig. 5).
The extents of the triggering boundaries for the audio clips varied greatly in size. In practice, some of the sounds were atmospheric in nature (for example, a piano playing in the background) and some very intensive, located dialogue. Therefore, it was possible for the user to hear several clips at once, depending on the position.

Results
The overall data acquisition and modelling process is summarised in Fig. 6. Three working days were used for staging and 3D scanning of the interiors. In the data acquisition campaign, all of the encountered indoor environments were successfully mapped with the  Matterport system. This was the expected result as the environments did not contain problem features such as large open interiors, areas with only repetitive features, outdoor or brightly sunlit areas, or highly mirroring environments, all of which are reported to potentially cause issues (Matterport, 2017a). As such, the measuring system was efficient, with scanning times for an individual scan being shorter than for most TLS instruments (for example). In smaller spaces, such as individually scanned rooms smaller than 20 m 2 , the limited range of the instrument when compared with laser scanners did not necessitate more measuring positions either. In longer corridors, more measuring positions were needed by Matterport than a typical TLS would have required. The automated reconstruction in the cloud service was initiated immediately following measurement. Accurate processing times were not recorded, but the automatically produced models were in all cases available in the morning following the scanning day. The editing phase for the models consisted of editing the meshes and the texture images separately, and formed the most laborious part of the project, requiring several weeks of working time from an experienced modeller. A model revision phase was required for combining separately processed models and optimising the models for mobile 3D. Afterwards, the Unity 5 game engine was used to program the final application, published both as online content and mobile software. In practise the application development and model editing partially overlapped, in total requiring approximately seven weeks. The model editing work was estimated to have taken roughly 40% of the working time during this stage, with the remaining time spent on programming the application (about 30%) and debugging and revising the application according to feedback from the project group (about 30%).
In the application development phase, a user interface was designed and implemented for the application. The main focus was in maintaining a simple user interface and not to obstruct the actual content. Movement and camera control were realised following the conventions of mobile-and browser-based 3D games. The user moves through the building in a first-person view (Fig. 7). A 2D map function was also implemented, allowing the user to jump directly to a specified space or location (Fig. 8). The resulting application was published online both as a WebGL version and as a downloadable Android application for mobile users.
In total, during the two-day event, approximately 3000 people visited the installation on site. The produced game engine version received 813 unique visitors during the first two weeks following its publication. During the following six months, the number of unique visits had increased to 1001, with a total of 1663 downloads.

Geometric Deviations and Detail Level of Models
When the Matterport models were compared with the TLS point clouds, several deviations were observed (Fig. 9). In general, it appears that Matterport produces models that are slightly too large, with this being most visible in the largest models. This produces significant mismatches in larger datasets, when the models can no longer be fully aligned.
Most significant local deviations were found in small details visible in the TLS clouds but not reproduced by the mesh generation algorithm applied in the Matterport automated processing including, for example, houseplants, small objects and radiator installations (Fig. 9). There was also a minor corner rounding effect visible in models.
The aforementioned issue was significantly more severe when the models covered a larger area. When the measured target zone is smaller, the Matterport system produced more detailed meshes. On the other hand, with larger entities the mesh quality decreases. This is clearly visible when comparing the reconstruction of details in large target areas (such as a corridor with several associated rooms) and a scan of an individual small room (Fig. 10).  In a large corridor combined dataset, there is also a noticeable difference between the component datasets. For example, in Fig. 9 observe the deviations on the ceiling surface, comparing the middle of the corridor with its ends. The corridor appears slightly bent vertically. This was investigated further by segmenting the floor surface from both TLS and Matterport, and performing a planar fitting with consecutive distance evaluation. From the results (Fig. 11) it can be observed that, for the long corridor dataset, the TLS point cloud reproduces the floor in a significantly more planar form than that found in Matterport data.
To study this issue further, the planarity of the floor surface was evaluated from an individual TLS scan. This was to eliminate the possibility of a cumulative co-registration error in the TLS. In the Matterport cloud, a fairly smooth descent of approximately 5 cm can be seen from one end of the corridor to the other (area marked in Fig 11(a)). When viewed as a single scan, some variations in the floor can be observed (Fig. 11(b)), but they generally fall in the AE2 cm range and do not display the same trend.

Effects Induced by Model Processing
The model optimisation process described earlier had a significant impact on the computational size of the generated models, as intended. Fig. 12 provides the vertex and  polygon counts for a set of four models in their original state (as downloaded from the cloud) and after optimisation/editing (before being transferred to the game engine). In addition, the number of neighbouring points has been calculated for the vertices, using a search radius of 0Á1 m. The maximum and minimum values are given, along with the median for the mesh. The impact of processing is clearly visible in the smaller rooms. Fig. 13 shows the mesh vertices of a room coloured according to the number of neighbouring points. In the original model, the density remains relatively homogenous over large planar surfaces, such as walls. In the processing, the vast majority of vertexes are removed from these planar surfaces. The vertices that remain relate to edges and areas with complex curvature, such as furniture details. Fig. 11. Comparison of floor surface deviations from a plane for TLS (a, above) and Matterport (a, below). For one end of the corridor, analysis was also performed for a single TLS scan (b); note the different scale. Manual editing also affected the object structure of models. The automated reconstruction created several texture maps for each project and divides the mesh into corresponding objects. Using this structure directly would have created a scene with a large number of separate objects, resulting in a large number of draw calls for the graphics processing unit (GPU), thereby reducing performance. Additionally, in larger spaces, some of the automatically built meshes were large, which would have caused issues for optimisation by occlusion culling and frustum culling (removing models beyond the viewing area), as bounding-box dimensions are used in determining whether objects are rendered. The object structure of models was revised, combining meshes from individual rooms and splitting meshes from large corridor spaces (Fig. 14). The goal was to maintain a sufficiently Fig. 13. Vertex points of an individual hospital room (data for the ceiling and two walls removed for illustration purposes), coloured according to the number of neighbouring points before (left) and after (right) processing.
Fig. 14. Left: the object structure of an individual room created by Matterport system. Right: the revised object structure. Colours for objects have been assigned randomly. low number of meshes to reduce draw calls, but simultaneously keep the extents of individual meshes limited, so that they are properly removed from rendering by occlusion culling and frustum culling.

Discussion
The ability of the measuring system to automatically produce textured mesh models of indoor environments was a critical factor in the work presented and also the reason for applying the specific system. While the mesh models were, technically, directly applicable in the game engine content production, they did require a significant amount of manual editing.
The amount of work in the manual editing exceeded, by far, the hours required for performing the measuring or developing the application, undermining the benefits of automated reconstruction offered. The need to optimise the final applications for smooth operation on low-end platforms increased the need for manual optimisation of the polygon geometry and object structure. Had the end product only been intended for high-end PCs, the amount of manual work would have been reduced. The estimation of working times was further complicated by the number of changes to the final application that arose from the project team during the application development and model revision phase.
When discussing the quality of models produced with the depth camera system, it should be noted that the primary intended use of the system was the production of panoramic image walkthroughs, with the 3D model being somewhat of a side product. Further, the experiments were carried out with the first production version of Matterport. The manufacturer has since released a new version of the instrument (Matterport, 2017b) and made new features available as a beta version in the cloud service (Matterport, 2017a).
In this work, analysis of the automatically produced models revealed several issues. First, the models have significantly varying polygon counts and density. If one compares the polygon densities (polygons/mesh surface area), it can be seen that it is low (23 polygons/m 2 ) for the large corridor dataset, improved in smaller corridors on the ground floor (355 polygons/m 2 ) and high for small rooms (for example, 2211 polygons/m 2 in hospital room 1) (Table II). This seems to correlate with the size of the measured area (Fig. 15). This  (Fig. 10). In large datasets, the system's ability to represent small geometric details was greatly reduced. Adding measuring positions did not have a significant influence on the polygon count of the model produced (in hospital room 5, a model with four scans actually had more polygons than when using 10 scan positions). This would imply that, for obtaining detailed models, the size of the measuring area should be kept as small as possible to preserve as high a level of detail as possible. Since the model fits almost identically to a mathematical model of the form y = 1/x 2 (Fig. 15), the limitation is obviously coming from Matterport memory allocation in the cloud service for a given object volume. The phenomenon is assumed to be more dramatic if the studied object has large variability in three dimensions, not just in two dimensions.
The limitations of the system are also acknowledged by the manufacturer, stating a relative 1% accuracy for the mesh models. In this project, for the smaller hospital rooms (bounding-box corner-to-corner distance below 10 m), a 10 cm accuracy could be expected. For the large corridor dataset (corner-to-corner distance 65Á7 m), this would imply an accuracy of 0Á65 m. The observed mismatches between the TLS data and Matterport fell within these specified limitations (Fig. 11).
As the models used in game engine content production also commonly feature image textures, the geometric properties of the models are not the only factors affecting their appearance. The Matterport system automatically acquires photographic textures for the models and maps them to the mesh model. As seen in Table II, the models contained a varying number of texture maps, having a size of 2048 by 2048 pixels. Each texture map therefore contains over 4 megapixels of image data (as there are some gaps in texture maps; the actual amount of information is smaller). The number of texture pixels per model surface area for the large corridor was actually higher (194 831 pixels/m 2 ) than for the smaller corridor (167 623 pixels/m 2 ) or for the smallest room (166 276 pixels/m 2 ). Even though the degree of detail decreased for the model geometry, the texture resolution did not decrease with area size.
For content production, these observations imply that Matterport is suited for producing 3D overviews of indoor environments with a high degree of automation. For well-optimised game engine models, further manual steps are required. More detailed models, that display smaller features, are only produced when the measured areas are small enough. However, these details remain in the texturing. The benefits and disadvantages of measuring and the automated reconstruction process applied are summarised in Table III. Alternative model production techniques for this case might have been the application of surface reconstruction algorithms from dense point clouds obtained with TLS (for example, Lorensen and Cline, 1987;Kazhdan and Hoppe, 2013). Some commercial services offering textured mesh model generation from point cloud datasets are already available (Autodesk, 2017;Sequoia, 2017). By applying these, the production of game engine compatible mesh models from TLS data might have been possible. In this respect, one of the limitations of the applied system is the unavailability of raw data. In Matterport, Table III. Summary of benefits and disadvantages of the system applied for indoor measurements.

Benefits Disadvantages
Measuring work possible by single operator Models had significant variation in density By automated reconstruction the initial models were attained with very little manual work No access to raw measuring data for user Results of automated reconstruction were compatible with game engine content production workflow Extensive manual refinement of models was still required downloading anything other than the automatically produced mesh models has so far not been possible. This feature is currently in the beta version (Matterport, 2017a). The availability of co-registered point clouds would significantly increase the utility of the instrument with other workflows and potentially help reduce the issues encountered with automatically generated mesh models. An additional benefit of applying the Matterport system is that it also produces a panoramic image set automatically. While this was not used in the presented case, it nevertheless is an easily comprehensible presentation of the spaces as they were at the time of imaging. As such, the system can be used to efficiently document temporary indoor setups, such as the staged interiors in this case.

Impact of Virtual Radio Play
Using a game engine as a platform, a virtual version of the audio installation was produced. The application was published online alongside the unveiling of the installation. There were some differences between the virtual version and the actual installation. The lighting conditions of the virtual version matched those of the scanning, which was performed in daytime. As the actual event occurred late in the evening, the atmosphere of the building was quite different (Fig. 16). The situation was also different since other visitors were present in the space during the event. In the virtual version, the user is alone in the building.
The presented case also provides an example of applying new technical possibilities in cultural productions. The end result contains elements from computer gaming, installation and radio theatre. A virtual environment can be used to produce spatial experiences and multisensory works. This has an influence on the traditional concepts of installations and

Virtual and Augmented Reality Technologies
Virtual reality (VR) offers, by definition, any kind of experience that enables an effective immersion for its user in a responsive virtual world (Brooks, 1999). It is a medium with the essential goal of assuring you that you are somewhere else (Parisi, 2015). The user can interact with the virtual environment analogously to the real world (Mazuryk and Gervautz, 1996). Thus, VR has an enormous potential to revolutionise the way people experience virtual spaces. The crucial enabling technologies for VR are: (1) 3D displays; (2) motion tracking; (3) input devices; and (4) software frameworks and development tools that allow real-time graphics rendering of 3D virtual worlds (Brooks, 1999;Parisi, 2015). VR has numerous applications in gaming and entertainment, virtual worlds, tourism, architecture and real estate, design and planning, education and training, telepresence and cooperative working, among many others (Mazuryk and Gervautz, 1996;Parisi, 2015). Developments of browser-based VR, computer gaming hardware and headsets utilising smartphones are making VR available to the masses. A second significant development direction is the integration of VR and augmented reality (AR) into mixed reality, allowing immersive presentation of virtual elements on top of the real world (Boland and McGill, 2015).
For creating a virtual experience, utilising VR technology would have surely enabled a deeper immersion. As the adoption of VR hardware is still low, this would have limited the accessibility approach, central to producing a virtual version in this case. Utilising VR would have required the development of a user interface allowing moving and interacting with the space, as the current keyboard and mouse/touchscreen user interface is not ideally suited for VR. The sufficiency of models for head-mounted VR displays also remains an open question.
AR mixes virtual elements with the real world. It has been successfully applied in cultural heritage by bringing additional "layers" to historically significant sites (Portal es et al., 2009). In the presented case, location-based technologies and augmented reality could have enabled bringing interactive content to the physical installation. However, this would not have allowed studying the content outside the site.

Digital 3D Models as a Platform
In the literature, digital geometric models are often suggested as a platform for various uses (for example, Alatalo et al., 2016). However, in the project presented here, the focus shifted from mapping and modelling to storytelling. The user does not only explore a model, but an entity composed of literary content (stories) presented as audio, an interaction system (moving, location-based triggers) and the environment (model). The model became subordinate to the story. Further, the storytelling aspect of the project conflicted with the measured model of the interior: some elements had to be removed from the model to maintain the credibility of the narrative and some of the spaces were relocated to make discovery of the story easier. Furthermore, the model used in application development is not only a model of a space, but may contain other elements, such as certain furnishings. In this case, the interiors VIRTANEN et al. Depth camera indoor mapping for 3D virtual radio play were measured as staged, fitted out with a set of furniture and with other items in them. If the application had been built using an empty model of the building, all the staging would have had to be done virtually. The idea of a "general purpose model" that can be applied to a variety of different cases becomes problematic. Even if a pre-existing interior model of the building had been available, it would have required extensive editing to be applicable for this use.

Conclusions
A game engine is suited for producing interactive virtual environments for various purposes. In the presented case, a virtual environment for exploring audio content was created using a game engine and an indoor mapping system, allowing the "virtualisation" of a real environment. In the final product, the user freely explores a building and experiences a set of audio clips creating a narrative. The final application was published as a web-based production and as a downloadable application for Android smartphones and tablets.
For applications that take place in the real world, digital geometric models of the environment are needed. Building these models manually is time consuming. 3D measuring methods and automated reconstruction algorithms offer an alternative to this manual process. The authors applied a combination of depth camera indoor mapping and manual model revision for producing indoor models of a historic hospital building, which were used to develop a game engine application. The indoor mapping system was found to be efficient, exceeding the measuring speed of a TLS in confined interiors.
The automatically reconstructed models produced with the system were found to be usable as a starting point for the game engine content production. However, the quality of the automated reconstruction left plenty of room for improvement. The models were extensively refined by manual processing to produce a model of a complex interior measured in several stages and to achieve sufficient optimisation for mobile 3D applications. The amount of work for optimising the models for game engine use exceeded the effort needed for the indoor mapping or application development.
The applied indoor mapping system, Matterport, was best suited for producing geometrically detailed models of smaller areas. In larger areas, the detail level and accuracy of models decreased, potentially due to fixed memory allocation in the cloud service. The quality of the textures, however, was not affected by this, partially aiding to maintain the appearance of models. In addition, some warping was observed in the model of a long corridor environment. Observed errors remained well within the tolerance stated by the manufacturer. The nature and intended use of the system should be taken into account when discussing these limitations. Objects having large volumetric variation in all three dimensions cannot always be mapped with the required accuracy, even though the number of scans is increased.
Despite the issues found, the applied system, together with the editing process introduced, proved successful in documenting a large indoor environment and producing models suited for mobile 3D applications.