The reconstruction of cities is a topic of significant intellectual and commercial interest. It is therefore no surprise that this research area has received significant attention over time. Despite the high volume of existing work, there are many unsolved problems, especially when it comes to the development of fully automatic algorithms.
Urban reconstruction is an exciting area of research with several applications that benefit from reconstructed three-dimensional (3D) urban models:
- In the entertainment industry, the storyline of several movies and computer games takes place in real cities. In order to make these cities believable at least some part of the models are obtained by urban reconstruction.
- Digital mapping for mobile devices, cars and desktop computers requires 2D and 3D urban models. Examples of such applications are Google Earth and Microsoft Bing Maps.
- Urban planning in a broad sense relies on urban reconstruction to obtain the current state of the urban environment. This forms the basis for developing future plans or to judge new plans in the context of the existing environment.
- Training and simulation applications for emergency management, civil protection, disaster control, driving, flying and security benefit from virtual urban worlds.
Urban habitats consist of many objects, such cars, streets, parks, traffic signs, vegetation and buildings. In this paper, we focus on the reconstruction of 3D geometric models of urban areas, individual buildings and façades.
Most papers mentioned in this survey were published in computer graphics, computer vision and photogrammetry and remote sensing. There are multiple other fields that contain interesting publications relevant to urban reconstruction, for example, machine learning, computer aided design, geo-sciences, mobile-technology, architecture, civil engineering and electrical engineering. Our emphasis is the geometric reconstruction and we do not discuss aspects, like the construction of hardware and sensors, details of data acquisition processes and particular applications of urban models.
We also exclude procedural modelling, which has been covered in a recent survey by Vanegas et al. [VAW*10]. Procedural modelling is an elegant and fast way to generate huge, complex and realistically looking urban sites, but due to its generative nature it is not well suited for exact reconstruction of existing architecture. It can also be referred to as forward procedural modelling. Nevertheless, in this survey we do address its counterpart, called inverse procedural modelling (Section 'Inverse procedural modelling'), in addition to other urban reconstruction topics.
We also omit manual modelling, even if it is probably still the most widely applied form of reconstruction in many architectural and engineering bureaus. From a scientific point of view, the manual modelling pipeline is well researched. An interesting overview of methods for the generation of polygonal 3D models from CAD-plans has been presented by Yin et al. [YWR09].
In order to allow unexperienced computer graphics researchers to step into the field of 3D reconstruction, we provide a slightly more detailed description of the fundamentals of stereo vision in Section 'Point Clouds & Cameras'. We omit concepts like the trifocal tensor or details of multi-view vision. Instead, we refer to the referenced papers and textbooks, for example, by Hartley and Zisserman [HZ04], Moons et al. [MvGV09] and recently by Szeliski [Sze11]. Due to the enormous range of the literature, our report is designed to provide a broad overview rather than a tutorial.
1.3. Input data
There are various types of possible input data that is suitable as a source for urban reconstruction algorithms. In this survey, we focus on methods which utilise imagery and Light Detection and Ranging scans (LiDAR).
Imagery is perhaps the most obvious input source. Common images acquired from the ground have the advantage of being very easy to obtain, store and exchange. Nowadays, an estimated tens of billions of photos are taken worldwide each year, which results in hundreds of petabytes of data. Many are uploaded and exchanged over the Internet, and furthermore, many of them depict urban sites. In various projects this information has been recognised as a valuable source for large scale urban reconstruction [SSS06, IZB07, ASSS10, FFGG*10]. Aerial and satellite imagery, on the other hand, for many years was restricted to the professional sector of the photogrammetry and remote sensing community. Only in the recent decade, this kind of input data has become more easily available, especially due to the advances of Web-mapping projects, like Google Maps and Bing Maps, and was successfully utilised for reconstruction [VAW*10].
Another type of input that is excellently suitable for urban reconstruction is LiDAR data. It typically utilises laser light which is projected on surfaces and its reflected backscattering is captured, where structure is determined trough the time-of-flight principle [CW11]. It delivers semi-dense 3D point-clouds which are fairly precise, especially for long distance acquisition. Although scanning devices are expensive and still not available for mass markets, scanning technology is frequently used by land surveying offices or civil engineering bureaus. Many recent algorithms rely on input from LiDAR, both terrestrial and aerial.
Furthermore, some approaches incorporate both data types in order to combine their complementary strengths: imagery is inherently a 2D source of extremely high resolution and density, but view depended and lacking depth information. A laser-scan is inherently a 3D source of semi-regular and semi-dense structure, but often incomplete and noisy. Combining both inputs promises to introduce more insights into the reconstruction process [LCOZ*11].
Finally, both types can be acquired from the ground or from the air (cf. Figure 1), providing a source for varying levels of detail (LOD). The photogrammetry community proposes a predefined standard (OpenGIS) for urban reconstruction LODs [GKCN08] for Geographic Information System (GIS). According to this scheme, airborne data is more suitable for coarse building models reconstruction (LOD1, Section 'Blocks & Cities'), ground based data is more useful for individual buildings (LOD2, Section 'Buildings & Semantics') and façade details (LOD3, Section 'Façades & Images').
1.4.1. Full automation
The goal of most reconstruction approaches is to provide solutions that are as automatic as possible. In practice, full automation turns out to be hard to achieve. The related vision problems quickly result in huge optimisation tasks, where global processes are based on local circumstances, and local processes often depend on global estimates. In other words, the detection of regions of interest is both context dependent (top down), since we expect a well-defined, underlying object and context free (bottom-up), since we do not know the underlying object and want to estimate a model from the data. In fact, this is a paradox and these dependencies can be generally compared to the ‘chicken or egg’ dilemma.
There is no unique solution to this fundamental problem of automatic systems. Most approaches try to find a balance between these constraints, for instance, they try to combine two or more passes over the data, or eventually to incorporate the human user in order to provide some necessary cues.
1.4.2. Quality and scalability
An additional price to pay for automation is often the loss of quality. From the point of view of interactive computer graphics, the quality of solutions of pure computer vision algorithms is quite low, while especially for high-quality productions like the movie industry, the expected standard of the models is very high. In such situations, the remedy is either pure manual modelling or at least manual quality control over the data. The downside of this approach is its poor scalability: human interaction does not scale well with huge amounts of input data.
For these reasons, many recent approaches employ compromise solutions that cast the problem in such a way that both the user and the machine can focus on tasks which are easy to solve for each of them. Simplified user interaction that can be performed even by unskilled users often provides the quantum of knowledge that is needed to break out from the mentioned dilemma.
1.4.3. Acquisition constraints
Other problems that occur in practice are due to the limitations given during the data acquisition process.
For example, it is often difficult to acquire coherent and complete data of urban environments. Buildings are often located in narrow streets surrounded by other buildings and other obstructions, thus photographs, videos or scans from certain positions may be impossible to obtain, neither from the ground nor from the air. The second common handicap is the problem of unwanted objects in front of the buildings, such as vegetation, street signs, vehicles and pedestrians. Finally, there are obstacles like glass surfaces which are problematic to acquire with laser-scans. Photographs of glass are also difficult to process due to many reflections. Lighting conditions, for example, direct sunshine or shadows, influence the acquisition as well, thus, recovery of visual information that has been lost through such obstructions is also one of the challenges.
A common remedy is to make multiple overlapping acquisition passes and to combine or to compare them. However, in any case post-processing is required.
It is a difficult task to classify all the existing reconstruction approaches, since they can be differentiated by several properties, such as input data type, level of detail, amount of automation or output data. Some methods are bottom–up, some are top–down and some combine both approaches.
In this paper, we propose an output-based ordering of the presented approaches. This ordering helps us to sequentially explain important concepts of the field, building one on top of another; but note that this is not always strictly possible, since many approaches combine multiple methodologies and data types.
Another advantage of this ordering is that we can specify the expected representation of the actual outcome for each section. Figure 2 depicts the main categories that we handle. In this paper, the term modelling is generally used for interactive methods, and the term reconstruction for automatic ones.
- Point Clouds & Cameras. Image-based stereo systems have reached a rather mature state and often serve as preprocessing stages for many other methods since they provide quite accurate camera parameters. Many other methods, even the interactive ones which we present in later sections, rely on this module as a starting point for further computations. For this reason we first introduce the Fundamentals of Stereo Vision in Section 'Fundamentals of stereo vision'. Then, in Section 'Structure from motion', we provide the key concepts of image-based automatic Structure from Motion methodology, and in Section 'Multi-view stereo', we discuss Multi-View Stereo approaches.
- Buildings & Semantics. In this section, we introduce a number of concepts that aim at the reconstruction of individual buildings. We start in Section 'Image-based modelling' with Image-Based Modelling approaches. Here, we present a variety of concepts based on photogrammetry and adapted for automatic as well as for interactive use. In Section 'LiDAR-based modelling', we introduce concepts of interactive LiDAR-Based Modelling aiming at reconstruction of buildings from laser-scan point clouds. In Section 'Inverse procedural modelling', we describe the concept of Inverse Procedural Modelling.
- Façades & Images. We handle the façade topic explicitly because it is of particular importance in our domain of modelling urban areas. In Section 'Façade imagery', we handle generation of panoramas and textures from Façade Imagery. In Section 'Façade decomposition', we introduce various concepts for Façade Decomposition that aim at segmenting façades into elements such as doors, windows, and other domain-specific features, detection of symmetry and repetitive elements, and higher-order model fitting. In Section 'Façade modelling', we introduce concepts which aim at interactive Façade Modelling, such as subdivision into highly detailed sub-elements.
- Blocks & Cities. In this section, we discuss automatic reconstruction of models of large areas or whole cities. Such systems often use multiple input data types, like aerial images and LiDAR. We first mention methods performing Ground Reconstruction in Section 'Ground-based reconstruction'. In Section 'Aerial reconstruction', we focus on Aerial Reconstruction from aerial imagery, LiDAR or hybrids, and finally, in Section 'Massive city reconstruction', we discuss methods which aim at automatic Massive City Reconstruction of large urban areas.
In the remainder of this paper we review those categories.