Residential Property Price Indexes: Spatial Coordinates versus Neighbourhood Dummy Variables

The paper addresses the following question: can satisfactory residential property price indexes be constructed using hedonic regression techniques where location effects are modeled using local neighbourhood dummy variables or is it necessary to use spatial coordinates to model location effects. Hill and Scholz (2018) addressed this question and found, using their hedonic regression model, that it was not necessary to use spatial coordinates to obtain satisfactory property price indexes for Sydney. However, their hedonic regression model did not estimate separate land and structure price indexes for residential properties. In order to construct national balance sheet estimates, it is necessary to have separate land and structure price indexes. The present paper addresses the Hill and Scholz question in the context of providing satisfactory residential land price indexes. The spatial coordinate model used in the present paper is a modification of Colwell’s (1998) spatial interpolation method. The modification can be viewed as a general nonparametric method for estimating a function of two variables.


Introduction
It is a difficult task to construct constant quality price indexes for residential (and commercial) properties. Properties with structures on them consist of two main components: the land component and the structure component. The problem is that each property has a unique location (which affects the price of the land component) and given the fact that the same property is not sold in every period, it is difficult to apply the usual matched model methodology when constructing constant quality price indexes. Bailey, Muth and Nourse (1963) developed the repeat sales methodology in an attempt to apply the matched model methodology to the problem of constructing property price indexes but this methodology does not allow for the use of single sales of the same property over the sample period and thus in particular, the sales of properties with new structures do not affect the resulting indexes, which could lead to biased indexes. Moreover, properties with structures on them do not retain the same quality over time due to structure depreciation and renovations or additions to the structure. Thus the matched model methodology for the construction of constant quality price indexes does not work in the property price index context.
A possible solution to the above measurement problem is to use a hedonic regression model approach to the construction of property price indexes. 2 This approach regresses the sale price of a property (or the logarithm of the sale price) on various characteristics of the properties in the sample. An important price determining characteristic of a property is its location. The location of a property can be described by its neighbourhood (a local government area or a postal code) or by its latitude and longitude, the spatial coordinates of the property. Most hedonic property regressions use the former approach to describing the location of a property but in recent years, the availability of spatial coordinate information has grown. Colwell (1998) was an early pioneer in the use of spatial coordinate information in a property price regression and more recently, Hill and Scholz (2018) used spatial coordinates to model Sydney house prices.
The main question that this paper addresses is the following one: can satisfactory residential property price indexes be constructed using hedonic regression techniques where location effects are modeled using local neighbourhood dummy variables or is it necessary to use spatial coordinates to model location effects. Hill and Scholz (2018) addressed this question and found that it was not necessary to use spatial coordinates to obtain satisfactory property price indexes for Sydney. However, their hedonic regression model did not estimate separate land and structure price indexes for residential properties. In order to construct national balance sheet estimates, it is necessary to have separate land and structure price indexes. The present paper addresses the Hill and Scholz question in the context of providing satisfactory residential land price indexes. The spatial coordinate model used in the present paper is a modification of Colwell's (1998) spatial interpolation method. The modification can be viewed as a general nonparametric method for estimating a function of two variables.
A basic building block in Colwell's method is a method of bilinear interpolation over a square that was developed in the mathematics literature. We explain this method in section 2 below.
In section 3, we explain how this bilinear method of interpolation over a square can be extended to a method of interpolation over a grid of squares. We then follow the example of Poirier (1976) and Colwell (1998) and convert the interpolation method into an econometric estimation model. The resulting method will be used in later sections to model the land price of a property as a function of its spatial coordinates.
In section 4, we compare Colwell's spatial coordinate model with the penalized least squares approach used by Hill and Scholz (2018) in their study of Sydney property prices. We note some problems with the Hill and Scholz approach.
Section 5 describes our data on sales of residential properties in Tokyo over the 44 quarters starting in first quarter of 2000 and ending in the last quarter of 2010. We used the same data as we used in Diewert and Shimizu (2015a) on sales of residential houses in Tokyo except the present study added additional data on sales of residential properties with no structures on the land plot.
Section 6 sets out the builder's model approach to hedonic property price regressions. This approach uses the property's sale price as the dependent variable and splits up property value into the sum of the land and structure components. This additive decomposition approach has a long history in the property hedonic regression literature but what is relatively recent is the use of an exogenous construction cost price series to value the structure component of the decomposition. It is this use of an exogenous index that allows us to decompose property value into plausible land and structure components. 3 This section uses both the nonparametric spatial coordinate approach due to Colwell as well as the neighbourhood approach to model the influence of location on land prices. We look at the resulting land price indexes as we increase the size of the grid and we find that there is little change in these land price indexes over a reasonable range of alternative grid sizes. Section 7 adds more characteristics to the model and again looks at how the resulting land prices change as we add more characteristics.
Section 8 compares the overall property price indexes generated by the important models explained in the previous sections (instead of comparing just the land price components of residential property sales). For comparison purposes, we also compared our "best" model results with a "traditional" hedonic property price hedonic regression which regresses the logarithm of property price on a linear combination of the property characteristics and time dummy variables. 4 This traditional approach does not generate reasonable subindexes for land and structures but it can generate reasonable results for an overall property price index. Section 9 concludes. An Appendix contains the results of selected regression models as well as the data underlying the charts in the main text.

Bilinear Interpolation on the Unit Square
Our task in this section is to explain how a particular method of bilinear interpolation works for functions of two variables defined on the unit square. This method of interpolation is a basic building block that can be used to construct a method for approximating a function of two variables that is defined over the unit square. Suppose that f(x,y) is a continuous function of two variables, x and y, where 0  x  1 and 0  y  1. Suppose that f takes on the values  ij at the corners of the unit square; i.e., we have: (1)  00  f(0,0);  10  f(1,0);  01  f(0,1);  11  f(1,1).
Assuming that we know (or can estimate) the heights of the function at the corners of the unit square, we look for an approximating continuous function that satisfies counterparts to equations (1) at the corners of the unit square and is a linear function along the four line segments that make up the boundary of the unit square. Colwell (1998;89) showed that the following quadratic function of x and y, g(x,y), satisfies these requirements: 5 (2) g(x,y)   00 (1x)(1y) +  10 x(1y) +  01 (1x)y +  11 xy. Colwell (1998;89) also showed that g(x,y) is a weighted average of  00 ,  10 ,  01 and  11 for (x,y) belonging to the unit square. 6 In order to gain more insight into the properties of g(x,y), rewrite g(x,y) as follows: (3) g(x,y) =  00 + ( 10   00 )x + ( 01   00 )y + [( 00 +  11 )  ( 01 +  10 )]xy.
In the following section, we will follow the example of Colwell (1998;91) and show how the function g(x,y) defined over the unit square can be extended in order to define a continuous function over a grid of squares. 5 The function g(x,y) defined by (2) is a special case of the bilinear function defined in matrix algebra textbooks such as Mirsky (1955;353). Poirier (1976;61) also defined the counterpart to (2) that is defined over a rectangle. The extension of our algebra to a grid of rectangles is straightforward. 6 It is straightforward to show that the sum of the nonnegative weights (1x)(1y), x(1y), (1x)y and xy is equal to 1. Thus g(x,y) will satisfy the inequalities min { 00 ,  10 ,  01 ,  11 }  g(x,y)  max { 00 ,  10 ,  01 ,  11 }.

Bilinear Spline Interpolation over a Grid
In order to explain how Colwell's method works over a grid of squares, we will explain his method for the case of a 3 by 3 grid of squares. The method will be applied to the variables X and Y that are defined over a rectangular region in X,Y space. We assume that X and Y satisfy the following restrictions: where X min < X max and Y min < Y max . We translate and scale X and Y so that the range of the transformed X and Y, x and y, lie in the interval joining 0 and 3; i.e., define x and y as follows: Define the following 3 dummy variable (or indicator) functions of x: Note that if 0  x  3, then D 1 (x) + D 2 (x) + D 3 (x) = 1 so that the 3 dummy variable functions sum to 1 if x lies in the interval between 0 and 3.
The domain of definition for the D ij (x,y) is the square S 3 in two dimensional space with each side of length 3; i.e., S 3  { (x,y) : 0  x  3; 0  y  3}. Note that for any (x,y) belonging to S 3 , we have  i=1 3  j=1 3 D ij (x,y) = 1. Thus the bilateral dummy variable functions D ij (x,y) will allocate any (x,y)S 3 to one of the nine unit square cells that make up S 3 . Denote the cell of area 1 that corresponds to x and y such that D ij (x,y) = 1 as C ij for i,j = 1,2,3. Thus the 3 cells in the grid of 9 cells that correspond to y values that satisfy 0  y < 1 are C 11 , C 21 and C 31 . The 3 cells that correspond to y values such that 1  y < 2 are C 12 , C 22 and C 32 and the 3 cells that correspond to y values such that 2  y  3 are C 13 , C 23 and C 33 .
Let f(x,y) be the function defined over S 3 that we wish to approximate. Define the heights  ij of the function f(x,y) at the 16 vertices of the grid of unit area cells as follows: (8)  ij  f(i,j) ; i = 0,1,2,3; j = 0,1,2,3.
Define the Colwell (1998;91-92) bilinear spline interpolating approximation g 3 (x,y) to f(x,y) for any (x,y)S 3 as follows: (9)  It can be verified that g 3 (x,y) is a continuous function of x and y over S 3 and g 3 (x,y) is equal to the underlying function f(x,y) when (x,y) is a vertex point of the grid; i.e., we have the following equalities for the 16 vertex points in S 3 : (10) g 3 (i,j) =  ij  f(i,j); i = 0,1,2,3; j = 0,1,2,3.
For each square of unit area in the grid, it can be seen that g 3 (x,y) behaves like the bilinear interpolating function g(x,y) that was defined by (2) in the previous section. Thus if (x,y) belongs to the cell C ij where i and j are equal to 1, 2 or 3, then g 3 (x,y) is bounded from below by the minimum of the 4 vertex point values  i1,j1 ,  i,j1 ,  i1,j ,  i,j and bounded from above by the maximum of the 4 vertex point values  i1,j1 ,  i,j1 ,  i1,j ,  i,j .
Following Colwell (1998;89), if we set y = j where j = 0, 1, 2 or 3, then the resulting function of x, g 3 (x, j), is a linear spline function in x between 0 and 3; i.e., g 3 (x, j) is a continuous, piecewise linear function of x that has 3 (joined) linear segments that can change their slopes at the break points x = 1 and x = 2. Similarly, if we set x = i where i = 0, 1, 2 or 3, then the resulting function of y, g 3 (i, y), is also a linear spline function in y between 0 and 3. Thus we can view g 3 (x,y) as an interpolating function that merges these linear spline functions in the x and y directions into a consistent continuous function of two variables, where the interpolating function is equal to the function of interest at the 16 vertex points of the grid.
Following Poirier (1976;11-12) and Colwell (1998), we can move from the interpolation model defined by (9) to an econometric estimation model. Thus suppose that we can observe x and y for N observations, say (x n ,y n ) for n = 1,...,N. Suppose also that we can observe f(x n ,y n ) for n = 1,...,N. Finally, suppose that we can approximate the function f(x,y) by g 3 (x,y) over S 3 . Let   [ 00 ,  10 ,..., 33 ] be the vector of the 16  ij which appear in (9) and rewrite g 3 (x,y) as g 3 (x,y,). Now view  as a vector of parameters which appear in the following linear regression model: (11) z n = g 3 (x n ,y n ,) +  n ; n = 1,...,N.
If we are willing to assume that the approximation errors  n are independently distributed with 0 means and constant variances, the unknown parameters  ij in (11) (which are the heights of the "true" function f(x,y) at the vertices in the grid) can be estimated by a least squares regression. It can be seen that this method for fitting a two dimensional surface over a bounded set is essentially a nonparametric method. If the number of observations N is sufficiently large and the observations are more or less uniformly distributed over the grid, then we can make the grid finer and finer and obtain ever closer approximations to the true underlying function. 7 To see how this nonparametric approach to the estimation of a surface could be applied in the context of sales of land plots in a geographical area, suppose that in a particular time period, we have information on the selling price of N land plots. Suppose that the selling price of land plot n is P n and the area of the property is L n square meters. Suppose also that we have data on the latitude and longitude of property n, X n and Y n for n = 1,...,N. Translate and scale these spatial coordinates into the variables x n and y n using definitions (4) and (5) above. We suppose that N is large enough and the observations are dispersed through all 9 cells in the 3 by 3 geographical grid. An approximation to the true land price surface in the geographical area under consideration (which gives the price of land per meter squared as a function of the transformed spatial coordinates) can be generated by estimating the following linear regression model: (12) P n /L n = g 3 (x n ,y n ,) +  n ; n = 1,...,N where the g 3 (x n ,y n ,) are defined by (9) for each (x n ,y n ) in the sample of observations. Thus estimates for the 16 unknown height parameters  ij in equations (12) can be obtained by solving a simple least squares minimization problem.
If observations are plentiful, then the grid can be made finer. Thus the 3 by 3 grid could be replaced by a k by k grid where k is an arbitrary positive integer. In this case, definitions (5) are replaced by x  k(X  X min )/(X max  X min ) and y  k(Y  Y min )/(Y max  Y min ). Definitions (6) to (9) can readily be modified to define the approximating function g k (x,y,) in place of g 3 (x,y,). Of course the new parameter vector  in g k (x,y,) will have dimension (k+1) 2 in place of the parameter vector  in g 3 (x,y,) which had dimension 4 2 = 16. Thus Colwell (1998) realized that the well known bilinear interpolation function g(x,y) defined by (2) could be used as a basic building block in a powerful nonparametric method for approximating an arbitrary continuous function of two variables. 8 7 If the dependent variable is observed with random errors, then the method for fitting the surface can also be regarded as a smoothing method. The smoothing parameter is the number of cells in the grid, k 2 (or k can be used as the smoothing parameter); the smaller the number of cells, the smoother will be the estimated g k (x,y) function. For a discussion of smoothing methods and alternative smoothing parameters, see Buja, Hastie and Tibshirani (1989). 8 Colwell (1998;87) summarized his method as follows: "A simple, non-parametric approach is neededone that fits any function with the fewest possible restrictions. The purpose of this article is to describe a method for using a single, standard OLS regression to estimate a continuous price function in space that can approximate any shape. The cost of the method developed here is found in terms of degrees of freedom. It achieves flexibility by requiring large numbers of observations." Colwell (1998;88) after noting that his approximating function was differentiable in the interior of each square in the grid but not necessarily at boundary points of each square offered the following view on the importance of continuity versus differentiability: "This tradeoff of continuity for differentiability is worth accepting because continuity is However, Colwell did not exhibit the explicit representation for g 3 (x,y) defined by (9) so it is not clear exactly how he defined his linear regression model. Colwell (1998;92) also made the following statement about his method of parameterization: "As indicated earlier, one of the location variables must be omitted if perfect multicollinearity is to be avoided. Finally, it is not necessary to have data points within every section." Thus he seemed to suggest that one of the  ij on the right hand side of (9) needed to be omitted in order to avoid perfect multicollinearity. But such an omission would seem to destroy the flexibility of his method; i.e., setting say  ij = 0 means that we would no longer have g k (i,j) = f(i,j). Moreover, as we shall see later in our empirical application of his method, problems can arise if some cells have no observations. Thus although the spirit of his model is clear, the exact details on how to implement it are not spelled out in his paper. 9

Colwell's Nonparametric Method versus Penalized Least Squares
It is useful to compare the nonparametric method for estimating a function of two variables explained in the previous section with the nonparametric method used by Hill and Scholz (2018) in their property price regressions for Sydney Australia. These authors used a penalized least squares approach for their nonparametric method.
Using the notation surrounding (11) above, a simplified version of this approach works as follows: find a function g(x,y) which is a solution to the following penalized least squares minimization problem: (13) min g  n=1 N [z n  g(x n ,y n )] 2 + J(g) where it is assumed that g(x,y) is twice continuously differentiable and J(g) is some function of the second order partial derivatives of g evaluated at the N observed (x n ,y n ). 10 The positive parameter  trades off how well each g(x n ,y n ) approximates the observed z n with how variable g is.
There is an extensive literature on solving this problem which is quite complicated. 11 In order to illustrate some of the problems associated with this penalized least squares approach, we will consider a simplified one dimensional version of this approach using finite differences of g(x) in J(g) in place of partial derivatives of g. For simplicity, we will also assume that the x n data are unique and we order the N observations on x as x 1 < compelling, whereas the worth of differentiability is dubious. Continuity of the price function is important because markets produce continuity. Discontinuities are destroyed by arbitrage." We agree with his assessment that differentiability of the approximating surface is not essential. 9 Poirier (1976;59-62) developed an approach which is equivalent to our Colwell based approach except that his parameterization of the approximating function is in terms of changes in levels rather than in the levels themselves. Thus his interaction terms are difficult to interpret. He also did not deal with the difficulties associated with empty cells. 10 For example, J(g) could equal  n=1 N [g xx (x n ,y n ) + 2g xy (x n ,y n ) + g yy (x n ,y n )] where g xx (x n ,y n )   2 g(x n ,y n )/xy, etc. 11 For example, see Wahba andWendelberger (1980), Silverman (1985;19), Wahba (2000) and Wood (2004). x 2 < ... < x N . The corresponding observed z values are z n for n = 1,...,N. Again, for simplicity, we assume that the x n are equally spaced.
Set g(x n ) = s n for n = 1,...,N. Our first highly simplified version problem (13) is the following penalized least squares minimization problem: choose s 1 , s 2 , ..., s N to solve the following unconstrained minimization problem: where  > 0 is a positive tradeoff parameter and the first and second order finite differences of the s n are defined as follows: (15) s n  s n  s n1 ; n = 2,3,...,N; (16)  2 s n  s n  s n1 ; n = 3,4,...,N.
For a given , (14) can readily be solved using the first order conditions for the minimization problem and a bit of linear algebra. Denote the solution to (14) as the vector s()  [s 1 (),...,s N ()]. Denote the vector of observed z n as z  [z 1 ,...,z N ]. As  tends to 0, s() will tend to the observed vector z. As  tends to plus infinity, the s n () will tend to a linear function of n; i.e., s n () will tend to  + n for n = 1,...,N for some  and . This smoothing model was originally suggested by Henderson (1924;30). Note that this smoothing method depends on the choice of . The method of cross validation can be used to choose ; see Silverman (1985;5) for references to the literature.
Our second highly simplified version problem (13) is the following penalized least squares minimization problem: choose s 1 , s 2 , ..., s N to solve the following unconstrained minimization problem: where  > 0 is again a positive tradeoff parameter between fit and the variability of the s n and the third order finite differences of the s n, are defined as follows: (18)  3 s n   2 s n   2 s n1 ; n = 4,5,...,N.
Denote the solution to (17) as the vector s()  [s 1 (),...,s N ()]. As  tends to 0, s() will tend to the observed vector z. As  tends to plus infinity, the s n () will tend to a quadratic function of n; i.e., s n () will tend to  + n + n 2 for n = 1,...,N for some  and . This smoothing model was originally suggested by Whittaker (1923).
In the actuarial literature, the smoothing methods that chose the s n = g(x n ) for n = 1,...,N by solving (14) or (17) for an exogenous  is known as the Whittaker-Henderson method of graduation 12 and in the economics literature, using (14) to smooth a time series is known as the Hodrick-Prescott (1980) filter.
This penalized least squares approach to smoothing a series implicitly defines a series to be smooth if its higher order differences are all "small". However, Bizley (1958;126) criticized this definition of smoothness by noting that the rather smooth exponential function, g(x)  e x , has derivatives and differences that never become small, no matter how high a difference is taken. This fairly compelling criticism of the penalized least squares approach has been largely ignored by the current literature.
A method for smoothing a discrete series z n can be modeled as a mapping of the vector z  [z 1 ,...,z N ] into a "smoothed" vector s  [s 1 ,...,s N ]. Let F(z)  [F 1 (z),...,F N (z)] be the vector valued smoothing function that transforms the "rough" z into the "smooth" s so that s  F(z). The function F(z) is a representation of the smoothing method. Diewert and Wales (2006;107-110) developed a test or axiomatic approach to describe desirable properties of a smoothing method. We list their tests below along with two additional tests. 13 Test 1; Sum Preserving Test. If s = F(z), then 1 N F(z)   n=1 N F n (z) =  n=1 N z n  1 N z where 1 N is an N dimensional vector of ones. The test says that the sum of the values of the smoothed series should equal the sum of the values of the original series. 14 Test 2; First Moment Preserving Test: If s = F(z), then  n=1 N ns n =  n=1 N nz n . This test was suggested by Whittaker (1923;68).
Test 3; Identity Test: If z = k1 N where k is a scalar, then s = F(k1 N ) = k1 N . Thus if the rough z is constant, then its smooth s reproduces this constant vector.
Test 4; The Linear Trend Test: If z n =  + n for n = 1,...,N where  and  are constants, then F(z) = z.
The last 3 tests were listed in Diewert and Wales (2006;106). The following two tests were not listed in Diewert and Wales but they are obvious tests that are similar to Tests 4-6: if the rough is a smooth elementary function of one variable, then the smooth should be identical to the rough.
Test 7; The Exponential Trend Test: If z n = e n for n = 1,...,N where  is a constant, then F(z) = z.
Test 8; The Logarithmic Trend Test: If z n = ln(n) for n = 1,...,N where  is a constant, then F(z) = z.

Test 9; The Diminishing Variation Test
If the smoothing method satisfies Tests 1 and 9, then the variance of the smooth cannot exceed the variance of the rough. Test 9 was proposed by Schoenberg (1946;52).
The next test is of fundamental importance in our view and it is essentially due to Sprague. It is worth quoting him on this test: "I now proceed to a different part of my subject and prove that it is undesirable to employ such formulas as Mr. Woolhouse's, Mr. Higham's, or Mr. Ansell's, not only because, as already mentioned, they will never entirely get rid of the irregularities in our observations, but also because they all have a tendency to introduce an error even into a regular series of numbers. I start with the proposition which I think will command universal consent, that, if we attempt to graduate a perfectly regular series of numbers, the result should be to leave it unaltered; and that, if our method of procedure alters the law of the series, and substitutes for the original series one following a different law, this proves that our method of procedure is faulty." T.B. Sprague (1887; 107-108).
It can be seen that Tests 3-8 above are essentially due to Sprague. Diewert and Wales (2006;109) argued that a consequence of the above quotation is the following test: Thus if we smooth the raw data z once and obtain the smooth s = F(z) and then if we smooth the resulting s and obtain F(s), we find that the second round of smoothing just reproduces s so that F(s) = s. Put another way, the smoothing method defined by the function F should produce a smooth series and so another round of smoothing should not change the smooth series produced by the initial use of F.
It can be shown that the Henderson smoothing method defined by the solution to (14) satisfies all of the above tests except Tests 6-8 and Test 10 and the Whittaker method defined by the solution to (17) satisfies all of the above tests except Tests 7, 8 and 10. 15 The failure of a method to pass Test 10 is, in our view, is a serious problem with the method. Unfortunately, as noted by Diewert and Wales (2006;109), most smoothing methods fail this test. For example, of the seven main types of nonparametric smoothing models listed by Buja, Hastie and Tibshirani (1989;456-460): (i) running mean smoothers; (ii) bin smoothers; (iii) running line smoothers; (iv) polynomial regression; (v) cubic smoothing splines (the Henderson (1924) model); (vi) regression splines with fixed knots or break points; (vii) kernel smoothers; only methods (ii), (iv) and (vi) pass Test 10. The reason why these three smoothing methods pass Test 10 is that they are linear smoothers that are based on linear regression models.
Thus suppose that the rough z satisfies the linear regression model, z = X +  where X is an N by K matrix of exogenous variables of full rank K  N and  is a vector of independently distributed error terms with means 0 and constant variances. Then the least squares estimator for  is b  (X T X) 1 Xz and the predicted z vector is the smoothed vector s  Xb = X(X T X) 1 Xz = Sz where S  X(X T X) 1 X is the linear smoothing matrix for this regression based smoothing method. Thus F(z)  Sz for this linear smoothing method. Any linear regression based smoothing method will satisfy Test 10 since F[F(z)] = S[Sz] = Sz = F(z) for this class of methods since SS = X(X T X) 1 X X(X T X) 1 X = X(X T X) 1 X = S. Methods (ii), (iv) and (vi) are all special cases of a linear regression based smoothing method. 16 What we are arguing in this section is that linear regression based smoothing methods have some advantages over the penalized least squares approach used by Hill and Scholz in their study. Looking at the definition of g 3 (x,y) in equation (9), it can be seen that our suggested bivariate nonparametric smoothing method is a linear regression based smoothing method, where the  ij are the components of the  vector in a linear regression. Thus our suggested method inherits the useful properties of a linear regression model including satisfaction of a bivariate version of Test 10; i.e., if we smooth the raw data once and then smooth once again, we do not change the original smooth, whereas the penalized least squares approach used by Hill and Scholz does not satisfy this key test. Also, if there are no random errors in the model and if the observed z n occur precisely at the vertex points of the chosen k by k grid, then the Colwell model will reproduce the underlying functional values; i.e., we will have g k (x n ,y n ) = f(x n ,y n ) for all n where f(x n ,y n ) is the "true" function value at the nth grid point. Thus our model has an underlying flexibility which is not attained by a penalized least squares approach.
We turn now to our empirical application of the Colwell nonparametric method of surface fitting.

The Tokyo Residential Property Sales Data
Our basic data set consists of quarterly data on V (the selling price of a residential property in Tokyo), L (the land area of the property in square meters), S (the floor space area of the structure if any on the land plot), A (the age in years of the structure if any on the land plot), the location of the property (specified in terms of longitude x and latitude y and in terms of the 23 Wards or local neighbourhoods of Tokyo) and some additional characteristics to be explained below. These data were obtained from a weekly magazine, Shukan Jutaku Joho (Residential Information Weekly) published by Recruit Co., Ltd., one of the largest vendors of residential listings information in Japan. The Recruit dataset covers the 23 special wards of Tokyo for the period 2000 to 2010, including the minibubble period in the middle of 2000s and its later collapse caused by the Great Recession.
Shukan Jutaku Joho provides time series of housing prices from the week when it is first posted until the week it is removed due to its sale. 17 We only used the price in the final week because this can be safely regarded as sufficiently close to the contract price. 18 After range deletions, there were a total of 5580 observations with structures on the property in our sample of sales of residential property sales in the Tokyo area over the 44 quarters covering 2000-2010. 19 In addition, we had 8493 observations on residential properties with no structure on the land plot. 20 Thus there was a total of 14,073 properties in our sample. The variables used in our regression analysis to follow and their units of measurement are as follows: V = The value of the sale of the house in 10,000,000 Yen; S = Structure area (floor space area) in units of 100 meters squared; L = Lot area in units of 100 meters squared; A = Approximate age of the structure in years; NB = Number of bedrooms; W = Width of the lot in 1/10 meters; TW = Walking time in minutes to the nearest subway station; TT = Subway running time in minutes to the Tokyo station from the nearest station during the day (not early morning or night); X = Longitude of the property; Y = Latitude of the property; P S = Construction cost for a new structure in 100,000 Yen per meter squared. 17 There are two reasons for the listing of a unit being removed from the magazine: a successful deal or a withdrawal (i.e. the seller gives up looking for a buyer and thus withdraws the listing). We were allowed access to information regarding which the two reasons applied for individual cases and we discarded those transactions where the seller withdrew the listing. 18 Recruit Co., Ltd. provided us with information on contract prices for about 24 percent of all listings. Using this information, we were able to confirm that prices in the final week were almost always identical with the contract prices; see Shimizu, Nishimura and Watanabe (2016). 19 We deleted 9.2 per cent of the observations with structures because they fell outside our range limits for the variables V, L, S, A, NB and W. It is risky to estimate hedonic regression models over wide ranges when observations are sparse at the beginning and end of the range of each variable. The a priori range limits for these variables were as follows: 1.8  V  20; 0.5  S  2.5; 0.5  L  2.5; 1  A  50; 2  NB  8; 25  W  90. For properties with no structure, we set the corresponding S equal to 0. 20 The large number of plots with no structures can be explained by the preference of Japanese buyers of residential properties to construct their own house. Thus sellers of residential properties that have a relatively old structure on the property tend to demolish the structure and sell the property as a land only property.
In addition, we have the address of each property and so we can allocate each property to one of the 23 Wards of Tokyo. This information was used to construct Ward dummy variables for each property in our sample. The basic descriptive statistics for the above variables are listed in Table 1 below. Thus over the sample period, the sample average sale price was approximately 62.5 million Yen, the average structure space was 43.5 m 2 (but for properties with structures, the average was 110 m 2 ), the average lot size was 103.9 m 2 , the average age of the structure was 5.8 years (for properties with a structure, the average age was 14.7 years), the average number of bedrooms in the properties that had structures was 3.95, the average lot width was 4.7 meters, the average walking time to the nearest subway station was 9.4 minutes and the average subway travelling time from the nearest station to the Tokyo Central station was 31.2 minutes.
As is usual in property regressions using L and S as independent variables, we can expect multicollinearity problems in a simple linear regression of V on S and L. 21 In order to eliminate a possible multicollinearity problem between the lot size L and floor space area S for properties with a structure and to make our estimates of structure value consistent with structure value estimates in the Japanese national accounts, we will assume that the value of a new structure in any quarter is equal to a Residential Construction Cost Index per m 2 for Tokyo 22 (equal to P St for quarter t) times the floor space area S of the structure. The builder's model for valuing a residential property postulates that the value of a residential property is the sum of two components: the value of the land which the structure sits on plus the value of the residential structure.

The Basic Builder's Model using Spatial Coordinates to Model Land Prices
In order to justify the model, consider a property developer who builds a structure on a particular property. The total cost of the property after the structure is completed will be equal to the floor space area of the structure, say S square meters, times the building cost per square meter,  say, plus the cost of the land, which will be equal to the cost per square meter,  say, times the area of the land site, L. Now think of a sample of properties of the same general type, which have prices or values V tn in period t 23 and structure areas S tn and land areas L tn for n = 1,...,N(t) where N(t) is the number of observations in period t. Assume that these prices are equal to the sum of the land and structure costs plus error terms  tn which we assume are independently normally distributed with zero means and constant variances. This leads to the following hedonic regression model for period t where the  t and  t are the parameters to be estimated in the regression: 24 (19) V tn =  t L tn +  t S tn +  tn ; t = 1,...,44; n = 1,...,N(t).
Note that the two characteristics in our simple model are the quantities of land L tn and the quantities of structure floor space S tn associated with property n in period t and the two constant quality prices in period t are the price of a square meter of land  t and the price of a square meter of structure floor space  t . Finally, note that separate linear regressions can be run of the form (19) for each period t in our sample.
The hedonic regression model defined by (19) applies to new structures. But it is likely that a model that is similar to (19) applies to older structures as well. Older structures will be worth less than newer structures due to the depreciation of the structure. Assuming that we have information on the age of the structure n at time t, say A tn = A(t,n) and assuming a geometric depreciation model, a more realistic hedonic regression model than that defined by (19) above is the following basic builder's model: 25 (20) V tn =  t L tn +  t (1  ) A(t,n) S tn +  tn ; t = 1,...,44; n = 1,...,N(t) where the parameter  reflects the net depreciation rate as the structure ages one additional period. 26 Note that (20) is now a nonlinear regression model whereas (19) was a simple linear regression model.
Note that the above model is a supply side model as opposed to the demand side model of Muth (1971) and McMillen (2003). Basically, we are assuming competitive suppliers of housing so that we are in Rosen's (1974;44) Case (a), where the hedonic surface identifies the structure of supply. This assumption is justified for the case of newly built houses but it is less well justified for sales of existing homes. 27 As was mentioned in the previous section, we have 14,073 observations on sales of houses in Tokyo over the 44 quarters in years 2000-2010. Thus equations (20) above could be combined into one big regression and a single depreciation rate  could be estimated along with 44 land prices  t and 44 new structure prices  t so that 89 parameters would have to be estimated. However, experience has shown that it is usually not possible to estimate sensible land and structure prices in a hedonic regression like that defined by (20) due to the multicollinearity between lot size and structure size. 28 Thus in order to deal with the multicollinearity problem, we draw on exogenous information on new house building costs from the Japanese Ministry of Land, Infrastructure, Transport and Tourism (MLIT). Thus if the sale of property n in period t has a new structure on it, we assume that the value of this new structure is equal to this measure of residential building costs p St time the floor space area of the new structure, S tn . We apply this same line of reasoning to property sales that have old structures on them as well. Thus our new builder's model replaces the parameter  t which appears in equations (20) with the exogenous official price P St . Our new model becomes the following one: (21) V tn =  t L tn + P St (1  ) A(t,n) S tn +  tn ; t = 1,...,44; n = 1,...,N(t).
Thus we have 14,073 degrees of freedom to estimate 44 land price parameters  t and one annual geometric depreciation rate parameter , a total of 45 parameters. We estimated the nonlinear regression model defined by (21) for our Tokyo data set using the econometric programming package Shazam; see White (2004). The R 2 for the resulting preliminary nonlinear regression Model 0 was only 0.5545, 29 which is not very satisfactory. However, there are no location variables in Model 0.
The value of a structure of the same type and age should not vary much from location to location. However, the price of land will definitely depend on the location of the property. Thus for our next model, we assume that the per meter price of land of a property is a function f(x,y) of its spatial coordinates, x and y. Thus let x tn and y tn equal the normalized longitude and latitude of property n sold in period t. We will initially approximate the true land price surface f(x,y) by the 4 by 4 Colwell spatial grid function g 4 (x,y) defined above in section 3. If X tn and Y tn are the raw longitude and latitude of property n sold in period t, then define the corresponding transformed spatial coordinates as x tn  4(X tn  X min )/(X max  X min ) and y tn  4(Y tn  Y min )/(Y max  Y min ) and define the Colwell approximation to f(x tn ,y tn ) as g 4 (x tn ,y tn ) using the definitions in section 3. Model 1 is the following nonlinear regression model: (22) V tn =  t g 4 (x tn ,y tn ,)L tn + P St (1  ) A(t,n) S tn +  tn ; t = 1,...,44; n = 1,...,N(t).
Note that the  vector of parameters in g 4 (x tn ,y tn ,) consists of the 25 spatial grid parameters  ij where i, j = 0,1,2,3,4. Thus equations (22) contain 44 unknown period t land price parameters  t , 25 unknown  ij spatial grid parameters and 1 depreciation rate parameter  for a total of 70 unknown parameters. However, not all of these parameters can be estimated. If we multiply all components of  by the positive number  and divide all  t by , it can be verified that the terms  t g 4 (x tn ,y tn ,)L tn remain unchanged. Thus a normalization on the  t and the  ij is required. We impose the normalization  1 = 1 which means that the sequence, 1,  2 ,..., 44 , can be interpreted as an index of residential land prices for Tokyo for the 44 quarters in our sample, where the index is set equal to 1 in the first quarter of 2000. 30 There are 44 = 16 cells C ij in our grid of squares where C 11 is the cell in the southwest corner of the grid, C 41 is the southeast corner cell, C 14 is the northwest corner cell and C 44 is the northeast corner cell. It turns out that cell C 41 has no observed property sales over the entire sample period. 31 This means that  44 , the value of land per meter squared at the southeast corner of the grid, cannot be identified. Thus in addition to the normalization  29 All of the R 2 reported in this paper are equal to the square of the correlation coefficient between the dependent variable in the regression and the corresponding predicted variable. The estimated net annual geometric depreciation rate was  = 10.49%, with a T statistic of 23.3. This depreciation rate is too high to be believable. As we add more explanatory variables, we will obtain more reasonable depreciation rates. 30 Note that the  t shift the entire land price surface g 4 (x,y,) in a proportional manner over time. Thus all reasonable index numbers of the land price components of individual residential properties in Tokyo will be proportional to the estimated parameter sequence 1,  2 * ,...,  44 * . This is perhaps a weakness of our model but given the nonparametric nature of our modeling of land prices, some simplifying assumptions had to be made in order to estimate all of the parameters in our model. In a real time setting, a rolling window approach would be used in order to implement our model which would allow the height parameters to change over time; see Shimizu, Nishimura and Watanabe (2010) for an example of this approach. 31 This cell is defined as properties with normalized spatial coordinates (x,y) where x and y satisfy the restrictions 3  x  4 and 0  y < 1. = 1, we set  44 = 0 in equations (22). These normalizations will ensure that the nonlinear minimization problem associated with estimating Model 1 will have a unique solution. Thus Model 2 has 68 unknown parameters.
We used Shazam's nonlinear regression option to estimate the unknown parameters in (22). The R 2 for Model 1 turned out to be 0.7973, a huge jump from the R 2 for Model 0, which was only 0.5545. This large jump indicates the importance of including locational variables in a property regression. The log likelihood for Model 1 increased by 5524.50 points over the final log likelihood of Model 0 for adding 23 new location parameters. Since Model 0 is a special case of Model 1, this is a highly significant increase in log likelihood. The estimated geometric depreciation rate from Model 1 was 6.33 % per year (T statistic = 31.7) which is more reasonable than the Model 0 estimate of 10.49 %.
We now address the problem of how exactly should the land, structure and overall house price index be constructed? Our nonlinear regression model defined by (22) decomposes the period t value of property n into two terms: one which involves the land area L tn of the property,  t g 4 (x tn ,y tn ,)L tn , and another term, P St (1  ) A(t,n) S tn , which involves the structure area S tn of the property. The first term can be regarded as an estimate of the land value of house n that was sold in quarter t while the second term is an estimate of the structure value of the house (if S tn > 0). Our problem now is how exactly should these two value terms be decomposed into constant quality price and quantity components? Our view is that a suitable constant quality land price index for all houses sold in period t should be  t and for property n sold in period t, the corresponding constant quality quantity should be g 4 (x tn ,y tn ,)L tn . 32 Turning to the decomposition of the structure value of property n sold in period t, P St (1  ) A(t,n) S tn , into price and quantity components, we take P St as the price and (1  ) A(t,n) S tn as the corresponding quantity for property n sold in quarter t.
Note that the above value decompositions of individual property values into land and structure components sets the price of a square meter of land in quarter t equal to  t * , the estimated parameter value for  t and sets the price of a square meter of structure equal to P St , the official per meter structure cost for quarter t. These prices are assumed to be the same across all properties sold in period t and thus we can set the aggregate land and structure price for all residential properties sold in period t equal to P Lt and P St where P Lt   t * for t = 1,...,44. The corresponding aggregate constant quality quantities of land and structures sold in period t are defined as follows: The price and quantity series for land and structures need to be aggregated into an overall Tokyo residential property sales price index. We use the Fisher (1922) ideal index to perform this aggregation. Thus define the overall house price level for quarter t for Model 2, P t , as the chained Fisher price index of the land and structure series {P Lt ,P St ,Q Lt ,Q St }. Since these aggregate price and quantity series are generated by the Model 1 nonlinear regression model defined by equations (22), we relabel Q Lt , Q St , P t , P Lt , P St , as Q L1t , Q S1t , P 1t , P L1t , P S1t for t = 1,...,T. 34 33 We could use hedonic imputation or index number theory to form aggregate price and quantity indexes of land and structures but because our model makes the constant quality price of land and structures the same across all property sales in a quarter, our aggregation procedure can be viewed as an application of Hicks' Aggregation Theorem; i.e., if the prices in a group of commodities vary in strict proportion over time, then the factor of proportionality can be taken as the price of the group and the deflated group expenditures will obey the usual properties of a microeconomic commodity. "Thus we have demonstrated mathematically the very important principle, used extensively in the text, that if the prices of a group of goods change in the same proportion, that group of goods behaves just as if it were a single commodity." J.R. Hicks (1946;312-313). 34 The Fisher chained index P 1t is defined as follows. For t = 1, define P 1t  1. For t > 1, define P 1t in terms of P 1t1 and P Ft as P 1t  P 1t1 P Ft where P Ft is the quarter t Fisher chain link index. The chain link index for t  2 is defined as P Ft  [P LASt P PAAt ] 1/2 where the Laspeyres and Paasche chain link indexes are defined as P LASt  [P L1t Q L1t1 +P S1t Q S1t1 ]/[P L1t1 Q L1t1 +P S1t1 Q S1t1 ] and P PAAt  [P L1t Q L1t +P S1t Q S1t ]/[P L1t1 Q L1t +P S1t1 Q S1t ]. Diewert (1976Diewert ( ) (1992 showed that the Fisher formula had good justifications from both the perspectives of the economic and axiomatic approaches to index number theory. The overall Model 1 house price index P 1t as well as the land and structure price indexes P L1t and the normalized structure price index, p St  P St /P S1 , for Tokyo over the 44 quarters in the years 2000-2010 are graphed in Chart 1. 35 We have also computed the quarterly mean selling price of properties traded in quarter t and then normalized this average property price series to start at 1 in Quarter 1 of 2000. This mean price series, P Mean t , is also graphed in Chart 1. 36 It can be seen that the official structure price series gradually trends downward over the sample period, which is not surprising since general deflation occurred in Japan during our sample period. Because there are so many land only properties in our sample and since the value of structures is relatively small for properties which have structures on them, it can be seen that our estimated land price series, P L1t , is relatively close to our Model 1 overall property price index, P 1t . It can also be seen that the average property price series, P Mean t , has the same general shape as our overall property price index P 1t , but the average property price series lies well below our constant quality property price series by the end of the sample period. This is to be expected since the mean property price series does not take into account depreciation of the structures for properties that have structures on them. However, the extent of the downward bias in the mean property price series by the end of the sample period is somewhat surprising. Can we vary the number of cells in the spatial grid and explain more of the variation in residential property prices? We address this question in the next four hedonic regression models (Models 2-5) where we progressively increase the number of cells in the locational grid. Thus we will replace the land price approximating function g 4 (x tn ,y tn ,) in (22) by g 5 (x tn ,y tn ,), g 6 (x tn ,y tn ,), g 7 (x tn ,y tn ,) and g 8 (x tn ,y tn ,). The resulting Models 2-5 have 25, 36, 49 and 64 cells C ij and 36, 49, 64 and 81 spatial land price height parameters  ij respectively. Setting up the corresponding nonlinear regressions using (22) as a template is straightforward except that the existence of cells with no sample observations means that not all height parameters can be estimated.
For Model 2, which used g 5 (x tn ,y tn ,) in (22) in place of g 4 (x tn ,y tn ,), the following cells in the 5 by 5 grid of cells had no sales over our sample period: C 11 , C 41 , C 51 and C 42 . This means that 3 height parameters could not be estimated so we imposed the following restrictions on the parameters of Model 2:  00 =  40 =  50 = 0. We also set  1 = 1 so that the remaining land price parameters  t could be identified. Thus Model 2 had 36  3 = 33  ij parameters, 43 land price parameters  t and 1 depreciation rate parameter  for a total of 77 parameters. The final log likelihood for Model 2 was 155.04 points higher than the final log likelihood for Model 1 for adding 9 extra land price location parameters. The resulting R 2 was 0.8035 and the estimated geometric depreciation rate was  * = 6.29% with a T statistic of 31.6. We expected that all of the estimated height parameters would be positive but two of them ( 51 * and  05 * ) turned out to be negative. However, the 35 Define the normalized official structure price series as p St = P St /P S1 for t = 1,...,44. This is the series that is plotted in Chart 1. It will not change as we introduce additional hedonic property regression models. We note that the official index P St = 18.5p St ; i.e., P S1 = 18.5. 36 The series P Mean , P 1 , P L1 and p S are also listed in Table A1 of the Appendix. estimated land prices for each observation tn in our sample, g 5 (x tn ,y tn , * ), turned out to be positive for t = 1,...,44 and n = 1,...,N(t), and so we did not worry about these 3 negative  ij * at this stage of our investigation. 37 The sequence of estimated  t * is our estimated land price series for Model 2, P L2t , and this series is plotted in Chart 2 below and is listed in Table A2 of the Appendix. For Model 3, which used g 6 (x tn ,y tn ,) in (22) in place of g 4 (x tn ,y tn ,), the following 5 cells in the 6 by 6 grid of cells had no sales over our sample period: C 11 , C 51 , C 61 , C 52 and C 62 . Thus we set the following 5 height parameters equal to 0 in order to identify the remaining height parameters:  00 =  50 =  60 =  51 =  61 = 0. We also set  1 = 1 so that the remaining land price parameters  t could be identified. Thus Model 3 had 49  5 = 44  ij parameters, 43 land price parameters  t and 1 depreciation rate parameter  for a total of 88 parameters. The final log likelihood for Model 3 was 82.43 points lower than the final log likelihood for Model 2 for adding 11 extra land price location parameters. Model 3 is not a special case of Model 2 so it can happen that a moving to a larger number of squares in the grid does not improve the fit of the model. The problem is that there are likely to be discrete neighbourhood land price effects and our relatively course partition of the city into squares does not adequately capture these discrete neighbourhood effects. The resulting R 2 for Model 3 was 0.8014 (less than the Model 2 R 2 of 0.8035) and the estimated geometric depreciation rate was  * = 6.25% with a T statistic of 31.8. There were 5 negative estimates for the land price height parameters:  01 * ,  10 * ,  41 * ,  06 * and  56 * . However, the estimated land prices g 6 (x tn ,y tn , * ) turned out to be positive for each observation in our sample. The sequence of estimated  t * is our estimated land price series for Model 3, P L3t , and this series is plotted in Chart 2 below and is listed in Table  A2 of the Appendix.
Model 4 used g 7 (x tn ,y tn ,) in (22) in place of g 4 (x tn ,y tn ,). The following 9 cells in the 7 by 7 grid of cells had no sales over our sample period: C 11 , C 21 , C 51 , C 61 , C 71 , C 52 , C 62 , C 72 and C 17 . Thus we set the following 9 height parameters equal to 0 in order to identify the remaining height parameters:  00 =  10 =  50 =  60 =  70 =  51 =  61 =  71 =  07 = 0. We also set  1 = 1 so that the remaining land price parameters  t could be identified. Thus Model 4 had 64  9 = 55  ij parameters, 43 land price parameters  t and 1 depreciation rate parameter  for a total of 99 parameters. The final log likelihood for Model 4 was 501.88 points higher than the final log likelihood for Model 3 for adding 11 extra land price location parameters. The resulting R 2 for Model 4 was 0.8156 and the estimated geometric depreciation rate was  * = 5.99% with a T statistic of 31.9. There were 3 negative estimates for the land price height parameters:  01 * ,  67 * and  77 * . As usual, the estimated land prices, g 7 (x tn ,y tn , * ) for t = 1,...,T and n = 1,...,N(t), turned out to be positive for each observation in our sample. The sequence of estimated  t * is our 37 The city of Tokyo is adjacent to the Pacific Ocean and so the boundaries of the city do not fit nicely into a rectangular grid (which we transformed into a square grid). Thus as the number of squares in the grid becomes larger, some squares at the boundaries of the grid will end up having no observations or very few observations. Thus suppose the observations in cell C 11 are concentrated in the top north east corner of this cell. Then a better fit to the observed data in cell C 11 may be obtained by setting  00 equal to a negative number. estimated land price series for Model 4, P L4t , and this series is plotted in Chart 2 below and is listed in Table A2 of the Appendix.
Finally, Model 5 used g 8 (x tn ,y tn ,) in (22) in place of g 4 (x tn ,y tn ,). The following 14 cells in the 8 by 8 grid of cells had no sales over our sample period: C 11 , C 12 , C 21 , C 18 , C 61 , C 62 , C 63 , C 71 , C 72 , C 73 , C 81 , C 82 , C 83 and C 88 . All 4 corner cells were empty along with many other boundary cells. Thus we set the following 14 height parameters equal to 0 in order to identify the remaining height parameters:  00 =  10 =  01 =  60 =  61 =  62 =  70 =  71 =  72 =  80 =  81 =  82 =  88 = 0. We also set  1 = 1 so that the remaining land price parameters  t could be identified. . As usual, the estimated land prices, the g 8 (x tn ,y tn , * ), turned out to be positive for each observation in our sample. The sequence of estimated  t * is our estimated land price series of index numbers for Model 5, P L5t , and this series is plotted in Chart 2 below and is listed in Table A2 of the Appendix.
At this point, we decided stop the process of increasing the number of height parameters. It is clear that our best model up to this point was Model 4.
One of the main purposes of this paper is to see if the use of spatial coordinates in a residential hedonic property value regression can lead to more accurate estimates for a property price index and for a land price subindex for residential properties than can be obtained using just postal codes or other neighbourhood locational variables. Hill and Scholz (2018) made this comparison for residential property price indexes but not for the land price component of their overall property price index since their methodological approach did not allow for separate land and structure subindexes.
An alternative to using spatial coordinates to measure the influence of location on property prices is to use postal codes or neighbourhoods as indicators of location. There are 23 Wards in Tokyo and each property in our sample belongs to one of these Wards. In order to take into account possible neighbourhood effects on the price of land, we introduced ward dummy variables, D W,tn,j , into the hedonic regression (20). These 23 dummy variables are defined as follows: for t = 1,...,44; n = 1,...,N(t); j = 1,...,23: 38 (24) D W,tn,j  1 if observation n in period t is in Ward j of Tokyo;  0 if observation n in period t is not in Ward j of Tokyo.
We now modify the model defined by (20) to allow the level of land prices to differ across the 23 Wards of Tokyo. The new Model 6 is defined by the following nonlinear regression model: (25) V tn =  t ( j=1 23  j D W,tn,j )L tn + P St (1  ) A(t,n) S tn +  tn ; t = 1,...,44; n = 1,...,N(t).
Comparing the models defined by equations (20) and (25), it can be seen that we have added an additional 23 ward relative land value parameters,  1 ,..., 23 , to the model defined by (20). However, looking at (25), it can be seen that the 44 land price time parameters (the  t ) and the 23 ward parameters (the  j ) cannot all be identified. Thus we need to impose at least one identifying normalization on these parameters. We chose the following normalization  1 = 1. Thus equations (25) contain 43 unknown period t land price parameters  t , 23 Ward relative land price parameters, the  j , which replace the 25 unknown  ij spatial grid parameters in (22), and 1 depreciation rate parameter  for a total of 67 unknown parameters. Thus this Ward dummy variable hedonic regression (Model 6) has roughly the same number of parameters as our spatial coordinate Model 1, which had 68 unknown parameters.
The final log likelihood for Model 6 was 24318.67, a gain of 5045.90 over the final log likelihood of Model 0 defined by equations (21). The R 2 for Model 6 was 0.7853. The final log likelihood for Model 1 was 23840.07 and the R 2 was 0.7993. Thus the spatial coordinates Model 1 fit the data better than the dummy variable Model 6. Both models had roughly the same number of parameters. However, how different are the resulting land price indexes generated by these two models? As usual, the sequence of estimated  t * is our estimated land price series for Model 6, P L6t , and this series is plotted in Chart 2 below and is listed in Table A2 of the Appendix. 39 It can be seen that all 6 models produce much the same land price indexes. 40 Since our best fitting model was Model 4, P L4t is our preferred land price series. Note that the Ward dummy variable model land price index, P L6t , is fairly close to our preferred series. 39 Define the aggregate constant quality amounts of residential land and structures sold in period t by Q Lt   n=1 N(t) ( j=1 23  j * D W,tn,j )L tn and Q St   n=1 N(t) (1   * ) A(t,n) S tn for t = 1,...,44. The overall period t property price index for Model 6, P 6t , is defined as the chained Fisher price index using the above Q Lt and Q St as the period t quantity series and P L6t   t * and the official structure prices P St as the period t price series when constructing the Fisher index chain links. 40 Since the structure component of overall property prices is relatively small compared to the land component and since the structure price index is the same across all 6 models, the overall property price indexes generated by Models 1-6 are all very similar.
The above 6 models make use of information on land plot size, structure floor space, the age of the structure (if the property has a structure) and its location, either in terms of spatial coordinates or terms of its neighbourhood. 41 These are the most important residential property price determining characteristics in our view. In the following section, we make use of additional information on housing characteristics and see if this extra information materially changes our estimated land price indexes. 42 We will use the spatial coordinate Model 4 as our starting point in the models which follow, since it was the best fitting model studied in this section. This model used the Colwell nonparametric model for modeling the land price surface with the 77 = 49 cell grid.

Spatial Coordinate Models that Use Additional Information
It is likely that property sales that have an older structure on the property will have a different land valuation than a nearby property of the same size that consists of cleared land, since demolition costs are not trivial. Our Model 7 takes this possibility into account. Define the dummy variable D L,tn as follows for t = 1,...,44 and n = 1,...,N(t): (26) D L,tn  1 if observation n in period t is a land only sale;  0 otherwise. 41 We also require an exogenous building cost per square meter in order to get realistic land and structure subindexes. 42 We are also interested in determining whether the extra information will change our estimates of structure depreciation rates. Define D S,tn  1  D L,tn for t = 1,...,44; n = 1,...,N(t). Thus if property n sold in period t has a structure on it, D S,tn will equal 1. Model 7 estimates the following nonlinear regression: (27) V tn =  t (D S,tn + D L,tn )g 7 (x tn ,y tn ,)L tn + P St (1  ) A(t,n) S tn +  tn ; t = 1,...,44; n = 1,...,N(t).
Thus the parameter  gives the added premium to the property's land price (per meter squared) if the property has no structure on it. We expect  to be a small number. Since we are using the interpolation function g 7 (x tn ,y tn ,) as a basic building block in the nonlinear regression model defined by (27), we need to impose the same restrictions on the  ij that were imposed in Model 4. Thus we set the following 9 height parameters equal to 0 in order to identify the remaining height parameters:  00 =  10 =  50 =  60 =  70 =  51 =  61 =  71 =  07 = 0. We also set  1 = 1 so that the remaining land price parameters  t could be identified. Thus Model 7 had 64  9 = 55  ij parameters, 43 land price parameters  t , 1 depreciation rate parameter  and one land only premium parameter  for a total of 100 parameters. As starting coefficient values for Model 7, we used the final coefficient estimates from Model 4 plus the starting value  = 0. The final log likelihood for Model 7 was 128.75 points higher than the final log likelihood for Model 4 for adding 1 land only parameter. The resulting R 2 for Model 7 was 0.8175 (the Model 4 R 2 was 0.8156) and the estimated geometric depreciation rate was  * = 2.85% with a T statistic of 15.4. Recall that the estimated depreciation rate from Model 4 was 5.99%. Our new estimated depreciation rate is much more reasonable. The estimated  was  * = 1.110 (T statistic = 153.4). Thus a property without a structure sold at an 11.0% premium compared to a similar property without a structure. There were 3 negative estimates for the land price height parameters:  01 * ,  67 * and  77 * . As usual, the estimated land prices, g 7 (x tn ,y tn , * ) for t = 1,...,T and n = 1,...,N(t), turned out to be positive for each observation in our sample. The sequence of estimated  t * is our land price series for Model 7, P L7t , and this series is plotted in Chart 3 below and is listed in Table A3 of the Appendix.
The most important point we learned from running this regression model is that residential property sales in Japan with and without a structure on the property are qualitatively different. Taking this difference into account led to much more reasonable estimated structure depreciation rates. In our next model, we allow the per square meter price of land to vary as the size of the land plot increases. Recall that we have restricted the range of the land variable to 0.5  L tn  2.5. 43 We allow the price of land to be piecewise linear function of the plot size with 3 break points; 1, 1.5 and 2. Using these land area break points, we found that 7492 observations fell into the interval 0.5  L tn < 1, 4711 observations fell into the interval 1  L tn < 1.5, 1414 observations fell into the interval 1.5  L tn < 2 and 456 observations fell into the interval 2  L tn  2.5. We label the four sets of observations that fall into the 43 Recall that our units of measurement for land are in 100 meters squared so that L tn = 1 means that observation n in period t had a land area equal to 100 m 2 . above four groups as groups 1-4. For each observation n in period t, we define the four land dummy variables, D L,tn,k , for k = 1,2,3,4 as follows: 44 (28) D L,tn,k  1 if observation tn has land area that belongs to group k;  0 if observation tn has land area that does not belong to group k.
These dummy variables are used in the definition of the following piecewise linear function of L tn , f L (L tn ), defined as follows: (29) f L (L tn ,)  D L,tn,1 [ 0 L 0 +  1 (L tn  L 0 )] + D L,tn,2 [ 0 L 1 +  1 (L 1  L 0 ) +  2 (L tn L 1 )] + D L,tn,3 [ 0 L 0 +  1 (L 1  L 0 ) +  2 (L 2 L 1 ) +  3 (L tn L 2 )] + D L,tn,4 [ 0 L 0 +  1 (L 1  L 0 ) +  2 (L 2 L 1 ) +  3 (L 3 L 2 ) +  4 (L tn L 3 )] where   [ 0 , 1 , 2 , 3 , 4 ] and the  k are 5 unknown parameters and L 0  0.5, L 1  1, L 2  1.5 and L 3  2. The function f L (L tn ,) defines a relative valuation function for the land area of a house as a function of the plot area. Thus  0 can be interpreted as the marginal price of land for plots between 0 and 0.5,  1 can be interpreted as the marginal price of land for plots between 0.5 and 1,  2 can be interpreted as the marginal price of land for plots between 1 and 1.5 and so on.
where the function f L is defined above by (29) and  tn is an error term. There are 44 unknown land price parameters  t , 1 land only premium parameter , 64 land price height parameters  ij , 5 marginal price of land parameters  k and 1 depreciation rate  to estimate. However, as was the case with Model 4 and the previous Model 7, some cells in our grid of cells are empty and so we cannot estimate 9 of the  ij and so there are only 55  ij parameters to estimate. Also, we cannot identify all of the  t and  k so we impose the normalizations  1 = 1 and  1 = 1. Thus we are left with 104 unknown parameters to estimate.
As starting coefficient values for Model 8, we used the final coefficient estimates from Model 7 plus the starting values  0 =  2 =  3 =  4 = 1. The final log likelihood for Model 8 was 328.27 points higher than the final log likelihood for Model 7 for adding 4 lot size parameters. The resulting R 2 for Model 8 was 0.8222 (the Model 7 R 2 was 0.8175) and the estimated geometric depreciation rate was  * = 3.44% with a T statistic of 16.1. The estimated  was  * = 1.091 (T statistic = 155.9). Thus a property without a structure sold at an 9.1% premium compared to a similar property without a structure. There were 3 negative estimates for the land price height parameters:  01 * ,  67 * and  77 * . As usual, the 44 Note that for each observation, the land dummy variables sum to one; i.e., for each tn, D L,tn,1 + D L,tn,2 + D L,tn,3 + D L,tn,4 = 1. estimated land prices, g 7 (x tn ,y tn , * ) for t = 1,...,T and n = 1,...,N(t), turned out to be positive for each observation in our sample. The sequence of relative marginal valuations of land (the  k * ) were as follows (with the T statistics in brackets):  0 * = 1.50 (41.2),  1 * = 1 (imposed restriction),  2 * = 1.16 (35.1),  3 * = 1.23 (35.9) and  4 * = 0.89 (13.9). Thus as lot size increases, the per meter price of land eventually decreases. The sequence of estimated  t * is our index of land prices for Model 8, P L8t , and this series is plotted in Chart 3 below and is listed in Table A3 of the Appendix. In our next model, we allow the per square meter price of a square meter of structure to vary as the floor space of the structure increases. The rational for this model is that bigger houses are likely to be of higher quality. Recall that we have restricted the range of the structure floor space variable to 0.5  S tn  2.5. 45 We allow the price of a square meter of floor space area to be piecewise linear function of the overall floor space size with 2 break points; 1 and 1.5. Using these structure area break points, we found that 2768 observations fell into the interval 0.5  S tn < 1, 2020 observations fell into the interval 1  S tn < 1.5 and 792 observations fell into the interval 1.5  S tn  2.5. We label the 3 sets of observations that fall into the above 3 groups as groups 1-3. For each observation n in period t, we define the 3 structure dummy variables, D S,tn,m , for m = 1,2,3 as follows: 46 (31) D S,tn,m  1 if observation tn has structure area that belongs to group m;  0 if observation tn has structure area that does not belong to group m.
These dummy variables are used in the definition of the following piecewise linear function of S tn , f S (S tn ), defined as follows: (32) f S (S tn ,)  D S,tn,1 [ 0 S 0 +  1 (S tn  S 0 )] + D S,tn,2 [ 0 S 1 +  1 (S 1  S 0 ) +  2 (S tn S 1 )] + D S,tn,3 [ 0 S 0 +  1 (S 1  S 0 ) +  2 (S 2 S 1 ) +  3 (S tn S 2 )] where   [ 0 , 1 , 2 , 3 ] and the  m are 4 unknown parameters and S 0  0.5, S 1  1 and S 2  1.5. The function f S (S,) defines a relative valuation function for the floor space area of a house as a function of the total floor space area, S. Thus  0 can be interpreted as the marginal price of a structure for structures with total floor between 0 and 0.5,  1 can be interpreted as the marginal price of an additional square meter of structure for total structure areas between 0.5 and 1,  2 can be interpreted as the marginal price of an additional square meter of structure for total structure areas between 1 and 1.5 and  3 can be interpreted as the marginal price of an additional square meter of structure for total structure areas between 1.5 and 2.5.
Model 9 is the following nonlinear regression: 45 Recall that our units of measurement for floor space are in 100 meters squared so that S tn = 1 means that observation n in period t had floor space area equal to 100 m 2 . 46 Note that for each observation tn where S tn > 0, the structure dummy variables sum to one; i.e., for each such tn, D S,tn,1 + D S,tn,2 + D S,tn,3 = 1. There were 5580 observations which had a positive S tn . Note also that f S (S tn ,) = 0 if S tn = 0.
where the function f L is defined above by (29), the function f S is defined by (32) and  tn is an error term. There are 44 unknown land price parameters  t , 1 land only premium parameter , 64 land price height parameters  ij , 5 marginal price of land parameters  k , 4 marginal price of structure parameters  m and 1 depreciation rate  to estimate. However, as was the case with the previous Model 8, some cells in our grid of cells are empty and so we cannot estimate 9 of the  ij and so there are only 55  ij parameters to estimate. Also, we cannot identify all of the  t and  k so we impose the normalizations  1 = 1 and  1 = 1. We also impose the normalization  1 = 1 in order to use the official structure building cost index to value new buildings with total floor space area between 0.5 and 1. This will ensure that our estimated structure values for new buildings are close to estimated structure values based solely on the official cost index. Thus we are left with 107 unknown parameters to estimate.
As starting coefficient values for Model 9, we used the final coefficient estimates from Model 8 plus the starting values  0 =  2 =  3 = 1. The final log likelihood for Model 9 was 136.32 points higher than the final log likelihood for Model 8 for adding 3 structure size parameters. The resulting R 2 for Model 9 was 0.8256 (the Model 8 R 2 was 0.8222) and the estimated geometric depreciation rate was  * = 4.35% with a T statistic of 18.6. The estimated  was  * = 1.159 (96.8). Thus a property without a structure sold at an 15.9% premium compared to a similar property without a structure. As usual, there were 3 negative estimates for the land price height parameters:  01 * ,  67 * and  77 * . Also as usual, the estimated land prices, g 7 (x tn ,y tn , * ) for t = 1,...,T and n = 1,...,N(t), turned out to be positive for each observation in our sample. The sequence of relative marginal valuations of land (the  k * ) were as follows:  0 * = 1.509 (41.7),  1 * = 1 (imposed restriction),  2 * = 1.160 (35.7),  3 * = 1.238 (36.8) and  4 * = 0.898 (14.3). The sequence of relative marginal valuations of floor space area (the  m * ) were as follows:  0 * = 1.461 (21.8),  1 * = 1 (imposed restriction),  2 * = 2.331 (18.6) and  3 * = 1.572 (10.0). The sequence of estimated  t * is our index of land prices for Model 9, P L9t , and this series is plotted in Chart 3 below and is listed in Table A3 of the Appendix.
Note that the 3 models that we introduced in this section did not require any additional property characteristics: we simply made better use of the information on S and L. However, for our next model, we make use of the two subway variables: TW, the walking time in minutes to the nearest subway station, and TT, the subway running time in minutes to the Tokyo central station. The sample minimum time for TW was 1 minute and the minimum time for TT was 8 minutes. Our next model allows the price of land to decrease as these two subway time variables increase. These variables have proven to be highly significant in other studies of Tokyo property prices. 47 Thus Model 10 is the following nonlinear regression: (34) V tn =  t [D S,tn + D L,tn ]g 7 (x tn ,y tn ,)f L (L tn ,) [1+(TW tn 1)][1+(TT tn 8)] + P St (1  ) A(t,n) f S (S tn ,) +  tn ; t = 1,..., 44; n = 1,...,N(t).
where the function f L is defined above by (29), the function f S is defined by (32),  is the percentage change in the price of land due to a one minute increase in walking time,  is the percentage change in the price of land due to a one minute increase in subway running time to Tokyo central station and  tn is an error term. There are 44 unknown land price parameters  t , 1 land only premium parameter , 64 land price height parameters  ij , 5 marginal price of land parameters  k , 4 marginal price of structure parameters  m , 2 subway time parameters and 1 depreciation rate  to estimate. However, as was the case with the previous models in this section, some cells in our grid of 49 cells are empty and so we cannot estimate 9 of the  ij and so there are only 55  ij parameters to estimate. Also, we cannot identify all of the  t and  k so we impose the normalizations  1 = 1 and  1 = 1. We also impose the normalization  1 = 1. Thus we are left with 109 unknown parameters to estimate.
As starting coefficient values for Model 10, we used the final coefficient estimates from Model 9 plus the starting values  = 0 and  = 0. The final log likelihood for Model 10 was 531.13 points higher than the final log likelihood for Model 9 for adding 2 subway time parameters. The resulting R 2 for Model 10 was 0.8383 (the Model 9 R 2 was 0.8256) and the estimated geometric depreciation rate was  * = 4.52% (21.7). The estimated  was  * = 1.137 (104.1). Thus a property without a structure sold at an 13.7% premium compared to a similar property without a structure. As usual, there were 3 negative estimates for the land price height parameters:  01 * ,  67 * and  77 * . Also as usual, the estimated land prices, g 7 (x tn ,y tn , * ) for t = 1,...,T and n = 1,...,N(t), turned out to be positive for each observation in our sample. The sequence of relative marginal valuations of land (the  k * ) were as follows:  0 * = 1.481 (42.1),  1 * = 1 (imposed restriction),  2 * = 1.137 (35.1),  3 * = 1.216 (33.5) and  4 * = 0.930 (13.3). The sequence of relative marginal valuations of floor space area (the  m * ) were as follows:  0 * = 1.418 (23.0),  1 * = 1 (imposed restriction),  2 * = 2.255 (19.3) and  3 * = 1.540 (10.2). Thus the estimated  k * and  m * did not change significantly from the estimates for Model 9. The estimated subway time parameter were  * =  0.0123 (33.3) and  * = 0.00606 (19.1). Thus a 1 minute increase in walking time to the nearest subway station decreases the per square meter land price by 1.2%. A 1 minute time in commuting time to the Tokyo central station decreases land value by 0.6%. The sequence of estimated  t * is our land price index for Model 10, P L10t , and this series is plotted in Chart 3 below and is listed in Table  A3 of the Appendix. In our next model, we introduce the number of bedrooms NB tn as a property characteristic that can affect structure value if the property n in quarter t has a structure on it. For the properties in our sample, the number of bedrooms ranged from 2 to 8. 48 Since there were relatively few observations with 6, 7 or 8 bedrooms, we grouped these last 3 categories into a single category. Define the bedroom dummy variables D NB,tn,i for observation tn as follows for i = 2,3,4,5; t = 1,...,44 and n = 1,...,N(t): (35) D NB,tn,i  1 if observation tn has a structure on it with i bedrooms;  0 elsewhere.
For bedroom group 6, define D NB,tn,6  1 if observation tn has a structure on it with 6, 7 or 8 bedrooms and define D NB,tn,6  0 elsewhere.
Model 11 is the following nonlinear regression: where the all of the functions and parameters which appear in (36) were defined in the previous model except that we have now added 5 bedroom variables,  2 ,  3 ,  4 ,  5 and  6 . We make the same normalizations as we made in Model 10 and in addition, we set  2 = 1. Thus we have added 4 additional unknown  i parameters to Model 10 so Model 11 has a total 113 unknown parameters. One might expect the  i parameters to monotonically increase as i increases; i.e., more bedrooms indicates a higher quality structure. But we have already introduced the  m into our hedonic regression model which allows structure quality to increase as the floor space increases. The correlation between the number of bedrooms and the structure area is 0.93500 and thus there will be a multicollinearity problem in using both of these variables in our nonlinear regression. Thus we cannot expect the  i and  m to increase monotonically as i and m increase.
As starting coefficient values for Model 11, we used the final coefficient estimates from Model 10 plus the starting values  i = 1 for i = 3,4,5,6.The final log likelihood for Model 11 was 75.03 points higher than the final log likelihood for Model 10 for adding 4 number of bedroom parameters. The resulting R 2 for Model 11 was 0.8400 (the Model 10 R 2 was 0.8383). As usual, there were 3 negative estimates for the land price height parameters:  01 * ,  67 * and  77 * . Also as usual, the estimated land prices, g 7 (x tn ,y tn , * ) for t = 1,...,T and n = 1,...,N(t), turned out to be positive for each observation in our sample. Recall that  2 was set equal to 1. The remaining bedroom parameter estimates were as follows:  2 * = 1 (imposed restriction),  3 * = 1.1587 (20.6),  4 * = 1.0863 (21.4),  5 * = 0.9116 (20.0) and  6 * = 0.7701 (18.6). Evidently, a house with more bedrooms did not seem to increase the quality of the structure. The sequence of relative marginal valuations of floor space area (the  m * ) were as follows:  0 * = 1.1573 (11.1),  1 * = 1 (imposed restriction),  2 * = 2.4201 (14.2) and  3 * = 1.7643 (10.7). Thus, for the most part, houses with more floor space were of higher quality. The collinearity of the number of bedrooms with the floor space of the structure explains the counterintuitive results for the  i * . The remaining parameters for Model 11 were much the same as the Model 10 estimated parameters. As usual, the sequence of estimated  t * is our land price series for Model 11, P L11t , and this series is plotted in Chart 3 below and is listed in Table A3 of the Appendix.
The final additional variable that we introduced into our property nonlinear regression model was the width of the land plot, W tn for property sale n in period t. Recall that W tn is measured in 10ths of a meter and the range of this property width variable was 25 to 90. Other residential property hedonic regression models have shown that this variable is a very significant one: the greater is the lot width, the more valuable is the land plot. Model 12 is the following nonlinear regression: (37)V tn =  t [D S,tn +D L,tn ]g 7 (x tn ,y tn , ...,44; n = 1,...,N(t).
where the all of the functions and parameters which appear in (37) were defined in the previous model except that we have now added the property width parameter . We make the same normalizations as we made in Model 11. Thus we have added 1 additional unknown parameter to Model 11 so Model 12 has a total 114 unknown parameters. Our expectation is that  will be positive. also turned out to be negative. But, as usual, the estimated land prices, g 7 (x tn ,y tn , * ) for t = 1,...,T and n = 1,...,N(t), turned out to be positive for each observation in our sample. The estimated parameters for this model are listed in the Appendix; see Table A4. As usual, the sequence of estimated  t * is our land price series for Model 12, P L12t , and this series is plotted on Chart 3 is also listed in Table A3 of the Appendix.
Although the fact that Model 12 generated 4 negative estimated  ij * did not lead to any negative predicted prices for land for the properties in our sample, these negative estimates could lead to negative land prices for properties not in our sample. Hence, it may be useful to perform a final regression where we restrict the  ij to be nonnegative. 49 This can be done by replacing  01 ,  67 ,  77 and  52 in the function g 7 (x tn ,y tn ,) by squares of parameters and then rerunning the model defined by (37). Model 13 is the resulting model. The final log likelihood for Model 13 was 1.19 points lower than the final log likelihood for Model 12 as a result of imposing nonnegativity on the height parameters  01 ,  67 ,  77 and  52 by entering these parameters as squares in the function g 7 (x tn ,y tn ,). The R 2 for Model 13 was 0.8488, which is the same as the R 2 for Model 12. Thus imposing nonnegativity constraints on the  ij did not lead to a significant loss of fit. The estimated 49 Nonparametric methods for the estimation of functions of one variable tend to become unreliable for observations close to the boundaries of the domain of definition of the independent variable because the nonparametric method will tend to fit the error terms near the end points of the sample range; see Diewert and Wales (2006;118) and the literature cited there. The same problem carries over to the nonparametric estimation of surfaces. Forcing the  ij to be nonnegative will tend to mitigate this problem. parameters for this model are listed in the Appendix; see Table A5. As usual, the sequence of estimated  t * is our land price series for Model 13, P L13t , and this series is listed in Table A3 of the Appendix. It does not appear on Chart 3 because the P L13t are virtually identical to the P L12t .
Our final model in this section is a Ward dummy variable model that adds more explanatory property characteristics to Model 6 in the previous section. This model was defined by equations (25). Model 14 is defined by the following nonlinear regression model:  Table A6 in the Appendix. 50 The sequence of land price indexes is the series of estimated coefficients, the  t * . This series is labeled as P L14t and is listed in Table A3 and plotted in Chart 3 below.
Excluding the location parameters (the  ij * for Model 13 and the  j * for Model 14), it can be seen that the remaining parameters ( * , the  k * ,  * ,  * ,  * , the  m * ,  * and the  i * ) are roughly similar across Models 13 and 14 and the land price coefficients (the  t * ) are very close to each other. Our tentative conclusion at this point is that neighbourhood dummy variable models do not fit the data quite as well as a spatial coordinate model but the two types of model generate much the same land prices and hence overall residential property price indexes. 51 Looking at Chart 3, it can be seen that Model 14, the model that used Ward dummy variables to take into account location effects on the price of land, produced the lowest measure of residential land price inflation in Tokyo. Our best spatial coordinate models, Models 12 and 13, 52 had the next lowest measure of land price inflation. The land price indexes generated by Models 7-11 are marginally above the Model 13 and 14 indexes. 50 The estimated parameter values for , , ,  and  for Models 13 and 14 were 1.125, 0.0130, 0.0668, 0.00402, 0.0417 and 1.155, 0.0136, 0.00945, 0.004393, 0.0402 respectively. Thus the estimates were broadly similar for our best spatial coordinates model and our best Ward dummy variable model. 51 Hill and Scholz (2018) came to the same conclusion for Sydney overall residential property price indexes. 52 We did not plot the land price index for Model 13 since it could not be distinguished from the Model 12 index.
In the following section, we compute the overall residential property price indexes that are generated by Models 7-14 and we compare the resulting indexes with a traditional log price time dummy property price index.

Overall Residential Property Price Indexes
Models 7-14 in the previous section all have the same general structure in that property value is decomposed into the sum of land value plus structure value plus an error term. For example, using Model 8, the predicted value of property n in quarter t, V tn , is equal to the predicted land value,  t * (D S,tn +  * D L,tn )g 7 (x tn ,y tn , * )f L (L tn , * )  V Ltn , plus predicted structure value, P St (1   * ) A(t,n) S tn  V Stn . Thus quarter t total predicted land value is V Lt   n=1 N(t) V Ltn and quarter t total predicted structure value is V St   n=1 N(t) V Stn . The period t price of land for Models 7-14, P Lt , is always  t * and the corresponding period t price of a structure is always p St  P St /P S1 for t = 1,...,44 where P St is the official structure cost per m 2 of structure. For all models, define the corresponding period t aggregate quantity of land and structure as Q Lt  V Lt /P Lt and Q St  V St /p St for t = 1,...,44. Thus the basic price and quantity data for each model are (P Lt ,p St ,Q Lt ,Q St ) for t = 1,...,44. The overall property price indexes for Models 7-14 are calculated as Fisher (1922) chained indexes using the price and quantity data for land and structures that has just been defined. Label the resulting overall property price indexes for quarter t as P 7t , P 8t , P 9t , P 10t , P 11t , P 12t , P 13t , and P 14t . These series are listed in Table A7 in the Appendix. As was the case with the corresponding land price indexes, there these overall property price indexes approximate each other fairly closely. There is one additional overall property price index that we calculate in this section and that is an index that is based on a "traditional" hedonic property price regression that uses the logarithm of price as the dependent variable and has time dummy variables. 53 Define the kth time dummy variable D T , tn,k for property n sold in period t as follows: for t = 1,...,44; n = 1,...,N(t); k = 2,3,...,44: (39) D T,tn,k  1 if t = k; D T,tn,k  0 if t  k.
where lnV tn and lnL tn denote the natural logarithms of property value V tn and property lot size L tn respectively, the D T,tn,k are time dummy variables, the D W,tn,j are Ward dummy variables, S tn is the floor space area of the property (if there was no structure on the property n in period t, S tn  0), TW tn and TT tn are the subway time variables, W tn is the lot width variable, A tn is the age of the structure on property n sold in period t (A tn  0 if the property had no structure) and the D NB,tn,i are the bedroom dummy variables. The log likelihood of this model cannot be compared to the log likelihood of the previous models because the dependent variable is now the logarithm of the property price instead of the property price. There are 75 unknown parameters in the model defined by equations (40). The R 2 for Model 15 was 0.8323. Set  1 * = 0 and denote the estimated  2 to  44 by  2 * ,  3 * ,...,  44 * . The sequence of overall property price indexes P 15t generated by this model are the exponentials of the  t * ; i.e., define P 15t  exp[ t * ] for t = 1,...,44. This series is also listed in Table A7 of the Appendix.
Chart 4 below compares several of the overall residential property prices that are defined above: the mean property price index P Mean t that appeared in Chart 1 above, P 9t (this is based on Model 9 which did not use information on the subway variables, the number of bedrooms and the lot width variable), Model 13 (P 13t :our best Colwell spatial coordinates model), Model 14 (P 14t :our best Ward dummy variable model) and Model 15 (P 15t :our best traditional log price time dummy hedonic regression model that used all of our property price characteristics except the spatial coordinates). 53 This type of model does not generate reasonable separate land and structure subindexes; see Diewert, Huang and Burnett-Isaacs (2017;24-25) for an explanation of this assertion. 54 We ran an initial linear regression using L tn as an independent variable in place of lnL tn . However, this regression had a log likelihood which was 204.99 points lower than our final linear regression defined by (40). The R 2 for this preliminary regression was 0.8274. Note that we could not use lnS tn as an independent variable because many observations had no structure on them and hence S tn is equal to 0 for these properties and thus we could not take the logarithm of 0.
Several points emerge from a study of Chart 4:  The mean index, P Mean t , has a large downward bias as compared to the other 4 indexes which is due to its neglect of structure depreciation. However, the movements in this index are similar to the movements in the other indexes.  The property price index P 15t generated by a traditional log price time dummy hedonic regression model has a downward bias but it is not large. 55  The Model 9 property price index, a Colwell spatial coordinates model that used only the 4 fundamental characteristics of a residential property (land plot area, structure floor space area, the age of the structure and some locational variable) 56 generated an overall property price index P 9t that is quite close to our best Colwell spatial model, Model 14 which generated the overall property price index P 14t . Thus it is probably not necessary for national statistical agencies to collect a great deal of information on housing characteristics in order to produce a decent overall property price index (as well as decent land and structure subindexes).  The Model 14 property price index, P 14t , that used local neighbourhood information about properties instead of spatial coordinate information turned out to be fairly close to our best Colwell spatial index, P 13t . Thus following the advice of Hill and Scholz (2018), it is probably not necessary to utilize spatial coordinate information in order to construct a satisfactory overall residential property price index. 55 Diewert (2010) also observed a similar result. 56 In addition to these four fundamental variables, we need an exogenous building cost measure in order to implement our basic models.

Conclusion
Here are the main points that emerge from our paper:  Satisfactory residential land price indexes and overall residential property price indexes can be constructed using local neighbourhood dummy variables as explanatory variables in residential property regression models. It is not necessary to use spatial coordinates to model location effects on property prices.  However, the use of spatial coordinates to model location effects does lead to better fitting regression models.  The most important housing characteristics information that is needed in order to construct satisfactory residential land and overall property price indexes is information on lot size, floor space area of the property structure (if there is a structure on the property), the age of the structure and some information on the location of the property. In order to obtain a satisfactory land price index, our method requires the use of exogenous information on residential construction costs.  However, additional information on the characteristics of the property will improve the fit of our hedonic regressions but the effects of the additional information on the resulting land and structure price indexes was minimal for our application to Tokyo residential property price indexes.  Having land only sales of residential properties should help improve the accuracy of the land price index that is generated by a property regression model. However, for our Japanese data, we found that the value of the land component of a land only property earned a 10-15% premium over the land value of a neighbouring property of the same size but with a structure on the property. We attribute this premium to the costs of demolishing an older structure.  Our models that used spatial coordinates to account for locational effects on the value of land used Colwell's nonparametric method for fitting a surface. This nonparametric method is much easier to implement than the penalized least squares approach used by Hill and Scholz (2018) to model locational effects on property prices. In section 4 of the paper, we pointed out some of the theoretical advantages of Colwell's method.  The potential bias in using property price indexes that are based on taking mean or median averages of property prices in a period can be very large. Typically, these methods will have a downward bias due to their neglect of structure depreciation.  A traditional log price time dummy hedonic regression model that has structure age as an explanatory variable will typically reduce the bias that is inherent in an index based on taking averages of property prices. For our Tokyo data, we found that the traditional hedonic regression model led to an index which had a small downward bias; see Chart 4 in the previous section.
It should be noted that if a national statistical agency were to apply the regression models that were explained in this paper, they would not just run a regression using the entire sample data. A rolling window approach would be used: a window length of say 12 to 16 quarters would be chosen and as the data for each new quarter was processed, the movements in the index over the last two quarters in the sample would be used to update the last published index value; see Shimizu, Nishimura and Watanabe (2010) for an application of this rolling window approach. 57 Our emphasis in this paper (and in other papers 58 ) has been to develop reliable methods for the construction of the land component of residential property price indexes. This task is important for national statistical agencies because the Balance Sheet Accounts in the System of National Accounts requires estimates for the price and volume of land used in production and consumption. In particular, this information is required in order to obtain more accurate estimates of national (and sectoral) Total Factor Productivity growth 59 but for the vast majority of countries, this information is simply not available. We hope that the methods explained in the present paper will be of use to national statistical agencies in developing improved estimates for the price and volume of land used in production and consumption.  19230 1.19785 1.19483 1.20930 1.19765 1.20595  28 1.20716 1.20434 1.19851 1.22033 1.20889 1.20877  29 1.23350 1.23573 1.22889 1.24735 1.23748 1.24309  30 1.27970 1.28027 1.28135 1.29594 1.28089 1.28484  31 1.22695 1.23436 1.22913 1.23777 1.23266 1.22324  32 1.26592 1.26387 1.26272 1.28812 1.27056 1.26938  33 1.20469 1.21042 1.21491 1.22545 1.21578 1.18474  34 1.15314 1.15323 1.15710 1.17238 1.15621 1.17714  35 1.15982 1.15548 1.15607 1.16891 1.16001 1.17180  36 1.05014 1.05138 1.05357 1.05976 1.04599 1.