### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Measurement Model for the Latent Variable Human Capital
- 3. The Use of Administrative Archives as Data Source in Italy
- 4. Labour Market Administrative Archives in Lombardy Region
- 5. Application to HC Static Model: the Data
- 6. HC in a Longitudinal Perspective
- 7. Conclusions
- References

**Abstract** This paper focuses on the estimation of the latent variable human capital (HC) at disaggregated level (worker) by available routinely institutional data flows. In particular we utilize the Lombardy region administrative archive ‘Employment Centers of the Province of Milan’, collecting information about careers of workers in the private sector of the Milan area, and administrative flows collecting mandatory workers' individual income tax returns, filed with the National Internal Revenue Service. First, we propose and empirically estimate HC scores in a static (referred to 2004) framework, by means of a realistic measurement model within causal relationships among endogenous and exogenous (investment) HC indicators. Furthermore, the model also specifies a set of (concomitant) indicators that, not belonging to HC investment indicators, have causal impact on endogenous variables and on HC scores, too. Second, we propose a longitudinal analysis (period 2000–2004) aimed to investigate how workers' earned income growth rates vary over workers' educational levels and other personal characteristics. The empirical results of both analyses confirm the characteristics of the Italian job market, denoted by marked inequalities, and knowledge regarding the process of school to work transition, characterized by a weak incidence of education on longitudinal trajectories of earned income.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Measurement Model for the Latent Variable Human Capital
- 3. The Use of Administrative Archives as Data Source in Italy
- 4. Labour Market Administrative Archives in Lombardy Region
- 5. Application to HC Static Model: the Data
- 6. HC in a Longitudinal Perspective
- 7. Conclusions
- References

Over the last 50 years, the concept of human capital (HC) or productivity of the human factor has been systematically developed in a broad swath of the economic literature.

Much of the original work that led to the ‘human capital revolution’ of the late 1950s and early 1960s followed an earlier revolution in economic thought spawned by the neoclassical growth model (Solow, 1956). This theory, attempting to provide a systematic and quantitative assessment of the sources of economic growth in market economies, includes HC as an important factor. New growth theories were typically divided into those theories that extend capital to include accumulation of HC as a normal input (Lucas, 1988) and those that see HC stock just as a facilitator of technological development (Romer, 1986).

While the neoclassical model focused on the contribution of conventional measures of labour and physical capital as the basic reproducible inputs, early implementations of the model (Solow, 1956; Denison, 1962) recognized the existence of a large unexplained residual, which was generally ascribed to the unaccounted role of technology.

The quality of the labour input, as measured by education, skill and entrepreneurship, was one obvious missing link in this accounting experiment, and this may have set the stage for turning attention to investment in human beings. Also, while the neoclassical model accounted for the ‘functional’ distribution of income, it did not address the sources of wage disparity and the personal distribution of income.

To support empirical evidence in cross-country growth regressions, several proxies have been proposed (Temple, 2000; Krueger and Lindahl, 2001; Lee and Barro, 2001; Pritchett, 2001) in the literature to measure HC. These measures were generally based on formal education such as enrolment ratios, literacy rates or average years of schooling. However, the principal defect common to all of them is that they do not really estimate the amount of HC, providing only an aggregate measure of the education stock, excluding things such as costs of health, work experience, ability etc.

To surpass these shortcomings, other approaches have gained popularity in the economic literature for the quantitative estimation of HC: the retrospective (Kendrick, 1976; Eisner, 1985) and the prospective (Jorgenson and Fraumeni, 1989) methods. The former dealt with the cost of production in the tradition of Engel (1883), whereas the latter, followed the pioneering contribution of William Farr (1853), dealt with the present actuarial value of an individual's expected income related to his skill and education (Dagum and Slottje, 2000).

Notwithstanding the early awareness of the importance of HC, it was not before the second half of the twentieth century that economists such as Theodore W. Schultz, Gary S. Becker and Jacob Mincer (see Becker, 1964) developed a thorough theory of HC.

These authors undertook a thorough study of the concept of HC and analysed the main forces that contribute to its formation and accumulation, focusing particularly on the role of knowledge and ability in accounting for productivity growth and on investment in HC as a determinant of personal earnings. Since individuals are paid wages in accord with their productive capabilities, education, increasing future labour productivity and future income can thus be seen as an investment in HC, which then is embodied in the human being. This form of productivity typically depends on amount of education, educational achievement, work experience and on-the-job training, job title, health history, parents' level of education and socio-economic status.

To empirically assess the role of investment in HC as a key determinant of personal earnings, Mincer (1958) introduced earnings functions, known also as age–earnings profiles, HC production functions or Mincerian wage equations. The standard HC wage function is of the form specified in equation (1):

- (1)

where ln *y*_{i} is the (natural logarithm) of earned income for the *i*th economic unity, *s*_{i} is the level of education (usually years of schooling), *x*_{i} the labour market experience and *u*_{i} the stochastic error term; in this equation parameter β_{1} plays an important role, measuring the returns on (an additional year of) education.

Many authors discuss several problems of this basic Mincerian wage equation (see Le *et al.*, 2003; Wößmann, 2003; Oxley *et al.*, 2008; Folloni and Vittadini, 2010, for a complete review). Among others, Griliches (1977) questions whether the wage equation really has this particular form, whether other explanatory variables should be included as well, whether the relation is stable across subpopulations and in time and how this equation should be estimated. In turn these questions gave rise to numerous other publications. In particular authors concentrate their attention on two sub-problems connected with the omitted individual ability and the self-selection of education.

Since individual ability, though not perfectly described by amount or level of education, probably has an important impact on productive potential and therefore earnings, these authors question whether the above-stated wage equation (1) suffers from omitted ability bias (Le *et al.*, 2003). Ability or proxies of this variable – some measure of intelligence or grades obtained during education – are often not included in available datasets. Several empirical investigations indicate that such a problem does indeed exist: the return on schooling appears to be underestimated because of this omitted variable bias (Blackburn and Neumark, 1993, 1995), whereas other investigations do not detect a bias attributable to leaving ability out of the specification (Ashenfelter and Krueger, 1994).

Second, in HC theory, education is considered to be an investment problem that every individual must face. The returns on this investment include the joy of understanding and a large store of knowledge and, perhaps most importantly, the wage that a worker can earn in the labour market. The problem is that individuals can opt for a particular level of education on their own and that this choice will depend, among other things, on the wage they expect to earn in the labour market. Individuals who believe they have a relatively high level of returns on education will opt for additional schooling and will usually also earn a high wage. Consequently, we can question whether the schooling variable in the Mincerian wage equation is a real exogenous variable. Willis and Rosen (1979) discuss the problem of the self-selection of education. This problem of self-selectivity has been the object of research in many empirical contributions to the literature. Examples are studies of Garen (1984) and Harmon and Walker (1995). Two alternative estimation techniques are proposed to overcome the problem of endogeneity of schooling. First, the instrumental variables estimation technique can be applied (Bound *et al.*, 1995). Second, a simultaneous equation model can be built which models both wages and, say, years of schooling. Heckman (1979), discussing the problem of endogeneity of labour market participation, provides the pioneering two-step estimation technique for this type of model. Both methods are applied by Harmon and Walker (1995), concluding that the endogeneity of schooling is a serious problem which gives rise to an underestimation of the returns on education by over 50%.

### 2. Measurement Model for the Latent Variable Human Capital

- Top of page
- Abstract
- 1. Introduction
- 2. Measurement Model for the Latent Variable Human Capital
- 3. The Use of Administrative Archives as Data Source in Italy
- 4. Labour Market Administrative Archives in Lombardy Region
- 5. Application to HC Static Model: the Data
- 6. HC in a Longitudinal Perspective
- 7. Conclusions
- References

The above stated problems, plaguing Mincer's wage functions, greatly depend on the coincidence between HC individual stock and number of years of schooling. The Mincer wage function approach has the main drawback that it does not really estimate HC, making it coincide with years of schooling (Le *et al.*, 2006). Nevertheless, it is difficult to admit that all students of one class have the same amount of HC.

In this section we propose a different approach furnishing furthermore a way to resolve omitted ability bias and endogeneity of education.

Within a more pragmatic approach, HC can be reasonably considered a broader multi-dimensional non-observable construct, depending on several and interrelated causes, and indirectly measured by many observed indicators; briefly, it is supposed to be a latent variable (LV).

Sharing the approach of considering HC as an LV (whose dimensionality has to be investigated), we have to specify a measurement model consistent with a realistic economic process of HC accumulation, and to utilize a proper LV technique for the estimation of its scores. To this end, the schemes hypothesized in recent researches (Dagum and Slottje, 2000; Dagum *et al.*, 2007; Vittadini and Lovaglio, 2007; Lovaglio, 2008a) suggest that a set of indicators (e.g. containing earned income), called reflective indicators, is directly affected by the unidimensional LV HC, whose unobservable scores depend on specific indicators, called formative indicators (e.g. containing years of schooling), measuring the amount (money, time) of investment in HC.

Nevertheless, the question then is whether such specification, aiming to conceptualize the relationships among these blocks of observed variables, really reflects the full picture.

In fact, a more consistent system would take into account other exogenous socio-demographic factors (like sex, ethnicity, marital status, area of residence, occupation, wealth of origin household, parents' socio-economic status and so on) that do not belong to the HC investment indicators set but may have a causal impact on reflective indicators (endogenous) of HC.

Statistically speaking, these factors (observed exogenous variables directly linked with the reflective indicators of an LV, without being embedded in its formative block) are called concomitant indicators.

Moreover, a subset of concomitant indicators may also have a causal impact on the LV scores (e.g indicators that reflect opportunity factors of HC formation, such as traditions, cultural elements, natural environmental factors and some social, political, institutional ones). To take into account all considered factors a consistent measurement model for HC is proposed in Figure 1.

The *r*-dimensional LV HC, whose components are collected as columns of matrix **Ξ**, admits two separated blocks of formative (or exogenous) indicators: specific indicators of investment in HC (**Z**) and concomitant indicators (**W**_{1}), that causally impact on the HC scores and on their reflective indicators (**Y**).

Methodologically, the LV approach potentially resolves the problems of omitted individual ability bias because, utilizing the entire covariates' profile of individuals, the interactions of individual information (HC investment indicators, viewed as instruments) may capture the ability dimension, surely more than considering only years of schooling; second, since HC will be estimated with error, the proposed method (a sort of ‘covariate measurement error model’) takes into account other sources of misspecification or uncertainty, not captured by instruments. Finally, the specification of causal relations between concomitant indicators **W**_{1} on HC investment indicators **Ξ** allows us to empirically explore the significance of the self-selection process of education.

Consistently with the path diagram of Figure 1, equations (2)–(3) specify the structural model and the measurement model for the *r*-dimensional LV HC, respectively:

- (2)

- (3)

where **Y** is an (*n*, *q*) matrix of reflective indicators, **W**_{1}(*n*, *p*) and **Z**(*n*, *m*) – horizontally merged in the (*n*, *m*+*p*) matrix **W**_{2}= (**Z**, **W**_{1}) – are matrices of formative observed variables, **C**_{1} a (*p*, *q*) matrix of regression parameters of **Y** onto **W**_{1}, **Λ** an (*r*, *q*) matrix of regression coefficients between **Y** and the latent scores, collected in the (*n*, *r*) matrix **Ξ**= (**ξ**_{1}, … ,**ξ**_{r}), **G**_{1}(*m*, *r*), **G**_{2}(*p*, *r*) – vertically merged in the (*m*+*p*, *r*) matrix **G**= (**G**_{1}, **G**_{2}) – are weight matrices for **Z** and **W**_{1} columns to define latent scores **Ξ**, **I**_{n} is an identity matrix of dimensions *n*, vec the operator staking columns and ⊗ the Kroneker product.

Finally, vec(**U**) = (**u**_{1}, … , **u**_{i}, … , **u**_{q})′ is the random vector of errors in equations with zero means, zero covariances (**Θ** is a *q*-dimensional diagonal matrix) and variances θ_{i}. **Ψ** is the matrix of random errors in variables for **Ξ**, whose terms are typically correlated with unit variances and uncorrelated with vec(**U**) terms. Equation (3) assumes that the dimensionality (*r*) of the LV **Ξ**, and therefore that of the linear combination that defines it (**W**_{2}G), is less than the sum of **Z** and **W**_{1} rank (*m*+*p*).

It is interesting to observe that this general model includes many known models as ‘special cases’.

If there is no **W**_{1} model (2)–(3) is the standard LISREL model (Joreskog, 1973), except for observed rather than latent exogenous variables.

Constraining *r*= 1 and eliminating the **W**_{1} matrix in equations (2)–(3) the model is equivalent to the multiple indicators and multiple causes model (MIMIC), proposed by Joreskog and Golberger (1975), whereas in the presence of **W**_{1} and for *r* > 1 we have the MIMIC model with covariates (Moustaki, 2003).

Estimation of the model given by equations (2) and (3) can be carried out by two methods. The first is maximum likelihood estimation applied to the model obtained by substituting equation (2) in equation (3) for **Ξ**. The second method derives structural parameter estimates by the generalized method of moments, minimizing the distance between the theoretical expressions of means (or thresholds if **Y** collects categorical indicators), variances and covariances of observed variables, which are nonlinear functions of model parameters and their sample estimates. The generalized method of moments procedure is made optimal using the variance–covariance matrix of the estimators as the weight matrix. All these estimators are consistent and asymptotically normal (Muthen, 1989; Browne and Arminger, 1995).

However, the first method based on the maximum likelihood approach requires the multivariate normality distribution for LVs, whereas the second approach does not guarantee necessary and sufficient conditions for model identification (see, among others, Vittadini, 1989). Thus, following the spirit of the first method, HC scores are estimated within complex causality relationships specified in the structural model in a non-parametric framework.

#### 2.1 *Estimation of Latent Scores*

Figure 1 and model (2)–(3) make evident that latent scores **Ξ** must be estimated as combinations of HC investment variables **Z** (having causal impact on reflective variables **Y**), but isolating two kinds of causal effects: the direct effect of **W**_{1} on **Y** variables and the spurious effect of **W**_{1} on **Y**, mediated by **Ξ**. Substituting equation (3) into equation (2) and aggregating terms, the reduced form is

- (4)

where **C**_{2}=**G****Λ** and **V**= (**ΨΛ**+**U**).

For identification purposes **W**_{2}G is assumed to be orthonormal, and the errors' structures **Θ** and **Ψ**, empirically indistinguishable in equations (2)–(3), are absorbed in the full (not diagonal) matrix **Ω** (Lovaglio, 2008a).

Following the path analysis rules, the reduced form suggests considering **W**_{2}= (**Z, W**_{1}) as the formative block of **Ξ**, once the spurious effect of **W**_{1} on **Y** has been removed into **W**_{2}.

Equation (4) exhibits an *r*-dimensional LV admitting a reflective (**Y**) and two formative blocks: one full rank (**W**_{1}) and one of deficient rank (**W**_{2}). Thus, parameters and latent scores can be estimated by extending the classical reduced rank regression (RRR) (Davis and Tso, 1982; van der Leeden, 1990), a model aimed at explaining a high proportion of responses' variability by linear components of predictor variables with structured errors, in the presence of concomitant indicators (full rank block).

In situations where concomitant indicators **W**_{1} do not impact on the latent score but only on reflective indicators, **W**_{1} and are perfectly orthogonal (Lovaglio, 2008a). Nevertheless, in the present case this is not possible (because **W**_{1} enters as a sub-matrix in the **W**_{2} matrix). However, since matrix contains residuals of a generalized multivariate regression of **W**_{2} onto **W**_{1}, marginalizing the contribution of **W**_{1} into the **W**_{2} block, it appears as a consistent formative block for **Ξ**. In this perspective, HC scores are estimated as those combinations of **Z** and **W**_{1} indicators explaining **Y**, net of the spurious contributions of **W**_{1} on **Y** and net of the direct effect of **W**_{1} on **Y**.

As regards the reduced rank block, various methods (canonical correlation, redundancy analysis, MIMIC model, fixed factor analysis) may be applied to estimate parameters, depending essentially on the supposed errors' structure **Ω**. Following my experience, the more reasonable solution, based on a lesser restrictive error structure, deals with alternating maximum likelihood, an iterative procedure with Gauss–Seidel optimization, proposed in the context of regression with ordinal data (Gifi, 1981; Breiman and Friedman, 1985). The algorithm iterates two following steps: with **G** known, **Λ** and **Ω** are easily estimated by multivariate regression, whereas for **Ω** known or estimated (**Ω***), **G** and **Λ** are estimated by RRR with known error structure; in particular, **G** is estimated by maximizing the largest *r* eigenvalues of **Y****Ω*******^{−1}Y (with the Gram–Schmidt orthogonalization of ).

Since the parsimony criterion suggests considering **Ξ** of unitary rank, the statistical interpretation of HC scores agrees with the unit variance first linear combination (with weights **g**) of the deficient rank matrix that best fits **Y** in the RRR framework, controlling for other causal effects, specified in the structural model.

### 3. The Use of Administrative Archives as Data Source in Italy

- Top of page
- Abstract
- 1. Introduction
- 2. Measurement Model for the Latent Variable Human Capital
- 3. The Use of Administrative Archives as Data Source in Italy
- 4. Labour Market Administrative Archives in Lombardy Region
- 5. Application to HC Static Model: the Data
- 6. HC in a Longitudinal Perspective
- 7. Conclusions
- References

Another criticism of utilizing the wage function approach is typically imputed to the lack of information contained in the available databases. Willis (1986, pp. 542–543) affirms that Mincer's equation represents a pragmatic method of incorporating some of the major implications of the optimal HC models into a simple econometric framework, which can be applied to the limited information available in census or cross-sectional survey data.

Survey data are periodic and occasional, and hence, by their very nature, discontinuous. To this end, it is necessary to have access to data as detailed and complete as the data found in surveys, with the added factors of more frequent updating capacity and much faster access capability, especially in research fields characterized by rapid structural changes and innovative phenomena such as the job market.

Given that it is not possible to replicate data collection systems as complex and costly as a census every 10 years, administrative registers can be used as an alternative. In fact, while censuses (survey samples) are complete (representative) only in the case of being drawn in predetermined instants, administrative archives describe, by definition, all of the relevant units.

In administrative archives, three main statistical units are considered: (i) juridical units (JU), individual or collective subjects capable of exercising one or more productive activities in compliance with the juridical requirements set out by national law, and, if decreed, registered in mandatory administrative registers; (ii) active units (AU), consisting of JU that are not only capable of, but also actively perform, one or more productive activities on a regular, organized basis; (iii) local units, being the variously denominated places of business (buildings, offices, sales points etc.) where the AU have designated the production equipment (storage facilities, machines, tools and equipment) necessary to perform their productive activity.

First, continuity, meaning the registration of an event at the exact moment in which it occurs since the JU are required to report every modification in real time, is a characteristic unique to administrative registers.

Second, the cost of collecting administrative data, recorded in computerized archives, is notably lower than one shot surveys, and also informative content, being of mandatory type, captures information with a low margin of uncertainty. For example, in responding to statistical surveys regarding balance sheet data, Italian companies are not in alignment with their administrative and financial declarations (Martini, 2000).

The limitations posed by statistical information drawn by updating identifying micro data from a census performed every 10 years has stimulated various experimental efforts by national statistical institutions, aimed at integrating census data with those of administrative registers. In this perspective, the orientation of national statistical institutions and international regulations was aimed at distributing the utility of administrative data as widely as possible, while at the same time limiting the use of survey data which cannot be described indirectly by administrative registers (Statistics Canada, 1988; Eurostat, 1997, 1999).

Beginning with an in-depth study of the experiences of countries such as Denmark, France, Switzerland and Canada, utilizing administrative registers as the basis for reorganizing their statistical systems for industries (Statistics Canada, 1988; Eurostat, 1997), during the 1990s, the Italian institutions responsible for statistical systems were working on the creation of the Register Based Business Statistical System, inspired by the criteria of being all inclusive, manifesting continuity, cost consciousness, and the responsibility of the respondents.

Given the unusual situation presented by Italian administrative registers, managed independently as distinct entities, Italy was forced to develop original statistical integration methodologies (Eurostat, 1999). In 1998, the Italian Institute of Statistics (ISTAT) was already integrating identifying micro data from the Statistical Archive of Active Firms with the partial data from analyses originating from the main statistical surveys conducted on industries, creating an informative statistical system which provided a significant improvement in data quality, enriching and broadening analysis opportunities. Above all, the primary advantage of these experiments, conducted on five large archives and in 31 Italian Provinces, has been the progressive implementation of complex statistical procedures, such as normalization, linkage, optimum modality choice, estimation of missing data and control of data quality (Martini, 2000).

These experiments have demonstrated that the reconstruction of identifying characters from the statistical units through multiple administrative archives (given that none of these, taken individually, resulted as being satisfactory regarding the number and completeness of the characters under consideration) is feasible and much less expensive than using traditional survey results.

### 4. Labour Market Administrative Archives in Lombardy Region

- Top of page
- Abstract
- 1. Introduction
- 2. Measurement Model for the Latent Variable Human Capital
- 3. The Use of Administrative Archives as Data Source in Italy
- 4. Labour Market Administrative Archives in Lombardy Region
- 5. Application to HC Static Model: the Data
- 6. HC in a Longitudinal Perspective
- 7. Conclusions
- References

Following the first effort aimed at integrating large provincial administrative archives collecting information about the job market from the workers' perspective, since 2000, the University Research Centre CRISP (sited in the University of Milan-Bicocca) is intensively working to an integration–normalization process of different provincial employment databases of the Lombardy region in order to estimate a true picture of the job market at local level.

The available archive Employment Centers of the Province of Milan refers to compulsory transmissions of subordinate contract employee declarations, mandatory for industries and recently also for the public administration. It collects information about the longitudinal sequence of vocational experiences for workers in the Province of Milan, who have registered a variation in their employment position after year 2000.

This archive considers subordinated work typologies (temporary agency work, fixed-term contract, with permanent contract, direct attendance and other subordinate contract), but it does not collect information on independent workers and on workers stable in their jobs before 2000 (e.g. all workers with permanent contracts in the public/private sector starting work before year 2000).

Available data focus on information about workers in each vocational experience, such as duration, type of contract, industry, type of occupation and other workers' socio-demographic characteristics, such as age, gender, marital status, region/nation of birth, number of children, schooling level, type of last certification awarded, and participation in regional training courses (duration and type).

In order to perform an HC analysis, this administrative archive is matched with administrative flows regarding income tax returns declarations which provide information regarding annual gross earnings declared to official institutions. It collects workers' individual income tax returns, filed with the National Internal Revenue Service. This database covers the period 2000–2004, referring to workers' individual income tax returns filed from 2001 to 2005.

Both available sources of data have great implications on HC analysis.

The first dataset provides fine-grained historical information about histories (e.g. employment position, career, occupational indicators, training courses, educational attainment and demographic characteristics) of worker paths over time. Instead, typically, much of the data available on the labour market experience of cohorts of workers cover only their initial labour market experiences (Harmon *et al.*, 2003; Heckman *et al.*, 2003), ignoring job heterogeneity in terms of other factors than earnings.

The second dataset collects gross earnings as objective measurements, rather than purely self-reports typical of survey data. Since they are before tax, available gross earned incomes allow a better exploration of level, variability and concentration of annual earnings and their implications with workers' characteristics.

### 5. Application to HC Static Model: the Data

- Top of page
- Abstract
- 1. Introduction
- 2. Measurement Model for the Latent Variable Human Capital
- 3. The Use of Administrative Archives as Data Source in Italy
- 4. Labour Market Administrative Archives in Lombardy Region
- 5. Application to HC Static Model: the Data
- 6. HC in a Longitudinal Perspective
- 7. Conclusions
- References

The first empirical aim of this paper is to perform a static analysis for HC estimation. We have considered 2004 as the year of reference (income declarations presented in 2005), as it is the most recent available period.

As regards the choice of the population for the HC estimation, although the entire Employment Centers of the Province of Milan database considers more than 2,700,000 vocational experiences (occurring in the period 2000–2005) referring to more than 1,200,000 workers, the utilized population refers to 95,896 workers resident in the City of Milan, because, at the moment, workers' individual income declarations are available only for workers resident in this area. The population is thus composed of workers, resident in the City of Milan, with subordinate contract in the private sector, and vocational experiences recorded in the database of employment offices of the Province of Milan having non-missing gross income in 2004.

In order to perform HC static analysis, following the model proposed in Section 3, we must choose useful indicators (reflective, formative and concomitant). To this end, the unique HC outcome (reflective indicator) is 2004 gross earned income (workers' individual income tax returns filed in 2005, from here on, *earned income*) composed of compensation of employees (including also transfers and economic assistance).

As HC investment indicators we have considered years of schooling (imputed by the last certification awarded), days of full-time work in the entire period of observation (2000–2004) and days of training in the period 2000–2003 (before 2004). Note that available training periods do not refer to training on the job, but to Lombardy regional courses (financed by the European Social Fund, which typically targets workers with low education) attended out of the job place; finally as concomitant indicators, impacting only earned income, we have considered gender, age, marital status, number of children, type of contract, industry, type of occupation (for the last three, the imputed values correspond to those having longest duration in 2004). In the block of concomitant indicators that causally impact also on HC we have inserted gender, nationality and age, whereas parents' socio-economic status and wealth of origin household are not available in the database.

Another limitation of the available informative platform is the lack of worker career prior to 2000 and, particularly, the years of work experience. Instead of using the well known imputation rule (age – years of schooling – 6), we prefer to insert workers' age in the model.

Looking at the composition of the population at 2004, 52% of workers were male. Age structure (mean age is 35) was homogeneous by gender. Regarding the structure of contract types (homogeneous by gender), 60% of workers have a permanent contract, 23% fixed-term contract, 5% are workers in temporary agency work (which constitutes the youngest worker group with a mean age of 31 years), and remaining workers have other types of contracts. Considering the schooling level, 41% of workers (34% males, 46% females) have completed secondary school (8 years of schooling), 16% are technical high school graduates, 13% scientific or humanities high school graduates (16% males, 9% females), 7% vocational high school graduates (9% and 6%, respectively) and 15% college graduates (same percentage for both sexes). Over 15,000 workers attend regional courses: the most attended training typologies are post compulsory school training (27%), lifelong training (23%) and training courses to complete compulsory schooling (21%).

#### 5.1 *The Results: Determinants of Earned Income*

Before applying a static model that estimates HC as an LV, we briefly discuss the results inherent in the impact of the indicators on the (natural logarithm of) 2004 gross earned income, by means of a generalized linear model; the final model (*R*^{2}= 0.49) exhibits the following significant (all at the 0.0001 level) determinants of earned income variability, in order of significance: gender, age, years of schooling, type of contract, duration of stable working (days of full-time working in 2000–2004), kind of occupation and total number of training days in the period 2000–2003.

The estimated education rate of return is 8.1% on the entire population; the highest (lowest) marginal rate of return has a value of 9.1% (5.2%) referring to males (females) post high school level as regards secondary compulsory school level. The estimated partial coefficient (net to other specified characteristics of the worker) shows that, for every additional year of schooling, the annual income increases constantly by €700, whereas the contribution of training is significant but with a moderate effect: the income difference between two workers who differ by one month of a training course is approximately €300.

Adjusted *post hoc* Tukey–Kramer contrasts show that males earn €3500 more than females (*p* < 0.001), married workers have the highest incomes (singles the smallest), while workers with prevalent permanent contract in 2004 have mean earned income greater than workers with prevalent fixed-term contract (nearly €4000, *p* < 0.001) and with prevalent temporary work contract (nearly €6000, *p* < 0.001).

#### 5.2 *The Results: the Estimation of HC*

With the methodology proposed in Section 3 we obtain the estimate of standardized HC. The presence of qualitative variables, expressed as dummy variables, does not pose any obstacles. The model's fit indicates that estimated HC and concomitant indicators explain 56% of the earned income. Table 1 displays the indicators of HC investment, the standardized regression coefficients (Std Coeff.), their significance (Sig.) and, recalling that HC is estimated as an exact linear combination of its formative indicators, the percentage impact of each indicator in the HC scores formation (% Weights).

Table 1. Coefficients and Significance for HC Indicators. HC indicators | Std coeff. | Sig. | % Weights |
---|

Years of schooling | 0.756 | <0.0001 | 57.1% |

Days of full-time work | 0.564 | <0.0001 | 31.8% |

Days of training | 0.333 | <0.0001 | 11.1% |

HC formation is largely attributed to formal schooling, showing the largest weight (57%), followed by the stability component, measured by the period of full-time work (32%), whereas the weight of training duration is significant but marginal (11%).

Table 2 shows significant indicators contributing to explain earned income variability. Other than HC, we found significant concomitant indicators, collected in the **W**_{1} block, causally affecting income variance; the impact of gender, age and nationality is estimated net of spurious effects mediated by HC. More than two thirds of explained income variance depends on factors that typically characterize the Italian labour market, such as gender, age and worker occupation, whereas less than one third is attributable to HC. The HC standardized coefficient shows an increase of earned income of €4000 for each additional HC standard deviation.

Table 2. Significance of HC and of Concomitant Indicators on Labour Income. Covariate | *F*-value | Sig. |
---|

Human capital | 4641.54 | <0.0001 |

Gender | 1232.88 | <0.0001 |

Age | 1141.55 | <0.0001 |

Occupation | 932.27 | <0.0001 |

Type of contract | 420.06 | <0.0001 |

Number of children | 119.22 | <0.0001 |

Industry | 73.93 | <0.0001 |

Marital status | 69.84 | <0.0001 |

Nationality | 3.29 | <0.0001 |

In particular, starting by earnings' averages by workers' age, we construct the series representing the expected flow of earned income at each age (HC life cycle value), based on the assumption that the expected mean income at age *x*+*t* of a person of age *x* should be equal to the mean earned income of individuals being at present *x*+*t* years old.

To actualize future earnings we have assumed a discount rate of 5% (approximately equal to Treasury bonds' interest), a productivity rate taking a maximum value of 3% at age 24, with a constant reduction in time until the age of 64 when it becomes null, and the survival probability to older ages, utilizing the ISTAT life tables for males in the 2001 Census.

We have assumed that working life varies from 20 years up to 67 years, although other accurate methods, separating the worklife expectancy by workers' characteristics, may be applied (Millimet *et al.*, 2010).

Once having obtained the HC monetary mean μ (averaging the series of expected earnings' flows over ages), the standardized distribution of HC, estimated with the LV approach, is exponentiated and then translated in order to have mean μ.

Statistics referred to the monetary HC distribution and earned income distributions show that the HC average (€129,089, median €79,757) is more than eight times higher than the average income (€16,190, median €13,907), and the HC distribution inequality (Gini ratio = 0.501) is higher than the income distribution inequality (Gini ratio = 0.389), confirming results in recent studies (Dagum and Slottje, 2000; Dagum *et al.*, 2007; Vittadini and Lovaglio, 2007).

In order to show relationships of monetary HC with other relevant variables, Table 3 exhibits correlations (all significant at the 0.001 level) among estimated monetary HC, its formative indicators and earned income.

Table 3. Correlation Matrix between HC, its Indicators and Earned Income. | Monetary HC | Earned income | Years of schooling | Days of full-time work |
---|

Earned income | 0.450 | 1 | | |

Years of schooling | 0.602 | 0.278 | 1 | |

Days of full-time work | 0.453 | 0.264 | 0.092 | 1 |

Days of training | 0.210 | 0.146 | 0.069 | −0.157 |

### 6. HC in a Longitudinal Perspective

- Top of page
- Abstract
- 1. Introduction
- 2. Measurement Model for the Latent Variable Human Capital
- 3. The Use of Administrative Archives as Data Source in Italy
- 4. Labour Market Administrative Archives in Lombardy Region
- 5. Application to HC Static Model: the Data
- 6. HC in a Longitudinal Perspective
- 7. Conclusions
- References

Another source of inconsistency plaguing the Mincerian wage function approach is the lack of longitudinal information in available data: wages analyses are typically performed only in a cross-sectional framework (census and survey data). Instead, the availability of longitudinal information of individuals' earnings associated with information about personal characteristics could enable the estimation of statistical earnings' functions and education rate of return in a consistent way.

To this end, since the database of the Province of Milan collects information longitudinally, covering the period from 2000 to 2005, the temporal dynamics for earned income trajectories (and also to what extent they depend on workers' characteristics) are investigated. To engage a longitudinal analysis estimating workers' earned income temporal trends, a set of statistical methodologies, variously known as ‘hierarchical’, ‘multilevel’ or ‘growth’ modelling, have provided the methodological complement for a proper treatment of growth or change data in stratified (temporal occasions are ‘nested’ within individuals) sampling designs that are common in many fields of empirical research.

A multilevel growth model (MGM) (Bryk and Raudenbush, 1992; Singer and Willet, 2003) was specified in the analysis. It is composed of an equation that models, for each specific worker, the evolution of incomes over time by a linear trajectory, depending on a random income level at a fixed instant of time (from here, *intercept*) and on a random annual income growth rate (from here, *slope*).

Both random parameters may be modelled as a function of workers' characteristics.

These models, embodying random effects for intercepts and time parameters in order to describe specific and separate trajectories over time among individuals, place the levels and growth rates of a considered outcome on an equal footing.

As a primary step for the MGM, a random intercept model, called unconditional means model (UMM) (Singer and Willet, 2003), that does not require covariates and excludes from the specification the linear time trend, is typically utilized to assess whether systematic variability exists in the outcome and how much of this variability can be decomposed between subjects (averaged over time) and within subjects (depending on the time or on other individual factors that change over time). The between variance to total variance ratio is called the intraclass correlation coefficient (ICC).

Second, inserting time (as a random continuous covariate) in the UMM the unconditional growth model (UGM) is specified. It shows how much of the variability within subjects is due to a time effect.

#### 6.1 *Data in the Longitudinal Analysis*

Table 4 shows the number of workers and statistics for earned income in each of five waves (workers having at least one vocational experience in the analysed period). The longitudinal dataset, collecting workers having at least two repeated observations of earned incomes over time, involves 70,533 workers. Although averages of gross incomes increase over time (whereas inequality decreases), the means trajectory appears relatively flat over time. The longitudinal analysis will investigate the degree of variability existing among individual income trajectories over time (some trajectories may increase, with different rates, while others may decrease), exploring possible individual factors that affect it.

Table 4. Number of Records and Statistics for Gross Earned Income, by Year. Year | Number of workers | Mean (€) | Median (€) | 90th percentile (€) | Standard deviation | Coeff. of variation | Gini ratio |
---|

2000 | 84,745 | 14,178.7 | 11,721.0 | 26,444.1 | 96,465.4 | 6.80 | 0.493 |

2001 | 86,051 | 15,550.1 | 13,234.4 | 28,046.2 | 97,834.2 | 6.29 | 0.457 |

2002 | 93,837 | 16,594.9 | 14,216.0 | 29,852.0 | 55,532.6 | 3.35 | 0.463 |

2003 | 92,398 | 17,371.0 | 14,999.0 | 30,370.0 | 56,547.9 | 3.26 | 0.388 |

2004 | 95,896 | 18,209.9 | 15,707.5 | 32,135.0 | 46,146.5 | 2.53 | 0.389 |

First, covariates for the intercept (level of 2004 earned income) are the same as those specified in the static analysis referred to year 2004; the only exception deals with the education dimension, measured in this analysis with the variable *schooling level*, a categorical variable whose modalities refer to the type of last certification awarded (Table 9 shows its modalities and associated years of schooling).

Table 9. Income Trajectories (Intercept and Slope) for Education Group. Schooling level | Years of schooling | Education group | Intercept | *t*-value (intercept) | Slope | *t*-value (slope) |
---|

Post university degree (Master, PhD) | 18–21 | D | 29,167 | 5.6 | 3397 | 2.6 |

University degree (after reform) | 18–19 | D | 23,067 | 18.4 | 2828 | 8.9 |

University degree (before reform) | 17–18 | D | 33,031 | 122.6 | 2857 | 42.3 |

Short university degree (after reform) | 16 | D | 23,569 | 39.9 | 2471 | 16.7 |

Post high school | 14 | C | 21,551 | 4.9 | 2575 | 2.3 |

High school (science and humanities) | 13 | C | 20,259 | 73.5 | 1703 | 24.7 |

High school (technical) | 13 | C | 19,905 | 77.3 | 1482 | 22.9 |

High school (vocational) | 13 | C | 18,558 | 60.6 | 1379 | 17.9 |

Post secondary school (general) | 10 | B | 13,352 | 24.6 | 1185 | 8.7 |

Post secondary school (specific) | 10 | B | 14,376 | 32.8 | 1028 | 9.4 |

Secondary school (compulsory) | 8 | A | 10,010 | 43.5 | 1181 | 20.5 |

To this end, in 1999 the reform of the Italian university system (Ministerial Decree 509/99) introduced some important innovations in the organization of academic degrees. While before the reform universities delivered a university degree with legal duration essentially of 4 years (*University degree before reform*), apart from some exceptions with durations no longer than 6 years, after the reform the system established two cycles: the first (*Short university degree*) has a 3-year duration (leading to a UK Bachelor of Science equivalent degree); the second (*University degree after reform*) has a 2-year duration (leading to a UK Master of Science equivalent degree).

The choice of schooling level rather than years of schooling is motivated by sharing criticism of the Mincer earning function approach, advanced by adherents of the so-called ‘screening theory’ (Layard and Psacharopoulos, 1974). They argue that, also postulating that schooling years increase salaries, nevertheless the time spent without the attainment of a degree has either no effect or a negative influence, contrary to Mincer's theory supposing that every additional year of schooling increases HC stock.

As regards the slope's covariates (modelling individual annual growth rates), they may change in time, especially when they refer to characteristics of vocational experiences. However, since the Employment Centers of the Province of Milan database is of incremental type and more recent data present better quality than data far off in time, we are not able to consider in the entire allowable period all available worker covariates of the static analysis; in particular, worker occupation and industry, presenting many missing data (75% and 68%, respectively, in the entire period 2000–2003), were dropped. Thus, type of contract is the only worker covariate changing in time; however, for ease of interpretation, instead of considering it as a time-variant indicator, we have preferred to substitute it by a new variable called ‘contractual evolution’ that directly elicits the longitudinal evolution of workers' careers in terms of contractual typologies in the period 2000–2004.

Table 5 shows the frequency distribution of the contractual evolution, whose modalities have been obtained by means of a longitudinal clustering algorithm that resumes and classifies longitudinal sequences of workers' vocational experiences (Lovaglio, 2008b). A *Random career* is associated with indeterminate contractual fluctuations over time.

Table 5. Frequency Distribution for ‘Contractual Evolution’ (Period 2000–2004). Contractual evolution (2000–2004) | Code | Workers | % |
---|

Stable in permanent contract | Perm | 37,380 | 53.0 |

Convergence to permanent contract | >Perm | 8765 | 12.4 |

Random career | Random | 8250 | 11.7 |

Fixed-term contract | TimeFix | 6377 | 9.0 |

Losing permanent contract | <Perm | 4658 | 6.6 |

Other evolutions | Other | 4000 | 5.7 |

Stable in temporary agency work | Temp AW | 1103 | 1.6 |

#### 6.2 *Results of the Longitudinal Analysis*

As an explorative step we have evaluated the individual variability in intercepts and slopes by running ordinary least squares regressions for each worker, specifying only time as continuous covariate.

Descriptive statistics on the distribution of estimated slopes in the period 2000–2004 (1st quartile =−€6, median =€1208, 3rd quartile =€2889, mean =€1442, coefficient of variation 246.4) indicate large variability.

The distribution of *R*^{2} related to individual fit of the linear time trajectories (mean = 0.56, median = 0.64, modal value = 0.75) shows that a small proportion of cases exhibit quite low *R*^{2} (remembering that also zero *R*^{2} supports linearity of the longitudinal specification). This suggests that the shape of individual changes may be consistently modelled by a linear MGM.

The estimated ICC of the UMM evidences that nearly 80% (78%) of the total income variability is attributable to differences between workers (i.e. individual characteristics), whereas the remaining part depends on within-subject income differences. Furthermore, the ICC of the UGM reveals that only 38% of the variability within subjects is explained by a linear time trend and the remaining part depends on individual characteristics that change over time. Then, only a little quota (8%) of total income variability is attributable to individual income trajectories over time.

In the UGM, variances among intercepts (*z*-test = 180.9) and slopes (*z*-test = 112.8) are both highly significant, indicating a strong individual variability around their mean values (€18,210 for 2004 earned income and €1313 for annual growth rate, respectively), whereas the covariance between intercepts and growth rates is significant, evidencing an inverse relationship.

Before exploring the significance of workers' covariates, the structure of temporal errors has to be correctly specified. In fact, correctly modelling the covariance error structure – taking into account heterogeneity of error variances or correlations depending on the temporal lag between instants – is fundamental for a valid inference and estimation of fixed effects (Cnaan *et al.*, 1997), such as for example to assess the impact of schooling level on the income growth rate. Estimates of between-worker variances at each year and correlations between couples of errors time show that income variances are approximately equal across times and correlations decrease with increasing lags between events. Indices of relative goodness-of-fit (Akaike information criterion, Schwarz Bayesian criterion) comparing different error structures have suggested structuring the empirical covariance matrix of the time-specific disturbances as autoregressive of first order. It specifies homogeneous variances over time and covariances among events depending on their temporal lag, decreasing toward zero with increasing lag: the estimated correlation parameter for unitary lag is 0.3256 (*z*-test = 54.1). Thus, UGM with an AR(1) structure becomes the reference model for judging the predictive power of the selected model, called *Final model with AR*(*1*), containing significant worker explanatory variables able to explain the residual variability existing between intercepts and slopes.

Table 6 shows the variance decomposition and goodness-of-fit statistics of both models.

Table 6. Fonts of Variability and Fit Statistics (UGM and Final Model, both with an AR(1) Structure). Models | Between workers | Within workers | −2 Log likelihood |
---|

Intercept | Slope |
---|

UGM with AR(1) | 215,660,000 | 2,302,293 | 45,136,935 | 7,205,241 |

Final model with AR(1) | 121,280,000 | 1,013,009 | 41,379,143 | 4,437,265 |

Comparing values of columns 2 and 3 listed in Table 6, the specified covariates explain 44% and 56% of the variability existing between (2004) workers' income levels and growth rates, respectively.

The distribution of estimated slopes in the selected model fits a normal distribution very well (skewness = 0.33; kurtosis = 0.65, qq-plot correlation test = 0.993), whereas the distribution of intercepts is highly right skewed.

Table 7 shows significant intercepts and slopes (last four) covariates contained in *Final model*. For 2004 earned income, gender, duration of full-time job, schooling level, age class and type of contract are highly significant, whereas schooling level and contractual evolution are highly significant, for slopes.

Table 7. Significant Covariates in the ‘Final Model with AR(1)’. Covariates | *F*-value | Sig. |
---|

Gender | 1102.24 | <0.0001 |

Days of full-time job | 876.16 | <0.0001 |

Schooling level | 611.98 | <0.0001 |

Age class | 506.25 | <0.0001 |

Type of contract | 503.46 | <0.0001 |

Time | 441.00 | <0.0001 |

Marital status | 102.70 | <0.0001 |

Number of children | 54.76 | <0.0001 |

Days of training | 37.95 | <0.0001 |

Schooling level * time | 288.77 | <0.0001 |

Contractual evolution * time | 260.65 | <0.0001 |

Gender * time | 129.32 | <0.0001 |

Age class * time | 109.48 | <0.0001 |

The estimated fixed effects, shown in Table 8 only for slopes, are presented as contrasts between growth rates for a specific modality and a reference category (labelled with 0 in the third column): females, less educated workers and employees stable in temporary agency work have lower annual slopes, whereas male workers in the age class 21–30 with a university degree (before reform) and contractual evolution convergent towards a permanent contract have the fastest slopes in the observed period. Instead, the profile with the highest 2004 income level is constituted by male workers, with a university degree (before reform) in the age class 41–50 with a permanent contract.

Table 8. Fixed Effects of Covariates on Annual Slope. Covariate | Contrasts | Slope | Std. error | *t*-value | Sig. |
---|

Schooling level | Post university degree (Master, PhD) | 2172.6 | 1218.6 | 1.83 | 0.0746 |

Schooling level | University degree (before reform) | 1617.4 | 90.3 | 17.92 | <0.0001 |

Schooling level | Post high school | 1178.7 | 864 | 1.41 | 0.1725 |

Schooling level | University degree (after reform) | 668.9 | 267.4 | 2.50 | 0.0124 |

Schooling level | Short university degree (after reform) | 581.7 | 142.5 | 4.10 | <0.0001 |

Schooling level | High school (science and humanities) | 363.3 | 91.6 | 4.00 | <0.0001 |

Schooling level | High school (technical) | 48.8 | 89.6 | 0.51 | 0.5864 |

Schooling level | High school (art and music) | −30.6 | 222.7 | −0.13 | 0.8906 |

Schooling level | High school (vocational) | −173.7 | 95.9 | −1.82 | 0.0701 |

Schooling level | Post secondary school (generalist) | −542.7 | 132.4 | −4.12 | <0.0001 |

Schooling level | Secondary school | −558.7 | 86.6 | −6.51 | <0.0001 |

Schooling level | Post secondary school (specific) | −667.5 | 114.1 | −5.91 | <0.0001 |

Schooling level | Missing | 0 | | | |

Gender | Female versus male | −263.1 | 14.70 | −8.46 | <0.0001 |

Contractual evolution | Convergence to permanent contract | 323.7 | 11.63 | −78.25 | <0.0001 |

Contractual evolution | Losing permanent contract | −124.4 | 13.12 | 24.67 | <0.0001 |

Contractual evolution | Stable in permanent contract | −667.1 | 14.06 | −593.84 | <0.0001 |

Contractual evolution | Other | −910.6 | 17.27 | −82.54 | <0.0001 |

Contractual evolution | Fixed-term contract | −1423.6 | 15.85 | −42.0 | <0.0001 |

Contractual evolution | Stable temporary agency work | −8350.1 | 14.70 | −8.46 | <0.0001 |

Contractual evolution | Random career | 0 | | | |

Age class | 21–30 | 469.8 | 131.49 | 3.57 | 0.0004 |

Age class | <20 | 373.7 | 300.65 | 1.24 | 0.2139 |

Age class | 31–40 | −38.8 | 130.74 | −0.30 | 0.7668 |

Age class | 41–50 | −428.4 | 132.67 | −3.23 | 0.0012 |

Age class | 51–60 | −478.2 | 137.91 | −3.47 | 0.0005 |

Age class | >60 | 0 | | | |

#### 6.3 *Income Trajectories Over Time for Specified Workers' Profiles*

Often, in the presence of complex models, for ease of interpretation it is preferable to directly estimate time trends for specific profiles of individuals, by a simple reformulation of the MGM in an equivalent specification (omitting overall intercept and slope).

Table 9 shows estimated income longitudinal trajectories, separately by schooling level and adjusted by covariates inserted in the longitudinal model (see Table 7). They evidence higher economic levels and growth for more educated workers, especially having (long) university degree. Table 9 also exhibits the variable ‘Education group’, a reduced version of the ‘Schooling level’ variable reflecting the steps of the Italian educative system. In subsequent analyses it will be preferred, because of its reduced number of modalities providing more robust results.

To this end, taking into account the most significant covariates for intercept (gender) and slopes (education group and contractual evolution) of Table 7, we have estimated longitudinal income trajectories for specific workers' profiles defined by these interactions.

Table 10 lists all allowable profiles (interaction of gender, education group and contractual evolution) regarding intercept, whereas only a selection of profiles characterized by fastest (upper part) and slowest (lower part) growth rates are presented.

Table 10. Income Trajectories (Intercept and Slope, in Euro) for Worker Profiles (*Non-significant at 5%). Intercept | Slope |
---|

Education group | Gender | Type of contract | Estimate | *t*-value | Education group | Gender | Contractual evolution | Estimate | *t*-value |
---|

D | Male | Perm. | 45,725 | 185.95 | D | Male | Perm | 3543 | 58.00 |

D | Female | Perm. | 31,223 | 122.37 | D | Male | >Perm | 3373 | 18.92 |

C | Male | Perm. | 26,817 | 141.96 | D | Male | Temp. AW | 2875 | 5.60 |

D | Male | TimeFix | 26,201 | 41.12 | D | Female | >Perm | 2576 | 17.66 |

C | Female | Perm. | 21,465 | 117.04 | D | Male | TimeFix | 2482 | 11.17 |

D | Female | TimeFix | 21,405 | 43.13 | D | Female | Perm. | 2432 | 36.76 |

D | Male | Temp. AW | 18,549 | 16.76 | D | Female | TimeFix | 2189 | 12.56 |

D | Female | Temp. AW | 17,903 | 20.12 | D | Female | Temp. AW | 2129 | 4.60 |

C | Male | TimeFix | 17,828 | 52.88 | C | Male | >Perm | 2072 | 20.88 |

B | Male | Perm. | 17,109 | 29.03 | C | Male | Perm | 2066 | 41.88 |

C | Female | TimeFix | 16,847 | 57.62 | D | Male | Random | 1822 | 9.54 |

B | Female | Perm. | 16,245 | 29.13 | … | … | … | … | … |

C | Female | Temp. AW | 14,986 | 30.28 | D | Male | <Perm | 833 | 2.89 |

C | Male | Temp. AW | 14,641 | 27.92 | A | Male | <Perm | 826 | 8.30 |

B | Female | Temp. AW | 14,093 | 10.47 | B | Male | Rand | 802 | 3.35 |

B | Female | TimeFix | 13,264 | 16.76 | A | Male | Other | 801 | 7.02 |

A | Male | Perm. | 12,965 | 77.42 | C | Male | Other | 797 | 5.46 |

B | Male | Temp. AW | 12,705 | 9.31 | A | Male | Temp. AW | 764 | 2.66 |

B | Male | TimeFix | 12,702 | 14.89 | C | Male | <Perm | 750 | 5.41 |

A | Female | Perm. | 12,419 | 54.49 | C | Female | <Perm | 636 | 5.14 |

A | Male | TimeFix | 11,244 | 44.19 | B | Female | Temp. AW | 851 | *1.03 |

A | Male | Temp. AW | 10,600 | 19.99 | B | Male | <Perm | 459 | *1.37 |

A | Female | TimeFix | 10,350 | 32.69 | B | Male | Temp. AW | 452 | *0.51 |

A | Female | Temp. AW | 9717 | 13.31 | B | Male | TimeFix | 354 | *0.77 |

Greatest slopes are linked to highly educated males with stable permanent contracts or converging to it, followed by highly educated males stable in temporary agency work and by three homogeneous profiles defined by males in fixed-term contracts, females with permanent contract or converging to it and characterized by highest education group. Lowest or flat slopes (labelled with an asterisk in the last column of Table 10) are associated with workers of education group B without (or having lost) a permanent contract.

Table 10 offers several points of discussion, especially in the perspective to link the evolution of contract types along the career path with monetary performances. In particular, we can empirically investigate one of the main topics more and more often at the centre of political debate: the role of temporary positions along a worker's career. Does flexibility provide economic benefits? Does flexibility *assure* the same economic benefits (and growth) of permanent employment?

Empirical results show that temporary agency work does not penalize annual growth rates for most educated workers (group D). Adjusted *post hoc* Tukey–Kramer contrasts between annual slopes are all significant among educational groups for males, whereas for females the unique significant contrast among slopes is between education group D (€2129) and other educational groups, whose means values are significantly homogeneous.

On the contrary, temporary agency work contract heavily penalizes income levels also for highly educated workers and especially for females: six of eight profiles referred to workers stable in temporary agency work have income means lower than overall income mean. Furthermore, adjusted mean income for most educated workers with stable temporary agency work does not differ significantly from that of male high school graduates with permanent contract (*t*-test = 1.24; *p*= 0.231). Instead, separating workers' gender, only the most educated females in temporary agency work present slopes that significantly do not differ from that of male high school graduates with stable permanent contract.

Thus, gender discrimination plays an important role in investigations of longitudinal income trajectories. Section 6.4 focuses specifically on the gender gap.

Before resuming the detailed income time trends of Table 10, we furnish longitudinal income trajectories only by gender and education group (adjusted by other worker characteristics), because of their significant impact on income levels and slopes, respectively. The first four columns of Table 11 refer to these time trends, once contract type (for intercept) and contractual evolution (for slope) have been considered as covariates in the MGM and not as stratification variables, as they appeared in Table 10.

Table 11. Income Trajectories by Gender and Education Group (Columns 3 and 4), and only by Education Group (Columns 5 and 6). Education group | Gender | Intercept (*t*-value) | Slope (*t*-value) | Education group intercept (*t*-value) | Education group slope (*t*-value) |
---|

A | Female | 10,183 (38.6) | 1361 (20.3) | 10,555 (43.2) | 1213 (20.9) |

A | Male | 10,714 (45.0) | 1191 (19.7) | | |

B | Female | 13,615 (29.5) | 1162 (9.9) | 13,883 (37.9) | 1114 (12.1) |

B | Male | 13,947 (29.0) | 1045 (8.5) | | |

C | Female | 17,719 (71.2) | 1371 (21.7) | 19,539 (84.7) | 1557 (26.8) |

C | Male | 21,271 (84.7) | 1742 (27.3) | | |

D | Female | 25,898 (86.3) | 2354 (30.9) | 31,870 (121.4) | 2844 (43.0) |

D | Male | 37,847 (126.3) | 3332 (43.8) | | |

Adjusted *post hoc* Tukey–Kramer contrasts among intercepts evidence a non-significant contrast between males' and females' intercepts in education group A (*p*= 0.106) and B (*p*= 0.582), whereas the following contrasts among slopes are non-significant: education group A against B for both sexes (*p*= 0.865 for females, *p*= 0.201 for males), and males against females in education group B (*p*= 0.445). The last two columns of Table 11 show income trends only by education groups. They evidence higher economic advantages as regards levels and trends for more educated workers: *post hoc* contrasts between intercepts are all highly significant; the same holds for slopes, except the non-significant contrast between educational groups A and B (*p*= 0.261).

#### 6.4 *Focus on Growth Rate Gender Gap*

As previously mentioned the final step of the analysis focuses on gender discrimination.

Only slopes are considered, since they show a more attenuated and evident effect than income levels. For a better investigation of gender discrimination in annual slopes, we have extended the MGM considering also two- and three-way interactions among slopes' covariates (age class, gender, contractual evolution and education group).

Gender discrimination has been assessed by analysing the differences in income slopes between males and females within (slicing by) different education groups, contractual evolution and age classes (considering only three more representative age classes 20–30, 31–40 and 41–50), adjusted by other significant covariates such as days of training, days of full-time job, marital status and number of children.

First, we unexpectedly found that, slicing workers by education group and considering age class and contractual evolution in the model as main effects, no evidence of slope gender gap arises in any education group. This emerges because, for both sexes, income slopes significantly vary depending on the combined effect of education group, contractual evolution and age class.

With regard to the age effect on the gender gap, we have found that age class interacts with gender (on slopes) only in profiles involving essentially most educated workers with stable permanent contracts; more specifically, in each profile defined by education group and contractual evolution, age class plays a significant role on slopes only in profiles defined by workers with stable (or converging to) permanent contract in education group C and especially for stable workers with permanent contract in education group D. This profile evidences greatest slope variability among age class (*F*-test = 59.4 for males and *F*-test = 50.8 for females) for both sexes. Instead, income slopes of less educated workers and with other evolutions of contractual typologies do not vary significantly over age classes.

Since age class only interacts with other stratification variables in some specific profiles, first we present the results (items 1, 2 and 3) about adjusted slope difference, considering age class as a simple covariate in the MGM model (slopes' gender gap is averaged over age classes); second, age class will be considered as a stratification variable in some specific worker profiles (item 4).

Empirical analyses on adjusted gender differences have provided the following findings.

- 1
Stratifying by contractual evolution, a significant interaction of gender by education group emerges, showing that, in almost all longitudinal contractual evolutions (except for workers in categories *other* and stable in temporary agency work), the income slopes' gender gap varies among education groups (especially for workers with a stable permanent contract). In this profile the income slopes' variability among education groups is greater for males (*F*-test = 176.6) than for females (*F*-test = 45.9); gender difference in slopes for workers with a permanent contract increases when level of education increases (*F* statistics testing gender gap for workers in groups A and B are not significant, whereas they become significant for groups C (*F*-test = 51.0) and D (*F*-test = 110.6), respectively (*gender gap over education groups by contractual evolution, averaged over age class*).

- 2
On the other hand, in every education group, incomes slopes significantly vary over sexes and contractual evolutions, except for workers in educational group B showing the same gender gap pattern when contractual evolution varies among its categories (*gender gap over contractual evolution by education groups, averaged over age class*).

- 3
Slicing by contractual evolution and education group, we find no significant differences in slopes for males and females except for five profiles: workers in educational group A converging to a permanent contract, workers in group B with fixed-term contract, workers with stable permanent contracts in education group C, workers converging to a permanent contract in education group D and, especially, workers of group D with a stable permanent contract. Apart from the first two profiles, the other three profiles exhibit faster trajectories for males. The greatest gender gap is found in the last profile (male slope =€3934, female slope =€2726), evidencing higher discrimination in more educated profiles (*gender gap by education groups and contractual evolution, averaged over age class*).

- 4
Considering age class too as a stratification variable, slopes are significantly different by gender in a consistent way only for workers with stable permanent contracts in education group C (of age class 21–30 and 31–40 with *F*-test = 38.5, *F*-test = 43.9, respectively) and in education group D (in all three age classes). Within these three profiles the gender gap is greatest in age class 30–40 (*F*-test = 180.7), then in age class 20–30 (*F*-test = 33.8) and less evident, but significant, in the class 40–50 (*F*-test = 15.3).

In particular, young (21–30) workers of education group D with stable permanent contracts show bigger growth rates than female workers in the same profile. Other profiles showing moderate but significant gender gap are workers in group D gaining a permanent contract in age class 21–30 (*F*-test = 4.2) or 31–40 (*F*-test = 10.1) and workers of age class 41–50 (*F*-test = 9.5) with fixed-term contract (*gender gap by education group, contractual evolution and age class*).

Thus, after having unmasked the surprising results, obtained previously, that gender gap about slopes did not exist in each education group, these results have shown the existence of gender difference in many profiles defined by education group, age and contractual evolution. In particular, gender discrimination is highly evident in profiles defined by workers with permanent contract of the highest education group (D) and in more productive age classes (20–40).

### 7. Conclusions

- Top of page
- Abstract
- 1. Introduction
- 2. Measurement Model for the Latent Variable Human Capital
- 3. The Use of Administrative Archives as Data Source in Italy
- 4. Labour Market Administrative Archives in Lombardy Region
- 5. Application to HC Static Model: the Data
- 6. HC in a Longitudinal Perspective
- 7. Conclusions
- References

The present paper has focused on a consistent technique for estimation of the LV HC, specified in a realistic measurement model, by utilizing routinely administrative archives.

The methodology has estimated HC as the rank-one best linear combination (in an RRR framework) of their formative indicators that best fits their reflective indicators (**Y**), net to the direct effects of concomitant indicators to **Y**, and net to the spurious effects of concomitant indicators on both HC and **Y**. Although this paper has not been proposed to resolve problems affecting Mincerian equations, it has been shown that the LV approach and availability of longitudinal administrative archives enables some unresolved methodological problems to be overcome.

The use of administrative data from the Employment Centers of the Province of Milan, providing fine information about the longitudinal evolution of the career path, reduces the risk of misspecification of the relationship between education and earnings, due to heterogeneity of vocational experiences and to omission of jobs' evolution in terms of timing and characteristics (Harmon *et al.*, 2003).

Nevertheless, administrative data in this context present some limitations.

The choice of considering only monetary effects of investments in HC, ignoring other interesting outcomes, such as the well-being or the health status of workers (Lye and Hirschberg, 2010), is justified by the lack of this information in available administrative archives.

Second, administrative data do not collect information about workers' education performance, such as measures of quality of education as well as characteristics of schools or university attended and on training on the job.

Finally, the main problem affecting the use of administrative archives is related to the problem of self-selection and endogeneity of labour market participation. Missing data and omitted variable problems in large datasets not only reflect unmeasured job characteristics, but many other factors which affect willingness to exploit the economic potential of their educational attainment. Thus, even the large datasets available to researchers in this area will have relatively small numbers of individuals in many of the low-participation groups, reducing the likelihood of statistically robust results.

Further researches in this perspective will consider the problem of job mobility endogeneity (Tchernis, 2010) and the role of labour demand (organization and innovation level) as a determinant of HC formation (Antonelli *et al.*, 2010).

With regard to the empirical results for the Milan area, they have confirmed and highlighted the known situation regarding the *transition school to work* in the Italian context. In the cross-section analysis, although HC is the most significant factor of earned income variability, it explains less than one third of explained labour income variance (56%), whereas an ample part is affected by dimensions linked to discrimination and career progression factors such as gender, age and type of occupation, rather than education/training dimensions.

With regard to the longitudinal analysis, 78% of total income variability is between workers and only 8% is attributable to its variation over time in individual longitudinal trajectories. Schooling is the most significant covariate for incomes' slope variability; its role is marginal since together all selected slope covariates explain 56% of the variability among individual growth rates.

Finally, the analysis on slope gender gap (more attenuated than income levels) has exhibited that discrimination among sexes exists, particularly in the most educated groups (especially for university graduates) and in the most productive age classes (20–40). To this end, adjusted gender differences suggest one constant pattern: within each age class, the income slopes for females of education group *j* are not significantly different from the income slopes for males with education group *j*− 1. The only exception concerns females with the highest education degree (at least 15 years of schooling) having stable permanent contracts.

Then, not only is (slope) gender discrimination significant in each specific education group, but earning an (also short) college degree and gaining a permanent contract is the only way for females to preserve an economic advantage (income progressions over time) with regard to male high school graduates.

These results are in accordance with knowledge regarding the Italian economy characterized by marked inequalities in the job market and weak incidence of education on the longitudinal evolution of earned income, depending on automatic annual increments due to years of work experience and to contractual seniority.