Local attributes and migration balance – evidence for different age and skill groups from a machine learning approach

Many European regions are currently experiencing a significant population decline and, related to this, are increasingly confronted with labour shortage. Migration is a main driver of changes in regional labour supply and the local level of human capital. A region's ability to attract residents thus becomes more and more important for its growth prospects. We use a large panel dataset for the period 2003 to 2017 to investigate the relationship between local attributes and the migration balance of regions in Germany. In particular, we examine whether the factors that determine the migration balance of regions significantly differ across age and skill groups because their contribution to regional human capital likely varies. Our econometric specification can be understood as an aggregate formulation of a two-region random utility model. The dataset includes 30 factors that might potentially influence a region's migration balance. Given this large number of explanatory variables and significant multicollinearity issues, we apply

consider a broader variety of factors, including economic variables and location-specific amenities as well as social and cultural characteristics.Moreover, systematic evidence on a potentially varying role of factors for the migration behaviour of different demographic groups is still scarce, and findings tend to be ambiguous.
Germany is particularly suitable for an empirical study that aims at providing evidence on population imbalances, internal migration and the determinants of labour migration.The country shows striking disparities with respect to regional migration balances, labour market conditions, local infrastructure and other amenities.In addition to a rather persistent East-West gap for different socioeconomic indicators, we observe significant differences between rural areas and large urban regions.Furthermore, demographic change, that is, a declining and aging work force, is already a challenge for the economic perspectives of many regions, in particular for rural areas in East Germany.
The paper is organized as follows.The next section provides a brief survey of the relevant literature.Section 3 describes the data and the econometric approach.We describe the results of the regression analysis in Section 4 and provide a more detailed discussion of some important findings in Section 5. Section 6 concludes the paper.

| LITERATURE
Migration theory discusses a variety of factors that are thought to influence the migration behaviour of individuals and households.The majority of models treat the migration decision as resulting from the evaluation of local labour market conditions and location-specific amenities.The basic idea is that the observed migration behaviour is based on the maximization of the utility of the individuals/households.The utility level of all potential places of residence, which is influenced by various local characteristics, is compared, taking into account relocation costs (see, e.g., Faggian et al., 2015).Sjaastad (1962) notes that the migration balance of a region can, thus, be understood as a function of the sum of individual utility levels (and therefore local attributes).
Numerous studies provide evidence on the significance of various factors that are supposed to influence migration behaviour and, thus, the migration balance of regions.There is robust evidence that interregional migration flows respond to (changes in) regional wage differentials and unemployment disparities (e.g., Etzo, 2011) as well as employment growth (Buch et al., 2014).Amenities that reflect local living conditions may also influence the attractiveness of regions as places of residence.Several studies show that first nature amenities such as a pleasant climate, a nice landscape and recreation areas positively correlate with the net migration rate of regions (see, e.g., Buch et al., 2014;Porell, 1982).Furthermore, second nature characteristics of a region, which include public infrastructure, cultural facilities and touristic sites, likely matter for its migration balance (Alperovich et al., 1977;Buettner & Ebertz, 2009).A detailed survey of the vast literature is beyond the scope of this paper.Our literature review focuses therefore on empirical studies that examine internal migration and consider differences in migration behaviour across skill and age groups.
Often the focus of this specific literature is on the relationship between individual attributes and the probability of migrating (see Bernard et al., 2014, andFaggian et al., 2015 for a survey) rather than examining the factors behind the migration balance of regions for distinct age and skill groups.Moreover, frequently migration decisions of specific age or skill groups are investigated, and much of this literature has focused on high-skilled individuals.In particular, there is an extensive body of literature on graduate migration which provides robust evidence on the importance of individual, study-related and regional factors for the migration decisions of young high-skilled workers after graduation (see, e.g., Faggian et al., 2006;Haapanen & Tervo, 2012;Venhorst et al., 2011).However, while these studies offer very detailed information on various factors that influence the migration of this specific group, they do not consider differences across levels of educational attainment.However, there are good reasons to expect that the importance of factors that influence migration decisions differs across demographic groups and skill levels.The impact of regional characteristics on migration behaviour likely varies between groups of workers if migration motives and preferences differ systematically between individuals.
Examining the impact of migration determinants for various age groups provides information on the extent to which residence preferences change over the course of working life (see Clark & Onaka, 1983;Kramer & Pfaffenbach, 2016).Findings by Niedomysl (2011) indicate that the living environment and housing seem to become more important as migration motives in Sweden as the age of individuals increases.The study also points to a positive correlation between the level of education and the significance of housing as a determinant of migration behaviour.Millington (2000) investigates how the importance of factors that influence migration behaviour changes over the life cycle in the United Kingdom.The findings point to a declining responsiveness to regional disparities in labour market conditions as age increases, while the opposite is true of housing and amenitiesin line with results by Niedomysl (2011).Millington (2000) emphasizes importance of disaggregating by age and laments the lack of corresponding studies.
Other studies point to a significant heterogeneity of preferences and migration motives with respect to skill groups.Chen and Rosenthal (2008) note that highly educated households in the United States and younger age groups between 20 and 35 years seem to attach great importance to favourable economic conditions, while individuals aged 55 years or older tend to move to places that offer highly valued consumer amenities.Some authors argue that the utility attached to specific (dis)amenities likely differs across skill groups.Cullen and Levitt (1999) show, for instance, that migration decisions of highly educated households and those of families with children are particularly responsive to changes in crime.Urban shopping possibilities and cultural facilities are supposed to matter primarily for highly educated individuals (see Dalmazzo & de Blasio, 2011;Shapiro, 2006).Couture and Handbury (2017) show that urban amenities like restaurants and nightlife are increasingly valued by young well-educated workers in the United States, explaining at least partly their movement towards large urban centres.Amenities as well as labour market conditions seem to influence the migration decision of high-skilled workers in Germany according to results of Buch et al. (2017).Moreover, there is no evidence that their importance varies systematically across skill levels.This is in line with findings by Arntz et al. (2021).Their results indicate that preferences for urban amenities do not differ systematically by skill level.In contrast, a study by Buettner and Janeba (2016) suggests that subsidizing theatres might be effective in attracting highly educated people to a location.
Altogether, systematic evidence on a potentially varying role of factors for the migration behaviour of different demographic groups is still scarce, and existing findings are ambiguous.

| Migration data and local characteristics
We use the Integrated Employment Biographies (IEB) of the Institute for Employment Research (IAB) to generate our regional migration data.The IEB includes detailed individual-level information on the workforce (15 to 65 years old) in Germany, more precisely on all registered unemployed people and all employees subject to social security contributions.Self-employed individuals, family workers and civil servants are not covered by the data.However, the migration data should be representative with respect to labour mobility because the IEB covers about 90% of the workforce in Germany. 1 Age, gender, skill level and residence of the workers is available in the dataset since 1999.
Our analysis makes use of annual migration data for the period 2000-2017.A migration event is defined as the change of residence, that is, the county region, between two reference dates (June 30 of present and previous year).
The individual migration events are aggregated at the county region level (360 regions).We use the net migration 1 See Frodermann et al. ( 2021) for a detailed description of the IEB.rate nmr it of region i in year t, proposed by Mitze (2019), as a dependent variable to investigate migration behaviour2 : where inmig it À outmig it ð Þ denotes net in-migration, that is, the difference between gross in-migration and gross out-migration, and pop itÀ1 is the regional workforce in t À 1.Thus, the variable shows the relative change of the regional workforce caused by interregional migration.A value of 0.01 indicates that net in-migration gives rise to an increase of the regional workforce by 1% (10 net immigrants per 1,000 workers).We generate migration data for three qualification groups, low-skilled workers (no formal vocational qualification), medium-skilled workers (completed apprenticeship training) and high-skilled workers (university degree), and four age groups (< 25 years, 25-29 years, 30-39 years, and 40-65 years).
The annual net migration rates the range between a migration loss of 20 individuals per 1,000 workers and a migration gain of same absolute size.Furthermore, we observe that the net migration loss of East Germany declined between 2004 and 2017.Substantial out-migration was characteristic for most East German regions in the 1990s after German reunification.As regards the group of rural areas, we detect a considerable migration loss between 2007 and 2011 (see Figure A1 in the appendix).However, they perform better before and after this period in terms of their net migration rate.In particular, rural regions close to large metropolitan areas tend to experience a favourable migration balance.We also observe important regional disparities for different age and skill groups.In particular, a strong out-migration of young workers from rural regions is noteworthy, while older workers often seem to move in opposite direction.
Migration theory considers various factors that might influence a region's migration balance.To examine the impact of potential factors on labour mobility, we merge our migration data with regional information from several data sources.In addition to a set of indicators for regional labour market conditions, we include measures of local amenities as a second group of factors that might explain spatial disparities in migration balances.Information on regional unem-

| Econometric approach
To investigate the importance of various potential determinants of the regional net migration rate, we apply a regression model given by: We consider different local labour market conditions and other local characteristics (x k itÀ1 ; see, e.g., Buch et al., 2014;Mitze, 2019).All explanatory variables enter as logarithms of ratios, which measure the relative deviations of the local conditions from the respective German average, excluding the region under consideration (x k GtÀ1 ).Our econometric specification can therefore be understood as an aggregate formulation of a two-region random utility model of labour migration between region i and the rest of Germany (see Mitze, 2019).All regressors are predetermined to account for potential endogeneity of explanatory variables.The panel specification includes regions-specific effects δ i that capture the impact of unobserved time-constant determinants of interregional labour migration.θ t denotes time-effects and the white noise error term is given by ε it . 4t is important to keep in mind that a fixed effects estimation does not provide a perfect solution to the identification problems that arise from unobserved heterogeneity.The fixed effects model makes use of the within variation only.However, the cross-sectional variation often makes up the major part of the variation in regional datasets.
Effects of explanatory variables that show only a minor variation in the time dimension might therefore be weakly identified (see Hausman & Taylor, 1981).
Our dataset includes 30 factors that, according to theoretical considerations outlined in Section 2, might potentially influence a region's migration balance.Moreover, we also consider spatial lags of the variables to allow for spillover effects from characteristics of neighbouring locations. 5Given this large number of explanatory variables and significant multicollinearity issues, we apply machine learning techniques (LASSO, complete subset regression) to identify important local characteristics.Regional datasets often suffer from multicollinearity problems, as many variables exhibit a similar (cross-sectional) variation.Population density, housing prices and rents show, for instance, a strong positive correlation, and it is difficult to precisely identify the impact of a specific factor if explanatory variables co-vary. 6Panel data tends to alleviate multicollinearity problems because co-variation is primarily related to cross-sectional variation.However, often local characteristics exhibit only small variation over time.Thus, regression analyses that consider many factors, which potentially affect the outcome variable, face a tradeoff between bias and precision.Including a comprehensive set of explanatory factors will help to avoid an omitted variable bias, but reduces the precision of the estimates as measured by the standard errors.
Model selection is challenging, in particular when there are many factors that are assumed to be important.Ahrens et al. (2020) note that iterative selection procedures, such as the general-to-specific approach, often result in pretesting biases and hypothesis tests frequently lead to false positives.They argue in favour of machine learning techniques as a model selection approach because methods like the least absolute shrinkage and selection operator (LASSO) set some coefficients to exactly zero, thereby excluding these predictors from the model.Cochrane et al. (2022) use machine techniques to identify predictors of regional resilience because stepwise regression approaches often lead to a selection of over-fitted specifications and machine learning techniques can be used to select a robust set of explanatory variables from a large set of potential predictors.
We apply two machine learning approaches to deal with the tradeoff between bias and precision.These techniques support the selection of important explanatory variables, which should show a robust effect in many different specifications, ensure a precise estimation and avoid biased estimates.Identification of key determinants becomes increasingly difficult as the number regressors in the model grows (see Hastie et al., 2009).The least absolute shrinkage and selection operator (LASSO) extends the OLS estimator by a factor that penalizes the inclusion of additional regressors (see Tibshirani, 1996).We use the method proposed by Belloni et al. (2016) for panel models to determine the penal factor and apply LASSO to detect variables that stand out due to their relatively high explanatory power for the net migration rate.
We use LASSO to identify sparse models which can be precisely estimated.However, these models may suffer from a comparatively high risk of an omitted variable bias.The focus of machine learning techniques is on prediction rather than on producing good estimates of parameters.As a consequence, these approaches often give rise to inconsistent coefficient estimates.A main weakness of machine learning is that it produces stable parameter estimates only under strong and mostly unverifiable assumptions (see Mullainathan & Spiess, 2017).
We consider these risks by combining LASSO with a second approach, complete subset regression (CSR).Moreover, we do not completely rely on machine learning alone but in fact use LASSO and CRS to select explanatory variables that we include in our fixed effects models.CSR enables us to identify important factors in addition to the variables selected via LASSO.We use CSR also to check the robustness of the results across various specifications (see Elliott et al., 2013;Sala-i-Martin, 1997).CSR estimates the model given by Equation ( 2) for all different specifications (combinations of explanatory variables) for a fixed number of regressors to be included.If we choose to consider 5 explanatory variables in every specification, this implies that we estimate 142,506 different specifications , given that the overall number of potentials factors is 30.The advantage of CSR is that it allows us to mitigate multicollinearity while obtaining information on the robustness of a specific coefficient across numerous specifications.

| RESULTS
Table 1 summarizes the regression results for different age groups.As a reference we also include the estimates for all workers in column (1).All coefficients are semi-elasticities.For instance, the coefficient of the regional wage level in column (1) indicates that an increase in the regional wage level by 10% relative to the rest of the country gives rise to a change of the net migration rate by 0.05 percentage points or, in other words, an increase of net in-migration of 5 workers per 10,000 workers who reside in the region.Furthermore, it is important to keep in mind that coefficient estimates should be interpreted as conditional correlations rather than causal effects.Thus, they provide information on how we can describe regions, which suffer from a net migration loss or benefit from an important net in-migration of workers.Some explanatory variables are likely endogenous although we reduce the risk of omitted variable bias via a large number of regional characteristics and region fixed effects included in the regression analysis.However, economic theory suggests that the relationship between migration and regional disparities is interdependent.This applies in particular to labour market/economic conditions.Regional differences in wages and unemployment might influence migration behaviour, but at the same time migration likely affects local labour market conditions (see, e.g., Granato et al., 2015).
The results show that labour market conditions as well as amenities correlate with the net migration rate.
However, there are considerable differences across age groups when the impact of specific factors is concerned.The estimates for labour market indicators that are selected by our machine learning approach suggest that primarily young workers attach high importance to these attributes, confirming evidence for the United Kingdom provided by Millington (2000).We detect a significant positive correlation between the regional wage level and the migration balance only for the two youngest age groups.The older workforce seems to bring other local characteristics to the foreground when deciding on the place of residence.This is in line with previous findings for the United States by Chen and Rosenthal (2008) and Clark and Hunter (1992).Moreover, we observe for the youngest workers (< 25 years) that they seem to prefer regions that offer an extensive supply of apprenticeship training positions.The sectoral structure of the regions also matters.An increasing share of the primary sector and other low-knowledge industries tends to go along with a declining net migration rate.This applies in particular to the age group below 25 years.Being specialized in low knowledge-intensive services, in contrast, seems to increase the attractiveness of a region for specific age groups.
Furthermore, the estimates point to a robust negative correlation between changes in the net migration rate and changes in population density across age groups.For workers between 25 and 39 years, we also detect an important positive effect of density in neighbouring regions. 7This implies that suburbanization trends distinguish the migration behaviour of these age groups.They tend to leave dense metropolitan areas and often move to the urban hinterland of large cities.This is in stark contrast to the migration behaviour of the youngest age group.Workers aged below 25 years tend leave regions, which border on large metropolitan areas.At the same time, there is also a negative impact of population density in the region itself on the net migration rate of these workers.However, the latter effect is less robust compared with other age groups. 8 There is also a negative correlation between the net migration rate of the age groups 25-39 years and the share of young inhabitants (< 25 years).The latter result might be driven by the migration of young graduates who leave the region of study/vocational training after completion and a first phase of their career.Workers in their thirties also seem to prefer regions showing a high share of foreign inhabitants and a high voter turnout.The latter also applies to the oldest age group.These findings may be interpreted as indicating the role of political and societal participation and the benefits of cultural diversity in the region of residence.
Evidence on crime being a disamenity that affects migration behaviour is rather weak.The variable is only selected for the age group 30-39 years, and the corresponding negative effect is not precisely estimated.However, there are also estimates, which point to an impact of (urban) amenities.The importance of gastronomy, the creative economy, places of (touristic) interest and the availability of recreation area correlate positively with the net migration rate of different age groups, with a slight indication that amenities might be more important for younger workers, which is in contrast to findings by Niedomysl (2011) for Sweden.For instance, the public financial capacity of the region, which is used in our analysis to approximate the provision of public services and infrastructure, is only selected as an influential factor for the youngest age group.
As regards the housing market, only the flat size is selected as an influential factor for the period from 2004 to 2017.It is noteworthy that regions with an above-average supply of small flats seem to be rather attractive for almost all age groups, but especially for the youngest workers.This might reflect the availability of affordable housing, which is likely important, in particular for households that possess only a small budget.Restricting the analysis to the period after 2008 enables us to examine the role of housing prices and childcare facilities.Evidence on relevant effects is, however, fairly weak for these variables.We observe a negative correlation between changes in the net migration rate and changes in childcare infrastructure (significant for workers < 25 years) for all age groups apart from the workforce between 30 and 39 years, but most effects are not precisely estimated.We also detect a negative correlation between the house price index and a region's migration balance across all age groups, which is also precisely estimated at the 5% level for the workers aged 30 to 39 years. 9 7 When attributes of neighbouring regions are concerned, only the spatial lag of the population density turns out to significantly influence the net migration rate of different age groups.Other spatially lagged explanatory variables are not selected by the machine learning techniques. 8Population density does not correlate with the net migration rate of worker aged below 25 years in a regression model that only includes population density and its spatial lag. 9 See Table A4 in the appendix.
To evaluate the robustness of the findings in more detail, we apply CSR, which provides information on the percentage of significant estimates and on the variation of the coefficient estimates across various specifications of our migration model.In the following, we focus on the youngest workers aged below 25 years. 10Figure 1 and Source: Own calculation using the R-package rrsim by Thomas de Graaff, IEB and regional database described in Table A1 in the appendix.
the main explanatory factors that have beforehand been selected based on LASSO (cf., Table 1) and are marked by an astrisk, and five additional variables are included.While Figure 1 shows the incidence of significant estimates at the 5% level and the sign of the correlation, Figure 2 indicates the size and range of standardized coefficient estimates.
Figure 1 reveals that the results of those explanatory variables that we identified as important factors in Table 1 are also characterized by a high degree of robustness.Altogether there are 12 variables, which are precisely estimated with the same sign in every regression.Some additional characteristics turn out to show a significant correlation only in some specification such as the fiscal capacity of the region and the share of knowledge-intensive services.A third group of variables does not correlate with the net migration at all.It is important to take notice of the potential determinants that are not chosen at all with our approach, such as the regional unemployment rate, meteorological indicators and recreation area in case of the young workforce.Figure 2 more or less confirms the previous results.The variation of coefficient estimates is moderate for the majority of factors identified as influential.Rare exceptions include the population density and its spatial lag.Thus, we notice that robustness also refers to the size of the 'effect' for the majority of variables.
Table 2 shows the findings of a regression analysis that differentiates between three skill groups.To provide results by skill group, we restrict our sample to employed workers aged above 25 years.A significant percentage of the age group below 25 years has not yet completed university education or apprenticeship training.These workers would be assigned to the low-skilled workforce, thus introducing a substantial measurement error.The number of relevant labour market indicators declines once we exclude workers below age 25 years.This corresponds with the above average importance of labour market conditions detected in Table 1 for the youngest workers.However, the economic structure still matters for migration in the reduced sample.The low-and medium-skilled workforce tends to prefer regions specialized in low-knowledge services, which might thus offer many job opportunities for these skill groups.
We do not detect important effects of the economic structure on the net migration rate of high-skilled workers.
The coefficient estimates for population density and its spatial lag confirm the results for different age groups.
We detect a significant negative correlation for all skill levels.However, there are important differences in the size of the 'effect' in absolute terms. 11Changes in population density seem to correlate in particular with changes in the migration balance of high-skilled workers, while we observe the smallest coefficient in absolute terms for the medium-skilled employees.A robust positive correlation between the share of the foreign population and the migration balance is also visible for all skill groups.An above average importance of gastronomy seems to increase the attractiveness of locations for low-and medium skilled workers.This result might point to the influence of an amenity (Buch et al., 2017), but could also be driven by corresponding job opportunities relevant to these skill groups.The latter interpretation is in line with weak evidence on a robust relationship for the high-skilled workers.In contrast, there is some indication that the crime rate becomes more important for the net migration rate as the skill level of the workforce increases.Cullen and Levitt (1999) provide corresponding evidence for the United States.
Finally, we observe a significant negative correlation between the net migration rate and the average flat size per capita for the medium-and high-skilled workforce.

| DISCUSSION
Our results point to a robust negative relationship between the net migration rate and population density, yet locations in close proximity to large urban centres seem to be rather attractive destination regions.Thus, agglomeration disadvantages seem to prevail when labour migration in Germany is concerned, conditional on a number of covariates.From a theoretical perspective, it is ambiguous how density influences the migration balance (see Glaeser & Gottlieb, 2006).Congestion of the local infrastructure and environmental problems may accompany high density and act as centrifugal forces, giving rise to low net in-migration (Brown & Scott, 2012).Moreover, high housing costs may contribute to net migration losses that we tend to observe for large cities.On the other hand, density may give rise to agglomeration advantages such as intense interaction and knowledge transfer from which the region's migration balance should benefit as they represent centripetal forces (see, e.g., Buch et al., 2017).
Alvarez and Royuela (2022) note that new economic geography models also have implications for interregional migration and the role of centrifugal and centripetal forces in this context.For instance, Crozet (2004) describes migration decisions as a function of wages, unemployment, transport costs and the region's access to markets.He shows that access to markets significantly affects interregional migration in Europe, in line with the forward linkage in discussed in Krugman (1991).In this analysis, in contrast, agglomeration disadvantages seem to outweigh benefits when migration behaviour is concerned.However, the negative net effect of agglomeration might at least partly be caused by the inclusion of other factors in the regression model that correlate with agglomeration benefits such as the wage level and different amenities.
In contrast, the utility that a location outside large cities offers as a place of residence seems to increase as the distance to major metropolitan areas declines.These findings suggest that, for rural regions, low density is a factor that promotes net in-migration of workers conditional on other determinants of migration behaviour.However, it is primarily those rural areas close to metropolitan regions that benefit from this constellation, and it does not apply to same extent to all age groups.Areas bordering on large cities offer good access to agglomeration advantages while workers at the same time avoid locational factors that negatively affect quality of life in metropolitan regions.The underlying suburbanization processes are primarily driven by high-skilled workers and the age group between 25 and 39 years.In contrast, for young workers aged below 25 years these regions do not seem to offer a particularly high utility level.For the youngest workforce the negative correlation between density and migration balance is less robust, suggesting that they might benefit more than other age groups from agglomeration advantages not captured by other explanatory variables.Large cities offer a broad range of training opportunities and jobs, which are of special importance for early career workers.
Moreover, labour market conditions and some amenities are significantly correlated with the region's migration balance.The role of specific factors for migration behaviour seems vary across age groups.We observe that the wage level and, in particular, facilities for vocational training matter primarily for young workers (see also Chen & Rosenthal, 2008 for corresponding US evidence).However, even for the youngest workers, these factors are not more important than amenities that reflect the recreational value of the location.It is also noteworthy that we do not detect an impact of regional unemployment on the net migration rate.Interestingly, this applies to all groups of workers considered in our analysis.This is in line with evidence provided by Buch et al. (2017) and, at least partly, by Arntz (2010).
We identify heterogenous effects for the sectoral structure of the region.While the youngest workers show disproportionate out-migration from (rural) regions with an above average share of the primary sector, low-and medium-skilled workers seem to prefer locations specialized in low-knowledge services, which probably offer many employment opportunities, especially for the former group.
Within the group of amenities, indicators for cultural facilities, places of interest and a variety of gastronomic offerings show a robust positive correlation with a region's migration balance.Locations with a relatively high share of foreign population are also often among those regions with a significant net in-migration of workers (see also Buch et al., 2014, for corresponding evidence on German cities).We interpret this variable as indicating a diverse supply of consumption goods and services (see Alesina & La Ferrara, 2005;Ottaviano & Peri, 2005).However, the relationship between cultural diversity and the net migration rate seems to be driven exclusively by the age group between 30 and 39 years.The migration balance of workers beyond their twenties also correlates positively with our measure for political and societal participation, the voter turnout.And finally, the regression results point to an adverse effect of crime on the liveability that a location offers (see also Cullen & Levitt, 1999 for the United States).
Altogether, we should not overrate the strength of all 'effects'.The size of the coefficient estimates indicates that the impact of a specific factor on the net migration rate is moderate.For instance, an increase of a region's wage level by 10% relative to all other regions improves the migration balance by 5 net immigrants per 10,000 workers living in the region.This is small in comparison to the migration losses that many areas face.On average the migration loss amounted to 20 persons per 10,000 workers among the quarter of regions, which experienced the highest net out-migration between 2014 and 2017.

| CONCLUSIONS
Our findings point to substantial spatial disparities in the net migration rate.However, rural areas do not show net out-migration of workers in Germany per se.There is a considerable heterogeneity between rural regions when labour migration is concerned.While rural areas close to large metropolitan regions tend to experience high net migration gains, rural sites in the periphery suffer from considerable net out-migration, as do cities on average.
This suggests that access to large urban centres might be a key factor.The results of our regression analysis also indicate that there is a robust relationship between a region's net migration rate and different labour market/ economic conditions as well as (dis)amenities.However, the importance of specific factors varies significantly between age and skill groups.
Rural regions close to large metropolitan areas have great appeal, especially to high-skilled workers and those aged between 25 and 39 years, that is, parts of the workforce that are assumed to play a decisive role in the economic prospects of regions.Thus, there seems to be no immediate need for political action for this category of rural areas.In view of high population growth, caused by net migration gains, that might increase housing prices and capacity utilization of local facilities, policy might rather focus on preserving favourable conditions in these regions.
In contrast, accessibility is likely an important starting point for policy action in those rural locations, which are not within commuting distance to a large metropolitan area.However, improving the transport infrastructure might not be the first best option in this context.Progressive suburbanization has severe environmental consequences (urban sprawl, land consumption).Moreover, rural areas do not necessarily benefit from improved physical accessibility via investments in transport infrastructure.Some studies suggest, in fact, it is primarily the large urban areas that benefit from improved traffic links between rural and urban regions (e.g., Faber, 2014).However, there might be alternative options to increase accessibility in rural sites if working from home becomes more common.
Another issue refers to the question of whether policy can effectively influence important determinants of migration behaviour at all.It is not feasible or extremely difficult to change some of the local attributes such as the wage level and the sectoral structure, which show a robust correlation with the net migration rate.Of course, well-designed (regional) economic policy might help to create attractive employment opportunities in rural areas.
However, numerous studies provide evidence on persistent spatial wage disparities, which are partly due to agglomeration effects (Hamann et al., 2019, andPeters, 2019, provide corresponding evidence for Germany).Moreover, effectively improving some local attributes might only influence the migration behaviour of specific groups of workers.This also implies that the cost-effectiveness of some policies might be relatively high simply because they focus on local attributes that matter for age groups, which attach a relatively high utility to rural locations anyway (e.g., workers between 25 and 39 years).Corresponding starting points for policy design include the recreational value of the region, public safety and measures to strengthen political and societal participation.In contrast, we detect no important effects of childcare facilities and return initiatives of local authorities for the considered groups of workers.Hence, at least on average these factors apparently do not significantly impact internal labour migration in Germany.
When net out-migration of young workers from rural areas is concerned, labour market conditions and, in particular, facilities for vocational training might be crucial.However, in view of demographic change and declining population figures in many rural areas, the financial feasibility of providing training infrastructure close to a young worker's place of residence is increasingly put into question (OECD, 2021).Our results underline the significance of access to training positions to the migration behaviour of young workers.A policy that aims at improving the migration balance of rural areas in this age group should therefore consider how to stabilize a sustainable educational infrastructure in these regions.We refrain from a detailed presentation of different approaches (e.g., training networks of SMEs, branch offices of universities) that are discussed in this context (see Daniel et al., 2019, for the German context).
Moreover, findings by Teichert et al. (2020) suggest that providing opportunities to gain knowledge about the local labour market and establishing labour market contacts before and during studies might be a possible strategy to deepen the ties of young skilled workers to the region of residence.
There is a considerable variation within the group of rural regions in Germany when it comes to the endowment with factors that positively influence the migration balance.This implies that a 'one-size-fits-all' policy for rural regions will not work.The starting point of a local strategy to improve the migration balance should therefore be a thorough analysis of locational advantages and disadvantages vis-à-vis other (rural) areas.Based on the evidence provided by such an analysis, region-specific strategies might be developed that allow for the strengths and weaknesses of the region under consideration.However, in the end we need to keep in mind that small improvements in some fields will probably not be sufficient to achieve a fundamental change of a region's migration balance, as the size of the effects, as indicated by the coefficient estimates, tends to be rather small.The creative economy includes literature, music, the performing arts, film, broadcasting service, design, architecture, the press, advertising and gaming (see Söndermann et al., 2009).

T A B L E A 1 Variable definition and data sources
T A B L E A 2 Summary statistics Source: IEB and regional characteristics (Table A1), own calculations.
T A B L E A 3 Correlation matrix (3) (4) (5) (2) (3) (4) (5)  F I G U R E A 2 Share of significant coefficient estimatesresults of CSR for all workers.Notes: Share of statistically significant coefficient estimates (p < 0.05) in a total number of 20,349 regressions.Each regression model includes all variables marked by an asterisk and five additional variables.Source: Own calculation using the R-package rrsim by Thomas de Graaff, IEB and regional database described in Table A1 in the appendix.Resumen.Muchas regiones europeas experimentan actualmente un importante descenso de la población y, en relación con ello, se enfrentan cada vez más a la escasez de mano de obra.La migración es uno de los principales factores que impulsan los cambios en la oferta regional de mano de obra y en el nivel local de capital humano.Así pues, la capacidad de una región para atraer residentes es cada vez más importante para sus perspectivas de crecimiento.Con el fin de investigar la relación entre los atributos locales y el saldo migratorio de las regiones de Alemania se utilizó un extenso conjunto de datos de panel para el período 2003 a 2017.En concreto, se examinó si los factores que determinan el saldo migratorio de las regiones difieren significativamente según los grupos de edad y cualificación, ya que es probable que varíe su contribución al capital humano regional.La especificación econométrica desarrollada puede entenderse como una formulación agregada de un modelo de utilidad aleatoria de dos regiones.El conjunto de datos incluye 30 factores que podrían influir en el saldo migratorio de una región.Dado este gran número de variables explicativas y los significativos problemas de multicolinealidad, se aplicaron técnicas de aprendizaje automático [operador de reducción y selección mínima absoluta (LASSO, por sus siglas en inglés), regresión completa de subconjuntos] para identificar las características locales importantes.Los resultados apuntan a una sólida relación negativa entre la tasa neta de migración y la densidad de población, aunque las localidades próximas a los grandes centros urbanos parecen ser regiones de destino bastante atractivas, y la magnitud de los efectos difiere significativamente según los grupos de edad y cualificación.Además, las condiciones del mercado laboral y algunos servicios están correlacionadas significativamente con el saldo migratorio de la región.Sin embargo, las primeras y, en particular, las facilidades para la formación profesional, tienen importancia sobre todo para los trabajadores jóvenes.

Figure 2
Figure2summarize the results of 20,349 regressions of the migration model given by Equation (2).In every regression, U R E A 1 Net migration rate of rural areas in Germany by age groups.Notes: The net migration rate refers to the migration of all workers captured by the IEB.Source: Own illustration based on Meister et al. (2019: Figure 6); definition of rural areas according to Küpper (2016).

Note:
With robust standard errors in parentheses.All models include region and year fixed effects.The selection of explanatory variables relies on a LASSO regression, which is combined with CSR.The superscripts (1)-(5) indicate all variables, which are selected based on LASSO, if the (sub-)sample in the respective column is considered.*p < 0.05, **p < 0.01, and ***p < 0.001.

F
I G U R E A 3 Share of significant coefficient estimatesresults of CSR for the age group 25-29 years.Notes and source: See Figure A2.F I G U R E A 4 Share of significant coefficient estimatesresults of CSR for the age group 30-39 years.Notes and source: See Figure A2.F I G U R E A 5 Share of significant coefficient estimatesresults of CSR for the age group 40-65 years.Notes and source: See Figure A2.F I G U R E A 6 Share of significant coefficient estimatesresults of CSR for low-skilled workers.Notes and source: See Figure A2.F I G U R E A 7 Share of significant coefficient estimatesresults of CSR for medium-skilled workers.Notes and source: See Figure A2.F I G U R E A 8 Share of significant coefficient estimatesresults of CSR for high-skilled workers.Notes and source: See Figure A2.F I G U R E A 9 Variation of coefficient estimatesresults of CSR for all workers.Notes: Range of coefficient estimates from 20,349 regressions, excluding the most extreme values (5%).Each regression model includes all variables marked by an asterisk and five additional variables.The dot indicates the mean estimate.Source: Own calculation using the R-package rrsim by Thomas de Graaff, IEB and regional database described in TableA1in the appendix.F I G U R E A 1 0 Variation of coefficient estimatesresults of CSR for the age group 25-29 years.Notes and source: See Figure A9.F I G U R E A 1 1 Variation of coefficient estimatesresults of CSR for the age group 30-39 years.Notes and source: See Figure A9.F I G U R E A 1 2 Variation of coefficient estimatesresults of CSR for the age group 40-65 years.Notes and source: See Figure A9.F I G U R E A 1 3 Variation of coefficient estimatesresults of CSR for low-skilled workers.Notes and source: See Figure A9.F I G U R E A 1 4 Variation of coefficient estimatesresults of CSR for medium-skilled workers.Notes and source: See Figure A9.F I G U R E A 1 5 Variation of coefficient estimatesresults of CSR for high-skilled workers.Notes and source: See Figure A9.
open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.© 2023 The Authors.Regional Science Policy & Practice published by John Wiley & Sons Ltd on behalf of Regional Science Association International.
Correlation between net migration rates and local characteristics across age groups, 2004-2017 a T A B L E 1Note: With robust standard errors in parentheses.All models include region and year fixed effects.The selection of explanatory variables relies on a LASSO regression, which is combined with CSR.The superscripts (1)-(5) indicate all variables, which are selected based on LASSO, if the (sub-)sample in the respective column is considered.a We restrict the regression analysis to the period 2004-2017 because many explanatory variables are only available from 2003 onwards.*p < 0.05, **p < 0.01, and ***p < 0.001.
T A B L E 2 Correlation between net migration rates and local characteristics across skill groups, 2004-2017 aWe restrict the regression analysis to the period 2004 to 2017 because many explanatory variables are only available from 2003 onwards.*p < 0.05, **p < 0.01, and ***p < 0.001.
Note: With robust standard errors in parentheses.All models include region and year fixed effects.The selection of explanatory variables relies on a LASSO regression, which is combined with CSR.The superscripts (1)-(4) indicate all variables, which are selected based on LASSO, if the (sub-)sample in the respective column is considered.a Summary statistics refer to the variables used in the regression analysis.All explanatory variables are logarithms of the ratio of the indicators in the respective regions to the value in the rest of the country.Results for categorial variables (return initiatives, share of students) refer to the value of the corresponding region.
Correlation between net migration rates and local characteristics across age groups, 2009-2017