How Wealthy Are the Rich? How Wealthy Are the Rich? ∗ E Equiprobable Sampling From a Power Law

Underreporting and undersampling biases in top tail wealth, although widely ac-knowledged, have not been statistically quantiﬁed so far, essentially because they are not readily observable. Here we exploit the functional form of power law-like regimes in top tail wealth to derive analytical expressions for these biases, and employ German microdata from a popular survey and rich list to illustrate that tiny diﬀerences in non-response rates lead to tail wealth estimates that diﬀer by an order of magnitude, in our case ranging from one to nine trillion euros. Underreporting seriously compounds the problem, and we ﬁnd that the estimation of totals in scale-free systems oftentimes tends to be spurious. Our ﬁndings also suggest that recent debates on the existence of scale- or type-dependence in returns to wealth are ill-posed because the available data cannot discriminate between scale- or type-dependence on the one hand, and statistical biases on the other. Yet both economic theory and mathematical formalism indicate that sampling and reporting biases are more plausible explanations for the observed data than scale- or type-dependence. JEL


Introduction
The starting point for this analysis was a conscientious effort to quantify the total wealth of the richest Germans from survey microdata. This seemingly innocuous exercise pointed us to a problem which, to the best of our knowledge, has not yet been adequately addressed in the pertinent literature. The problem arises in the top tail of wealth, generally following power law-like distributions, where survey data apparently suggest total wealth to be orders of magnitude smaller than implied by named rankings of the super-rich, often referred to as rich lists. Extrapolating the power law backward from observed top wealth levels to some unobserved minimum is asymptotically unbiased. Severe biases can arise, however, when extrapolating forward from relatively low levels to unreliable or missing maximum wealth levels (Cristelli et al., 2012). In survey data the latter typically leads to strongly downward biased estimates of wealth and inequality (Eckerstorfer et al., 2016;Vermeulen, 2018). Since we cannot quantify this effect without data that go beyond the available, we propose two limit interpretations to gauge the potential impact of this bias. In what we term the data first limit, we assume both upper and lower truncated samples to deliver unbiased estimates. Put differently, we attribute all observed differences between upper and lower truncated samples to truly existing differences in the data generating process. In the complementary theory first limit, we assume the data generating process to be homogeneous across samples on the different scales, attributing the entire observed difference to statistical bias. We show that tail wealth estimates differ by an order of magnitude, depending on which of the two pre-analytical visions we employ.
The literature so far has implicitly taken a data first stance on this issue (Eckerstorfer et al., 2016;Vermeulen, 2016Vermeulen, , 2018Bach et al., 2019). Our primary goal with this paper is to argue that a theory first perspective is at the very least equally plausible. To show this we introduce different categories of biases that affect measured inequality, and provide closed-form expressions that are readily estimated. First, we show that underreporting incentives by themselves are insufficient to generate biased estimates, as the estimate is asymptotically unbiased if the entire population unanimously underreports their wealth. Inequality is underestimated only if underreporting is more pronounced for the richest, which seems intuitively plausible as the super-rich have mightier means at their disposal to avoid taxes than the average person or household. Second, we demonstrate that differential underreporting by the super-rich indeed leads to downward biased estimates of inequality for the entire population. Finally, and most importantly, the impact of underreporting rates is highly non-linear. Even if only a fraction of actual wealth is reported, this will greatly reduce the resulting bias compared to when information on a fraction of the super-rich is missing altogether. We call the latter case undersampling, which is typical of survey data that essentially use equiprobable sampling and therefore do not adequately capture the richest individuals in power law-like regimes. We also show that logarithmic sampling would greatly improve the statistical quality of wealth surveys. The named rich lists, on the other hand, will be subject to reporting biases as they explicitly try to account for the super-rich but typically suffer from data availability and salience issues, as well as adverse (tax) reporting incentives. Without additional information, both the estimated underreporting and undersampling rates remain within plausible bounds, so the polar data first and theory first perspectives would appear equally plausible at first.
While it is hardly surprising that the two perspectives imply different estimates for top tail wealth, the difference turns out to be enormous. The lowest estimate arising from data first is around one trillion euros for Germany's top tail wealth, while the theory first estimates reach about nine trillion euros. These vast differences, spanning almost one order of magnitude in top tail wealth, are caused by tiny non-response rates on the order of a tenth of a percent. This disconcerting result suggests that aggregate findings within the data first framework can become heavily distorted by tiny degrees of undersampling. The severity of the problem extends far beyond the German dataset since our results are functions of the power law tail of wealth distributions that applies across many countries and time periods. 1 Consequently, estimates of total wealth will crucially depend on the pre-analytical perspective and should thus be treated with extreme caution. If total wealth estimates are to be stated, we believe that scientific integrity at least demands to report the range from the smallest estimates of a data first perspective to the largest estimates of a theory first perspective, especially if these estimates are intended to inform economic policy or public debate.
The ubiquity of power laws has led to numerous suggestions for potential generating mechanisms, reviewed for instance by Gabaix (2009) or Luttmer (2010). In the case of top tail wealth, any candidate mechanism should be based on a property that is common across the various time periods, countries, or proxies of wealth. One common property, at least across the different varieties of capitalism, concerns the primary types of assets in super-rich portfolios, namely entrepreneurial stakes, financial assets, and speculative (that is non-owner occupied) real-estate, which are perpetually reinvested into or reallocated among these asset classes (Davies and Shorrocks, 2000;Wachter and Yogo, 2010). 2 Thus a random growth model featuring a multiplicative component seems to be the most adequate candidate for a sensible generating mechanism. The idea to explain the emergence of power law tails with stochastic multiplicative processes has a long history but has fallen out of fashion in economics for many decades, essentially for its lack of microfoundations. Yet random multiplicative growth has recently regained traction within economically motivated partial and general equilibrium models that endogenously generate power law tails in wealth from stochastic capital or asset accumulation (Levy, 2003;Levy and Levy, 2003;Nirei and Souma, 2007;Nirei, 2009;Benhabib et al., 2011;Toda, 2014;Piketty and Zucman, 2015;Hubmer et al., 2016;Aoki and Nirei, 2016;Benhabib and Bisin, 2018).
The literature on random multiplicative growth has typically placed weak restrictions on the particular form of return distributions governing the stochastic process. One notable exception, however, is the assumption of an equilibrating tendency for the expected (riskadjusted) rate of return or, in more technical terms, of a homogeneous return distribution across wealth portfolios. This is consistent with the classical notion of competition, the implications of (semi-strong) informationally efficient capital markets, and the idea that investors' superior talent in either fundamental or technical analysis cannot lead to excess returns over extended periods of time (Fama, 1965(Fama, , 1970(Fama, , 1991. Indeed, as Levy (2003) and Levy and Levy (2003) show both experimentally and via Monte Carlo simulations, the scope for differential talent is very limited in light of power law distributed top wealth. If one group of investors were to consistently outperform another group of less talented investors in terms of their expected returns by only a tiny margin, the functional form of the emergent stationary distribution would differ significantly from a power law and exhibit concavity on double-logarithmic scale. 3 Hence the defining characteristic of the 2 From an accounting standpoint, this perpetual reallocation and investment is closely related to saving and there is a consensus in the literature that propensities to save are strongly positively correlated with (lifetime) income or wealth (Dynan et al., 2004;Jappelli and Pistaferri, 2014). This also holds for entrepeneurial households (Quadrini, 1999). As a major reason for this relationship, Deaton (2003) identifies credit constraints that are only binding for low wealth households and individuals. Alan et al. (2015), on the other hand, provide a critical discussion of the identification strategy and find no differential savings behavior with respect to long-term income. The major limitation of Alan et al. (2015) is the exclusion of the wealthiest one percent that we are primarily concerned with here.
3 They consider two Gaussian return distributions that merely differ in expected value, showing that already a difference by one percentage point in expected returns leads to a stationary distribution that significantly differs from the Pareto type. theory first perspective is to assume a homogeneous return distribution, thereby implying equivalent data generating processes across samples.
A more recent strand of literature has started to challenge the homogeneity hypothesis on both theoretical and empirical grounds. Bach et al. (2017) and Fagereng et al. (2020) find excess risk-adjusted returns for the wealthiest portfolios, the latter even claiming persistence in abnormal returns, indicating persistent heterogeneity in financial information and talent if we take the data at face value. From a more theoretical perspective, Luttmer (2011) and Gabaix et al. (2016) build on the well-known limitation of random growth models to typically generate very slow transitions. The former puts this in terms of the stationary distribution of assets, with a half-life of assets that would be way too high from an empirical point of view, while the latter argue (formally equivalently) that the rate of convergence to the new stationary distribution after a shock to the variance in the permanent component of earnings is too slow to account for the observed rapid rise in top-level income inequality. Gabaix et al. (2016) and Jones and Kim (2018) thus put forward the hypothesis of heterogeneous returns to explain the observed rise in income and wealth inequality, whereby excess returns are either correlated with wealth levels ("scaledependence") or result from differential talent ("type-dependence"). In informationally efficient capital markets, scale-dependence can only occur when the set of investment opportunities increases in wealth. Hedge funds and some private banks perhaps provide anecdotal evidence, as hedge funds typically require high minimum investment inlays (King and Maier, 2010), while some private banks like JP Morgan Chase require their private clients to hold at least ten million dollars (Glazer, 2016). Concerning type-dependence, Gabaix et al. (2016) circumvent the formal problem that differential talent is inconsistent with a Pareto distribution by essentially assuming that "high growth types" only stay in the high growth regime for a limited amount of time and cannot return there. This idea not only lacks theoretical appeal, it also introduces another degree of freedom into any empirical investigation that now has to justify after how many periods of abnormally high returns one can safely claim type-dependence.
Moreover, given that our data lack information on investors' sophistication, this notion of type-dependence is phenomenologically equivalent to scale-dependence since we cannot control for investors' ability. Put differently, we cannot distinguish between the hypothesis that individuals are rich because of their excess returns, and the alternative hypothesis that they have excess returns because they are rich. We will thus only focus on testing for scale-dependence. This hypothesis corresponds to the data first interpretation, as observed differences between high and low scale samples are then assumed to reflect true differences in the data generating process, that is to say scale-dependent random growth. We will argue, however, that the idea of scale-dependent growth is not only problematic from a formal point of view, but that it also violates economic intuitions like informational efficiency or the classical concept of competition that predicts a tendency for the equalization of returns. Theory first leaves these economic intuitions intact by attributing observed deviations in the data to statistical biases arising from undersampling and underreporting, and also casts a different light on the apparently reversed risk-return trade-off that we observe in the data.
The remainder of this paper is organized as follows: section 2 derives the biases in estimates of the tail exponent that arise from underreporting and undersampling, respectively. Throughout the paper we have relegated all derivations to the appendix in order to emphasize important conceptual differences over technical detail. Section 3 introduces the data and discusses our estimation procedure. Our results are presented in section 4, where we put forward two mutually exclusive yet on their own reasonably plausible explanations for the observed behavior in the data. Section 5 discusses the implications of our results for existing work on top tail wealth, and concludes with the suggestion to improve future surveys through logarithmic sampling.

Model
The data first and theory first interpretations are purposefully designed to be antithetical, although we will show that the formal explication of both interpretations can be reduced to conceptually closely related mechanisms that affect measured tail inequality at different stages of empirical estimation. For both interpretations, Zipf's (1949) law with a tail exponent of unity is an attractor for a parsimonious stochastic multiplicative process that does not exhibit scale-dependence in accumulation or reporting. Consequently, an observed tail exponent that differs from unity implies scale-dependent behavior in both frameworks. Data first attributes this to scale-dependent stochastic growth at the level of accumulation, while theory first assumes that it is fully caused by scale-dependent reporting behavior at the level of measurement. Though the mechanisms are formally quite similar, the pre-analytical vision obviously differs substantially between narratives.
Within the data first framework, the measured tail inequality is a sufficient statistic for both the true (snapshot) inequality among the richest and scale-dependence within the wealth accumulation process over time, because tail inequality is intricately linked to the nature of the underlying stochastic process of multiplicative growth. In Appendix A, we consider a standard drift-diffusion process to show that the tail index of the stationary power law distribution is uniquely determined by the expected return and variance of the stochastic growth process. As Gabaix (1999) shows, and we rederive in greater detail in Appendix A, the stationary distribution of the right tail for this type of general process is a power law, with tail index α given by whereγ is the average wealth growth rate and γ(w) the (normalized) mean growth rate for a given wealth level w. Expression (1) has intuitive comparative statics with respect to the degree of scale-dependence in both mean growth rates γ and variance σ 2 . Whenever the expected (excess) mean growth rate γ(w) −γ increases in wealth, the tail exponent decreases and stationary inequality rises. Thus positive scale-dependence in expected returns increases measured inequality. When variance exhibits positive scale-dependence, ∂σ 2 (w)/∂w > 0, tail exponents increase and system-wide inequality hence decreases. Zipf's law with α = 1 is an interesting limit case for a situation without any scale-dependence (positive or negative), that is γ(w) =γ, ∀w ∈ R + , and ∂σ 2 (w)/∂w = 0. These two conditions are typically called Gibrat's law after the seminal study by Gibrat (1931). Therefore Gibrat's law in growth rates is a sufficient condition for Zipf's law to hold in wealth levels. Córdoba (2008a,b) proves that it is also a necessary condition. The data first interpretation takes the Zipf benchmark in the stationary distribution as an indication for scale-independence, while statistically significant deviations are evidence to the contrary. The data first approach thus implicitly assumes that the measured tail inequalityα is equivalent to the true stationary α or, at least, that the estimate is not systematically biased in any direction. The polar theory first interpretation assumes no systematic scaledependence of either type, that is α = 1, and attributes significant deviations from this Zipf benchmark to underreporting and undersampling biases. While the relevance of distorted or missing observations has already been argued on empirical grounds and in Monte Carlo simulations (Vermeulen, 2016(Vermeulen, , 2018, we derive closed-form expressions here that quantify the resulting bias in the tail exponent when the number of observations, denoted N , becomes large. 4 In addition, we will differentiate between unanimous and differential reporting behavior on the one hand, and underreporting versus undersampling on the other. Since undersampling or underreporting rates are impossible to estimate by the very nature of the problem, we consider three stylized scenarios that are analytically tractable: i) unanimous (proportional) underreporting, ii) differential (proportional) underreporting and iii) undersampling.
First, we consider the case of unanimous underreporting, that is all respondents only report a fraction ρ of their wealth. Call this fraction the reporting rate. We show in Appendix B that this leads to an unbiased estimator of the tail index, hence unanimous underreporting does not pose problems for the estimation of inequality. This holds symmetrically for unanimous overreporting, ρ > 1, also showing that the estimator is invariant with respect to inflation. Whenever there are differential reporting rates, however, the bias is unambiguously positive and thus underestimates inequality. We call this case differential underreporting. For this, consider the case where the upper q-quantile of the wealth distribution only reports a fraction ρ of their wealth, from whence we show that for large N the estimated tail index, now denotedα du , will differ from the Zipf benchmark such thatα with q and ρ ∈ (0, 1), and the additional restriction that ρ > q. The latter restriction is needed to preserve the minimum of the true power law distribution on which the maximum likelihood estimator (MLE) is anchored. It is easily verified thatα du is always upward biased compared to Zipf's law for these parameter restrictions, implying that true inequality is underestimated. The effect of varying the parameters is also quite intuitive: an increase in q for a given ρ and a decrease in ρ for a given q increases the bias, as in both cases relatively less wealth is reported for the richest. 5 Furthermore α is only unity when either q = 0 or ρ = 1, so there is no differential behavior to begin with. Thus, when it comes to underreporting, the differential behavior of the very richest compared to the relatively less wealthy is necessary to cause upward biases from the theory first perspective. While we cannot derive analytical expressions for ρ < q in general, this is possible for the limit case of ρ = 0. In our stylized scenario, this would correspond to a case where the upper q-quantile is non-respondent and the wealth distribution is therefore q-truncated. In this case of differential undersampling or non-response, the richest quantile is not included at all in the sample, corresponding to a reporting rate of zero. This scenario actually appears to be empirically relevant, and connected to sampling and social desirability biases (Kennickell and Woodburn, 1999;Eckerstorfer et al., 2016;Vermeulen, 2016Vermeulen, , 2018. As we show in Appendix B, non-response leads asymptotically to a (strong) upward bias in the MLE of the tail exponent, now denoted bŷ for large N and q ∈ (0, 1). For this parameter range of q,α nr is always upward biased compared to the Zipf benchmark, and monotonically increasing in the quantile q of nonrespondents. The quantile q of upper non-or underreporting individuals is thus the only formal difference between the competing narratives of data first and theory first. 6 Ifα = 1, data first implicitly assumes q = 0 and therefore attributes all the observed deviation from the Zipf benchmark to scale-dependence in either mean or expected returns. In contrast, theory first takesα = 1 to imply q = 0 and therefore differential reporting behavior according to sample inclusion rates and the level of wealth.

Data and Estimation
To test the hypothesis of scale-dependence, we examine two samples covering distinct scales in the upper tail of the German wealth distribution. We need two non-overlapping samples that both exhibit power law-like top tails, as is often the case for surveys and rich lists. The German data described below comfortably meet this requirement as the minimum wealth level in the rich lists is about three times as large as the maximum wealth level in the surveys. Non-overlapping samples are necessary to isolate potential scale effects in the accumulation of wealth, and to ensure that we consider two distinct sets of wealth portfolios. The latter condition minimizes potential Type II error in hypothesis testing, because failure to reject the null hypothesis of insignificant scale differences could otherwise arise from the simultaneous presence of identical wealth portfolios, thereby affecting the estimated parameters in both sample types.

Data
The Socio-Economic Panel (SOEP), compiled by Deutsches Institut für Wirtschaftsforschung, is probably the most prominent source for microdata on German households and individuals. The , 2007 and 2012 waves of the panel include items on personal wealth that we use in our analysis. Assuming different weighing and imputation techniques for the market value and disaggregation to individual values, the SOEP sample claims to be representative of the entire German population, implying that each person or household in Germany is chosen with equal probability (Frick et al., 2007). With a total population of 82.5 milllion in Germany and about 25, 000 individuals in the sample, the sampling ratio thus corresponds to about 0.035 percent (Statistisches Bundesamt, 2017). While the SOEP sample probably provides a reasonable approximation to the distribution of wealth for the majority of Germans, it is well known that wealth data from household surveys become increasingly inaccurate for the tails of the distribution (see, e.g., Davies and Shorrocks, 2000). Casual empiricism indeed suggests that the reported maximum wealth level in the SOEP of around seventy million euros is far from being "representative" of the richest Germans, whose fortunes are about three orders of magnitude larger according to the rich lists compiled by manager magazin. These named lists rank the five hundred richest Germans according to their net wealth in the years 2010 to 2016. Since the rich lists are not curated for statistical inference, the data likely suffer from numerous issues regarding their consistency both in the time-series and cross-sectional domain. We discuss both datasets and their respective limitations in Online Appendix H.

Estimation
Our empirical analysis starts with the parameter estimates of the power law distributions in the upper tail of the SOEP and manager magazin samples. We interpret these as the stationary distributions resulting from a general random growth process, as described in Appendix A. The assumption that the empirically observed state coincides with the stationary state of the distribution for time t → ∞ is frequently challenged though. Especially Gabaix et al. (2016) and Luttmer (2011Luttmer ( , 2018 show that the convergence to a new stationary distribution from a shock resulting in deviations from the steady-state is extremely slow. A back-of-the-envelope calculation in Luttmer (2018) suggests that for a firm size distribution close to Zipf, but with slightly thinner tails, a shock to the aggregate capital stock would be extremely persistent with a half-life of around seventy years, implying unrealistically low rates of recovery. 7 Given slow convergence, it is questionable whether the empirical distribution truly reflects the dynamics of an underlying random growth process or whether it is merely in a transient state to stationarity. On the other 7 For a distribution that is exactly Zipf, there would be no recovery at all. hand, as Levy and Levy (2003) show, the convergence to the approximate power law is much faster even though convergence to the asymptotic distribution is indeed very slow for these types of random growth processes. Levy and Levy (2003) understand approximate convergence as convergence to a distribution that cannot be statistically distinguished from the stationary state by means of a Kolmogorov-Smirnov (KS) test. If the parameter estimates are at least approximating the true stationary state of the random growth process, the pronounced differences we find between samples will not be mere artefacts of one distribution being in a transient state but not the other, thus reflecting genuine differences in reporting, sampling, or the underlying growth process.
Given the diffusion in Appendix A, we reject the null hypothesis of scale-independence forα significantly different from unity. This procedure is advantageous in the sense that it relies on observables to test for scale-dependence and thus allows inferences about the (at least partially) unobservable random growth process. Additionally, we also consider the distribution of growth rates in wealth to judge whether scale-(in)dependence characterizes the wealth accumulation process. Notice that the diffusion in Appendix A requires us to consider scale-dependence in both expected value and risk, which are readily measured by the MLEs of the location and dispersion parameters of the growth rate distributions.
We estimate the tail exponent of the power law using maximum likelihood. 8 For the estimation of the minimum wealth level w min in the SOEP sample, we use the standard suggested by Clauset et al. (2009), yieldingŵ min = 280, 000 euros for the 2002 sample, w min = 200, 000 euro for the 2007 sample, andŵ min = 180, 000 euro for the 2012 sample. 9 It seems reasonable to assume that a net worth of around 200, 000 euros already gives rise to primarily multiplicative returns, especially considering that most households hold their wealth in the form of owner-occupied housing. Since the rich lists should be characterized by power laws, we do not estimate w min but rather take it directly from the data, so w min simply corresponds to the minimal wealth level in each rich list, ranging from 200 to 250 million euros in the different years. 10 The minimum in the rich lists is thus three orders of magnitude larger than in the surveys.
Finally, we would expect the distribution of wealth growth rates to be Laplacian (or double-exponential) since we measure wealth growth by the logarithmic difference in wealth levels, that is It can be shown that log(w) follows an exponential distribution if w follows a power law, and that the difference between two exponentially distributed variables is Laplacian (Kotz et al., 2001). The symmetric Laplace distribution for returns r then has a probability density function (PDF) that is given by where m ∈ R and σ > 0 are location and dispersion parameters, respectively. 11 From a conventional point of view, m measures the expected return in a set of wealth portfolios, while σ measures the associated risk in these portfolios. Our estimation strategy considers the cross-sectional distributions of wealth in both samples, each interpreted as the outcome of a parsimonious random growth process like the one described in Appendix A, whose realizations are at least partially unobservable. The estimated tail index then allows us to infer scale-(in)dependence within this unobservable process. Moreover, using the Laplace estimates from the actual growth rate distributions, we can test parametrically for scale-(in)dependence in expected returns or risk, and we also employ several nonparametric tests.

Results
Our parametric estimation strategy is based on the two distributional regularities in the upper tail of cross-sectional wealth portfolios, namely the power law distribution in wealth levels and the Laplace distribution of portfolio returns, because the respective empirical densities are reasonably in line with the theoretically expected functional forms. The observed complementary cumulative distribution functions (complementary CDFs) above the minimum thresholds w min are approximately linear on a double-logarithmic scale, indicating power law-like patterns for the richest individuals in both samples (see Online Appendices I and J). The empirical densities of returns to wealth portfolios are also reasonably well approximated by the expected Laplace distribution. This is readily indicated by their (symmetric) tent shape on semi-logarithmic scale that is characteristic of the Laplace, and shown in Appendix C. Apart from mere visual inspection, the standard procedure to test for a Laplace distribution is to fit an exponential power (or Subbotin) distribution to the data (Subbotin, 1923). Since the Subbotin distribution includes the Laplace as a special case when its shape parameter equals unity, an MLE fit of the Subbotin parameters provides a convenient test. As we show in Appendix C, a shape parameter of unity cannot be rejected in any of the considered cases, so our findings should not be distorted by systematic deviations from the parametric forms we impose for the estimations.

Distributional Results
We estimate the parameters for the power law distribution separately for the survey and the rich list. The tail indices are estimated via maximum likelihood, employing the respective empirical minima from the rich list, and using the procedure described in Clauset et al. (2009) to estimate the respective minimaŵ min in the survey. Tables 1 and 2 report tail index estimates for the survey tails and rich lists, respectively. Two peculiarities stand out. First, normality of standard errors for the tail index estimates (De Haan and Resnick, 1997) implies that Zipf's law (with α = 1) can be rejected with at least 95% confidence in all survey years. Wealth in the survey tails therefore appears more equally distributed than scale-independent growth would imply. Second, the wealth maxima in the survey are not even on the same order of magnitude as the wealth minima reported in the rich lists. These implausibly small maxima indicate severe  undersampling (or rather the complete absence) of the super-rich in the survey, and are a major reason for the relatively low degree of measured inequality in the survey tails. Tail index estimates for the rich list, on the other hand, stand in stark contrast to those for the survey. As shown in Table 2, we cannot reject Zipf's law at the usual significance levels in any of the years other than 2013 and 2016, with Zipf's law being only barely rejected in 2013. 12 In the language of the stochastic accumulation process, Zipf's law indicates scale-independent returns among the super-rich. Yet significant deviations from Zipf's law in the survey tails point to scale-dependent wealth returns within the survey populations, and obviously also to scale-dependent returns between the two sample types. Hence we consider the distribution of wealth returns in the two sample types, and to facilitate comparison we construct wealth returns over five year intervals. Several nonparametric tests reject the null hypothesis of distributional equivalence between the two sample types in both periods, but fail to reject it within the samples between periods (cf. Online Appendix K). Apparently the data suggest that wealth dynamics are time-invariant but scale-dependent between sample types. The parameter estimates for the Laplace distribution of wealth returns, summarized in Table 3, strengthen the impression from the non-parametric tests. The estimates for the location parameter m (the "average" return) and the dispersion σ (the "average" risk) do not vary much within the respective samples, yet vastly differ between the two sample types. While average returns do not significantly differ from zero in the survey tails, zero can safely be rejected at the five percent level in the rich lists, where m is significantly positive, implying that Germany's super-rich on average became wealthier during the considered period. Paradoxically, however, σ is significantly lower in the rich lists than in the survey tails, apparently indicating that super-rich portfolios are less risky than the ones in the survey tails. 13 So how can we interpret these findings?  Table 3. Maximum likelihood parameter estimates for the Laplace distribution of wealth returns, with standard errors in parentheses. While a location measure or "average" return of zero cannot be rejected for the survey tails, it is significantly greater than zero in the rich lists. Note that the dispersion of returns, that is the "average" risk across portfolios, is markedly lower in the rich lists than in the survey tails.

Data First
Taken at face value our findings indicate that the accumulation process is scale-dependent in the survey study but scale-independent in the rich list, where we find higher average returns and lower volatility compared to the survey. From a theoretical point of view this is puzzling. How plausible is it that the investment strategies of the super-rich converge to roughly equivalent risk profiles that not only outperform other (still rather) wealthy individuals, but do so at a lower risk? The conventional rationale for the risk-return tradeoff, as for instance in the canonical intertemporal capital asset pricing model of Merton (1973), suggests that the conditional expected excess return should grow linearly with its conditional variance. 14 But both the non-parametric tests as well as the parameter estimates for the Laplace distribution of wealth returns indicate that the super-rich enjoy higher expected returns at lower risk. The excess returns of Germany's super-rich cannot be explained by a higher risk tolerance, because this should be reflected in a higher dispersion of returns among the super-rich.
The data first interpretation thus not only suggests scale-dependence, but scaledependence that cannot be explained by heterogeneous risk preferences alone. To explain the estimation results within the framework of random multiplicative growth, we need to assume that financial markets are not fully competitive in the conventional sense. This would suggest that investors' talent or the increased set of possibilities that comes with being very wealthy enables the richest to persistently beat the market and achieve above average risk-adjusted returns at a lower risk. Such an interpretation would also be at odds with empirical findings on risk preferences that observe decreasing risk-aversion in wealth levels, such that higher net worth correlates positively with a higher dispersion in returns to wealth (Guiso et al., 1996;King and Leape, 1998;Calvet and Sodini, 2014). The data first interpretation thus poses a challenge to both, the empirically observed risk profiles, and the idea that financial markets with rapid feedbacks and a low degree of informational asymmetries should be close to the benchmark of a fully competitive market.

Theory First
Our central point here is that these "puzzles" can be resolved within the theory first interpretation once we agree that estimates of the tail exponent in the two samples suffer from two different sources of bias. Equiprobable sampling in the survey makes it very unlikely to observe the largest wealth levels that are necessary for reliable estimation of the tail index, as we quantify in Appendix E. Note that the probability of including the maximum wealth level for the SOEP sampling ratio under equiprobable sampling is 0.035 percent and thus practically equal to zero. Adding to this problem are concerns of social desirability biases, particularly the phenomenon that the super-rich tend not to respond to survey requests. As the probability of non-response is therefore positively correlated with wealth levels, the survey is subject to differential non-response (Kennickell and Woodburn, 1999;Eckerstorfer et al., 2016;Vermeulen, 2016Vermeulen, , 2018. These two considerations lead to undersampling, that is the largest wealth levels are not included at all in the survey sample. In contrast, the rich list is a carefully selected sample aimed at covering the super-rich, and one can therefore expect that undersampling is not an issue. On the other hand, the manager magazin staff relies on public records for their compilation of the rich list, likely underestimating the actual wealth levels for Germany's super-rich due to privacy considerations and tax avoidance that is particularly pronounced among the wealthiest (Alstadsaeter et al., 2019). Consequently, the manager magazin sample should be subject to underreporting, not undersampling. In more colloquial terms, the upward bias in the survey sample arises because the richest are not included at all in the sample, while the upward bias in the rich list arises because the richest are not included with the full extent of their wealth.
To study the relative biases arising from differential undersampling and underreporting, we plot the upward deviation from the theoretically expected tail exponent of α = 1 for different reporting rates ρ and undersampling in the (empirically motivated) quantile q ∈ (0, 0.2). The case ρ = 0 corresponds to undersampling, and is also the only case for ρ < q that we can examine along the lines elaborated in section 2.  Figure 1 supports the intuition that the relative bias is decreasing in the reporting rate ρ, since for smaller ρ a larger fraction of wealth is not reported. When ρ = 1 we recover the initial distribution from eq. (2), and there is no bias for any q. Compared to the underreporting bias, the undersampling bias is rather unexpected though. If merely 25% of wealth were to be reported by the richest q-quantile, this would lead to a disproportionately smaller bias in the estimator, indicating that tail index estimates from the rich list are in all likelihood much less (upward) biased than estimates from the survey. This is reminiscent of the finding by Cristelli et al. (2012) that the maximum in a power law is most informative. Our result is more general in the sense that even partial inclusion of these top observations by only a fraction of their true level will greatly reduce the bias in measured inequality. Given the limited impact of differential underreporting, we conclude that the true inequality of the system is substantially closer to the Zipf benchmark than the survey estimates seem to suggest, as indicated by the less biased estimates for the rich list.
Furthermore, our closed-form expression (2) that quantifies the impact of underreporting on the tail exponent also allows us to back out the reporting rates ρ for the rich lists. We assume that the upper 20% quantile exhibits different reporting behavior, in the sense that the richest one hundred Germans constitute a rather salient set on the rich list, where the manager magazin staff focuses their efforts to compile reliable data (Balz et al., 2014), and thus the effect of tax avoidance should not be compounded by rounding errors or selection bias for the considered sources. 15 So fixing q = 0.2 and further assuming that Zipf's law governs the true distribution, we obtain the reporting rates ρ in Table 4.  Table 4. Implied reporting rates ρ for differential underreporting in the rich lists, with standard errors in parentheses. For illustrative purposes, we assume that q = 0.2 and that the true distribution follows Zipf's law exactly. Except for 2016, the estimates suggest little differential reporting behavior, adding to the plausibility of the theory first interpretation.
The implied reporting rates appear to be plausible, except for the 2016 estimate that neatly reflects the qualitative change in the data collection procedure by the manager magazin staff. 16 Note the highly non-linear and perhaps counterintuitive effect of a mere twenty percent decrease inα between 2015 and 2016 that requires the implied reporting rate to more than triple, showing that the change of sampling procedures between 2015 and 2016 is a qualitative shift that would easily be missed if we were to exclusively look at the twenty percent increase in measured inequality. Thus the assumption of Zipf's and consequently Gibrat's law along with scale-independence seem entirely plausible in the theory first interpretation, especially since we cannot reject Zipf's law in any of the years other than 2013 and 2016.
Regarding the survey, theory first suggests that the deviation from Zipf's law in the SOEP data originates from undersampling (ρ = 0) such that the super-rich are entirely absent in the sample. Using the closed-form expression (3), we can infer the q-quantiles of non-respondents from the estimated tail exponents both in the survey tail, denoted q pl , and also for the survey as a whole, q tot = q pl (n pl /n tot ), where n tot denotes the size of the SOEP sample and n pl denotes the size of the survey tail (reported in the last row of Table 1). The results are summarized in Table 5.  Table 5. Implied non-response rates in the survey tail,q pl , and in the entire survey,q tot , calculated from eq. (3) under the assumption of Zipf's law, with standard errors in parentheses. Non-response rates are tiny and imply that missing merely twenty-five to a hundred of the super-rich in the survey can already explain the observed deviations from Zipf's law.
The implied non-response rates relative to the size of the survey sample are remarkably low. The effects of equiprobable sampling combined with differential non-response quite plausibly lead to non-reponse rates q tot of 0.1 to 0.4%. Consequently, the survey data are not inconsistent with the interpretation of scale-independent multiplicative growth and therefore Zipf's law in wealth levels. Since the mixture of non-overlapping Zipf samples is also distributed as a Zipf law, the theory first interpretation supports scale-independence across the entire tail of the German wealth distribution. After all, our results underline the importance of maximum wealth levels for the estimation of tail indices because failing to account for merely 0.1 to 0.4 % of the richest individuals already leads to substantial biases-and the descriptive statistics for the two samples clearly indicate that the actual response rate of the super-rich in the survey is zero. As we show in the upcoming subsection, differences in tail index estimates translate into substantial differences in estimated top tail wealth, and therefore also lead to enormous differences in measures of wealth inequality. 17

Total Wealth Estimates
How much wealth is concentrated in the power law tail? The most recent literature on this matter extrapolates the estimates from survey studies to a maximum determined from rich lists (Vermeulen, 2018;Bach et al., 2019). Even within this established methodology, three very different kinds of answers emerge depending on the pre-analytical vision one employs. In line with the literature, we use the continuous analogue of the power law distribution and integrate to derive a measure for the total power law wealth W . The minima correspond to the estimates for the survey study, while we take the maxima from the rich lists. Within the data first interpretation, we need to choose between the estimated tail exponents from the survey study and the rich list corresponding to the respective belief that either the inequality within the SOEP or the manager magazin sample is more representative of the power law tail as a whole. The theory first perspective suggests Zipf's law and thus leaves no such degree of freedom. The estimation strategy is elaborated in more detail in Appendix D, where we also detail how to estimate the population n inhabiting the power law tail.
In the data first estimations, we essentially extrapolate the power law population in-sample to the entire German population of N = 82.5 million (Statistisches Bundesamt, 2017). This simple extrapolation is justified since data first assumes no systematic nonresponse rates for the richest. The estimates for the population from the survey study reveal a relatively large power law population with a relatively homogeneous wealth distribution, while the estimates for the rich list imply a very small population characterized by an extremely heterogeneous wealth distribution. The theory first perspective implies Zipf's law for the entire top tail and attributes observed differences from this benchmark in the survey to differential non-response. We thus correct our population estimates for the survey by the estimated non-response rates. Unanimously, we find the largest estimated power law populations for this theory first perspective (see Appendix D). Both the corrected as well as the uncorrected estimates for the survey study differ up to one order of magnitude with respect to the estimates for the rich list. The correction within the theory first approach has a very limited effect on the estimated total population, resulting from the fact that the estimated non-response rates are tiny. This leaves us with three estimation strategies, each with 18 possible combinations ofŵ SOEP min and w mm max for all sampling years. Table 6 shows how the differences in the estimated power law populations and tail indices translate into differences in total wealth. We note first that especially the 2007 estimates for the SOEP are in remarkably close agreement with the latest estimates in Bach et al. (2019) based on the Household Finance and Consumption Survey (HFCS), even though our samples differ substantially from Table 6. Estimated wealth in the power law tail for combinations of minima and maxima from the respective survey and rich list samples in billions of euros (inflation-adjusted with base year 2010). Details regarding the underlying estimation strategies and parameter constellations are described in Appendix D. The estimates exhibit tremendous variation, almost spanning one order of magnitude.
theirs. 18 This is also the case where the estimation procedure for the total power law population is closest to theirs. We take this as evidence that our results are not driven by idiosyncrasies in our data and instead testify to the external validity of our approach.
Second, and more importantly, the results differ enormously between the two preanalytical visions. The estimates for the pure Zipf case are higher than the rich list estimates by at least a factor of six, in some cases even by one order of magnitude. This is primarily caused by the huge differences in estimated population, with both estimation strategies appearing to be plausible. Even when populations are not differing too much, the pre-analytical vision has a large effect on estimated total wealth, as the uncorrected and corrected estimates for the pure survey and Zipf case show, differing by up to a factor of three. Hence even state-of-the-art methods for this kind of estimation will likely severely underestimate the degree of inequality both within the richest group, and also between the top tail and the rest of the population. A case can be made (more or less convincingly) for all three estimation strategies, and it seems fair to say that total wealth estimates are influenced at least as much by pre-analytical belief as they are by the data used for estimation.

Discussion
We have shown that the pre-analytical vision decisively informs the research agenda as well as the conclusions drawn from it. So how wealthy are the rich, and are returns to wealth scale-dependent for them? As we have argued here, the proposed mutually contradicting interpretations of data first versus theory first are observationally equivalent to each other. Data first interprets the observed deviations from Gibrat's law in wealth returns, and consequently Zipf's law in wealth levels, as evidence for scale-dependence.
Theory first, on the other hand, explains these deviations through sampling and reporting biases that affect the two sample types differently. Ultimately, we cannot discriminate between the two narratives based on the data alone, and seem to face a classic instance of the underdetermination of scientific theory by evidence, featuring prominently in the philosophy of science at least since the turn of the 20th century (Quine, 1975). On the other hand, dearly held convictions in economic theory, such as the risk-return trade-off, informationally efficient markets, and the classical notion of competition that requires an equalization of rates of return, patently suggest that theory first is a more plausible explanation for the data.
The proposed differential biases cast doubt on the validity of conclusions drawn across and within sample types, both in the cross-sectional and the time series domain. Valid inference in the presence of reporting biases requires stability of parameters over time, otherwise identified trends might become spurious and instead reflect changes in bias. Since the proposed explanation of biases is behavioral and builds on empirically well-established phenomena such as salience, differential tax avoidance, or social desirability rather than being based on sampling method, there is no reason to expect stability. The estimated parameters within the theory first framework indeed suggest such variable behavioral responses over time.
Improving data availability and quality, for instance through the use of wealth or capitalized income tax data, might mitigate the severity of undersampling. Data availability then depends on the political willingness to impose such taxes in the first place, while future research will still be confined to the taxed population and a legal definition of wealth that is generally not catered towards the needs of statistical inference (Galbraith, 2019). Our results thus highlight the need to improve on survey and sampling methods, not to abandon them altogether. The recently conducted SOEP-P sample, which uses information on stock holdings to target high net worth individuals, is a first step in this direction, but still fails to adequately capture the super-rich in the targeted random sample, and thus fails to include the maximum wealth levels that we show to be crucial for valid inference. This is why data from the SOEP-P need to be complemented by the manager magazin rich lists in the hope of adequately capturing tail wealth. Yet our findings strongly suggest that the SOEP-P supplement and the resulting composite sample from the rich lists still must be scrutinized along the lines of the fundamental theory first versus data first distinction. In the end, our results cast serious doubt on simply pooling data from different sample types and comparing trends therein, which has been standard practice so far (see, e.g. Vermeulen, 2018;Bach et al., 2019;Schröder et al., 2020).
While it is not surprising that the two narratives yield different estimates for total wealth in the top tail, the magnitude of this difference comes probably unexpected for most, because power laws have the counter-intuitive property that supposedly small variations in the tail index lead to enormous variations in totals. This property is substantially compounded by tiny degrees of undersampling, here on the order of a tenth of a percent, that lead to differences in estimated total wealth by a factor of up to three. Such small degrees of undersampling are easily explained by equiprobable sampling from a power law, leading to an inclusion probability of the maximum that is on the order of a hundredth of a percent in our case, and thus practically equal to zero. Since we have shown how important the inclusion of an accurately measured maximum is for unbiased tail index estimation, this is disconcerting.
The enormous differences between total wealth estimates suggest that inferences from survey studies regarding the cross-sectional distribution of wealth and its time variation tend to be severely distorted, illustrating that discussions about the notion of "representativeness" in scale-free systems are not discussions about technical subtleties but disagreements in substance. In Appendix F, we conduct a simple analytical thought experiment for an extreme case of unrepresentative oversampling of the rich using logarithmic sampling across different orders of magnitude in wealth levels. We show that for Zipf's law, the necessary sampling ratio to surely include the maximum decays extremely fast by a power function. If the maximum is indeed as important as our analytical results on the undersampling bias indicate, even conventional oversampling techniques will be insufficient, and should instead try to implement logarithmic sampling in order to allow for unbiased estimations. After all, our results show that accurate representations of total wealth require us to be highly "unrepresentative" in the sampling of individuals.
where µ is the mean growth rate of normalized wealth, σ its standard deviation and dW t are Wiener increments. Denote by f (w, t) the distribution of normalized wealth levels at t, and by f (w) the stationary density for t → ∞. The Fokker-Planck equation is then given by For the stationary state, it has to hold that Integration yields A-1 This allows to solve for f (w) by differentiating the right term and omitting the dependence on t for the stationary density by and therefore Establishing conditions for stationarity or convergence to a power law distribution is far from trivial. Informed by his application to city sizes, Gabaix (1999) assumed both a time-invariant population size N and minimum level w min , unaware of the result in Blank and Solomon (2000) that the distribution approaches a degenerate case with α = 0 if both variables are held constant. To guarantee convergence to a stationary power-law, we follow Malcai et al. (1999) and Blank and Solomon (2000) and assume a time-invariant population size N and a time-varying reflecting boundary w min (t) that depends on the average wealthw(t) by some small constant c ∈ R + . The minimum threshold to "join the super-rich" should therefore increase over time, at the very least through inflation in the monetary value of wealth portfolios, so the latter assumption does not appear too restrictive to be of general interest. Under these assumptions, the tail exponent of the stationary density α is Substituting equation (10) in (11) yields Since we consider normalized wealth levels, µ(w) corresponds to the excess expected growth rate relative to the average growth rate across all wealth levelsγ by γ(w) −γ, implying A-2 Zipf's law emerges as a special case of growth rates characterized by Gibrat's law. This implies that the partial ∂σ 2 (w)/∂w is zero, as there is no scale-dependence in the variance. Also, Gibrat's law implies that the expected normalized growth rate, that is, the excess growth rate of wealth levels w in relation to the average growth rate, is independent of w for any w and thus must be zero, thereby implying the Zipf exponent of α(0, σ) = 1.
To confirm this, consider the general diffusion in equation (5), with µ(w) = µ = 0 and σ(w) = σ. The general Fokker-Planck equation (6) under these assumptions is It is easy to see that a density f (w) solves equation (14), whenever the differentiated term in (14) is independent of w. This is exactly the case for f (w) = C/w 2 , that is, Zipf's law with a normalizing constant C independent of w.

q-Truncation
Zipf's Law and Hill Estimator. Preliminaries. Suppose a discrete quantity w is distributed according to Zipf's law, so its tail index α equals unity. According to the rank-size formulation, its values are therefore given by with s = 1, 2, ..., N as the respective ranks of a given w in descending order, N as the number of values with N ∈ N + , and w max as the maximum value of the distribution. Equivalently, rewriting equation (15) in terms of the minimum value w min yields since for Zipf it holds that w max = N · w min . Maximum likelihood estimation (MLE) for any given (continuous) power law yields the Hill estimator (Clauset et al., 2009), that is † In Gabaix (1999), there is a minor typographical error on page 757, where the correct expression for the tail exponent in equation (13) should read γ(S), not ζ(S), like for our analogous expression in equation (13).
which is by equation (16) = now independent of w min and converging asymptotically for N → ∞ to Zipf's law, that is α = 1.

Unanimous Proportional Underreporting. Unbiasedness Result.
Suppose that a discrete quantity is perfectly distributed according to Zipf's law. All individuals report only a fraction ρ ∈ (0, 1) of this quantity (the response-rate), which implies unanimous (proportional) underreporting. The rank-size rule for unanimous underreporting thus reads with s = 1, 2, ..., N as the ranks. Notice that we also require w uu min = ρ · w min by w(N ) = w min and w uu (N ) = w uu min . The Hill estimator for α uu under unanimous underreporting by equation (19) which is the unbiased estimator of equation (17).
Differential Non-Response of the Upper q Quantile. Asymptotic Properties. Suppose that a discrete quantity w is distributed according to Zipf's law but q-truncated during measurement. The q-truncated rank-size rule therefore reads A-4 now with s = q · N + 1, q · N + 2, ...., N as the ranks. For the q-truncated distribution, the MLE for the tail index under differential non-response, denotedα nr , becomeŝ Further simplifying equation (24) yieldŝ Utilizing Stirling's approximation, in particular Ramanujan's version ln n! ≈ n · ln n − n + 1 6 ln(n(1 + 4n(1 + 2n))) + 1 2 π (Ramanujan, 1988), equation (25) now becomeŝ 4N (1 + 2N ))) + N · q · ln (N · q) − N · q + 1 6 ln(N · q(1 + 4(N · q)1 + 2 · (N · q)))) . ‡ Finally, taking the limit of expression (26) yields In the limit, the impact of N has completely vanished and the distortion of α is now only dependent on q. As we easily see, even for large values of N , the estimator is (upward) biased for any positive q, since for any q > 0, the numerator is larger than the denominator, so α > 1. The result in equation (27) shows that the upward bias is not merely an artefact of sample size, but holds true for any sufficiently large N .
Differential Underreporting of the Upper q Quantile. Asymptotic Properties. Suppose that a quantity w is distributed according to Zipf's law. Consider the case, where only the upper q-quantile is proportionally underreporting with rate ρ ∈ (0, 1). The rank-size rule is now a piecewise function for the upper q-quantile and the remaining 1 − q, yielding w(s) du = ρ · w min · N s , (28) ‡ In particular, (Ramanujan, 1988) shows that the asymptotic error for the above approximation is 1 1440 N 3 , which suffices for the current purpose.
We require that w min , the minimum of the unchanged initial distribution stays the minimum for the distribution with differential underreporting to avoid issues with the MLE which is based on this minimum. For this, the smallest reported value in the underreporting region has to be greater than w min , that is, Thus, for the minimum not to be affected, it has to hold by (*) that the reporting rate exceeds the affected population share of the highest wealth levels. By the linearity of the sum function and assuming condition (*) to hold, we obtain the Hill estimator for the tail exponentα du under differential (proportional) underreporting aŝ Further simplifying yieldŝ Utilizing again Stirling's approximation, we get with v = − 1 6 ln 8N 3 + 4N 2 + N + 1 30 +N q ln(N ρ)+(N −N q) ln(N )+N +N (−ln(N ))− Notice that for q ∈ (0, 1) and ρ ∈ (q, 1), the estimator is therefore always upward biased compared to the Zipf benchmark of α = 1. Condition (*) precludes the possibility of a negative induced bias which would result from q ln(ρ) < −1 and would be uninterpretable. For this, note that condition (*) implies ln(ρ) > ln(q), since ln(·) is monotonically increasing in its argument. It is thus sufficient to show that q ln(q) > −1. Rearranging yields Define f (q) = ln(q) + 1 q . By and we know that the function is monotonically decreasing for the whole considered interval and positive at the upper interval boundary. From (35) and (36), we can therefore conclude that f (q) > 0 for all q ∈ (0, 1). This is exactly the non-negativity constraint in (34) and thus the desired result that condition (*) implies strictly non-negative induced biases in approximation (33).
A-7 Note that returns for the rich list samples exhibit a positive median, while the median for the survey samples is indistinguishable from zero at the usual significance levels. The reason we consider the median (instead of, say, the expectation or mode) is that the median corresponds to the maximum likelihood estimate of the location parameter for the Laplace distribution (4). By the same token, the MLE of the dispersion parameter in (4) is the mean absolute deviation, not the variance. Overall, visual inspection already confirms the Laplacian nature of wealth returns, since the empirical densities exhibit a linear tent shape on semi-logarithmic scale that is characteristic of the Laplace distribution. To test parametrically for the hypothesis of a Laplace distribution in wealth returns, we follow standard procedure and consider the exponential power (or Subbotin) distribution, because the Laplace is a special case of the Subbotin when the shape parameter κ equals A-8 unity. This is readily verified from the Subbotin density given by

C Empirical Densities of Returns to Wealth
where κ, σ ∈ R + , m ∈ R, and Γ(·) denotes the Gamma function. MLEs of the shape parameter are reported in the table below, showing that we cannot reject the Laplace hypothesis in our data.  Table 7. Maximum likelihood estimates of the Subbotin shape parameter, denotedκ, for the distribution of wealth returns cannot reject the Laplace hypothesis at the usual significance levels. We employed Subbotools 1.3.0 for estimation as it delivers the most accurate and efficient estimates in simulation runs (Bottazzi, 2004).

D Total Wealth Estimates
We estimate the total wealth levelsŴ by numerical integration according tô wheref i (w) denotes the estimated PDF of the power law given bŷ whereŵ SOEP min is the estimated minimum from the SOEP sample and w mm max is the maximum from the rich list, and are common across approaches, while the estimated populationn i , and the estimated tail indexα i are chosen dependent on the pre-analytical vision. There are three possibleα i ∈ {α SOEP ;α mm ;α Zipf = 1}, corresponding to the estimates from the survey study, the rich lists and for Zipf's law, respectively. The same holds for the estimated population.

A-9
The population estimates based on the SOEP extrapolate the in-sample ratio of the power law population to the whole German population. In particular, let ω be the ratio of the estimated power law population relative to the whole sample size. For the uncorrected estimates, we set the sample size to 0.035% of 82, 500, 000 which equals 28, 875, where 0.035% is the approximate sample ratio in the SOEP surveys. The power law population in Germany is then calculated as ω · N for the uncorrected casen SOEP , with N = 82, 500, 000 (Statistisches Bundesamt, 2017). For the Zipf case, we correct ω by the estimated non-response rates in Table 5 and calculaten Zipf = N (ω + q tot ). Given the relatively low estimates of q tot , the estimated population levels do not differ too much from the uncorrected estimates. The results are reported in Table 8.

Population from SOEP
Uncorrected Corrected Population estimaten (2002) 2,745,714 2,753,993 Population estimaten (2007) 3,600,000 3,603,629 Population estimaten (2012) 3,805,714 3,820,703 Table 8. The estimates assume a total population N = 82, 500, 000 and are calculated from the in-sample power law population fractions ω, and the estimated non-response rates q pl for the corrected case.
We estimate the different population levels for the manager magazin by taking the CDF P (w; w min , α) of a continuous power law with parametersŵ SOEP min from the SOEP samples and the tail indicesα mm , the in-sample power law populations N mm and the minima w mm min from the manager magazin samples. For a specific parameter combination, n mm is then calculated asn mm = 1/(1 − P (w mm min ;ŵ SOEP min ,α mm ) · N mm ). The intuition is thatn mm corresponds to the power law population when the power law in the manager magazin sample is extended to the minima determined from the SOEP surveys. The results are reported in Table 9 below.  Table 9. The estimates are calculated from the parameter combinations of the various estimated minimâ w SOEP min in the SOEP samples, and the tail indicesα mm , the in-sample power law populations N mm , and the minima w mm min of the manager magazin samples for each of the respective years given in the Table. A-10

E Equiprobable Sampling From a Power Law
Simple Random Sampling without Replacement. Equiprobable Selection of Elements. Let N denote the total population with size N ∈ N + and S a sample out of N with size S ∈ N + and S N . The sampling procedure selects each element of the set N with equal probability and without replacement. If the maximum value of N , denoted by w max , is unique, the probability of w max to be included in the sample S, that is, p(w max ∈ S), is equivalent to the probability of any unique element to be chosen under these conditions. The inclusion probability of w max in the chosen set S is therefore given by The inclusion probability under simple random sampling without replacement for w max therefore corresponds to the sampling ratio S/N and is equal to unity only if S = N .

F Logarithmic Sampling From a Power Law
Logarithmic Random Sampling without Replacement. Assumption of Zipf's Law. Let again N with size N ∈ N + denote the total population and S the sample out of N with size S ∈ N + and S N . Furthermore, assume that the total population is now divided into v different intervals or "slices", where the length of each slice corresponds to one order of magnitude of the relevant quantity w, so the intervals are scaled logarithmically. Now the same procedure as above is applied to each logarithmic slice, that is every element in each A-11 slice is selected with equal probability and without replacement. It has to hold that v ∈ N + .
For every slice, S/v elements are included in the sample of size S, where S obviously needs to be an integer multiple of v, since (S/v) ∈ N + . The slice covering the highest order of magnitude for w also has to include w max as the maximum value. If one assumes Zipf's law to hold, this range of w includes a proportion 10 −v+1 of the total population with size N . Under Zipf's law this procedure chooses S/v elements out of a set of size N/10 v−1 . The probability to include w max in the chosen set S is therefore Therefore, the inclusion probability p(w max ∈ S) under logarithmic sampling converges 10 v−1 /v times faster to unity compared to simple random sampling. The sampling ratio S/N has to equal merely v/10 v−1 for p(w max ∈ S) = 1. For v = 2, it has to equal 0.2, for v = 3, it has to equal 0.03, and so on. Of course, this is the case because every element in the interval covering the highest order of magnitude for w has to be included in the sample.
For v = 1, this procedure obviously corresponds to the case of pure equiprobable sampling, where the inclusion probability is equal to the sampling ratio S/N , as (10 v−1 /v) = 1 for v = 1. A-12