The standard model
The likelihood function for a random sample of size n, y = (y_{1}, … ,y_{n}) ′ , from the LPN(α) distribution with density (9), is given by
It follows that the maximum likelihood estimator (MLE) of α is given by
which is a function of the complete and sufficient statistics for α, namely
Therefore,
and we can conclude that the asymptotic variance of is given by α^{2} ∕ n, which agrees with the asymptotic variance in the alphapower model.
Defining m_{i} = Φ(log(y_{i})) ∼ U(0,1), we have that the random variable G_{i} = − log{Φ(log(y_{i}))} ∼ Gamma(1,1) and hence . Consequently, is distributed according to the random variable , for n > 2. Therefore,
so that an unbiased estimator of α is given by
Similarly, we have
where Var and MSE denote variance and mean squared error, respectively. From Chebyshev's inequality, it follows that
Therefore, when n → ∞ , then , which proves that is a consistent estimator for the parameter α. Finally,
where is Cramer–Rao's lower bound for . Moreover, is a pivotal quantity for α, so that a 100(1 − δ)% confidence interval for α is given by
Expected information matrix
The expected (or Fisher) information matrix follows by taking the expectation of the elements of the Hessian matrix. Considering the quantities a_{kj} = E{z^{k}(ϕ(z) ∕ Φ(z))^{j}}for k = 0,1,2,3 and j = 1,2, θ_{1} = ξ, θ_{2} = η and θ_{3} = α, we have that the elements of the Fisher information matrix, denoted
are given by
The preceding expressions have to be computed numerically. In the particular case of α = 1, that is, , it follows that , so that the Fisher information matrix for θ = (ξ,η,α = 1) ′ is given by
which agrees with the Fisher information matrix of the powernormal distribution (Pewsey et al., 2012). Hence, by using numerical procedures, it can be shown that
so that the Fisher information matrix is not singular at α = 1.0.
However, this is not the case with the LSN model (MateuFigueras et al., 2003, 2004), for which the Fisher information matrix is singular for λ = 0. This important feature allows testing (with the LPN model) of normality using the ordinary largesample property of the likelihood ratio statistics, which states that in a large sample, it follows a chisquare distribution. Another important difference between the two models (LSN and LPN) concerns asymmetry and kurtosis ranges. Whereas the kurtosis range is wider for the LSN model, the asymmetry range is wider for the LPN model. Such differences can help one to select a more appropriate model. The upperleft 2 × 2 submatrix coincides with the Fisher information matrix of the ordinary LN distribution. Therefore, as n is large,
meaning that is consistent and asymptotically normally distributed with I(θ)^{ − 1} as the largesample variance.
An illustration
The data set studied in this illustration was previously analyzed by Nadarajah (2008) and Leiva et al. (2010). It is related to air pollution in the city of New York, USA. For air pollutant concentrations, it is usually assumed that the data are uncorrelated and independent and thus do not require the diurnal or cyclic trend analysis (Gokhale and Khare (2007)). The data correspond to daily measurements of ozone concentration in the atmosphere (in ppb = ppm × 1000) in the city of New York in May–September 1973, from the New York State Department of Conservation.
The concentration of average air pollutants has been used in epidemiological surveillance as an indicator of the atmospheric contamination and its associated adverse effects in humans, causing diseases such as bronchitis. The distribution of this concentration has a bias to the right, as this random variable is always positive. A model that has these characteristics is the LN, and it has been frequently used for modeling the concentration of air contaminants and chemical concentration in soil samples (Ahrens' law), mainly owing to its theoretical arguments. However, the level of air pollution varies depending on factors such as the source of contamination, local weather and topography. Therefore, the actual distribution of the concentration of atmospheric pollutants does not always agree with an LN model, especially at high contamination levels.
Descriptive statistics for the data set are presented in Table 3. Quantities and b_{2} indicate sample asymmetry and kurtosis coefficients.
Table 3. Descriptive statistics for variables Y and log(Y )Variable  n  Mean  Variance   b_{2} 

Y  116  4.1293  1088.2010  1.2098  1.1122 
log(Y )  116  3.4185  0.7490  − 0.5478  0.7755 
Table 3 reveals a positively skewed distribution for the variable Y. Moreover, asymmetry and kurtosis coefficients for log(Y ) are somewhat far from what is expected with the normal distribution, which are 0 and 3, respectively, justifying the use of a more flexible model such as the LPN model discussed in the paper.
To model the amount of ozonelevel concentration in the atmosphere, we use LN, LSN and LPN models. We also adjusted the ordinary twoparameter Birnbaum–Saunders (BS) model (denoted BS(γ,β), Birnbaum and Saunders, 1969), which can be used for studying this type of data (Leiva et al. 2010).
To compare model fitting, we use the Akaike information criterion (AIC) (Akaike, 1974), namely We consider also the Bayesian information criterion and the modified AIC, typically called the consistent AIC (CAIC), namely , where k is the number of parameters for the model being considered. The best model is the one with the smallest AIC (or BIC or CAIC). MLEs, estimated standard errors (in parenthesis), for the LN, LSN, BS and LPN models, were computed by maximizing the loglikelihood using the function optim in R. Results are presented in Table 4 together with the AIC, BIC and CAIC. Hence, we have that model LPN presents the best fit to the data set, according to the AIC, BIC or CAIC, where the graphs in Figure 6(a, b) reveal that the LPN model fitting is quite good.
Table 4. Parameter estimates and estimated standard errors for LN, LSN and LPN distributionsParameter  LN   LSN   BS   LPN 


Loglikelihood  − 543.883   − 541.655   − 549.097   − 540.266 
AIC  1091.766   1089.310   1102.194   1086.532 
BIC  1097.273   1097.570   1107.701   1094.792 
CAIC  1100.273   1100.570   1109.701   1097.792 
ξ  3.418 (0.079)   4.372 (0.079)   —   4.986 (0.117) 
η  0.861 (0.056)   0.7048 (0.075)   —   0.146 (0.053) 
λ  —   1.5381 (0.478)   —   — 
α  —   —   —   0.012 (0.009) 
γ  —   —   0.982 (0.064)   — 
β  —   —   28.031 (2.265)   — 
We now consider testing the hypothesis of no difference between the LPN and LN distributions for the data set under study, which corresponds to testing the hypotheses
using the statistics
leading to
which is greater than the 5% chisquare critical value, . Hence, the LPN model seems to be a useful alternative to be used for modeling air pollution data, particularly the ozonelevel concentration, in the atmosphere of the city of New York, USA.
Figure 7(a–c) shows the qqplot for the LPN, LN and LSN calculated with the estimates of the parameters in each model. Figure 6(b) contains the empirical CDF for variable Y (solid line), whereas the dotted line corresponds to the CDF for the LPN model.
A test to compare the LPN model against the LSN model requires a nonnested approach. With F_{θ} and G_{β} as two nonnested models and f(y_{i}  x_{i},θ) and g(y_{i}  x_{i},β) as the corresponding nonnested densities, the likelihood ratio statistic to compare both models is given by
where
is an estimator for the variance of (Vuong, 1989). This statistic corresponds to the distance between the two models measured in terms of the Kullback–Liebler information criterion. Hence, it was shown that, as n → ∞ ,
under
that is, models are equivalent.
At the δ% critical level, with z_{δ ∕ 2} as the critical value, we reject that the models are equivalent if  T_{LR,NN}  > z_{δ ∕ 2}.
On the other hand, we reject at the significance level δ the null hypothesis that the models are equivalent in favor of model F_{θ} (or model G_{β}) if TLR;NN > z_{δ ∕ 2} (or TLR;NN < − z_{δ ∕ 2}).
For the data set under study, with F_{θ} being the LPN model and G_{β} the LSN model, Vuong's approach leads to the observed value T_{LR,NN} = 21.819, which is greater than the critical value z_{0.025} = 1.96, and hence, the LPN distribution is better than the LSN distribution at the 5% level. The preceding results illustrate the fact that the LPN model is a viable alternative for fitting positive data with asymmetry and kurtosis not contemplated by the LN model. Moreover, results call for an update of Ahrens' law. Parameter estimates for the parameters of the preceding models were computed using library optim in R (R Development Core Team, 2012).