In the maximum likelihood framework, the score test of Rao (1948) is preferable to the Wald test and the likelihood ratio test in terms of computation time, because it requires the estimation of the model restricted by the null hypothesis, while the Wald test requires the more time-consuming estimation of the unrestricted model and the likelihood ratio test the even more time-consuming estimation of both the restricted model and the unrestricted model. In addition, if it is desired to test models with correlated statistics or test homogeneity assumptions, root-finding algorithms to estimate the unrestricted model may need a large number of iterations and struggle to find the root(s) of estimating function (9). Therefore, in situations where computation time and computing in general are an important concern, the score test of Rao (1948) is preferable to the Wald test and the likelihood ratio test. Under regularity conditions, the asymptotic behaviour of the three tests is equivalent (see Rao, 2002; Lehmann & Romano, 2005) and, in practice, it is therefore both legitimate and prudent to let practical considerations along these lines guide the choice of test.
However, given a medium to large data set, maximum likelihood estimation is hardly necessary on statistical grounds and hardly attractive on computational grounds (see Section 2.2), and therefore tests in the method of moments framework are preferable to tests in the maximum likelihood framework. Therefore, it is desirable to obtain a score-type test in the method of moments framework. A score-type test in the method of moments framework, which is the natural counterpart of the score test of Rao (1948) when nuisance parameters are estimated by maximum likelihood estimators and of the C(α) test of Neyman (1959) when nuisance parameters are estimated by consistent estimators, can be obtained by replacing the score function by regular estimating functions (Godambe, 1960, 1991) along the lines of Basawa (1985, 1991).
The section proceeds as follows. In Section 3.2.1, first the case is considered where the estimating function is given by the score function (9), and the score test of Rao (1948) and the C(α) test of Neyman (1959) in the family of models of interest are introduced; then the case is considered where the estimating function is given by (8), and a new score-type test in the family of models of interest is introduced. Remarks and extensions are given in Section 3.2.2.
3.2.1. Basic score-type test
Let θ1 be a vector of nuisance parameters and θ2 be a vector of parameters of primary interest, and θ′= (θ′1, θ′2).
In the classical Neyman–Pearson tradition, goodness of fit can be studied by specifying hypotheses regarding the postulated family of probability distributions , for instance the null hypothesis
where θ2,0 is a specified value (such as θ2,0=0), and θ1 is unspecified. Let θ′0= (θ′1, θ′2,0) be the parameter vector under H0: θ2=θ2,0. Let gn= g n(zn, θ0) be an estimating function satisfying regularity conditions (see Godambe, 1960, 1991; Basawa, 1985, 1991). Partition g′n= (g′1n, g′2n) in accordance with θ′0= (θ′1, θ′2,0).
220.127.116.11. Estimating function: score function
Score test without nuisance parameters
In the absence of nuisance parameters, θ0 reduces to θ2,0. If θ2,0 were translated by , the local change in the log-likelihood function due to the local change in θ2,0 would be given approximately by
Under H0: θ2=θ2,0, (13) has mean 0 and variance . A test of H0: θ2=θ2,0 could be based on the test statistic
Test statistic (14) is based on the linear function of the score function g2n and raises the question: which linear function of the score function g2n is optimal in the sense that, under H1: θ2≠θ2,0, test statistic (14) is as large as possible? By the Cauchy–Schwarz inequality,
where the maximum on the right-hand side of (15) is attained at . The right-hand side of (15) is the score test statistic of Rao (1948).
Score test with nuisance parameters
In the presence of nuisance parameters, the score test statistic of Rao (1948) is given by
where is the restricted maximum likelihood estimator of θ0 under H0: θ2=θ2,0, obtained by maximizing the log-likelihood function with respect to θ0 subject to the constraint H0: θ2=θ2,0.
C(α) test with nuisance parameters
The C(α) test is designed to test hypotheses in the presence of nuisance parameters, where nuisance parameters are replaced by consistent estimators. If θ1 and θ2,0 were translated by and , respectively, the local change of the log-likelihood function due to the local changes in θ1 and θ2,0 would approximately be given by
If θ0 were estimated by the restricted maximum likelihood estimator under H0: θ2=θ2,0, then would vanish. If, however, θ0 were replaced by a consistent estimator under H0: θ2=θ2,0, then would not, in general, vanish. Neyman (1959) showed that, under regularity conditions, the impact of replacing θ0 by a consistent estimator under H0: θ2=θ2,0 on the test can be eliminated by basing tests on
Under H0: θ2=θ2,0, (18) has mean 0 and variance , where Cn is the variance–covariance matrix of en. A test of H0: θ2=θ2,0 could be based on the test statistic
An argument along the lines of the score test shows that the optimal choice of is given by , giving rise to the C(α) test statistic of Neyman (1959):
18.104.22.168. Estimating function: non-score function
where denotes convergence in distribution, NL refers to the L-variate Gaussian distribution, and is non-singular.
The entities and can be replaced by and , respectively, without changing the asymptotic distribution of (29). The parameter vector θ0 can be replaced by a restricted moment estimator under H0: θ2=θ2,0 (see Section 2.2). The test statistic, obtained by replacing , , and θ0 in (29) by , , and , respectively, is given by
3.2.2. Remarks and extensions
Observe that, to test restrictions on the parameter vector θ2, θ2 need not be estimated, saving computation time and avoiding computational issues which may arise in the estimation of unrestricted models with correlated statistics or without homogeneity assumptions.
If θ2 (and therefore bn and ) is a scalar, then (30) can be used both in its quadratic form, as presented above, and in its corresponding linear form,
The linear form is convenient when one-sided one-parameter tests are desired. The minus sign in (31) facilitates the interpretation in the sense that, if u2 denotes the statistic corresponding to the parameter θ2 and its conditional expectations given X (t1) =x (t1) are increasing functions of θ2, then, by the definition of gn in (8), θ2−θ2,0 > 0 is associated with positive values of (31). By (27), the asymptotic distribution of (31) under H0: θ2=θ2,0 is standard Gaussian.
Furthermore, tests with R > 1 degrees of freedom can be complemented with one-degree-of-freedom tests, testing the restrictions one by one; two-sided one-parameter tests can be based on the test statistic Cn, while one-sided one-parameter tests can be based on the test statistic Dn. It is convenient to compute the one-parameter test statistics by using the simulations under the null hypothesis of the multi-parameter test, requiring no additional, time-consuming simulations. If the null hypothesis of the multi-parameter test is true, such one-parameter test statistics are computed correctly. Otherwise, they are computed incorrectly, but one can take them as an informal indication of where the model deviates from the null hypothesis of the multi-parameter test, without requiring additional, time-consuming simulations.
Observe that test statistics Cn and Dn have an appealing interpretation in terms of goodness of fit in the classic sense, because both are based on
where u2 is the vector of statistics corresponding to the parameter vector θ2. In other words, the test statistics are based on the ‘distance’ between the expected value of the function u2 of the data – evaluated under H0: θ2=θ2,0– and the observed value of u2.
Since the one- and multi-parameter tests do not require the estimation of the unrestricted model, the tests are most useful in forward model selection and as tests of homogeneity assumptions with respect to time and nodes. To complement the usefulness of the tests, we derive one-step estimators which are useful as starting values of parameters in forward model selection. Suppose that tests indicate empirical evidence against the model restricted by H0: θ2=θ2,0 and it is desired to estimate the unrestricted model. If gn is differentiable at , then, by definition (Magnus & Neudecker, 1988, p. 82),
Thus, in the limit, solving gn= g n(zn, θ) =0 is the same as solving
suggesting the one-step estimator
where is non-singular. The one-step estimator θ★ is an approximation of the unrestricted estimator . The one-step estimator θ★ is useful as a starting value of the parameter vector θ in the root-finding algorithm which is used to find the root(s) of the estimating function gn= g n(zn, θ). If either gn is approximately linear as a function of θ or is sufficiently close to , the linear approximation of gn around can be expected to result in good one-step estimators θ★ and therefore good starting values of θ. Otherwise, the one-step estimator θ★ is at least an improvement on .
Finally, concerning the asymptotic Gaussian distribution of the estimating function gn (see (25)), note that the choice of statistics u of the estimating function gn is arbitrary as long as gn is increasing in θ and sensitive to changes in θ. The use of test statistics Cn and Dn is admissible for all choices of gn for which gn is asymptotically Gaussian distributed or at least approximately Gaussian distributed. In most applications, verifying the asymptotic distribution of gn is hard. Indeed, hardly anything is known about asymptotics of estimators and tests in the field of social networks, leaving aside simplistic models without dependence; for example, as noted in Section 3.1, the distribution of the t-type test is unknown.