Standard Errors of Fitted Category Probabilities by the Delta Method for the Nested Logit Model

This document uses the delta method (Fox, 2021, sec. 6.3.5) to derive approximations to the variances of estimated probabilities for dichotomous logit models, and from these, for the nested logit model. The standard errors of these estimated probabilities are the square-roots of their respective variances.

Notation

Let ψj represent the probability that the dichotomous response in the jth nested dichotomous logit model is Yj = 1 (i.e., a “success”), j = 1, …, m − 1, where m is the number of response categories for the polytomy. Then 1 − ψj is the probability that Yj = 0 (i.e., a “failure”). I assume that the regression coefficients and their covariance matrix for each dichotomous logit model are estimated in the usual manner.

Let λj = log [ψj/(1 − ψj] represent the (estimated) logit (log-odds) for the jth dichotomous logit model, with variance V(λj) (see below).

Let ϕk, k = 1, …, m represent the probability that the polytomous response is Y = k.

Let ψ̂j and ϕ̂k represent the estimates of these probabilities.

In the sequel, which involves only the estimates of these and other parameters, I’ll omit the hats so as to simplify the notation.

In the nested logit model, the polytomous probabilities ϕk are each products of probabilities ψj or 1 − ψj for j ∈ ℳk ⊆ {1, …, m − 1}; that is k is the subset of the dichotomous logit models that enter into ϕk. Let ψj, kj represent either ψj or 1 − ψj, as appropriate for category k of the polytomous response. Then for k = 1, …, m.

Finally, the individual-category probabilities ϕk can be converted into logits, Λk = log [ϕk/(1 − ϕk)]. The estimates of these logits should approach asymptotic normality more rapidly than the estimates of the corresponding probabilities.

An Example

I’ll use the following example to illustrate the results in this document: Suppose that we have a three-category response variable Y with categories A, B, and C, and define the two nested dichotomies Y1 coded 0 or 1 for categories {A} and {B, C}, respectively, and Y2 coded 0 and 1 for categories {B} and {C}. Then ψ1 = Pr (Y1 = 1) = Pr ({B, C}); 1 − ψ1 = Pr (Y1 = 0) = Pr ({A}). As well, A = {1} and B = ℳC = {1, 2}. Consequently, Here, I abuse the notation slightly in the interest of clarity, using letters rather than numbers for the response categories, so the index of response categories, k, takes on the values A, B, and C, rather than 1, 2, and 3.

Variances of the Estimated Probabilities

For the Dichotomous Logit Models

The estimated probability of success ψj for the jth dichotomous logit model is Then is a function of the regression coefficients, where xT = [1, x1, …, xp] (an arbitrary vector of values of the regressors) and β(j) = [α(j), β1(j), …, βp(j)]T. The probability of failure is The variance of the logit is V(λj) = xTV(β(j))x,

The derivatives of ψj and 1 − ψj with respect to λj are

By the univariate delta method,

For the Nested Logit Model

The variances of the estimated response-category probabilities for the polytomous response can be obtained similarly by the multivariate delta method, recognizing that these probabilities are products of the dichotomous probabilities. The result is greatly simplified because the dichotomies are independent, and so the covariance matrix of the estimated dichotomous probabilities is diagonal.

The required derivatives are for j ∈ ℳk and k = 1, …, m. Here, denotes set difference. Because V(ψj) = V(1 − ψj), it’s always the case that V(ψj, kj) = V(ψj), and so for k = 1, …, m.

Applying these results to the example, recall, first, that A = {1} and so the set for the product j′ ∈ ℳA − {j}ϕ(j′, Aj), that is, j′ ∈ {1} − {1}, is empty. In this case, the product is taken = 1, and V(ϕA) = V(ψ1). That makes intuitive sense, because, as noted previously, ϕA = 1 − ψ1.

Proceeding with B and C, B = ℳC = {1, 2}. Consequently, each product j′ ∈ ℳB − {j}ψj′, Bj and j′ ∈ ℳC − {j}ψj′, Cj has only one term, for j′ = 2 in the case of B or j′ = 1 in the case of C:

Yet another application of the delta method produces approximate variances for the individual-category logits. The relevant derivative is for k = 1, …, m, and so

Acknowledgment

I’m grateful to Georges Monette of York University for a close reading of an earlier version of this document, and in particular for his suggested simplification of the notation employed.

Reference

Fox, J. (2021). A mathematical primer for social statistics (Second edition). Thousand Oaks CA: Sage. Retrieved from https://www.john-fox.ca/MathPrimer/index.html