# 16 Appendix

## 16.1 Notation

We use the following notation

Notation | Explanation |
---|---|

\(u\) | The random effects vector |

\(\theta\) | Parameter vector (first part) |

\(\beta\) | Parameter vector (second part) |

\(f(u,\beta,\theta)\) | Joint negative log likelihood |

\(x\) | Data |

\(E(u|x)\) | Conditional expectation of random effect given data |

\(\hat u\) | The posterior mode \(\arg \min_{u} f(u,\beta,\theta)\) |

## 16.2 Profiling the inner problem

This section describes the underlying theory of the argument `profile`

to `MakeADFun`

intended to speedup and robustify linear mixed effect
models with a large number of fixed effects. With a few common model
properties (Assumption 1 and 2 below), which must be checked by the
user, one can apply the `profile`

argument to move outer parameters to
the inner problem without affecting the model result.

**Theorem 1 (Profiling inner problem)**
Assume that for any \(\beta\) and \(\theta\)

**Assumption 1**The partial derivative \(\partial_{\beta} f(u,\beta,\theta)\) is a linear function of u.**Assumption 2**The posterior mean is equal to the posterior mode: \(E(u|x)=\hat u\)

Then the MLE

\[\hat \beta := \arg \max_{\beta} \left( \int \exp(-f(u,\beta,\theta)) \: du \right) \]

is a solution to the augmented system

\[ \begin{split} \partial_{u} f(u,\beta,\theta) &= 0 \\ \partial_{\beta} f(u,\beta,\theta) &= 0 \end{split} \]

The augmented system defines \(\hat \beta\) implicitly as function of the posterior mode \(\hat u\).

*Proof*

Differentiation of the negative log marginal likelihood gives

\[ \begin{split} \partial_{\beta} \left( -\log \int \exp(-f(u,\beta,\theta)) \: du \right) &= E(\partial_{\beta}f(u,\beta,\theta) |x) \\ &= \partial_{\beta} f(u,\beta,\theta)_{|u=\hat u(\beta,\theta)} \end{split} \]

where the first equality holds in general and the second equality follows from assumptions (1) and (2).

\(\square\)

### 16.2.1 Example

The standard situation for which **assumption 1** holds is when the
\(\beta\)s are the linear fixed effects of a mixed model. In this case
the joint negative log density takes the form
\[ f(u,\beta,\theta) = \frac{1}{2}(u-A\beta)'\Sigma_{\theta}^{-1}(u-A\beta) + ... \]
for some design matrix \(A\) where ’ \(...\) ’ does not depend on
\(\beta\). The derivative
\[ \partial_{\beta} f(u,\beta,\theta) = A'\Sigma_{\theta}^{-1}(u-A\beta) \]
is thus a linear function of the random effect \(u\).

In general **assumption 2** holds exact for models with a symmetric
(e.g. Gaussian) posterior distribution.

## 16.3 Theory underlying sdreport

This section supplements the documentation of `?sdreport`

by adding
some missing details.

As previously, we consider a general latent variable model with parameter vector \(\theta\), random effect vector \(u\) and observation vector \(x\). The TMB estimation procedure works as follows:

- The MLE \(\hat\theta=\hat\theta(x)\) is calculated and used as estimator of \(\theta\).
- Denote by \(\hat u(\theta,x)\) the random effect mode depending on
\(\theta\) and \(x\). Now,
*plug in*the MLE, and we get our estimator \(\hat u\left(\hat\theta(x),x\right)\) of \(u\).

In general, we assume that \(\hat\theta\) is a consistent estimator of \(\theta\). However, we do not in general require \(\hat u\) to be consistent for \(u\). The purpose of sdreport is, for a given realization of the pair \((u,x)\), to quantify the joint uncertainty of \((\hat u,\hat\theta)\) as estimator of \((u,\theta)\). That is, we are interested in the variance matrix of the difference

\[D:=\begin{pmatrix}\hat u\left(\hat\theta(x),x\right) - u\\ \hat\theta(x) - \theta\end{pmatrix}\]

An important point of the uncertainty quantification is to account for plugging in \(\hat\theta\) rather than using the true \(\theta\).

We calculate the variance using the standard formula:

\[V[D]=E(V(D|x))+V(E(D|x))\]

Consider \(D\) conditionally on \(x\). The second component does not depend on \(u\) and \(\hat u\) is constant given \(x\):

\[V[D|x]=\begin{pmatrix}V[u|x] & 0 \\ 0 & 0 \end{pmatrix}\]

It follows that

\[E(V[D|x])=\begin{pmatrix}E(V[u|x]) & 0 \\ 0 & 0 \end{pmatrix}\]

As central estimator of \(E(V[u|x])\) we use \(V[u|x]\) which is
approximated by the inverse random effect Hessian \(H_{uu}^{-1}\) based
on the assumption that \(u|x\) is well approximated by a Gaussian
distribution (a reasonable assumption given that we are using the
Laplace approximation). This explains the first term of variance formula in `?sdreport`

:

\[E(V[D|x]) \approx \begin{pmatrix} H_{uu}^{-1} & 0 \\ 0 & 0 \end{pmatrix}\]

Likewise,

\[E[D|x]=\begin{pmatrix}\hat u\left(\hat\theta(x),x\right) - E(u|x)\\ \hat\theta(x) - \theta\end{pmatrix}\]

Again, asuming a Gaussian approximation of \(u|x\), it follows that \(E(u|x) \approx \hat u(\theta,x)\):

\[E[D|x]=\begin{pmatrix}\hat u\left(\hat\theta(x),x\right) - \hat u(\theta,x)\\ \hat\theta(x) - \theta\end{pmatrix}\]

We approximate the expectation using linerization of \(\theta \rightarrow \hat u(\theta,x)\) around \(\hat\theta(x)\)

\[E[D|x]=J_x \cdot (\hat\theta(x) - \theta)\]

We now have the second term of the variance formula in `?sdreport`

:

\[V(E[D|x]) \approx J_x V(\hat\theta(x)) J_x'\]

This term becomes negligible if the amount of data is high because of the assumed asymptotic consistency of \(\hat\theta\).

*Ecological Modelling*222 (8): 1394–1400.

*Environmental and Ecological Statistics*24 (2): 317–39.