# 2 Tutorial

A TMB project consists of an R file (*.R) and a C++ file (*.cpp). The
R file does pre- and post processing of data in addition to maximizing
the log-likelihood contained in *.cpp. See Examples for more
details. All R functions are documented within the standard help
system in R. This tutorial describes how to write the C++ file, and
assumes familiarity with C++ and to some extent with R.

The purpose of the C++ program is to evaluate the objective function,
i.e. the negative log-likelihood of the model. The program is compiled
and called from R, where it can be fed to a function minimizer like
`nlminb()`

.

The objective function should be of the following C++ type:

```
#include <TMB.hpp>
template<class Type>
Type objective_function<Type>::operator() ()
{
.... Here goes your C++ code .....
}
```

The first line includes the source code for the whole TMB package (and
all its dependencies). The objective function is a templated class
where `<Type>`

is the data type of both the input values and
the return value of the objective function. This allows us to
evaluate both the objective function and its derivatives using the
same chunk of C++ code (via the AD package
CppAD). The
technical aspects of this are hidden from the user. There is however
one aspect that surprises the new TMB user. When a constant like “1.2”
is used in a calculation that affects the return value it must be
“cast” to Type:

```
Type nll; // Define variable that holds the return value (neg. log. lik)
nll = Type(1.2); // Assign value 1.2; a cast is needed.
```

## 2.1 Obtaining data and parameter values from R

Obviously, we will need to pass both data and parameter values to the objective function. This is done through a set of macros that TMB defines for us.

### 2.1.1 List of data macros

`DATA_ARRAY()`

,`DATA_FACTOR()`

,`DATA_IARRAY()`

,`DATA_IMATRIX()`

,`DATA_INTEGER()`

,`DATA_IVECTOR()`

,`DATA_MATRIX()`

,`DATA_SCALAR()`

,`DATA_SPARSE_MATRIX()`

,`DATA_STRING()`

,`DATA_STRUCT()`

,`DATA_UPDATE()`

,`DATA_VECTOR()`

### 2.1.2 List of parameter macros

`PARAMETER()`

,`PARAMETER_ARRAY()`

,`PARAMETER_MATRIX()`

,`PARAMETER_VECTOR()`

To see which macros are available start typing
`DATA_`

or `PARAMETER_`

in the Doxygen search field of
your browser (you may need to refresh the browser window between each
time you make a new search). A simple example if you want to read a
vector of numbers (doubles) is the following

`DATA_VECTOR(x); // Vector x(0),x(1),...,x(n-1), where n is the length of x`

Note that all vectors and matrices in TMB uses a **zero-based**
indexing scheme. It is not necessary to explicitly pass the dimension
of `x`

, as it can be retrieved inside the C++ program:

## 2.2 An extended C++ language

TMB extends C++ with functionality that is important for formulating likelihood functions. You have different toolboxes available:

- Standard C++ used for infrastructure like loops etc.
- Vector, matrix and array library (see Matrices and arrays)
- Probability distributions (see Densities and R style distributions)

In addition to the variables defined through the `DATA_`

or
`PARAMETER_`

macros there can be “local” variables, for which
ordinary C++ scoping rules apply. There must also be a variable that
holds the return value (neg. log. likelihood).

```
DATA_VECTOR(x); // Vector x(0), x(1), ..., x(n-1)
Type tmp = x(1);
Type nll = tmp * tmp;
```

As in ordinary C++ local variable tmp must be assigned a value before it can enter into a calculation.

## 2.3 Statistical modelling

TMB can handle complex statistical problems with hierarchical
structure (latent random variables) and multiple data sources. Latent
random variables must be continuous (discrete distributions are not
handled). The `PARAMETER_`

macros are used to pass two types
of parameters.

**Parameters**: to be estimated by maximum likelihood. These include fixed effects and variance components in the mixed model literature. They will also correspond to hyper parameters with non-informative priors in the Bayesian literature.**Latent random variables**: to be integrated out of the likelihood using a Laplace approximation.

Which of these are chosen is controlled from R, via the
`random`

argument to the function `MakeADFun`

. However,
on the C++ side it is usually necessary to assign a probability
distribution to the parameter.

The purpose of the C++ program is to calculate the (negative) joint density of data and latent random variables. Each datum and individual latent random effect gives a contribution to log likelihood, which may be though of as a “distribution assignment” by users familiar with software in the BUGS family.

```
PARAMETER_VECTOR(u); // Latent random variable
Type nll = Type(0); // Return value
nll -= dnorm(u(0),0,1,true) // Distributional assignment: u(0) ~ N(0,1)
```

The following rules apply:

- Distribution assignments do not need to take place before the latent variable is used in a calculation.
- More complicated distributional assignments are allowed, say
*u(0)-u(1) ~ N(0,1)*, but this requires the user to have a deeper understanding of the probabilistic aspects of the model. - For latent variables only normal distributions should be used (otherwise the Laplace approximation will perform poorly). For response variables all probability distributions (discrete or continuous) are allowed. If a non-gaussian latent is needed the “transformation trick” can be used.
- The namespaces R style distributions and Densities contain
many probability distributions, including multivariate normal
distributions. For probability distributions not available from
these libraries, the user can use raw C++ code:
`{.cpp} DATA_VECTOR(y); // Data vector Type nll = Type(0); // Return value nll -= sum(-log(Type(1.0)+y*y)); // y are i.i.d. Cauchy distributed`

See Toolbox for more about statistical modelling.