# Inside a GLM

When fitting any statistical model to data, what we’re really trying to do is find the ‘correct’ set of parameters. What correct means is fairly subjective, but for generalised linear models we commonly use the maximum likelihood estimate – the set of parameters which gives us the highest likelihood (the probability we’d observe the data from the model and its parameters).

Whilst there is a straightforward formula for calculating the maximum likelihood parameter estimates for standard linear models, it’s a bit more complicated for the generalised linear models which we use to analyse things like presence/absence or count data. In these cases, we can calculate the estimates by numerical optimisation – setting loose an algorithm which will quickly find the set of parameters with the highest likelihood. Below is a brief illustration of an optimisation routine at work on some normally distributed data, with R code at the end of the post.

First we’ll generate some fake data from the model y = a + bx, with a = -1 and b = 3, and add some normally-distributed random noise to it:

Then we set the optimisation routine to work (using R’s optim function) to find the parameters that best fit the data. In the left panel below we look at the whole ‘parameter space’ – all of the different potential combinations of the parameters in which the algorithm is searching for the best estimates. In the background are contour lines showing the actual log-likelihood (it’s easier to deal with the really small numbers involved if we log the likelihood first) and in blue is the path of the optimisation algorithm, starting in the lower right and moving to the highest point. In the right panel are the lines fitted at each step of the algorithm. In each plot, the blue lines get more opaque as they approach their target.

The optimisation routine used here (Nelder-Mead) is slightly different to the one used in R’s glm function, but both give more or less the same result as the standard Ordinary least squares approach.

Here’s the R code used to produce these figures:

# set up the data, fixing the random number generator for repeatable results set.seed(1) n <- 100 x <- rnorm(n) y <- -1 + 3 * x + rnorm(n) # and plot it plot(y ~ x, type = 'n', cex.lab = 2, cex.axis = 1.3) abline(-1, 3, lwd = 2, col = 'dark grey') points(y ~ x, pch = 16) # ~~~~~~~~~~~~~~~ # define the likelihood for the model # with a trace to record the algorithm's path llik <- function(pars) { tr <<- rbind(tr, pars) -sum(dnorm(y, pars[1] + pars[2] * x, log = TRUE)) } tr <- vector('numeric', len = 2) # set up a grid of parameter values to plot the surface with res <- 30 a <- seq(-5, 5, len = res) b <- seq(-5, 5, len = res) pars <- expand.grid(a, b) # calculate the log likelihood at each place on the grid ll <- apply(pars, 1, llik) # re-set the trace tr <- vector('numeric', len = 2) # run the optimisation o <- optim(c(3, -4), llik, hessian = T) # and plot the path of the optimisation routine par(mfrow = c(1, 2), mar = c(5, 5, 4, 2)) contour(a, b, matrix(-ll, res), cex.axis = 1.3, cex.lab = 2, ylab = 'b', xlab = 'a', col = 'dark grey') for(i in 3:nrow(tr)) lines(tr[i:(i - 1), ], col = rgb(0, 0, 1, i / nrow(tr)), lwd = 3) plot(y ~ x, type = 'n', cex.lab = 2, cex.axis = 1.3) for(i in 2:nrow(tr)) abline(tr[i, ], col = rgb(0, 0, 1, i / nrow(tr)), lwd = 2) points(y ~ x, pch = 16, cex = 1.3) # here are our parameter estimates round(o$par, 3) # and here are the parameter estimates from glm # (which uses a different algorithm): round(glm(y ~ x)$coefficients, 3) # and lm, which uses Ordinary least squares round(lm(y ~ x)$coefficients, 3)