Random effects

And how they interact with machine learning

Niklas Korsbo

2025-10-12

Mixed effects (?)

What are mixed effects models?

Fixed effects, \(θ\)
- Model parameters modelled as deterministic quantities
Random effects, \(η_i\)
- Model parameters modelled as random variables

Hierarchical

We typically define hierarchies where \(θ\) are shared parameters but \(η\) is subject-specific.

No need to assign too much meaning to random effects

Indicates unknown parameters that vary between subjects (or whatever hierarchy we use)
Usually tied very closely to a specific parameter in pharmacometrics. \(CL = tvCL \cdot e^{η_{CL}}\)
Enables degree o freedom along which the model can account for heterogenous outcomes

Simulating with random effects

Simple

We have given \(θ\) and covariates \(x\)
Sample from the prior \(η | θ\)
Compute your individual parameters and propagate your ODE
Sample your observations \(y | θ, η, x\)

Fitting with random effects

Conditional probability / Joint likelihood / MLE

Probability of the response \(y\) according to the model given specific values of \(θ\), \(η\), and \(x\).

\[ p_c(y | θ, η, x) \]

Fit model by simply finding the values of \(θ\) and \(η\) that jointly maximize the probability?

Equivalent to minimizing a distance metric (e.g. MSE) between observed and predicted data.

Not what we do

Fitting with random effects

Marginal probability

Integrates out the effect of the random effects.

\[ p_m(y | θ, x) = \int p_c(y | θ, η, x) \cdot p_{prior}(η | θ) dη \]

Average conditional probability weighted by a prior

Fitting with random effects

One dimension per random effect

\(η\) here can be multi-dimensional

\[ p_m(y | θ, x) = \int p_c(y | θ, η, x) \cdot p_{prior}(η | θ) dη \]

This is a multi-variate integral
One dimension (degree of freedom) along which to account for between subject variability in the data for each random effect.
Marginalization incentivizes that each random effect controls a single smooth dimension between-subject variability.

“Smoothness”?

Classical NLME \[ EFF = \left(1 + Smax \cdot \frac{C}{tvSC50 \cdot exp\left(\mathbf{\eta}\right) + C}\right) \]

This function is somewhat smooth in \(η\) by structural definition.
Very little flexibility to affect the smoothness by tuning fixed effects.

DeepNLME

\[ EFF = \left(1 + NN(C, η)\right) \]

This function can be very non-smooth in \(η\).
Lots of flexibility to affect the smoothness by tuning fixed effects.
Incentivizing smoothness by marginalization in the fit really helps here!

Smoothness in DeepNLME

Data-generating function: \[ Y = \frac{E_{max} \cdot x}{EC_{50} + x} + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) \]

where \[ \begin{align} E_{max} &\sim \mathcal{U}(0.5, 1.5) \\ EC_{50} &\sim \mathrm{LogNormal}(-2, 1.0) \end{align} \]

DeepNLME model: \[ \begin{align} Y &= {\color{orange}NN(x, η₁, η₂)} + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2)\\ η &\sim \mathcal{N}(0, I) \end{align} \]

Maximizing the marginal likelihood

The marginal likelihood is often intractable to compute exactly. \[ p_m(y | θ, x) = \int p_c(y | θ, η, x) \cdot p_{prior}(η | θ) dη \]

No marginalization
- NaivePooled()
- JointMAP()
Maximize an approximation
- LaplaceI()
- FOCE() - Often our first choice
- FO()

Expectation maximization (EM, indirect)
- SAEM()
- Markov Chain EM
- Variational EM (Later this year)
Monte Carlo integration
- BayesMCMC()
- MarginalMCMC()

Many can be wrapped in MAP() to do maximum a-posteriori estimation of fixed effects.