And how they interact with machine learning
2025-10-12
Hierarchical
We typically define hierarchies where \(θ\) are shared parameters but \(η\) is subject-specific.
Simple
Conditional probability / Joint likelihood / MLE
Probability of the response \(y\) according to the model given specific values of \(θ\), \(η\), and \(x\).
\[ p_c(y | θ, η, x) \]
Fit model by simply finding the values of \(θ\) and \(η\) that jointly maximize the probability?
Equivalent to minimizing a distance metric (e.g. MSE) between observed and predicted data.
Not what we do
Marginal probability
Integrates out the effect of the random effects.
\[ p_m(y | θ, x) = \int p_c(y | θ, η, x) \cdot p_{prior}(η | θ) dη \]
Average conditional probability weighted by a prior
\[ p_m(y | θ, x) = \int p_c(y | θ, η, x) \cdot p_{prior}(η | θ) dη \]
This is a multi-variate integral
One dimension (degree of freedom) along which to account for between subject variability in the data for each random effect.
Marginalization incentivizes that each random effect controls a single smooth dimension between-subject variability.
Classical NLME \[ EFF = \left(1 + Smax \cdot \frac{C}{tvSC50 \cdot exp\left(\mathbf{\eta}\right) + C}\right) \]
DeepNLME
\[ EFF = \left(1 + NN(C, η)\right) \]
Data-generating function: \[ Y = \frac{E_{max} \cdot x}{EC_{50} + x} + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) \]
where \[ \begin{align} E_{max} &\sim \mathcal{U}(0.5, 1.5) \\ EC_{50} &\sim \mathrm{LogNormal}(-2, 1.0) \end{align} \]
DeepNLME model: \[ \begin{align} Y &= {\color{orange}NN(x, η₁, η₂)} + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2)\\ η &\sim \mathcal{N}(0, I) \end{align} \]
The marginal likelihood is often intractable to compute exactly. \[ p_m(y | θ, x) = \int p_c(y | θ, η, x) \cdot p_{prior}(η | θ) dη \]
NaivePooled() JointMAP() LaplaceI()FOCE() - Often our first choiceFO()SAEM()BayesMCMC()MarginalMCMC()Many can be wrapped in MAP() to do maximum a-posteriori estimation of fixed effects.