using Pumas
using ModelingToolkit
using StructuralIdentifiability
using AlgebraOfGraphics
using CairoMakie
using DataFramesMeta
using SummaryTables
import Logging
Structural Identifiability Analysis
Which Parameters Can We Estimate from Clinical Measurements?
This is Tutorial 3 of 6 in the HBV series.
| # | Tutorial | Status |
|---|---|---|
| 1 | Model Introduction | ✓ Complete |
| 2 | Global Sensitivity Analysis | ✓ Complete |
| 3 | Structural Identifiability (current) | ← You are here |
| 4 | Copula Vpop Generation | |
| 5 | MILP Calibration | |
| 6 | VCT Simulation |
1 Learning Outcomes
For the theory behind structural identifiability (definitions, classifications, why it matters before estimation), see TB-03: Structural Identifiability.
In this tutorial, you will learn:
- How to handle non-rational ODE terms that
StructuralIdentifiability.jlcannot process directly - How to reformulate a Hill function with a non-integer exponent into a rational system via an auxiliary state variable
- How different measurement scenarios affect identifiability in a large QSP model (11 ODEs, 9 estimated parameters)
2 Libraries
3 Part 1: Rational Reformulation of the HBV Model
3.1 The Non-Rational Function Problem
The HBV model from HBV-01 contains a Hill function in the X' equation with a non-integer exponent nS = 0.486:
X' = r_X * (1 - X) * (I / (ϕ_E + I)) *
(1 - S_max * (HBsAg_ug_mL^nS / (ϕ_S^nS + HBsAg_ug_mL^nS))) - d_X * Xwhere HBsAg_ug_mL = c * (V + S) is a linear function of state variables V and S and c = 96 · 24000 · 1e6 / 6.022e23 is a known unit-conversion constant.
StructuralIdentifiability.jl requires the ODE right-hand side to be a quotient of polynomials (i.e., rational). The term (V + S)^0.486 violates this requirement.
3.2 Auxiliary Variable Trick
To resolve this problem, we introduce an auxiliary state variable:
\[H(t) = \bigl(c \cdot (V(t) + S(t))\bigr)^{n_S}\]
By the chain rule:
\[H'(t) = n_S \bigl(c \cdot (V(t) + S(t))\bigr)^{n_S - 1} \cdot c \cdot (V'(t) + S'(t)) = n_S \cdot H(t) \cdot \frac{V'(t) + S'(t)}{V(t) + S(t)}\]
Substituting the ODEs for \(V(t)\) (\(V'(t) = p_V \epsilon_\text{NA} \epsilon_\text{IFN} I(t) - d_V V(t)\)) and \(S(t)\) (\(S'(t) = p_S I(t) - d_V S(t)\)):
\[H'(t) = n_S \cdot H \cdot \frac{(p_V \epsilon_\text{NA} \epsilon_\text{IFN} + p_S) I(t) - d_V (V(t) + S(t))}{V(t) + S(t)}\]
This right-hand side is rational in \(H(t)\), \(I(t)\), \(V(t)\), \(S(t)\) and the parameters.
The Hill function in the differential equation for \(X'(t)\) becomes \(H(t) / (\psi + H(t))\) where \(\psi = \phi_S^{n_S}\) is a known constant — also rational.
3.3 Model Definition
We reuse the HBV model from HBV-01 with two modifications:
- The Hill function in
X'is replaced byH / (ψ + H) - An ODE for the auxiliary variable
His added
As in HBV-02, we use I and P directly without the Ieff/Peff threshold switch as such conditionals are not supported by StructuralIdentifiability.jl.
hbv_si_model = @model begin
@param begin
# Estimated parameters (log10 scale)
"Production/synthesis rate of HBsAg subviral particles from infected hepatocytes [molec./cell/d]"
tv_log10_p_S ∈ RealDomain(; lower = 2, upper = 12)
"HBV infectivity [mL/virus/d]"
tv_log10_β ∈ RealDomain(; lower = -10, upper = 0)
"Killing rate of infected hepatocytes by cellular immunity [mL/cells/d]"
tv_log10_m ∈ RealDomain(; lower = -3, upper = 5)
"Synthesis rate of antigen presenting cells [cells/d]"
tv_log10_k_P ∈ RealDomain()
"Synthesis rate of ALT from debris hepatocytes [10^3 U/cells/d]"
tv_log10_k_A ∈ RealDomain()
"PD efficacy of NA on HBV production []"
tv_log10_ϵ_NA ∈ RealDomain(; lower = -5, upper = 0)
"PD efficacy of IFNα on HBV production []"
tv_log10_ϵ_IFN ∈ RealDomain(; lower = -5, upper = 0)
"PD efficacy of IFNα on activation of cellular immunity []"
tv_log10_r_E_IFN ∈ RealDomain(; lower = 0, upper = 4)
"Scaling factor for observed effector CD4/8 T cell numbers []"
tv_log10_conv_E ∈ RealDomain()
# IIV of estimated parameters in log10 scale
Ω ∈ PDiagDomain(9)
# Fixed parameters
"Baseline of HBV [copies/mL]"
ini_V ∈ RealDomain(; lower = 0.0)
"Baseline of ALT [U/L]"
ini_A ∈ RealDomain(; lower = 0.0)
"Max. proliferation rate of uninfected hepatocytes [1/d]"
r_T ∈ RealDomain(; lower = 0.0)
"Max. proliferation rate of cellular immunity [1/d]"
r_E ∈ RealDomain(; lower = 0.0)
"Max. proliferation rate of innate immunity [1/d]"
r_X ∈ RealDomain(; lower = 0.0)
"Fraction of cellular immunity that killing uninfected and refractory cells []"
n ∈ RealDomain(; lower = 0.0)
"Max. rate of conversion of infected cells to refractory cells due to the action of innate immunity [1/d]"
ρ ∈ RealDomain(; lower = 0.0)
"Max. rate of conversion of refractory cells back to target cells in the absence of innate immune responses []"
ρ_o_ρ_r ∈ RealDomain(; lower = 0.0)
"Production/synthesis rate of HBV from infected hepatocytes [virus/cell/d]"
p_V ∈ RealDomain(; lower = 0.0)
"Death rate of infected hepatocytes [1/d]"
d_TI ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of HBV HBsAg subviral particles [1/d]"
d_V ∈ RealDomain(; lower = 0.0)
"Rate of systemic induction/recovery from exhaustion [1/d]"
d_Q ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of innate immunity [1/d]"
d_X ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of cellular immunity [1/d]"
d_E ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of antigen presenting cells [1/d]"
d_P ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of debris [1/d]"
d_D ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of ALT [1/d]"
d_A ∈ RealDomain(; lower = 0.0)
"Max. rate of exhaustion [1/d]"
d_E_Q ∈ RealDomain(; lower = 0.0)
"Hepatocyte carrying capacity in the liver [cells/mL]"
T_max ∈ RealDomain(; lower = 0.0)
"Half-maximal (saturation) parameter for the effect of systemic exhaustion []"
ϕ_Q ∈ RealDomain(; lower = 0.0)
"Half-maximal (saturation) parameter for the activation of cellular immunity [cells/mL]"
ϕ_E ∈ RealDomain(; lower = 0.0)
"Max. inhibitory effect of HBsAg subviral particles on innate immunity []"
S_max ∈ RealDomain(; lower = 0.0)
"Half-maximal (saturation) parameter for inhibitory effect of HBsAg subviral particles on the generation of innate immunity [μg/mL]"
ϕ_S ∈ RealDomain(; lower = 0.0)
"Hill coefficient for describing HBsAg-induced inhibition of innate immunity []"
nS ∈ RealDomain()
# Residual errors
σ_add_log10_HBsAg ∈ RealDomain(lower = 0.0)
σ_prop_log10_HBsAg ∈ RealDomain(lower = 0.0)
σ_add_log10_HBV ∈ RealDomain(lower = 0.0)
σ_prop_log10_HBV ∈ RealDomain(lower = 0.0)
σ_prop_log10_ALT ∈ RealDomain(lower = 0.0)
σ_prop_log10_CD8 ∈ RealDomain(lower = 0.0)
end
@random begin
η ~ MvNormal(Ω)
end
@covariates NA IFN
@pre begin
# Logit-normal distribution of log10(p_S), scaled to range [2, 12]
p_S = exp10(10 * logistic(logit((tv_log10_p_S - 2) / 10) + η[1]) + 2)
# Logit-normal distribution of log10(β), scaled to range [-10, 0]
β = exp10(10 * (logistic(logit(tv_log10_β / 10 + 1) + η[2]) - 1))
# Logit-normal distribution of log10(m), scaled to range [-3, 5]
m = exp10(8 * logistic(logit((tv_log10_m + 3) / 8) + η[3]) - 3)
# Normal distribution of log10(k_P)
k_P = exp10(tv_log10_k_P + η[4])
# Normal distribution of log10(k_A)
k_A = exp10(tv_log10_k_A + η[5])
# Logit-normal distribution of log10(ϵ_NA), scaled to range [-5, 0]
ϵ_NA = exp10(NA * 5 * (logistic(logit(tv_log10_ϵ_NA / 5 + 1) + η[6]) - 1))
# Logit-normal distribution of log10(ϵ_IFN), scaled to range [-5, 0]
ϵ_IFN = exp10(IFN * 5 * (logistic(logit(tv_log10_ϵ_IFN / 5 + 1) + η[7]) - 1))
# Logit-normal distribution of log10(r_E_IFN), scaled to range [0, 4]
r_E_IFN = exp10(IFN * 4 * logistic(logit(tv_log10_r_E_IFN / 4) + η[8]))
# Normal distribution
conv_E = exp10(tv_log10_conv_E + η[9])
# Precompute ψ = ϕ_S^nS for the rational Hill function
ψ = ϕ_S^nS
end
@init begin
V = ini_V
I = d_V * ini_V / p_V
T = T_max - d_V * ini_V / p_V
S = p_S * ini_V / p_V
A = ini_A
H = (96 * 24000.0e6 / 6.022e23 * (ini_V + p_S * ini_V / p_V))^nS
end
@dynamics begin
T' = -β * V * T + r_T * T * (1 - (T + I + R) / T_max) -
d_TI * T + ρ_o_ρ_r * R - (m / n) * r_E_IFN * E * T
I' = β * V * T + r_T * I * (1 - (T + I + R) / T_max) -
ρ * X * I - d_TI * I - m * r_E_IFN * E * I
R' = ρ * X * I - ρ_o_ρ_r * R + r_T * R * (1 - (T + I + R) / T_max) -
d_TI * R - (m / n) * r_E_IFN * E * R
D' = m * r_E_IFN * E * I + (m / n) * r_E_IFN * E * (T + R) - d_D * D
A' = d_A * (ini_A - A) + k_A * D
S' = p_S * I - d_V * S
V' = p_V * I * ϵ_NA * ϵ_IFN - d_V * V
P' = k_P * (I / (ϕ_E + I)) - d_P * P
E' = (d_E + r_E * E) * P - d_E_Q * E * (Q^4 / (ϕ_Q^4 + Q^4)) - d_E * E
Q' = d_Q * (P - Q)
# MODIFIED: rational Hill function via auxiliary variable H
X' = r_X * (1 - X) * (I / (ϕ_E + I)) *
(1 - S_max * (H / (ψ + H))) - d_X * X
# NEW: auxiliary variable H = (c * (V + S))^nS
H' = nS * H * ((p_V * ϵ_NA * ϵ_IFN + p_S) * I - d_V * (V + S)) / (V + S)
end
@vars begin
f_molec2ug := 24000 * 1.0e6 / 6.022e23
HBsAg_ug_mL := 96 * (V + S) * f_molec2ug
end
# Error models: Values are log10-transformed
@derived begin
"HBsAg [IU/mL]"
log10_HBsAg ~ @. CombinedNormal(log10(max(0, HBsAg_ug_mL) / 0.98e-3), σ_add_log10_HBsAg, σ_prop_log10_HBsAg)
"Viral load [copies/mL]"
log10_HBV ~ @. CombinedNormal(log10(max(0, V)), σ_add_log10_HBV, σ_prop_log10_HBV)
"ALT [U/L]"
log10_ALT ~ @. ProportionalNormal(log10(max(0, A)), σ_prop_log10_ALT)
"CD8+ T cell counts [cells/mL]"
log10_CD8 ~ @. ProportionalNormal(log10(conv_E * max(0, E)), σ_prop_log10_CD8)
end
endPumasModel
Parameters: tv_log10_p_S, tv_log10_β, tv_log10_m, tv_log10_k_P, tv_log10_k_A, tv_log10_ϵ_NA, tv_log10_ϵ_IFN, tv_log10_r_E_IFN, tv_log10_conv_E, Ω, ini_V, ini_A, r_T, r_E, r_X, n, ρ, ρ_o_ρ_r, p_V, d_TI, d_V, d_Q, d_X, d_E, d_P, d_D, d_A, d_E_Q, T_max, ϕ_Q, ϕ_E, S_max, ϕ_S, nS, σ_add_log10_HBsAg, σ_prop_log10_HBsAg, σ_add_log10_HBV, σ_prop_log10_HBV, σ_prop_log10_ALT, σ_prop_log10_CD8
Random effects: η
Covariates: NA, IFN
Dynamical system variables: T, I, R, D, A, S, V, P, E, Q, X, H
Dynamical system type: Nonlinear ODE
Derived: log10_HBsAg, log10_HBV, log10_ALT, log10_CD8
Observed: log10_HBsAg, log10_HBV, log10_ALT, log10_CD8
3.4 Extract ODE System
ode_system = hbv_si_model.sysModel ##Pumas#232: Equations (12): 12 standard: see equations(##Pumas#232) Unknowns (12): see unknowns(##Pumas#232) T(t) I(t) R(t) D(t) ⋮ Parameters (33): see parameters(##Pumas#232) ini_V ini_A r_T r_E ⋮
4 Part 2: Running Identifiability Analysis
4.1 Define Measurement Scenarios
We define three clinically motivated scenarios. Monotonic transformations (log₁₀) and known scaling constants do not affect structural identifiability, so we use the raw state variables as measurements.
t = ModelingToolkit.independent_variable(ode_system)
@variables y_HBsAg(t) y_HBV(t) y_ALT(t) y_CD8(t)Scenario A (Routine clinical): HBsAg + HBV DNA
scenario_A = [
y_HBsAg ~ ode_system.V + ode_system.S
y_HBV ~ ode_system.V
]Scenario B (Standard + ALT): HBsAg + HBV DNA + ALT
scenario_B = [
y_HBsAg ~ ode_system.V + ode_system.S
y_HBV ~ ode_system.V
y_ALT ~ ode_system.A
]Scenario C (All observables): HBsAg + HBV DNA + ALT + CD8
scenario_C = [
y_HBsAg ~ ode_system.V + ode_system.S
y_HBV ~ ode_system.V
y_ALT ~ ode_system.A
y_CD8 ~ ode_system.conv_E * ode_system.E
]4.2 Estimated Parameters
We are interested in studying the identifiability of the estimated parameters:
estimated_params = [
ode_system.p_S, ode_system.β, ode_system.m, ode_system.k_A, ode_system.k_P, ode_system.ϵ_NA, ode_system.ϵ_IFN, ode_system.r_E_IFN, ode_system.conv_E,
]9-element Vector{Num}:
p_S
β
m
k_A
k_P
ϵ_NA
ϵ_IFN
r_E_IFN
conv_E
4.3 Known Parameters as Measurements
All fixed parameters are assumed to be known (measured or taken from literature). To communicate this to StructuralIdentifiability.jl, we include them as measured quantities.
@variables begin
y_ini_V
y_ini_A
y_r_T
y_r_E
y_r_X
y_n
y_ρ
y_ρ_o_ρ_r
y_p_V
y_d_TI
y_d_V
y_d_Q
y_d_X
y_d_E
y_d_P
y_d_D
y_d_A
y_d_E_Q
y_T_max
y_ϕ_Q
y_ϕ_E
y_S_max
y_nS
y_ψ
end
measured_fixed = [
y_ini_V ~ ode_system.ini_V,
y_ini_A ~ ode_system.ini_A,
y_r_T ~ ode_system.r_T,
y_r_E ~ ode_system.r_E,
y_r_X ~ ode_system.r_X,
y_n ~ ode_system.n,
y_ρ ~ ode_system.ρ,
y_ρ_o_ρ_r ~ ode_system.ρ_o_ρ_r,
y_p_V ~ ode_system.p_V,
y_d_TI ~ ode_system.d_TI,
y_d_V ~ ode_system.d_V,
y_d_Q ~ ode_system.d_Q,
y_d_X ~ ode_system.d_X,
y_d_E ~ ode_system.d_E,
y_d_P ~ ode_system.d_P,
y_d_D ~ ode_system.d_D,
y_d_A ~ ode_system.d_A,
y_d_E_Q ~ ode_system.d_E_Q,
y_T_max ~ ode_system.T_max,
y_ϕ_Q ~ ode_system.ϕ_Q,
y_ϕ_E ~ ode_system.ϕ_E,
y_S_max ~ ode_system.S_max,
y_nS ~ ode_system.nS,
y_ψ ~ ode_system.ψ,
]4.4 Run Local Identifiability Analysis
For a 12-state, 9-parameter system, we start with assess_local_identifiability (rank-based, fast) before attempting the more expensive global analysis.
result_local_A = assess_local_identifiability(
ode_system;
measured_quantities = vcat(scenario_A, measured_fixed),
funcs_to_check = estimated_params,
loglevel = Logging.Warn,
)OrderedCollections.OrderedDict{Num, Bool} with 9 entries:
p_S => 1
β => 1
m => 0
k_A => 0
k_P => 1
ϵ_NA => 0
ϵ_IFN => 0
r_E_IFN => 0
conv_E => 0
result_local_B = assess_local_identifiability(
ode_system;
measured_quantities = vcat(scenario_B, measured_fixed),
funcs_to_check = estimated_params,
loglevel = Logging.Warn,
)OrderedCollections.OrderedDict{Num, Bool} with 9 entries:
p_S => 1
β => 1
m => 0
k_A => 1
k_P => 1
ϵ_NA => 0
ϵ_IFN => 0
r_E_IFN => 0
conv_E => 0
result_local_C = assess_local_identifiability(
ode_system;
measured_quantities = vcat(scenario_C, measured_fixed),
funcs_to_check = estimated_params,
loglevel = Logging.Warn,
)OrderedCollections.OrderedDict{Num, Bool} with 9 entries:
p_S => 1
β => 1
m => 0
k_A => 1
k_P => 1
ϵ_NA => 0
ϵ_IFN => 0
r_E_IFN => 0
conv_E => 1
5 Part 3: Interpreting Results
5.1 Compile Results
function result_to_df(result, scenario_name)
return DataFrame(
parameter = [string(p) for p in keys(result)],
local_identifiable = collect(values(result)),
scenario = scenario_name,
)
end
all_results = vcat(
result_to_df(result_local_A, "A: HBsAg + DNA"),
result_to_df(result_local_B, "B: + ALT"),
result_to_df(result_local_C, "C: + CD8"),
)
simple_table(
all_results,
[
:parameter => "Parameter",
:local_identifiable => "Locally identifiable",
:scenario => "Measurement scenario",
],
)| Parameter | Locally identifiable | Measurement scenario |
| p_S | true | A: HBsAg + DNA |
| β | true | A: HBsAg + DNA |
| m | false | A: HBsAg + DNA |
| k_A | false | A: HBsAg + DNA |
| k_P | true | A: HBsAg + DNA |
| ϵ_NA | false | A: HBsAg + DNA |
| ϵ_IFN | false | A: HBsAg + DNA |
| r_E_IFN | false | A: HBsAg + DNA |
| conv_E | false | A: HBsAg + DNA |
| p_S | true | B: + ALT |
| β | true | B: + ALT |
| m | false | B: + ALT |
| k_A | true | B: + ALT |
| k_P | true | B: + ALT |
| ϵ_NA | false | B: + ALT |
| ϵ_IFN | false | B: + ALT |
| r_E_IFN | false | B: + ALT |
| conv_E | false | B: + ALT |
| p_S | true | C: + CD8 |
| β | true | C: + CD8 |
| m | false | C: + CD8 |
| k_A | true | C: + CD8 |
| k_P | true | C: + CD8 |
| ϵ_NA | false | C: + CD8 |
| ϵ_IFN | false | C: + CD8 |
| r_E_IFN | false | C: + CD8 |
| conv_E | true | C: + CD8 |
6 Part 4: Visualization
6.1 Identifiability Status by Scenario
specs = data(all_results) *
mapping(
:parameter => "Parameter",
:scenario => sorter("C: + CD8", "B: + ALT", "A: HBsAg + DNA") => "Measurement Scenario",
color = :local_identifiable => sorter(true, false) => "Locally identifiable",
) * visual(Scatter; markersize = 30, marker = :rect)
draw(
specs,
scales(Color = (; palette = [:orange, :red]));
axis = (; xticklabelrotation = π / 4),
)6.2 Summary: Impact of Measurements
result_counts = @by all_results [:scenario, :local_identifiable] begin
:count = length(:parameter)
end
status_mapping = :local_identifiable => sorter(true, false) => "Locally identifiable"
specs = data(result_counts) * mapping(
:scenario => sorter("A: HBsAg + DNA", "B: + ALT", "C: + CD8") => "Measurement Scenario",
:count => "Number of Parameters",
stack = status_mapping,
color = status_mapping,
) * visual(BarPlot)
draw(specs, scales(Color = (; palette = [:green, :orange, :red])))7 Summary
7.1 Key Takeaways
Non-rational functions (e.g., Hill functions with non-integer exponents) can be handled by introducing auxiliary state variables whose derivatives are rational
Measurement scenario matters: Adding clinical biomarkers (ALT, CD8 counts) can make previously non-identifiable parameters identifiable
Local analysis first: For complex QSP models, start with
assess_local_identifiability(fast screening) before running the more expensive global analysis
7.2 Quick Reference: Key Functions
| Function | Purpose |
|---|---|
assess_local_identifiability() |
Fast local analysis (identifiable yes/no) |
7.3 What’s Next?
In Tutorial 4: Copula Vpop Generation, we’ll generate a virtual population for the HBV model using Gaussian copula sampling.