using Pumas
using Copulas
using AlgebraOfGraphics
using CairoMakie
using PairPlots
using StatsBase
using SummaryTables
using LinearAlgebra
using Random
using Statistics
# Set seed for reproducibility
Random.seed!(42)
Virtual Population Generation using Gaussian Copulas
Generating Realistic HBV Virtual Patients
This is Tutorial 4 of 6 in the HBV series.
| # | Tutorial | Status |
|---|---|---|
| 1 | Model Introduction | ✓ Complete |
| 2 | Global Sensitivity Analysis | ✓ Complete |
| 3 | Structural Identifiability | ✓ Complete |
| 4 | Copula Vpop Generation (current) | ← You are here |
| 5 | MILP Calibration | |
| 6 | VCT Simulation |
1 Learning Outcomes
In this tutorial, you will learn how to generate a virtual population for the HBV model using Gaussian copula sampling. For the theory behind copulas and why correlated sampling matters, see TB-04: Copula Vpop Generation.
In this tutorial, you will learn:
- How to specify the 8 correlated HBV parameters and their NLME-estimated correlations (Cortés‐Ríos et al., 2025)
- How to apply the Gaussian copula algorithm to a complex QSP model with 9 random effects
- How to validate that the generated population preserves target correlations and marginal distributions
- How to visualize the virtual population characteristics
2 Libraries
3 Part 1: HBV Model and Parameter Specifications
3.1 Model
The HBV QSP model (Cortés‐Ríos et al., 2025) is an 11-ODE system with 9 estimated parameters and 23 fixed parameters. For details on the model biology and parameters, see HBV-01: Model Introduction.
hbv_model = @model begin
@param begin
# Estimated parameters (log10 scale)
"Production/synthesis rate of HBsAg subviral particles from infected hepatocytes [molec./cell/d]"
tv_log10_p_S ∈ RealDomain(; lower = 2, upper = 12)
"HBV infectivity [mL/virus/d]"
tv_log10_β ∈ RealDomain(; lower = -10, upper = 0)
"Killing rate of infected hepatocytes by cellular immunity [mL/cells/d]"
tv_log10_m ∈ RealDomain(; lower = -3, upper = 5)
"Synthesis rate of antigen presenting cells [cells/d]"
tv_log10_k_P ∈ RealDomain()
"Synthesis rate of ALT from debris hepatocytes [10^3 U/cells/d]"
tv_log10_k_A ∈ RealDomain()
"PD efficacy of NA on HBV production []"
tv_log10_ϵ_NA ∈ RealDomain(; lower = -5, upper = 0)
"PD efficacy of IFNα on HBV production []"
tv_log10_ϵ_IFN ∈ RealDomain(; lower = -5, upper = 0)
"PD efficacy of IFNα on activation of cellular immunity []"
tv_log10_r_E_IFN ∈ RealDomain(; lower = 0, upper = 4)
"Scaling factor for observed effector CD4/8 T cell numbers []"
tv_log10_conv_E ∈ RealDomain()
# IIV of estimated parameters in log10 scale
Ω ∈ PDiagDomain(9)
# Fixed parameters
"Baseline of HBV [copies/mL]"
ini_V ∈ RealDomain(; lower = 0.0)
"Baseline of ALT [U/L]"
ini_A ∈ RealDomain(; lower = 0.0)
"Max. proliferation rate of uninfected hepatocytes [1/d]"
r_T ∈ RealDomain(; lower = 0.0)
"Max. proliferation rate of cellular immunity [1/d]"
r_E ∈ RealDomain(; lower = 0.0)
"Max. proliferation rate of innate immunity [1/d]"
r_X ∈ RealDomain(; lower = 0.0)
"Fraction of cellular immunity that killing uninfected and refractory cells []"
n ∈ RealDomain(; lower = 0.0)
"Max. rate of conversion of infected cells to refractory cells due to the action of innate immunity [1/d]"
ρ ∈ RealDomain(; lower = 0.0)
"Max. rate of conversion of refractory cells back to target cells in the absence of innate immune responses []"
ρ_o_ρ_r ∈ RealDomain(; lower = 0.0)
"Production/synthesis rate of HBV from infected hepatocytes [virus/cell/d]"
p_V ∈ RealDomain(; lower = 0.0)
"Death rate of infected hepatocytes [1/d]"
d_TI ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of HBV HBsAg subviral particles [1/d]"
d_V ∈ RealDomain(; lower = 0.0)
"Rate of systemic induction/recovery from exhaustion [1/d]"
d_Q ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of innate immunity [1/d]"
d_X ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of cellular immunity [1/d]"
d_E ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of antigen presenting cells [1/d]"
d_P ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of debris [1/d]"
d_D ∈ RealDomain(; lower = 0.0)
"Degradation/clearance of ALT [1/d]"
d_A ∈ RealDomain(; lower = 0.0)
"Max. rate of exhaustion [1/d]"
d_E_Q ∈ RealDomain(; lower = 0.0)
"Hepatocyte carrying capacity in the liver [cells/mL]"
T_max ∈ RealDomain(; lower = 0.0)
"Half-maximal (saturation) parameter for the effect of systemic exhaustion []"
ϕ_Q ∈ RealDomain(; lower = 0.0)
"Half-maximal (saturation) parameter for the activation of cellular immunity [cells/mL]"
ϕ_E ∈ RealDomain(; lower = 0.0)
"Max. inhibitory effect of HBsAg subviral particles on innate immunity []"
S_max ∈ RealDomain(; lower = 0.0)
"Half-maximal (saturation) parameter for inhibitory effect of HBsAg subviral particles on the generation of innate immunity [μg/mL]"
ϕ_S ∈ RealDomain(; lower = 0.0)
"Hill coefficient for describing HBsAg-induced inhibition of innate immunity []"
nS ∈ RealDomain()
"Threshold of infected hepatocytes to produce dynamical switch from progressive disease to long-lasting control of HBV infection [cells/mL]"
I_threshold ∈ RealDomain(; lower = 0.0)
# Residual errors
σ_add_log10_HBsAg ∈ RealDomain(lower = 0.0)
σ_prop_log10_HBsAg ∈ RealDomain(lower = 0.0)
σ_add_log10_HBV ∈ RealDomain(lower = 0.0)
σ_prop_log10_HBV ∈ RealDomain(lower = 0.0)
σ_prop_log10_ALT ∈ RealDomain(lower = 0.0)
σ_prop_log10_CD8 ∈ RealDomain(lower = 0.0)
end
@random begin
η ~ MvNormal(Ω)
end
# Time-varying covariates for treatment phases
@covariates NA IFN
@pre begin
# Individual parameters
# Logit-normal distribution of log10(p_S), scaled to range [2, 12]
p_S = exp10(10 * logistic(logit((tv_log10_p_S - 2) / 10) + η[1]) + 2)
# Logit-normal distribution of log10(β), scaled to range [-10, 0]
β = exp10(10 * (logistic(logit(tv_log10_β / 10 + 1) + η[2]) - 1))
# Logit-normal distribution of log10(m), scaled to range [-3, 5]
m = exp10(8 * logistic(logit((tv_log10_m + 3) / 8) + η[3]) - 3)
# Normal distribution of log10(k_P)
k_P = exp10(tv_log10_k_P + η[4])
# Normal distribution of log10(k_A)
k_A = exp10(tv_log10_k_A + η[5])
# Logit-normal distribution of log10(ϵ_NA), scaled to range [-5, 0]
ϵ_NA = exp10(NA * 5 * (logistic(logit(tv_log10_ϵ_NA / 5 + 1) + η[6]) - 1))
# Logit-normal distribution of log10(ϵ_IFN), scaled to range [-5, 0]
ϵ_IFN = exp10(IFN * 5 * (logistic(logit(tv_log10_ϵ_IFN / 5 + 1) + η[7]) - 1))
# Logit-normal distribution of log10(r_E_IFN), scaled to range [0, 4]
r_E_IFN = exp10(IFN * 4 * logistic(logit(tv_log10_r_E_IFN / 4) + η[8]))
# Normal distribution
conv_E = exp10(tv_log10_conv_E + η[9])
end
# Initial conditions
@init begin
V = ini_V
I = d_V * V / p_V
T = T_max - I
S = p_S * I / d_V
A = ini_A
end
@dynamics begin
# Target uninfected hepatocytes
T' = -β * V * T + r_T * T * (1 - (T + Ieff + R) / T_max) -
d_TI * T + ρ_o_ρ_r * R - (m / n) * r_E_IFN * E * T
# Infected hepatocytes
I' = β * V * T + r_T * I * (1 - (T + I + R) / T_max) -
ρ * X * I - d_TI * I - m * r_E_IFN * E * I
# Resistant hepatocytes
R' = ρ * X * Ieff - ρ_o_ρ_r * R + r_T * R * (1 - (T + Ieff + R) / T_max) -
d_TI * R - (m / n) * r_E_IFN * E * R
# Dead cell marker
D' = m * r_E_IFN * E * Ieff + (m / n) * r_E_IFN * E * (T + R) - d_D * D
# Immune response (ALT proxy)
A' = d_A * (ini_A - A) + k_A * D
# HBsAg dynamics
S' = p_S * Ieff - d_V * S
# Viral load (NA and IFN reduce production)
V' = p_V * Ieff * ϵ_NA * ϵ_IFN - d_V * V
# Dendritic cell activation
P' = k_P * (Ieff / (ϕ_E + Ieff)) - d_P * P
# Effector T cells
E' = (d_E + r_E * E) * Peff - d_E_Q * E * (Q^4 / (ϕ_Q^4 + Q^4)) - d_E * E
# Delayed effector signal
Q' = d_Q * (Peff - Q)
# Cytotoxic effect
X' = r_X * (1 - X) * (Ieff / (ϕ_E + Ieff)) *
(1 - S_max * (HBsAg_ug_mL^nS / (ϕ_S^nS + HBsAg_ug_mL^nS))) - d_X * X
end
@vars begin
# Switch between progressive infection (I ≥ I_threshold) and clearance of infection (I < I_threshold)
Ieff = ifelse(I < I_threshold, zero(I), I)
Peff = ifelse(I < I_threshold, zero(P), P)
# HBsAg [μg/mL]: Each particle of V and S is assumed to harbor 96 molecules of S envelope protein
f_molec2ug := 24000 * 1.0e6 / 6.022e23
HBsAg_ug_mL := 96 * (V + S) * f_molec2ug
end
# Error models: Values are log10-transformed
@derived begin
"HBsAg [IU/mL]"
log10_HBsAg ~ @. CombinedNormal(log10(HBsAg_ug_mL / 0.98e-3), σ_add_log10_HBsAg, σ_prop_log10_HBsAg)
"Viral load [copies/mL]"
log10_HBV ~ @. CombinedNormal(log10(V), σ_add_log10_HBV, σ_prop_log10_HBV)
"ALT [U/L]"
log10_ALT ~ @. ProportionalNormal(log10(A), σ_prop_log10_ALT)
"CD8+ T cell counts [cells/mL]"
log10_CD8 ~ @. ProportionalNormal(log10(conv_E * E), σ_prop_log10_CD8)
end
endPumasModel
Parameters: tv_log10_p_S, tv_log10_β, tv_log10_m, tv_log10_k_P, tv_log10_k_A, tv_log10_ϵ_NA, tv_log10_ϵ_IFN, tv_log10_r_E_IFN, tv_log10_conv_E, Ω, ini_V, ini_A, r_T, r_E, r_X, n, ρ, ρ_o_ρ_r, p_V, d_TI, d_V, d_Q, d_X, d_E, d_P, d_D, d_A, d_E_Q, T_max, ϕ_Q, ϕ_E, S_max, ϕ_S, nS, I_threshold, σ_add_log10_HBsAg, σ_prop_log10_HBsAg, σ_add_log10_HBV, σ_prop_log10_HBV, σ_prop_log10_ALT, σ_prop_log10_CD8
Random effects: η
Covariates: NA, IFN
Dynamical system variables: T, I, R, D, A, S, V, P, E, Q, X
Dynamical system type: Nonlinear ODE
Derived: log10_HBsAg, log10_HBV, log10_ALT, log10_CD8, Ieff, Peff
Observed: log10_HBsAg, log10_HBV, log10_ALT, log10_CD8, Ieff, Peff
3.2 Parameter Values
From HBV-01 and Cortés‐Ríos et al. (2025):
hbv_fixed_params = (
ini_V = exp10(-0.481),
ini_A = exp10(1.25),
r_T = 1.0,
r_E = 0.1,
r_X = 1.0,
n = 100,
ρ = 0.001,
ρ_o_ρ_r = 1.0e-5,
p_V = 100.0,
d_TI = 0.0039,
d_V = 0.67,
d_Q = 0.00385,
d_X = 0.2,
d_E = 0.01,
d_P = 0.239,
d_D = 1.66,
d_A = 0.47,
d_E_Q = 0.3,
T_max = 1.36e7,
ϕ_Q = 0.8,
ϕ_E = 100.0,
S_max = 1.0,
ϕ_S = 0.1473,
nS = 0.486,
I_threshold = 1.0e-4,
)The 9 estimated parameters and their IIV from the NLME fit:
# IIV standard deviations (on logit/log10 scale)
ω_p_S = 0.632
ω_β = 2.489
ω_m = 0.827
ω_k_P = 1.292
ω_k_A = 1.156
ω_ϵ_NA = 1.583
ω_ϵ_IFN = 1.626
ω_r_E_IFN = 2.739
ω_conv_E = 1.248
hbv_estimated_params = (;
tv_log10_p_S = 7.643,
tv_log10_β = -4.358,
tv_log10_m = 1.618,
tv_log10_k_P = -3.968,
tv_log10_k_A = -4.869,
tv_log10_ϵ_NA = -1.485,
tv_log10_ϵ_IFN = -1.466,
tv_log10_r_E_IFN = 0.313,
tv_log10_conv_E = 2.745,
Ω = Diagonal([ω_p_S^2, ω_β^2, ω_m^2, ω_k_P^2, ω_k_A^2, ω_ϵ_NA^2, ω_ϵ_IFN^2, ω_r_E_IFN^2, ω_conv_E^2]),
σ_add_log10_HBsAg = 0.0972,
σ_prop_log10_HBsAg = 2.2204e-16,
σ_add_log10_HBV = 0.3206,
σ_prop_log10_HBV = 0.0838,
σ_prop_log10_ALT = 0.0478,
σ_prop_log10_CD8 = 0.4149,
)Full set of parameters:
hbv_params = merge(hbv_fixed_params, hbv_estimated_params)4 Part 2: The Gaussian Copula Algorithm
As explained in TB-04, independently sampling random effects destroys correlation structures estimated by NLME fitting. Gaussian copulas solve this by preserving Spearman (rank) correlations through any monotonic transformation (Cortés-Ríos et al., 2025).
The copula-based sampling algorithm is illustrated in Figure 1.
In Pumas this algorithm corresponds to the following steps:
flowchart TB
A["Step 1: Define Model Parameters"] --> B["Step 2: Define Correlation Matrix<br/>(Spearman correlations)"]
B --> C["Step 3: Create Gaussian Copula<br/>with correlation matrix"]
C --> D["Step 4: Create Joint Distribution of Random Effects<br/>from marginals and copula"]
D --> E["Step 5: Sample Correlated Random Effects"]
E --> F["Step 6: Perform Model Simulation"]
style A fill:#e1f5fe,stroke:#01579b
style F fill:#e8f5e9,stroke:#2e7d32
4.1 Step 1: Define Subjects
We create subjects with both treatments active (NA = true, IFN = true) so that all 9 parameters, including the treatment-dependent ones (\(\epsilon_{NA}\), \(\epsilon_{IFN}\), \(r^E_{IFN}\)), are expressed.
patients = [
Subject(; id, time = [0.0], covariates = (; NA = true, IFN = true))
for id in 1:10_000
]Population
Subjects: 10000
Covariates: NA, IFN
Observations:
4.2 Step 2: Define the Correlation Matrix
The Spearman correlation matrix from NLME estimation (Cortés‐Ríos et al., 2025) captures the dependence structure among the 8 correlated parameters. conv_E is independent (off-diagonal entries in the last row and the last column are all zeros):
# Spearman correlation matrix (9×9) from @cortes2025HBV Figure S5B
Rho = [
# p_S β m k_P k_A ϵ_NA ϵ_IFN r_E_IFN conv_E
1.0 0.43 0.46 0.51 -0.7 0.58 -0.14 -0.38 0.0; # p_S
0.43 1.0 0.3 -0.01 -0.5 0.38 -0.52 -0.21 0.0; # β
0.46 0.3 1.0 -0.16 -0.5 0.29 0.07 -0.33 0.0; # m
0.51 -0.01 -0.16 1.0 -0.45 0.25 0.14 -0.1 0.0; # k_P
-0.7 -0.5 -0.5 -0.45 1.0 -0.41 0.01 0.05 0.0; # k_A
0.58 0.38 0.29 0.25 -0.41 1.0 -0.61 -0.36 0.0; # ϵ_NA
-0.14 -0.52 0.07 0.14 0.01 -0.61 1.0 0.26 0.0; # ϵ_IFN
-0.38 -0.21 -0.33 -0.1 0.05 -0.36 0.26 1.0 0.0; # r_E_IFN
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0; # conv_E
]9×9 Matrix{Float64}:
1.0 0.43 0.46 0.51 -0.7 0.58 -0.14 -0.38 0.0
0.43 1.0 0.3 -0.01 -0.5 0.38 -0.52 -0.21 0.0
0.46 0.3 1.0 -0.16 -0.5 0.29 0.07 -0.33 0.0
0.51 -0.01 -0.16 1.0 -0.45 0.25 0.14 -0.1 0.0
-0.7 -0.5 -0.5 -0.45 1.0 -0.41 0.01 0.05 0.0
0.58 0.38 0.29 0.25 -0.41 1.0 -0.61 -0.36 0.0
-0.14 -0.52 0.07 0.14 0.01 -0.61 1.0 0.26 0.0
-0.38 -0.21 -0.33 -0.1 0.05 -0.36 0.26 1.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
The parameter conv_E is a scaling factor used to standardize CD8+ T cell measurements across assays (Cortés‐Ríos et al., 2025). It has no estimated correlations with the other parameters and is treated as independent in the copula.
4.3 Step 3: Create the Gaussian Copula
Create the Gaussian copula with the specified correlation structure:
copula = GaussianCopula(Rho)Copulas.GaussianCopula{9, Matrix{Float64}}(Σ = [1.0 0.43 0.46 0.51 -0.7 0.58 -0.14 -0.38 0.0; 0.43 1.0 0.3 -0.01 -0.5 0.38 -0.52 -0.21 0.0; 0.46 0.3 1.0 -0.16 -0.5 0.29 0.07 -0.33 0.0; 0.51 -0.01 -0.16 1.0 -0.45 0.25 0.14 -0.1 0.0; -0.7 -0.5 -0.5 -0.45 1.0 -0.41 0.01 0.05 0.0; 0.58 0.38 0.29 0.25 -0.41 1.0 -0.61 -0.36 0.0; -0.14 -0.52 0.07 0.14 0.01 -0.61 1.0 0.26 0.0; -0.38 -0.21 -0.33 -0.1 0.05 -0.36 0.26 1.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0]))
4.4 Step 4: Create the Joint Distribution
Combine the copula with the marginal distributions of each random effect:
correlated_randeffs_dist = SklarDist(
copula, (
Normal(0, ω_p_S),
Normal(0, ω_β),
Normal(0, ω_m),
Normal(0, ω_k_P),
Normal(0, ω_k_A),
Normal(0, ω_ϵ_NA),
Normal(0, ω_ϵ_IFN),
Normal(0, ω_r_E_IFN),
Normal(0, ω_conv_E),
)
)Copulas.SklarDist{Copulas.GaussianCopula{9, Matrix{Float64}}, NTuple{9, Distributions.Normal{Float64}}}(
C: Copulas.GaussianCopula{9, Matrix{Float64}}(Σ = [1.0 0.43 0.46 0.51 -0.7 0.58 -0.14 -0.38 0.0; 0.43 1.0 0.3 -0.01 -0.5 0.38 -0.52 -0.21 0.0; 0.46 0.3 1.0 -0.16 -0.5 0.29 0.07 -0.33 0.0; 0.51 -0.01 -0.16 1.0 -0.45 0.25 0.14 -0.1 0.0; -0.7 -0.5 -0.5 -0.45 1.0 -0.41 0.01 0.05 0.0; 0.58 0.38 0.29 0.25 -0.41 1.0 -0.61 -0.36 0.0; -0.14 -0.52 0.07 0.14 0.01 -0.61 1.0 0.26 0.0; -0.38 -0.21 -0.33 -0.1 0.05 -0.36 0.26 1.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0]))
m: (Distributions.Normal{Float64}(μ=0.0, σ=0.632), Distributions.Normal{Float64}(μ=0.0, σ=2.489), Distributions.Normal{Float64}(μ=0.0, σ=0.827), Distributions.Normal{Float64}(μ=0.0, σ=1.292), Distributions.Normal{Float64}(μ=0.0, σ=1.156), Distributions.Normal{Float64}(μ=0.0, σ=1.583), Distributions.Normal{Float64}(μ=0.0, σ=1.626), Distributions.Normal{Float64}(μ=0.0, σ=2.739), Distributions.Normal{Float64}(μ=0.0, σ=1.248))
)
4.6 Step 6: Perform Model Simulation
We now have correlated random effects that we can use for performing model simulations.
vrandeffs = [(; η) for η in eachcol(correlated_randeffs)]
sims = simobs(hbv_model, patients, hbv_params, vrandeffs; simulate_error = false)Simulated population (Vector{<:Subject})
Simulated subjects: 10000
Simulated variables: log10_HBsAg, log10_HBV, log10_ALT, log10_CD8, Ieff, Peff
We create a DataFrame of the simulated parameters of the dynamical system:
vpop = postprocess(sims) do generated, _
(;
p_S = only(unique(generated.p_S)),
β = only(unique(generated.β)),
m = only(unique(generated.m)),
k_P = only(unique(generated.k_P)),
k_A = only(unique(generated.k_A)),
ϵ_NA = only(unique(generated.ϵ_NA)),
ϵ_IFN = only(unique(generated.ϵ_IFN)),
r_E_IFN = only(unique(generated.r_E_IFN)),
conv_E = only(unique(generated.conv_E)),
)
end |> DataFrame
table_one(vpop)| Total | |
| p_S | |
| Mean (SD) | 1.64e9 (7.88e9) |
| Median [Min, Max] | 4.77e7 [975, 2.22e11] |
| β | |
| Mean (SD) | 0.109 (0.234) |
| Median [Min, Max] | 4.11e-5 [1.0e-10, 0.998] |
| m | |
| Mean (SD) | 987 (3296) |
| Median [Min, Max] | 39.8 [0.00429, 73146] |
| k_P | |
| Mean (SD) | 0.0109 (0.193) |
| Median [Min, Max] | 0.000117 [2.82e-9, 11.3] |
| k_A | |
| Mean (SD) | 0.00044 (0.00479) |
| Median [Min, Max] | 1.29e-5 [9.16e-11, 0.234] |
| ϵ_NA | |
| Mean (SD) | 0.157 (0.23) |
| Median [Min, Max] | 0.0329 [1.09e-5, 0.99] |
| ϵ_IFN | |
| Mean (SD) | 0.169 (0.244) |
| Median [Min, Max] | 0.0366 [1.07e-5, 0.987] |
| r_E_IFN | |
| Mean (SD) | 443 (1489) |
| Median [Min, Max] | 2.05 [1, 9949] |
| conv_E | |
| Mean (SD) | 47723 (902393) |
| Median [Min, Max] | 570 [0.00275, 5.95e7] |
The nonlinear transformations of η entries in the @pre block are all monotonically increasing, so they preserve the ordering (ranks) of values and therefore preserve Spearman correlations. For a detailed explanation, see TB-04.
5 Part 3: Validation
5.1 Check Spearman Correlations
The most critical validation is checking that our target correlations are preserved.
Compute observed Spearman correlations:
param_cols = [:p_S, :β, :m, :k_P, :k_A, :ϵ_NA, :ϵ_IFN, :r_E_IFN, :conv_E]
param_matrix = Matrix(vpop[:, param_cols])
observed_corr = corspearman(param_matrix)9×9 Matrix{Float64}:
1.0 0.424178 0.43856 … -0.361265 0.0171833
0.424178 1.0 0.277427 -0.210621 0.0146976
0.43856 0.277427 1.0 -0.311802 0.0182236
0.491311 -0.00096101 -0.133576 -0.104756 -0.000553539
-0.684156 -0.483684 -0.478523 0.046769 -0.0118301
0.546566 0.361237 0.262448 … -0.346221 0.0124758
-0.120676 -0.506189 0.0883368 0.254462 -0.0130531
-0.361265 -0.210621 -0.311802 1.0 0.00642249
0.0171833 0.0146976 0.0182236 0.00642249 1.0
The absolute difference from the target correlation matrix is:
@. abs(observed_corr - Rho)9×9 Matrix{Float64}:
0.0 0.00582187 0.0214397 … 0.019324 0.0187348 0.0171833
0.00582187 0.0 0.022573 0.0138113 0.00062089 0.0146976
0.0214397 0.022573 0.0 0.0183368 0.0181981 0.0182236
0.0186892 0.00903899 0.0264236 0.00555759 0.00475606 0.000553539
0.0158442 0.0163165 0.0214765 0.0160846 0.00323102 0.0118301
0.0334343 0.0187627 0.027552 … 0.0212148 0.0137793 0.0124758
0.019324 0.0138113 0.0183368 0.0 0.0055381 0.0130531
0.0187348 0.00062089 0.0181981 0.0055381 0.0 0.00642249
0.0171833 0.0146976 0.0182236 0.0130531 0.00642249 0.0
The observed correlations closely match the expected correlations from Cortés‐Ríos et al. (2025). The small differences are due to sampling variability and decrease with larger sample sizes.
5.2 Check Marginal Distributions
Let’s verify that the parameters follow their expected distributions by comparing expected and observed medians:
marginals_df = DataFrame(
parameter = String.(param_cols),
median_exp = [
hbv_params.tv_log10_p_S,
hbv_params.tv_log10_β,
hbv_params.tv_log10_m,
hbv_params.tv_log10_k_P,
hbv_params.tv_log10_k_A,
hbv_params.tv_log10_ϵ_NA,
hbv_params.tv_log10_ϵ_IFN,
hbv_params.tv_log10_r_E_IFN,
hbv_params.tv_log10_conv_E,
],
median_obs = [median(vpop[!, c]) for c in param_cols],
)
simple_table(
marginals_df,
[
:parameter => "Parameter",
:median_exp => "Median (expected)",
:median_obs => "Median (observed)",
],
)| Parameter | Median (expected) | Median (observed) |
| p_S | 7.64 | 4.77e7 |
| β | -4.36 | 4.11e-5 |
| m | 1.62 | 39.8 |
| k_P | -3.97 | 0.000117 |
| k_A | -4.87 | 1.29e-5 |
| ϵ_NA | -1.48 | 0.0329 |
| ϵ_IFN | -1.47 | 0.0366 |
| r_E_IFN | 0.313 | 2.05 |
| conv_E | 2.74 | 570 |
6 Part 4: Visualization
6.1 Parameter Distributions
specs_vpop = data(vpop) * mapping(
[:p_S, :β, :m, :k_P, :k_A, :ϵ_NA, :ϵ_IFN, :r_E_IFN, :conv_E] .=> log10 .=> ["p_S", "β", "m", "k_P", "k_A", "ϵ_NA", "ϵ_IFN", "r_E_IFN", "conv_E"],
layout = dims(1),
) * histogram(; normalization = :pdf, datalimits = extrema)
specs_median = data(
(;
p_S = [hbv_params.tv_log10_p_S],
β = [hbv_params.tv_log10_β],
m = [hbv_params.tv_log10_m],
k_P = [hbv_params.tv_log10_k_P],
k_A = [hbv_params.tv_log10_k_A],
ϵ_NA = [hbv_params.tv_log10_ϵ_NA],
ϵ_IFN = [hbv_params.tv_log10_ϵ_IFN],
r_E_IFN = [hbv_params.tv_log10_r_E_IFN],
conv_E = [hbv_params.tv_log10_conv_E],
)
) * mapping(
[:p_S, :β, :m, :k_P, :k_A, :ϵ_NA, :ϵ_IFN, :r_E_IFN],
layout = dims(1),
linestyle = AlgebraOfGraphics.direct("Expected median"),
) * visual(VLines)
draw(
specs_vpop + specs_median;
axis = (; xlabel = "log₁₀ value", ylabel = "Density"),
facet = (; linkxaxes = false, linkyaxes = false),
legend = (; position = :bottom),
)6.2 Corner Plot: All Pairwise Relationships
pairplot(
log10.(vpop) => (
PairPlots.Scatter(; rasterize = 1),
PairPlots.MarginHist(),
PairPlots.Calculation(corspearman),
),
labels = Dict(
:p_S => Makie.rich("log", Makie.subscript("10"), " p", Makie.subscript("S")),
:β => Makie.rich("log", Makie.subscript("10"), " β"),
:m => Makie.rich("log", Makie.subscript("10"), " m"),
:k_P => Makie.rich("log", Makie.subscript("10"), " k", Makie.subscript("P")),
:k_A => Makie.rich("log", Makie.subscript("10"), " k", Makie.subscript("A")),
:ϵ_NA => Makie.rich("log", Makie.subscript("10"), " ϵ", Makie.subscript("NA")),
:ϵ_IFN => Makie.rich("log", Makie.subscript("10"), " ϵ", Makie.subscript("IFN")),
:r_E_IFN => Makie.rich("log", Makie.subscript("10"), " r", Makie.subsup("IFN", "E")),
:conv_E => Makie.rich("log", Makie.subscript("10"), " conv", Makie.subscript("E"))
),
)7 Summary
7.1 Key Takeaways
The HBV model has 8 correlated parameters whose NLME-estimated Spearman correlations must be preserved in the virtual population
Gaussian copulas preserve rank (Spearman) correlations through the complex logit-normal and log-normal transformations in the HBV model’s
@preblockThe algorithm in Pumas:
- Define parameter specifications and correlation matrix from NLME estimation
- Create Gaussian copula and joint distribution from marginals and copula
- Sample random effects from joint distribution
- Perform simulations using the sampled random effects
- Validate results
Always validate correlations and marginals before using the virtual population
7.2 Quick Reference: Key Functions
| Function | Package | Purpose |
|---|---|---|
corspearman(X) |
StatsBase.jl | Compute Spearman correlations |
GaussianCopula(Rho) |
Copulas.jl | Create copula with correlation matrix |
SklarDist(copula, marginals) |
Copulas.jl | Create joint distribution from marginals and copula |
7.3 What’s Next
In Tutorial 5: MILP Calibration, we will:
- Calibrate the virtual population to match clinical trial baseline characteristics
- Apply MILP optimization to match target HBsAg distributions
