using Pumas
using StatsBase
using JuMP
using HiGHS
import MultiObjectiveAlgorithms as MOA
using Copulas
using AlgebraOfGraphics
using CairoMakie
using CategoricalArrays
using DataFramesMeta
using SummaryTables
using Random
using Statistics
# Set seed for reproducibility
Random.seed!(1234)
MILP-Based Virtual Population Calibration
Matching Response Rate Distributions with Optimization
This is Tutorial 5 of 6 in the Tumor Burden series.
| # | Tutorial | Status |
|---|---|---|
| 1 | Model Introduction | ✓ Complete |
| 2 | Global Sensitivity Analysis | ✓ Complete |
| 3 | Structural Identifiability | ✓ Complete |
| 4 | Copula Vpop Generation | ✓ Complete |
| 5 | MILP Calibration (current) | ← You are here |
| 6 | VCT Simulation |
1 Learning Outcomes
Virtual populations generated using copula sampling (see Tutorial 4) may not match real clinical data distributions. Calibration ensures the virtual population is representative of the actual patient population being studied (Cortés-Ríos et al., 2025).
In this tutorial, you will learn:
- Why virtual population calibration is essential for realistic ISCTs
- The mathematical formulation of an integer linear programming (ILP) problem
- How to build a JuMP optimization model step-by-step
- How to calibrate to response rate distributions (not just baseline characteristics)
- Multi-objective optimization with maximum displacement using MultiObjectiveAlgorithms.jl
- The mixed-integer linear programming (MILP) formulation with distribution error bound for inter-category trade-offs
- Validating calibration results with visualizations
This tutorial builds on concepts from earlier tutorials:
- Tutorial 1: Tumor burden model definition
- Tutorial 4: Virtual population generation with copulas
We’ll generate a new vpop using the copula approach and then calibrate it to match clinical response rates.
2 Background
2.1 The Problem: Why Calibration Matters
After generating a virtual population (Vpop) using copula sampling, we have patients with realistic parameter correlations. However, the marginal distributions and outcome distributions may not match what we observe in real clinical trials.
Consider this scenario for tumor burden:
- Your Vpop has 10,000 virtual patients with parameters (f, g, k)
- When you simulate treatment, you get certain response rates
- The real clinical trial showed: 30% complete response, 45% partial response, 25% no response
- Your Vpop shows different rates due to parameter distribution differences
If your Vpop outcome distribution doesn’t match the clinical trial:
- Treatment effect estimates will be biased
- Subgroup analyses won’t reflect real clinical heterogeneity
- Regulatory questions about population representativeness
Calibration selects a subset of your Vpop that, when simulated, matches the target response distribution.
2.2 What is ILP-Based Calibration?
ILP stands for integer linear programming. It is an optimization technique where variables are integer or binary (0 or 1), and both the objective function and the constraints are linear.
For Vpop calibration, we use binary variables to select or reject each virtual patient:
\[ x_j = \begin{cases} 1 & \text{if patient } j \text{ is selected} \\ 0 & \text{if patient } j \text{ is rejected} \end{cases} \]
2.3 Why ILP? Alternatives and Trade-offs
| Approach | Pros | Cons |
|---|---|---|
| Random Rejection Sampling | Simple, fast | Inefficient, may reject too many |
| Importance Weighting | No rejection needed | Weights can be extreme, high variance |
| ILP | Optimal selection, precise matching | Computationally more expensive |
ILP is the preferred approach when you need precise control over the distribution matching.
3 Libraries
What each package does:
JuMP: Algebraic modeling language for optimization problemsHiGHS: High-performance ILP solver (open source, fast)MultiObjectiveAlgorithms(MOA): Multi-objective optimization algorithms for JuMPCopulas: For generating correlated parameter samplesAlgebraOfGraphics+CairoMakie: Publication-quality visualizations
4 Part 1: Understanding the Calibration Problem
4.1 The Tumor Burden Model Recap
From Tutorial 1, our tumor burden model has:
- f: Fraction of treatment-sensitive cells (0-1)
- g: Growth rate of resistant cells (1/day)
- k: Death rate of sensitive cells (1/day)
The tumor dynamics follow:
\[ N(t) = N_0 \cdot f \cdot e^{-kt} + N_0 \cdot (1-f) \cdot e^{gt} \]
The full non-linear mixed effect model, implemented and explained in Tutorial 1: Model Introduction, is:
tumor_burden_model = @model begin
@metadata begin
desc = "Tumor Burden Model for NSCLC Chemotherapy"
timeu = u"d" # Time unit is days
end
@param begin
# Typical values (population parameters)
tvf ∈ RealDomain(lower = 0.0, upper = 1.0)
tvg ∈ RealDomain(lower = 0.0)
tvk ∈ RealDomain(lower = 0.0)
# Inter-individual variability (diagonal covariance matrix)
# Values on the transformed scale (logit for f, log for g and k)
Ω ∈ PDiagDomain(3)
# Residual error (standard deviation)
σ ∈ RealDomain(lower = 0.0)
end
@random begin
# Random effects vector (one per parameter)
η ~ MvNormal(Ω)
end
@covariates treatment
@pre begin
# Transform random effects to individual parameters
# f: logit-normal transformation (bounded 0-1)
f = logistic(logit(tvf) + η[1])
# g: log-normal transformation (positive value)
g = tvg * exp(η[2])
# k: log-normal transformation (positive value)
# Additionally, treatment affects death rate:
# treatment = false → control arm (no killing)
# treatment = true → treatment arm (killing at rate tvk * exp(η[3]))
k = treatment * tvk * exp(η[3])
end
# Initial conditions (normalized tumor = 1)
@init begin
N_sens = f # Sensitive cells
N_res = 1 - f # Resistant cells
end
@dynamics begin
# Sensitive cells: die at rate k (0 if control)
N_sens' = -k * N_sens
# Resistant cells: grow at rate g
N_res' = g * N_res
end
@derived begin
# Total normalized tumor size
Nt := @. N_sens + N_res
# Observation with residual error
tumor_size ~ @. Normal(Nt, σ)
end
endPumasModel
Parameters: tvf, tvg, tvk, Ω, σ
Random effects: η
Covariates: treatment
Dynamical system variables: N_sens, N_res
Dynamical system type: Matrix exponential
Derived: tumor_size
Observed: tumor_size
4.2 Defining Response Categories
For tumor burden, we define response based on tumor size at a fixed time point (e.g., 18 weeks):
- tumor size < 14%: Complete response (CR)
- 14% ≤ tumor size < 70%: Partial response (PR)
- 70% ≤ tumor size ≤ 100%: Stable disease (SD)
- tumor size > 100%: Progressive disease (PD)
4.3 Generating the Virtual Population
Let us generate a vpop using the copula approach from Tutorial 4.
The parameter values come from Qi & Cao (2022), who fit this model to NSCLC clinical trial data:
# Population (typical) parameter values
pop_f = 0.27 # 27% of tumor is treatment-sensitive
pop_g = 0.0013 # Resistant cells grow ~0.13% per day
pop_k = 0.0091 # Sensitive cells die ~0.91% per day
# Inter-individual variability (standard deviations on transformed scale)
ω_f = 2.16 # High variability in sensitive fraction
ω_g = 1.57 # Moderate variability in growth rate
ω_k = 1.24 # Moderate variability in death rate
# Residual error (standard deviation)
resid_σ = 0.095
# Critical correlation from the data
cor_f_g = -0.64 # Negative correlation: high f ↔ low gWe create 2000 virtual subjects for simulations with the estimated population parameters:
patients = [
Subject(;
id, time = 7 * (0:18), covariates = (; treatment = true),
) for id in 1:2_000
]
param = (; tvf = pop_f, tvg = pop_g, tvk = pop_k, Ω = Diagonal([ω_f^2, ω_g^2, ω_k^2]), σ = resid_σ)Based on the estimated correlation matrix, we define a Gaussian copula:
Rho = [
1.0 -0.64 0.0
-0.64 1.0 0.0
0.0 0.0 1.0
]
copula = GaussianCopula(Rho)We define the joint distribution of correlated random effects based on the marginal distributions and the Gaussian copula:
correlated_randeffs_dist = SklarDist(copula, (Normal(0, ω_f), Normal(0, ω_g), Normal(0, ω_k)))We sample correlated random effects for each virtual patient:
correlated_randeffs = rand(correlated_randeffs_dist, length(patients))
vrandeffs = [(; η) for η in eachcol(correlated_randeffs)]Now let us simulate the tumor dynamics for each virtual patient:
sims = simobs(
tumor_burden_model, patients, param, vrandeffs; simulate_error = false
)Simulated population (Vector{<:Subject})
Simulated subjects: 2000
Simulated variables: tumor_size
We compute the observed Spearman correlation of the simulated parameters (f, g, and k):
sims_f_g_k = postprocess(sims) do generated, _
(; f = only(unique(generated.f)), g = only(unique(generated.g)), k = only(unique(generated.k)))
end |> DataFrame
corspearman(Matrix(sims_f_g_k))3×3 Matrix{Float64}:
1.0 -0.643481 -0.0191359
-0.643481 1.0 0.0374227
-0.0191359 0.0374227 1.0
Additionally, we validate the marginal distributions of the simulated parameters:
marginals_df = DataFrame(
parameter = ["f", "g", "k"],
median_exp = [pop_f, pop_g, pop_k],
median_obs = [median(sims_f_g_k.f), median(sims_f_g_k.g), median(sims_f_g_k.k)],
omega_exp = [ω_f, ω_g, ω_k],
omega_obs = [std(logit.(sims_f_g_k.f)), std(log.(sims_f_g_k.g)), std(log.(sims_f_g_k.k))],
)
simple_table(
marginals_df,
[:parameter => "Parameter", :median_exp => "Median (expected)", :median_obs => "Median (observed)", :omega_exp => "ω (expected)", :omega_obs => "ω (observed)"]
)| Parameter | Median (expected) | Median (observed) | ω (expected) | ω (observed) |
| f | 0.27 | 0.262 | 2.16 | 2.18 |
| g | 0.0013 | 0.00129 | 1.57 | 1.57 |
| k | 0.0091 | 0.00905 | 1.24 | 1.24 |
4.4 Classifying the Tumor Response
Now let us classify the response of each virtual patient after 18 weeks.
vpop = postprocess(sims) do generated, _
if generated.t[end] != 7 * 18 # days
error("final time point must be 18 weeks")
end
tumor_size_final = generated.tumor_size[end]
(; tumor_size_final)
end |> DataFrame
@transform! vpop :response = cut(
:tumor_size_final,
[0.14, 0.7, 1.0];
labels = ["CR", "PR", "SD", "PD"],
extend = true,
)We compute the distribution of responses:
vpop_distribution = @by vpop :response begin
:pct = 100 * length(:response) / nrow(vpop)
end
simple_table(
vpop_distribution, [:response => "Response", :pct => "Vpop (%)"],
)| Response | Vpop (%) |
| CR | 2.35 |
| PR | 22 |
| SD | 26 |
| PD | 49.6 |
4.5 Defining the Target Distribution
Let us define a target response distribution that we want to match. In practice, one would use response rates in a clinical trial.
target_distribution = DataFrame(
response = categorical(["CR", "PR", "SD", "PD"]),
pct = [25.0, 40.0, 20.0, 15.0],
)
simple_table(
target_distribution, [:response => "Response", :pct => "Target (%)"],
)| Response | Target (%) |
| CR | 25 |
| PR | 40 |
| SD | 20 |
| PD | 15 |
We compare the target distribution with the distribution of response rates in the current virtual population.
comparison_df = leftjoin(target_distribution, vpop_distribution; on = :response, renamecols = "_target" => "_vpop")
@rtransform! comparison_df :pct_vpop = @coalesce(:pct_vpop, 0.0)
@rtransform! comparison_df :diff = :pct_vpop - :pct_target
simple_table(
comparison_df,
[
:response => "Response",
:pct_vpop => "Vpop (%)",
:pct_target => "Target (%)",
:diff => "Difference (%)",
]
)| Response | Vpop (%) | Target (%) | Difference (%) |
| CR | 2.35 | 25 | -22.6 |
| PR | 22 | 40 | -18.0 |
| SD | 26 | 20 | 6.05 |
| PD | 49.6 | 15 | 34.6 |
5 Part 2: The Mathematical Formulation
Before writing code, let’s understand the ILP optimization problem mathematically.
5.1 Decision Variables
We have binary selection variables for each virtual patient: \[ x_j \in \{0, 1\} \quad \text{for } j = 1, \ldots, N \]
5.2 Objective Function
We want to maximize the number of selected patients (larger calibrated populations give better statistical power):
\[\max \sum_{i=1}^N x_i \]
5.3 Constraints
Additionally, we want the distribution of observed response rates of the selected patients to match the target distribution.
As generally we cannot expect the observed and expected distributions to match exactly, we enforce only approximate equality of the two distributions. That is, for each response category \(c\) with target fraction \(p_c\), the proportion of selected VPs in category \(c\) is constrained to be within an absolute error of \(\epsilon > 0\) of the target:
\[ (p_c - \epsilon) N_{total} \leq \text{N}_c \leq (p_c + \epsilon) N_{total}\]
where \(N_{\text{total}} = \sum_{i=1}^N x_i\) is the total number of selected VPs and \(N_{c} = \sum_{i \text{ with category } c} x_i\) is the number of selected VPs in category \(c\).
5.4 Why Binary Variables Make This Hard (But Solvable)
The fundamental insight is that VP selection is an all-or-nothing decision. Each virtual patient is either:
- Selected (\(x_j = 1\)): included in the calibrated population
- Rejected (\(x_j = 0\)): excluded from the calibrated population
This discrete choice makes the problem an integer linear program (ILP) rather than a standard linear program (LP).
Binary optimization is NP-hard— there’s no known algorithm that solves all instances quickly. With \(N\) virtual patients, there are \(2^N\) possible selections:
| VPs | Possible Selections |
|---|---|
| 10 | 1,024 |
| 20 | ~1 million |
| 100 | \(2^{100}\) (astronomical) |
Fortunately, modern solvers like HiGHS use sophisticated algorithms (branch-and-bound, cutting planes) that solve well-structured problems like ours in seconds (Dunning et al., 2017; Huangfu & Hall, 2018).
6 Part 3: Building the JuMP Model Step-by-Step
6.1 Step 3a: Build the Optimization Model
We implement the optimization problem using a maximum absolute error of 5% for each category:
# Calibration tolerance
epsilon = 0.05
# Create optimization model
model = Model(HiGHS.Optimizer)
set_silent(model)
set_attribute(model, "presolve", "on")
set_time_limit_sec(model, 60.0)
# Decision variables
@variable(model, x[1:nrow(vpop)], Bin) # Binary selection for each VP
# Objective: maximize number of selected VPs
N_total = sum(x)
@objective(model, Max, N_total)
# Constraints: distribution matching for each category
for row in eachrow(target_distribution)
p_c = row.pct / 100
# Count of selected VPs in this category
N_c = sum(x[vpop.response .== row.response])
# Upper bound: count <= (p_c + ε) × N_total
@constraint(model, N_c <= (p_c + epsilon) * N_total)
# Lower bound: count >= (p_c - ε) × N_total
@constraint(model, N_c >= (p_c - epsilon) * N_total)
end
modelA JuMP Model
├ solver: HiGHS
├ objective_sense: MAX_SENSE
│ └ objective_function_type: AffExpr
├ num_variables: 2000
├ num_constraints: 2008
│ ├ JuMP.AffExpr in MOI.GreaterThan{Float64}: 4
│ ├ JuMP.AffExpr in MOI.LessThan{Float64}: 4
│ └ JuMP.VariableRef in MOI.ZeroOne: 2000
└ Names registered in the model
└ :x
6.2 Step 3c: Solve the Model
Solve the optimization problem:
optimize!(model)We ensure that the problem was solved successfully:
assert_is_solved_and_feasible(model)In case the problem was not solved successfully, the function assert_is_solved_and_feasible throws an error describing the state of the solver.
6.3 Step 3d: Extract and Analyze Results
We can inspect the results in more detail:
solution_summary(model)solution_summary(; result = 1, verbose = false)
├ solver_name : HiGHS
├ Termination
│ ├ termination_status : OPTIMAL
│ ├ result_count : 1
│ ├ raw_status : kHighsModelStatusOptimal
│ └ objective_bound : 2.35000e+02
├ Solution (result = 1)
│ ├ primal_status : FEASIBLE_POINT
│ ├ dual_status : NO_SOLUTION
│ ├ objective_value : 2.35000e+02
│ ├ dual_objective_value : NaN
│ └ relative_gap : 0.00000e+00
└ Work counters
├ solve_time (sec) : 1.00106e-02
├ simplex_iterations : 5
├ barrier_iterations : -1
└ node_count : 1
From the summary, we can already see that the selected number of virtual patients (which corresponds to the objective function value of the solution) is 235.
We extract the selected virtual patients, i.e., those for which the solution is 1:
ilp_selected = findall(xi -> value(xi) == 1, x)
ilp_vpop = vpop[ilp_selected, :]The distribution of observed responses for this subset of virtual patients is
ilp_distribution = @by ilp_vpop :response begin
:pct = 100 * length(:response) / nrow(ilp_vpop)
end
simple_table(
ilp_distribution, [:response => "Response", :pct => "Vpop (%)"],
)| Response | Vpop (%) |
| CR | 20 |
| PR | 35.3 |
| SD | 24.7 |
| PD | 20 |
We compare it with the target distribution of response rates:
ilp_comparison_df = leftjoin(target_distribution, ilp_distribution; on = :response, renamecols = "_target" => "_vpop")
@rtransform! ilp_comparison_df :pct_vpop = @coalesce(:pct_vpop, 0.0)
@rtransform! ilp_comparison_df :pct_diff = :pct_vpop - :pct_target
simple_table(
ilp_comparison_df,
[
:response => "Response",
:pct_vpop => "Vpop (%)",
:pct_target => "Target (%)",
:pct_diff => "Difference (%)",
]
)| Response | Vpop (%) | Target (%) | Difference (%) |
| CR | 20 | 25 | -5.0 |
| PR | 35.3 | 40 | -4.68 |
| SD | 24.7 | 20 | 4.68 |
| PD | 20 | 15 | 5 |
6.4 Evaluating Distribution Error
The ILP constrains the maximum per-category proportion error to \(\epsilon\), so the natural evaluation metric is the realized maximum absolute error:
\[ \epsilon_{\max} = \max_c \left| \frac{N_c}{N_{\text{total}}} - p_c \right| \]
This tells us how far the worst-matching category deviates from its target.
ilp_max_error = maximum(abs, ilp_comparison_df.pct_diff) / 1000.05
6.5 Visualizing the Calibration Result
original_comparison_df = vcat(
target_distribution,
vpop_distribution;
source = "type" => ["Target", "VPop"],
)
calibrated_comparison_df = vcat(
target_distribution,
ilp_distribution;
source = "type" => ["Target", "VPop"],
)
comparison_df = vcat(
original_comparison_df,
calibrated_comparison_df;
source = "source" => ["Original", "Calibrated"],
)
type_mapping = :type => sorter("VPop", "Target") => ""
specs = data(comparison_df) * mapping(
:response => renamer(
"CR" => "Complete Response",
"PR" => "Partial Response",
"SD" => "Stable Disease",
"PD" => "Progressive Disease",
) => "Response",
:pct => "Percentage (%)",
dodge = type_mapping,
row = :source => sorter("Original", "Calibrated") => "VPop",
color = type_mapping
) * visual(BarPlot)
draw(specs; axis = (; xticklabelrotation = π / 4))7 Part 4: Multi-Objective Optimization
7.1 Motivation
The ILP from Part 3 requires choosing \(\epsilon\) before solving. What if we want to explore the full trade-off between population size and per-category accuracy without committing to a specific tolerance?
Multi-objective optimization lets us find the entire Pareto front directly, revealing all optimal trade-offs between two competing objectives:
- Maximize \(N_{\text{total}} = \sum x_i\) (number of selected VPs)
- Minimize the maximum per-category error (the same quantity \(\epsilon\) constrains)
7.2 Mathematical Formulation
We introduce a single continuous variable \(\Delta \geq 0\) representing the maximum absolute count displacement across all categories:
\[ \begin{align} \min \quad & [-N_{\text{total}},\; \Delta] \\ \text{s.t.} \quad & N_c - p_c \cdot N_{\text{total}} \leq \Delta & \forall\, c \\ & p_c \cdot N_{\text{total}} - N_c \leq \Delta & \forall\, c \\ & x_j \in \{0, 1\} \\ & \Delta \geq 0 \end{align} \]
The key relationship is \(\Delta / N_{\text{total}} = \max_c |N_c / N_{\text{total}} - p_c|\), which is exactly the quantity that \(\epsilon\) constrains in the ILP. So this formulation directly explores the Pareto front of \((N_{\text{total}},\, \epsilon_{\max})\).
7.3 Recovering the \(\epsilon_{\max}\)-Pareto Front
Using \(\Delta\) as the second objective keeps the problem as a MILP, solvable efficiently by HiGHS. But \(\Delta\) is not the metric we ultimately care about — \(\epsilon_{\max}\) is. The Pareto fronts in \((N_\text{total}, \Delta)\) and \((N_\text{total}, \epsilon_{\max})\) space are generally not identical.
Fortunately, the \(\Delta\)-Pareto front is guaranteed to contain all \(\epsilon_{\max}\)-Pareto-optimal solutions:
If solution \(T\) dominates \(S\) in \((N_\text{total}, \Delta)\) space (i.e., \(N_T \geq N_S\) and \(\Delta_T \leq \Delta_S\), with at least one strict inequality), then:
\[ \epsilon_{\max,T} = \frac{\Delta_T}{N_T} \leq \frac{\Delta_S}{N_S} = \epsilon_{\max,S} \]
This holds because \(\Delta_T \leq \Delta_S\) and \(\frac{1}{N_T} \leq \frac{1}{N_S}\), so their product satisfies \(\frac{\Delta_T}{N_T} \leq \frac{\Delta_S}{N_S}\).
Consequence: every \(\epsilon_{\max}\)-Pareto-optimal solution is also \(\Delta\)-Pareto-optimal. We can therefore find the \(\Delta\)-Pareto front efficiently (as a MILP), compute \(\epsilon_{\max} = \Delta / N_\text{total}\) for each solution, and post-filter for \(\epsilon_{\max}\)-dominance to recover the exact \(\epsilon_{\max}\)-Pareto front.
7.4 Multi-Objective Formulation with MultiObjectiveAlgorithms.jl
MultiObjectiveAlgorithms.jl (MOA) extends JuMP with multi-objective optimization algorithms. It wraps a single-objective solver (here, HiGHS) and orchestrates multiple subproblem solves to trace the Pareto front.
We use the ε-constraint method (MOA.EpsilonConstraint), which is well-suited for bi-objective integer programs. It iterates over the range of one objective, constraining it at each step while optimizing the other, systematically enumerating nondominated solutions.
For bi-objective problems, MOA offers several algorithms:
| Algorithm | What it finds | Best for |
|---|---|---|
EpsilonConstraint() |
All nondominated points | General bi-objective (recommended) |
Chalmet() |
All nondominated points | Bi-criterion integer programs |
Dichotomy() |
Supported extreme points | Quick overview (may miss points) |
TambyVanderpooten() |
All nondominated points | Multi-objective (p ≥ 2) discrete programs |
EpsilonConstraint is the most intuitive and gives fine-grained control over the number of Pareto points via MOA.SolutionLimit().
moa_model = Model(() -> MOA.Optimizer(HiGHS.Optimizer))
set_attribute(moa_model, MOA.Algorithm(), MOA.EpsilonConstraint())
set_attribute(moa_model, MOA.SolutionLimit(), 50)
set_silent(moa_model)
# Binary selection variables
@variable(moa_model, y[1:nrow(vpop)], Bin)
# Single continuous variable: maximum absolute count displacement
@variable(moa_model, Δ >= 0)
# Total selected VPs
N_total = sum(y)
# Bi-objective: minimize [-N_total, Δ]
# (negate N_total since JuMP requires a uniform optimization sense)
@objective(moa_model, Min, [-N_total, Δ])
# Maximum displacement constraints
for row in eachrow(target_distribution)
p_c = row.pct / 100
N_c = sum(y[vpop.response .== row.response])
@constraint(moa_model, N_c - p_c * N_total <= Δ)
@constraint(moa_model, p_c * N_total - N_c <= Δ)
end
moa_modelA JuMP Model
├ solver: MOA[algorithm=MultiObjectiveAlgorithms.EpsilonConstraint, optimizer=HiGHS]
├ objective_sense: MIN_SENSE
│ └ objective_function_type: Vector{AffExpr}
├ num_variables: 2001
├ num_constraints: 2009
│ ├ JuMP.AffExpr in MOI.LessThan{Float64}: 8
│ ├ JuMP.VariableRef in MOI.GreaterThan{Float64}: 1
│ └ JuMP.VariableRef in MOI.ZeroOne: 2000
└ Names registered in the model
└ :y, :Δ
optimize!(moa_model)assert_is_solved_and_feasible(moa_model)7.5 Extracting and Filtering the Pareto Front
We extract all nondominated solutions found by the ε-constraint method, compute the maximum absolute error for each, and filter for \(\epsilon_{\max}\)-Pareto-optimality:
n_solutions = result_count(moa_model)
moa_solutions_df = map(1:n_solutions) do i
obj = objective_value(moa_model; result = i)
nselected = round(Int, -obj[1])
delta = obj[2]
# Recover max absolute error: ε_max = Δ / N_total
max_error = nselected > 0 ? delta / nselected : NaN
(; nselected, max_error)
end |> DataFrameWe filter for \(\epsilon_{\max}\)-Pareto-optimality by removing solutions that are dominated in \((N_\text{total}, \epsilon_{\max})\) space:
sort!(moa_solutions_df, [order(:nselected; rev = true), :max_error])
moa_solutions_df.pareto_optimal .= false
moa_solutions_df.pareto_optimal[begin] = true
dominating_error = moa_solutions_df.max_error[begin]
for row in eachrow(moa_solutions_df)
if row.max_error < dominating_error
row.pareto_optimal = true
dominating_error = row.max_error
end
end
sort!(moa_solutions_df, :nselected; rev = true)
simple_table(
moa_solutions_df,
[
:nselected => "# selected VPs",
:max_error => "Max. absolute error",
:pareto_optimal => "Pareto-Optimal",
],
)| # selected VPs | Max. absolute error | Pareto-Optimal |
| 2000 | 0.347 | true |
| 1320 | 0.214 | true |
| 1282 | 0.213 | true |
| 1244 | 0.212 | true |
| 1206 | 0.211 | true |
| 1168 | 0.21 | true |
| 1130 | 0.208 | true |
| 1092 | 0.207 | true |
| 1054 | 0.205 | true |
| 1016 | 0.204 | true |
| 978 | 0.202 | true |
| 940 | 0.2 | true |
| 902 | 0.198 | true |
| 864 | 0.196 | true |
| 826 | 0.193 | true |
| 788 | 0.19 | true |
| 750 | 0.187 | true |
| 712 | 0.184 | true |
| 674 | 0.18 | true |
| 636 | 0.176 | true |
| 598 | 0.171 | true |
| 560 | 0.166 | true |
| 522 | 0.16 | true |
| 484 | 0.153 | true |
| 446 | 0.145 | true |
| 408 | 0.135 | true |
| 370 | 0.123 | true |
| 332 | 0.108 | true |
| 294 | 0.0901 | true |
| 256 | 0.0664 | true |
| 218 | 0.0344 | true |
| 180 | 0 | true |
7.6 Visualization
We visualize the Pareto front and overlay the ILP solution from Part 3:
specs = (
data(@rsubset(moa_solutions_df, :pareto_optimal)) * mapping(color = direct("Pareto front")) +
data((; max_error = [ilp_max_error], nselected = [nrow(ilp_vpop)])) * mapping(color = direct("ILP solution (ϵ = 0.05)"))
) * mapping(:max_error => "Max. absolute error", :nselected => "# selected VPs") * visual(Scatter)
draw(specs)The Pareto front reveals the full trade-off between population size and maximum per-category error. The ILP solution from Part 3 is a single point on (or near) this front, corresponding to the specific choice of \(\epsilon = 0.05\). The multi-objective approach recovers the entire curve without requiring \(\epsilon\) to be chosen a priori.
8 Part 5: Inter-Category Trade-offs
8.1 Why Allow Trade-offs Between Categories?
The per-category formulation from Part 3 (and its multi-objective extension in Part 4) treats all categories equally: no single category’s proportion can deviate from its target by more than \(\epsilon\). But what if you are willing to accept a slightly larger error in one category, as long as other categories compensate?
Constraining the mean total error (MTE) instead of the per-category maximum allows this trade-off. This leads to a mixed-integer linear program (MILP) that is strictly more flexible than the per-category ILP.
8.2 Mean Total Error
The mean total error across all response categories (Cortés‐Ríos et al., 2025) is defined as:
\[ \text{MTE} = \frac{1}{C} \sum_c \left| \frac{N_c}{N_{\text{total}}} - p_c \right| \]
where \(C\) is the number of categories. This measures the average absolute deviation between the observed and target proportions.
The total variation distance between two discrete probability distributions is a standard metric defined as:
\[ \text{TV} = \frac{1}{2} \sum_c |q_c - p_c| \]
where \(q_c\) and \(p_c\) are the observed and target proportions. TV is always in \([0, 1]\) regardless of the number of categories.
The mean total error is simply a rescaled version: \(\text{MTE} = 2 \cdot \text{TV} / C\). In particular, this implies that the MTE is always in \([0, 2/C]\).
8.3 Mathematical Formulation
As in Part 2, we maximize the number of selected VPs. But instead of per-category constraints, we bound the mean total error using auxiliary count deviation variables.
Using the MTE directly as a constraint is problematic: \(N_c / N_{\text{total}}\) is a ratio of decision variables, making the problem a nonlinear fractional program. Instead, we work with count deviations:
\[ \delta_c = |N_c - p_c \cdot N_{\text{total}}| \]
Since \(N_c - p_c \cdot N_\text{total} = \sum_{j \in c} x_j - p_c \sum_j x_j = \sum_j (\mathbb{1}_{j \in c} - p_c) \cdot x_j\), this is a linear function of the decision variables \(x_j\). The absolute value can be linearized with auxiliary variables \(\delta_c \geq 0\) and two-sided inequality constraints:
\[ \delta_c \geq N_c - p_c \cdot N_\text{total}, \qquad \delta_c \geq p_c \cdot N_\text{total} - N_c \]
The total count deviation \(D = \sum_c \delta_c\) is related to MTE by:
\[ D = C \cdot \text{MTE} \cdot N_\text{total} \]
so the constraint \(\text{MTE} \leq \epsilon\) is equivalent to:
\[ \sum_c \delta_c \leq C \cdot \epsilon \cdot N_\text{total} \]
Both formulations use \(\epsilon\) but with different interpretations:
- The ILP (Part 3) enforces \(|q_c - p_c| \leq \epsilon\) for every category, which guarantees \(\text{MTE} \leq \epsilon\).
- The MILP enforces \(\text{MTE} \leq \epsilon\) directly, allowing individual categories to exceed \(\epsilon\) as long as the average stays within the bound.
Since the ILP’s feasible set is a subset of the MILP’s, the MILP can always select at least as many VPs.
8.4 Implementation
We implement the MILP with the same \(\epsilon = 0.05\) used in Part 3:
model_milp = Model(HiGHS.Optimizer)
set_silent(model_milp)
set_attribute(model_milp, "presolve", "on")
set_time_limit_sec(model_milp, 60.0)
# Decision variables
@variable(model_milp, x_milp[1:nrow(vpop)], Bin)
@variable(model_milp, δ[1:nrow(target_distribution)] >= 0)
# Objective: maximize number of selected VPs
N_total = sum(x_milp)
@objective(model_milp, Max, N_total)
# Count deviation constraints (linearized absolute values)
for (δ_c, row) in zip(δ, eachrow(target_distribution))
p_c = row.pct / 100
N_c = sum(x_milp[vpop.response .== row.response])
@constraint(model_milp, δ_c >= N_c - p_c * N_total)
@constraint(model_milp, δ_c >= p_c * N_total - N_c)
end
# Upper bound on mean total error: MTE ≤ ε
# Since sum(δ) = C · MTE · N_total, this is equivalent to: sum(δ) ≤ C · ε · N_total
@constraint(model_milp, sum(δ) <= nrow(target_distribution) * epsilon * N_total)
model_milpA JuMP Model
├ solver: HiGHS
├ objective_sense: MAX_SENSE
│ └ objective_function_type: AffExpr
├ num_variables: 2004
├ num_constraints: 2013
│ ├ JuMP.AffExpr in MOI.GreaterThan{Float64}: 8
│ ├ JuMP.AffExpr in MOI.LessThan{Float64}: 1
│ ├ JuMP.VariableRef in MOI.GreaterThan{Float64}: 4
│ └ JuMP.VariableRef in MOI.ZeroOne: 2000
└ Names registered in the model
└ :x_milp, :δ
optimize!(model_milp)assert_is_solved_and_feasible(model_milp)8.5 Results and Comparison with ILP
milp_selected = findall(xi -> value(xi) == 1, x_milp)
milp_vpop = vpop[milp_selected, :]milp_distribution = @by milp_vpop :response begin
:pct = 100 * length(:response) / nrow(milp_vpop)
end
milp_comparison_df = leftjoin(target_distribution, milp_distribution; on = :response, renamecols = "_target" => "_vpop")
@rtransform! milp_comparison_df :pct_vpop = @coalesce(:pct_vpop, 0.0)
@rtransform! milp_comparison_df :pct_diff = :pct_vpop - :pct_target
simple_table(
milp_comparison_df,
[:response => "Response", :pct_vpop => "Vpop (%)", :pct_target => "Target (%)", :pct_diff => "Difference (%)"]
)| Response | Vpop (%) | Target (%) | Difference (%) |
| CR | 15 | 25 | -9.98 |
| PR | 40.3 | 40 | 0.256 |
| SD | 20.1 | 20 | 0.128 |
| PD | 24.6 | 15 | 9.6 |
milp_mte = mean(abs, milp_comparison_df.pct_diff) / 1000.04992012779552717
We compare the ILP and MILP solutions:
ilp_mte = mean(abs, ilp_comparison_df.pct_diff) / 100
milp_max_error = maximum(abs, milp_comparison_df.pct_diff) / 100
summary_df = DataFrame(
method = ["ILP (max. absolute error ≤ $epsilon)", "MILP (mean total error ≤ $epsilon)"],
nselected = [nrow(ilp_vpop), nrow(milp_vpop)],
max_error = [ilp_max_error, milp_max_error],
mte = [ilp_mte, milp_mte],
)
simple_table(
summary_df,
[
:method => "Method",
:nselected => "# selected VPs",
:max_error => "Max. absolute error",
:mte => "Mean total error",
]
)| Method | # selected VPs | Max. absolute error | Mean total error |
| ILP (max. absolute error ≤ 0.05) | 235 | 0.05 | 0.0484 |
| MILP (mean total error ≤ 0.05) | 313 | 0.0998 | 0.0499 |
The MILP selects at least as many virtual patients as the ILP, because its feasible space is strictly larger. The inter-category trade-off allows the solver to find calibrated populations that per-category constraints would reject — but note that the maximum per-category error may exceed \(\epsilon\).
8.6 Multi-Objective Optimization with MTE
Just as we used multi-objective optimization in Part 4 to explore the per-category error trade-off, we can do the same for the MTE-based formulation. Instead of a single \(\Delta\) variable, we use per-category count deviations \(\delta_c\) and minimize their sum \(D = \sum_c \delta_c\) as the second objective:
8.6.1 From MTE to Count Deviations
We cannot use MTE directly as an optimization objective because it involves a ratio of decision variables. Instead, we minimize the total count deviation \(D = \sum_c \delta_c\), which is related to MTE by \(D = C \cdot \text{MTE} \cdot N_\text{total}\).
8.6.2 Recovering the MTE-Pareto Front
If solution \(T\) dominates \(S\) in \((N_\text{total}, D)\) space (i.e., \(N_T \geq N_S\) and \(D_T \leq D_S\), with at least one strict inequality), then:
\[ \text{MTE}_T = \frac{D_T}{C \cdot N_T} \leq \frac{D_S}{C \cdot N_S} = \text{MTE}_S \]
This holds because \(D_T \leq D_S\) and \(\frac{1}{N_T} \leq \frac{1}{N_S}\), so their product satisfies \(\frac{D_T}{N_T} \leq \frac{D_S}{N_S}\).
Consequence: every MTE-Pareto-optimal solution is also \(D\)-Pareto-optimal. We can therefore find the \(D\)-Pareto front efficiently (as a MILP), compute \(\text{MTE} = D / (C \cdot N_\text{total})\) for each solution, and post-filter for MTE-dominance to recover the exact MTE-Pareto front.
moa_mte_model = Model(() -> MOA.Optimizer(HiGHS.Optimizer))
set_attribute(moa_mte_model, MOA.Algorithm(), MOA.EpsilonConstraint())
set_attribute(moa_mte_model, MOA.SolutionLimit(), 50)
set_silent(moa_mte_model)
# Binary selection variables
@variable(moa_mte_model, y_mte[1:nrow(vpop)], Bin)
# Continuous count deviation variables (one per category)
@variable(moa_mte_model, δ_mte[1:nrow(target_distribution)] >= 0)
# Total selected VPs
N_total = sum(y_mte)
# Bi-objective: minimize [-N_total, D]
# (negate N_total since JuMP requires a uniform optimization sense)
@objective(moa_mte_model, Min, [-N_total, sum(δ_mte)])
# Count deviation constraints
for (δ_c, row) in zip(δ_mte, eachrow(target_distribution))
p_c = row.pct / 100
N_c = sum(y_mte[vpop.response .== row.response])
@constraint(moa_mte_model, δ_c >= N_c - p_c * N_total)
@constraint(moa_mte_model, δ_c >= p_c * N_total - N_c)
end
moa_mte_modelA JuMP Model
├ solver: MOA[algorithm=MultiObjectiveAlgorithms.EpsilonConstraint, optimizer=HiGHS]
├ objective_sense: MIN_SENSE
│ └ objective_function_type: Vector{AffExpr}
├ num_variables: 2004
├ num_constraints: 2012
│ ├ JuMP.AffExpr in MOI.GreaterThan{Float64}: 8
│ ├ JuMP.VariableRef in MOI.GreaterThan{Float64}: 4
│ └ JuMP.VariableRef in MOI.ZeroOne: 2000
└ Names registered in the model
└ :y_mte, :δ_mte
optimize!(moa_mte_model)assert_is_solved_and_feasible(moa_mte_model)8.7 Extracting and Filtering the MTE-Pareto Front
n_mte_solutions = result_count(moa_mte_model)
moa_mte_solutions_df = map(1:n_mte_solutions) do i
obj = objective_value(moa_mte_model; result = i)
nselected = round(Int, -obj[1])
count_dev = obj[2]
# Recover MTE from count deviation: MTE = D / (C · N_total)
mte = nselected > 0 ? count_dev / (nrow(target_distribution) * nselected) : NaN
(; nselected, mte)
end |> DataFrame# Remove MTE-dominated solutions
sort!(moa_mte_solutions_df, [order(:nselected; rev = true), :mte])
moa_mte_solutions_df.pareto_optimal .= false
moa_mte_solutions_df.pareto_optimal[begin] = true
dominating_error = moa_mte_solutions_df.mte[begin]
for row in eachrow(moa_mte_solutions_df)
if row.mte < dominating_error
row.pareto_optimal = true
dominating_error = row.mte
end
end
sort!(moa_mte_solutions_df, :nselected; rev = true)
simple_table(
moa_mte_solutions_df,
[
:nselected => "# selected VPs",
:mte => "Mean total error",
:pareto_optimal => "Pareto-optimal",
],
)| # selected VPs | Mean total error | Pareto-optimal |
| 2000 | 0.204 | true |
| 1966 | 0.201 | true |
| 1928 | 0.199 | true |
| 1890 | 0.196 | true |
| 1852 | 0.194 | true |
| 1814 | 0.191 | true |
| 1776 | 0.188 | true |
| 1738 | 0.185 | true |
| 1700 | 0.182 | true |
| 1662 | 0.179 | true |
| 1624 | 0.175 | true |
| 1586 | 0.172 | true |
| 1548 | 0.168 | true |
| 1510 | 0.164 | true |
| 1472 | 0.16 | true |
| 1434 | 0.156 | true |
| 1396 | 0.151 | true |
| 1358 | 0.146 | true |
| 1320 | 0.141 | true |
| 1282 | 0.135 | true |
| 1244 | 0.13 | true |
| 1206 | 0.124 | true |
| 1168 | 0.117 | true |
| 1130 | 0.11 | true |
| 1092 | 0.103 | true |
| 1054 | 0.103 | true |
| 1016 | 0.102 | true |
| 978 | 0.101 | true |
| 940 | 0.1 | true |
| 902 | 0.0989 | true |
| 864 | 0.0978 | true |
| 826 | 0.0965 | true |
| 788 | 0.0952 | true |
| 750 | 0.0937 | true |
| 712 | 0.092 | true |
| 674 | 0.0901 | true |
| 636 | 0.0881 | true |
| 598 | 0.0857 | true |
| 560 | 0.083 | true |
| 522 | 0.08 | true |
| 484 | 0.0764 | true |
| 446 | 0.0723 | true |
| 408 | 0.0674 | true |
| 370 | 0.0615 | true |
| 332 | 0.0542 | true |
| 294 | 0.0451 | true |
| 256 | 0.0332 | true |
| 218 | 0.0172 | true |
| 180 | 0 | true |
8.7.1 Visualization
We visualize the MTE-Pareto front and overlay the MILP solution:
specs = (
data(@rsubset(moa_mte_solutions_df, :pareto_optimal)) * mapping(
color = direct("Pareto front (mean total error ≤ $epsilon)")
) +
data((; mte = [milp_mte], nselected = [nrow(milp_vpop)])) * mapping(
color = direct("MILP solution (mean total error ≤ $epsilon)")
)
) * mapping(
:mte => "Mean total error", :nselected => "# selected VPs"
) * visual(Scatter)
draw(specs)The MTE-Pareto front shows the trade-off when inter-category trade-offs are allowed. The MILP solution from earlier in this section is a single point on (or near) this front. Comparing with the max-displacement Pareto front from Part 4 shows how the two different error metrics lead to different trade-off landscapes.
9 Part 6: Practical Recommendations
The calibration approaches offer different trade-offs:
- Per-category ε constraints (Part 3): Simple, interpretable, and guarantees that each response category individually matches the target within ε. Use this when per-category accuracy matters (e.g., matching the CR rate within 5%).
- Multi-objective with max displacement (Part 4): Finds the full Pareto front of population size vs. maximum per-category error without choosing ε a priori. Use this to explore the trade-off space before committing to a tolerance.
- MILP with MTE bound (Part 5): More flexible, allows inter-category trade-offs while bounding the overall distribution error. This is the recommended approach for VCT simulations (see Tutorial 6).
- Multi-objective with MTE (Part 5): Explores the MTE-based trade-off without choosing ε a priori.
When using per-category ε constraints (Part 3) or MTE-bounded MILP (Part 5), start with ε = 0.10 (10% tolerance). Note that ε has different interpretations: the ILP guarantees \(|q_c - p_c| \leq \epsilon\) per category, while the MILP enforces \(\text{MTE} \leq \epsilon\) on average:
| Epsilon | Effect |
|---|---|
| 0.05 (5%) | Very strict, fewer VPs selected |
| 0.10 (10%) | Good balance (recommended) |
| 0.15-0.20 | More VPs, acceptable for preliminary work |
| > 0.25 | Too loose, poor distribution matching |
If the solver returns INFEASIBLE:
- Increase epsilon from 0.10 to 0.15 or 0.20
- Check target achievability - do you have VPs in all response categories?
- Merge sparse categories - combine SD and PD if both are rare
- Increase Vpop size - generate more VPs to have more options
If your Vpop has very few VPs in a target category (e.g., only 2% CR but target is 25% CR):
- Calibration may fail or reject most VPs
- Consider: Is the Vpop generation realistic?
- Solution: Adjust population parameters or use importance sampling
10 Summary
10.1 Key Concepts
Calibration ensures realistic virtual populations that match clinical response distributions
ILP optimization provides precise control over distribution matching through binary selection variables and per-category constraints
Multi-objective optimization with max displacement finds the full Pareto front of population size vs. maximum per-category error without pre-specifying ε
MILP with MTE bound allows inter-category trade-offs by constraining the mean total error rather than individual categories
Multi-objective optimization with MTE explores the MTE-based trade-off using count-deviation objectives and MTE post-filtering
10.2 Quick Reference: Steps
- Generate virtual population (Tutorial 4)
- Simulate responses for each VP
- Define target response distribution
- Choose epsilon and formulation (ILP, MILP, or multi-objective)
- Run calibration
- Validate: compare calibrated vs target distributions
- Save calibrated vpop for VCT (Tutorial 6)
10.3 What’s Next?
In Tutorial 6: VCT Simulation, we’ll use the calibrated virtual population to run a complete Virtual Clinical Trial, comparing treatment effects and analyzing endpoints.