Structural Identifiability Analysis

Can We Actually Estimate These Parameters?

Authors

Saugandhika Minnikanti

Vijay Ivaturi

David Müller-Widmann

Tumor Burden Tutorial Series

This is Tutorial 3 of 6 in the Tumor Burden series.

#	Tutorial	Status
1	Model Introduction	✓ Complete
2	Global Sensitivity Analysis	✓ Complete
3	Structural Identifiability (current)	← You are here
4	Copula Vpop Generation
5	MILP Calibration
6	VCT Simulation

1 Learning Outcomes

Before attempting parameter estimation in pharmacometric models, it is essential to verify that parameters can actually be uniquely determined from the available data. This prerequisite check is called structural identifiability analysis, and is a critical early step in the In Silico Clinical Trial (ISCT) workflow (Cortés-Ríos et al., 2025).

In this tutorial, you will learn:

What structural identifiability means and why it matters for NLME modeling
The three classifications: globally identifiable, locally identifiable, and non-identifiable
How to use StructuralIdentifiability.jl to analyze Pumas models
How different measurement scenarios affect identifiability
How to interpret identifiability results and what actions to take
How to find identifiable parameter combinations for reparameterization

Prerequisites

This tutorial uses the tumor burden model introduced in Tutorial 1. You should be familiar with:

The biological meaning of parameters f, g, and k
The two-population model structure (sensitive vs resistant cells)
Basic Pumas model syntax (@param, @dynamics, @derived)

2 Background

2.1 The Problem: Can We Actually Estimate These Parameters?

Imagine you’ve built a pharmacometric model with several parameters. You collect clinical data, run your estimation algorithm, and… it doesn’t converge. Or worse, it converges to wildly different values depending on your initial estimates.

Why does this happen?

Often, the issue isn’t your data quality or your algorithm— it’s that some parameters are fundamentally impossible to estimate from the available measurements. This is a property of the model structure itself, and it’s called structural identifiability.

The Key Question

Structural identifiability asks: Given perfect, noise-free data measured over infinite time, can each parameter be uniquely determined?

If the answer is “no” for some parameters, then no amount of data or sophisticated algorithms will help— you need to change your model or add measurements.

2.2 What is Structural Identifiability?

Structural identifiability analyzes whether model parameters can be uniquely estimated from input-output data under ideal conditions:

Perfect (noise-free) observations — no measurement error
Infinite time horizon — unlimited observation duration
Known model structure — the equations are correct
Known initial conditions (or treated as parameters)

This is a theoretical property of the model structure, independent of actual data quality.

2.3 The Three Classifications

Every parameter falls into one of three identifiability classes:

Classification	Symbol	What It Means	Practical Impact
Globally identifiable	`globally`	Exactly one parameter value produces the observed output	Safe to estimate — unique solution exists
Locally identifiable	`locally`	A finite number of values produce the same output	Estimation depends on initial values — may find wrong solution
Non-identifiable	`nonidentifiable`	Infinitely many values produce identical output	Cannot be estimated — must fix or reparameterize

2.4 A Simple Analogy

Think of identifiability like solving equations:

Globally identifiable: x + 3 = 5 → Only one solution: x = 2
Locally identifiable: x² = 4 → Two solutions: x = 2 or x = -2
Non-identifiable: x + y = 5 with only this equation → Infinite solutions

In pharmacometrics, the “equations” are your ODEs, and the “solutions” are your parameter values.

2.5 Why Check Identifiability Before Estimation?

Common Pitfalls Without Identifiability Analysis

Estimation fails to converge — optimizer wanders in flat likelihood regions
Different initial values give different final estimates — locally identifiable parameters
Parameter confidence intervals are enormous — sign of non-identifiability
Biologically implausible estimates — compensating for unidentifiable parameters
Model predictions are good but parameters are wrong — many parameter sets give same output

3 Libraries

Load the necessary packages for this tutorial.

using Pumas
using ModelingToolkit
using StructuralIdentifiability
using AlgebraOfGraphics
using CairoMakie
using DataFramesMeta
using SummaryTables

import Logging

What each package does:

Pumas: Pharmacometric modeling and simulation
StructuralIdentifiability: The core analysis package — implements algorithms to determine identifiability (Dong et al., 2023)
ModelingToolkit: Symbolic math framework — needed to define measurement equations
AlgebraOfGraphics + CairoMakie: Visualization — for creating publication-quality plots
SummaryTables: Clean rendering of data frames in documents

4 Part 1: The Tumor Burden Model

Let’s start with the 3-parameter tumor burden model from cancer chemotherapy. This model is small enough to understand completely, making it perfect for learning identifiability concepts.

4.1 The Clinical Scenario

A patient receives chemotherapy for lung cancer. We measure tumor size over time using imaging (CT scans). The tumor initially shrinks (sensitive cells die), but eventually regrows (resistant cells take over).

Question: From tumor size measurements alone, can we estimate all the model parameters?

4.2 Model Definition

As introduced in Tutorial 1, the tumor burden model divides tumor cells into two populations:

flowchart LR
    subgraph Tumor["Total Tumor: Nt = N_sens + N_res"]
        direction TB
        N_sens["N_sens<br/>(sensitive cells)<br/>Initial: f × N₀"]
        N_res["N_res<br/>(resistant cells)<br/>Initial: (1-f) × N₀"]
    end

    N_sens -->|"dies at rate k"| Death["☠️ Death"]
    N_res -->|"grows at rate g"| Growth["📈 Growth"]

    Tumor -.->|"What we measure"| Obs["🔬 Observed:<br/>Nt"]

    style N_sens fill:#ffcccc,stroke:#cc0000
    style N_res fill:#ccffcc,stroke:#00cc00
    style Obs fill:#cce5ff,stroke:#0066cc

tumor_burden_model = @model begin
    @metadata begin
        desc = "Tumor Burden Model for NSCLC Chemotherapy"
        timeu = u"d"  # Time unit is days
    end

    @param begin
        # Typical values (population parameters)
        tvf ∈ RealDomain(lower = 0.0, upper = 1.0)
        tvg ∈ RealDomain(lower = 0.0)
        tvk ∈ RealDomain(lower = 0.0)

        # Inter-individual variability (diagonal covariance matrix)
        # Values on the transformed scale (logit for f, log for g and k)
        Ω ∈ PDiagDomain(3)

        # Residual error (standard deviation)
        σ ∈ RealDomain(lower = 0.0)
    end

    @random begin
        # Random effects vector (one per parameter)
        η ~ MvNormal(Ω)
    end

    @covariates treatment

    @pre begin
        # Transform random effects to individual parameters
        # f: logit-normal transformation (bounded 0-1)
        f = logistic(logit(tvf) + η[1])
        # g: log-normal transformation (positive value)
        g = tvg * exp(η[2])
        # k: log-normal transformation (positive value)
        # Additionally, treatment affects death rate:
        # treatment = false → control arm (no killing)
        # treatment = true → treatment arm (killing at rate tvk * exp(η[3]))
        k = treatment * tvk * exp(η[3])
    end

    # Initial conditions (normalized tumor = 1)
    @init begin
        N_sens = f           # Sensitive cells
        N_res = 1 - f        # Resistant cells
    end

    @dynamics begin
        # Sensitive cells: die at rate k (0 if control)
        N_sens' = -k * N_sens

        # Resistant cells: grow at rate g
        N_res' = g * N_res
    end

    @derived begin
        # Total normalized tumor size
        Nt := @. N_sens + N_res

        # Observation with residual error
        tumor_size ~ @. Normal(Nt, σ)
    end
end

PumasModel
  Parameters: tvf, tvg, tvk, Ω, σ
  Random effects: η
  Covariates: treatment
  Dynamical system variables: N_sens, N_res
  Dynamical system type: Matrix exponential
  Derived: tumor_size
  Observed: tumor_size

Understanding the Model

The @dynamics block defines our ODEs:

N_sens' = -k * N_sens: Sensitive cells decrease exponentially with rate k
N_res' = g * N_res: Resistant cells grow exponentially with rate g

The @derived block defines what we observe: total tumor = sensitive + resistant cells.

5 Part 2: Running Identifiability Analysis

5.1 Step 1: Extract the ODE System

Pumas models contain an underlying ModelingToolkit ODE system. This is what we’ll analyze for identifiability.

ode_system = tumor_burden_model.sys

Model ##Pumas#232:
Equations (2):
  2 standard: see equations(##Pumas#232)
Unknowns (2): see unknowns(##Pumas#232)
  N_sens(t)
  N_res(t)
Parameters (3): see parameters(##Pumas#232)
  f
  g
  k

What happened: We extracted the symbolic ODE system from the Pumas model. This contains all the equations in a form that StructuralIdentifiability.jl can analyze mathematically.

5.2 Step 2: Define What We Can Measure

In clinical practice, we can only measure certain quantities. The choice of measurements directly affects which parameters are identifiable.

Key Insight: Measurements Determine Identifiability

The same model can have different identifiability properties depending on what you measure. Adding a new biomarker or measurement can make previously unidentifiable parameters identifiable.

Inputs (Doses) Also Affect Identifiability

Which compartments receive inputs (doses) can also affect structural identifiability. In this tutorial, we do not consider inputs. To include them, you would define a separate ModelingToolkit ODE system where dose terms appear as variables that are neither ODE states nor outputs — StructuralIdentifiability.jl automatically treats these as known inputs. See the StructuralIdentifiability.jl documentation for details.

We create symbolic variables for measurements. As the existing variables in the ODE system, they depend on time.

t = ModelingToolkit.independent_variable(ode_system)
@variables Nt(t) N_sens_obs(t) N_res_obs(t)

Here, @variables Nt(t) creates a symbolic variable Nt that depends on time variable t, the independent variable already present in the ODE system.

We consider two measurement scenarios:

In the first scenario, only total tumor burden is measured (most common clinically):

measured_total_only = [Nt ~ ode_system.N_sens + ode_system.N_res]

Nt ~ ode_system.N_sens + ode_system.N_res defines the measurement equation: “Nt equals the sum of N_sens and N_res”

The ~ operator means “equals” in measurement equations. It connects the name of what we observe (left side) and the model expression being measured (right side).

In the second scenario, both populations of treatment-sensitive and treatment-resistant cells are measured separately (e.g., with biomarkers):

measured_both = [
    N_sens_obs ~ ode_system.N_sens,
    N_res_obs ~ ode_system.N_res,
]

5.3 Step 3: Run Global Identifiability Analysis

Now we ask the key question: Can we estimate f, g, and k from total tumor measurements?

result_total = assess_identifiability(
    ode_system;
    measured_quantities = measured_total_only,
    loglevel = Logging.Warn,
)

OrderedCollections.OrderedDict{SymbolicUtils.BasicSymbolic{Real}, Symbol} with 5 entries:
  N_sens(t) => :locally
  N_res(t)  => :locally
  f         => :nonidentifiable
  g         => :locally
  k         => :locally

6 Part 3: Interpreting the Results

6.1 Understanding the Output

Looking at the output above:

Parameter	Status	What It Means
f	`nonidentifiable`	The sensitive fraction CANNOT be estimated from total tumor data alone
g	`locally`	The growth rate has a finite number of possible values (ambiguous)
k	`locally`	The death rate has a finite number of possible values (ambiguous)
N_sens(t)	`locally`	The sensitive cell trajectory is locally identifiable
N_res(t)	`locally`	The resistant cell trajectory is locally identifiable

The key finding: When we only measure total tumor burden, the parameter f (sensitive fraction) is completely non-identifiable! This means infinitely many values of f would produce the exact same tumor size curve.

6.2 Why is f Non-Identifiable?

Think about it intuitively: If we only see total tumor size, we can’t distinguish between:

80% sensitive cells dying fast, 20% resistant cells growing slow
60% sensitive cells dying medium, 40% resistant cells growing medium

Many combinations of (f, g, k) give the same total tumor curve!

6.3 What About Measuring Both Populations?

Let’s see if measuring both populations separately helps:

result_both = assess_identifiability(
    ode_system;
    measured_quantities = measured_both,
    loglevel = Logging.Warn,
)

OrderedCollections.OrderedDict{SymbolicUtils.BasicSymbolic{Real}, Symbol} with 5 entries:
  N_sens(t) => :globally
  N_res(t)  => :globally
  f         => :nonidentifiable
  g         => :globally
  k         => :globally

Dramatic Improvement

Look at how the results changed:

Parameter	Total Only	Both Populations
f	❌ nonidentifiable	❌ nonidentifiable
g	⚠️ locally	✅ globally
k	⚠️ locally	✅ globally
N_sens(t)	⚠️ locally	✅ globally
N_res(t)	⚠️ locally	✅ globally

Key insight: By measuring both cell populations (perhaps using flow cytometry or specific biomarkers), we go from locally identifiable to globally identifiable for g and k!

But wait: Parameter f is STILL non-identifiable! Why?

6.4 Why is f Still Non-Identifiable?

The parameter f represents the initial fraction of sensitive cells. But look at our initial conditions:

N_sens(0) = f
N_res(0) = 1 - f

If we measure both N_sens and N_res at time 0, we directly observe f! The issue is that f isn’t really a dynamic parameter — it’s embedded in the initial conditions. The analysis treats initial conditions and parameters differently.

In practice, you would either:

Estimate f directly from the first measurement
Treat it as a known value
Reparameterize so initial conditions become explicit parameters

7 Part 4: Finding Identifiable Combinations

When individual parameters aren’t identifiable, we can find combinations of parameters that ARE identifiable. This is extremely useful for reparameterization.

find_identifiable_functions(
    ode_system;
    measured_quantities = measured_total_only,
    loglevel = Logging.Warn,
)

2-element Vector{Num}:
 g - k
   g*k

Interpreting Identifiable Functions

The output shows:

g - k: The difference between growth and death rates is identifiable
g × k: The product of growth and death rates is identifiable

What this means practically:

Even though we can’t estimate g and k individually from total tumor data, we CAN estimate:

The net rate difference (g - k): How much faster resistant cells grow than sensitive cells die. If positive, tumor eventually grows; if negative, it shrinks.
The product g × k: From this, we can derive the geometric mean √(g×k), which represents a “typical” rate between g and k.

Example: If the true values were g = 0.04/day and k = 0.16/day:

Product: g × k = 0.0064/day² (directly identifiable)
Geometric mean: √(0.0064) = 0.08/day (derived from product)
Difference: g - k = -0.12/day (tumor shrinks overall)

Clinical interpretation: We can determine the NET tumor behavior (is it overall growing or shrinking?) but not the individual rates.

Reparameterization strategy: Instead of estimating (g, k), estimate (g-k, g×k) and back-calculate if needed. Or fix one rate to a literature value and estimate the other.

8 Part 5: Visualization

Let’s create visualizations to communicate identifiability findings clearly.

8.1 Compile Results into DataFrames

# Tumor burden results
result_total_df = DataFrame(
    parameter = [string(p) for p in keys(result_total)],
    status = map(string, values(result_total))
)
result_both_df = DataFrame(
    parameter = [string(p) for p in keys(result_both)],
    status = map(string, values(result_both))
)
tumor_results = outerjoin(
    result_total_df,
    result_both_df;
    on = :parameter,
    renamecols = "_total" => "_both",
)

simple_table(
    tumor_results,
    [
        :parameter => "Parameter",
        :status_total => "Status (total only)",
        :status_both => "Status (both populations)",
    ],
)


Parameter	Status (total only)	Status (both populations)

N_sens(t)	locally	globally
N_res(t)	locally	globally
f	nonidentifiable	nonidentifiable
g	locally	globally
k	locally	globally

8.2 Identifiability Status Chart

long_tumor_results = vcat(
    result_total_df,
    result_both_df;
    source = :scenario => [:total, :both],
)
specs = data(long_tumor_results) * mapping(
    :parameter => "Parameter",
    :scenario => renamer(:both => "Both populations", :total => "Total only") => "Measurement scenario",
    color = :status => renamer(
        "globally" => "Globally identifiable",
        "locally" => "Locally identifiable",
        "nonidentifiable" => "Non-identifiable",
    ) => "Status",
) * visual(Scatter; markersize = 30, marker = :rect)
draw(
    specs,
    scales(Color = (; palette = [:green, :orange, :red]));
    axis = (; xticklabelrotation = π / 4, yticklabelrotation = π / 2),
)

Tumor Burden Model: How measurement choice affects identifiability. Green = globally identifiable (unique solution), Orange = locally identifiable (finite solutions), Red = non-identifiable (infinite solutions).

Reading the Chart

Row 1 (Total Only): When we only measure total tumor size:

f is non-identifiable (red) — cannot be estimated
g, k, state variables are locally identifiable (orange) — finite ambiguity

Row 2 (Both Populations): When we measure both cell types:

f is STILL non-identifiable — it’s a structural issue
g, k, state variables become globally identifiable (green) — unique solutions!

Takeaway: Adding measurements dramatically improves identifiability for g and k.

8.3 Summary: Impact of Measurements

# Count parameters in each category, per scenario
tumor_results_counts = vcat(
    combine(groupby(result_both_df, :status), nrow => :count),
    combine(groupby(result_total_df, :status), nrow => :count);
    source = :scenario => [:total, :both],
)
status_mapping = :status => renamer(
    "globally" => "Globally identifiable",
    "locally" => "Locally identifiable",
    "nonidentifiable" => "Non-identifiable",
) => "Status"
specs = data(tumor_results_counts) * mapping(
    :scenario => renamer(:both => "Both populations", :total => "Total only") => "Measurement scenario",
    :count => "Number of parameters/states",
    stack = status_mapping,
    color = status_mapping,
) * visual(BarPlot)
draw(specs, scales(Color = (; palette = [:green, :orange, :red])))

Adding measurements increases the percentage of globally identifiable parameters.

9 Part 6: Practical Recommendations

Based on what we’ve learned, here are actionable guidelines for the tumor burden model and beyond.

1. Always Check Identifiability BEFORE Estimation

Run identifiability analysis as the first step after building your model, not as a debugging step when estimation fails. This saves enormous time and frustration.

Workflow:

Build model → Check identifiability → Fix issues → THEN estimate

2. Design Measurements for Identifiability

When planning a clinical study:

List the parameters you need to estimate
Run identifiability analysis with your planned measurements
If key parameters are non-identifiable, consider adding biomarkers
Balance cost/invasiveness against identifiability gains

Example: In the tumor burden model, adding biomarkers to distinguish sensitive vs resistant cells would make g and k globally identifiable.

3. Handle Non-Identifiable Parameters

When you find non-identifiable parameters, you have four options:

Strategy	When to Use	Example
Fix to literature value	Parameter well-characterized elsewhere	Fix k to published death rate
Reparameterize	Only combinations are identifiable	Estimate (g-k, g×k) instead of (g, k)
Add measurements	Feasible in study design	Add biomarker for cell types
Use informative Bayesian priors	Have prior knowledge	Strong prior on growth rate

4. Local Analysis First, Global Analysis to Confirm

For complex models (>5 parameters, >3 state variables):

Start with local identifiability — fast screening
If locally identifiable — follow up with global analysis for complete picture
If locally non-identifiable — definitely non-identifiable globally, fix first

This saves computational time on large models.

5. Structural vs Practical Identifiability

Remember: Structural identifiability is necessary but not sufficient! (Wieland et al., 2021)

Type	Assumes	Answers
Structural	Perfect data, infinite time	Can parameters theoretically be identified?
Practical	Real data, finite samples	Will parameters be well-estimated in practice?

Even structurally identifiable parameters may be practically non-identifiable due to:

Limited data
Poor sampling times
High measurement noise
Parameter values near boundaries

Always follow up with practical identifiability (FIM, profile likelihood, bootstrap) on your actual data (Raue et al., 2009).

10 Summary

10.1 Key Takeaways

Structural identifiability tells you whether parameters CAN be estimated, assuming perfect data
Three classifications matter:
- Globally identifiable: Safe to estimate (unique solution)
- Locally identifiable: Estimation depends on starting values (finite solutions)
- Non-identifiable: Cannot estimate (infinite solutions)
Measurements determine identifiability: The same model can be identifiable or not depending on what you observe
When parameters aren’t identifiable individually, look for identifiable combinations (reparameterization)
Always check BEFORE estimation to avoid wasted effort on impossible problems

10.2 Tumor Burden Model Findings

Finding	Practical Implication
`f` non-identifiable from total tumor	Can’t separate sensitive/resistant fractions without biomarkers
`g-k` and `g×k` identifiable	Can estimate net tumor behavior, not individual rates
Measuring both populations helps	Biomarkers for cell types dramatically improve estimation

10.3 Quick Reference: Key Functions

Function	Purpose
`model.sys`	Extract ODESystem from Pumas model
`@variables x(t)`	Define symbolic measurement variable
`assess_identifiability()`	Global analysis (globally/locally/non)
`assess_local_identifiability()`	Local analysis (identifiable yes/no)
`find_identifiable_functions()`	Find identifiable parameter combinations

10.4 What’s Next?

In Tutorial 4: Copula Vpop Generation, we’ll learn how to generate realistic virtual populations using Gaussian copulas that preserve the parameter correlations from NLME estimation.

10.5 References

Cortés-Ríos, J., Magee, M., Sher, A., Jusko, W. J., & Desikan, R. (2025). A step-by-step workflow for performing in silico clinical trials with nonlinear mixed effects models. CPT: Pharmacometrics & Systems Pharmacology. https://doi.org/10.1002/psp4.70122

Dong, R., Goodbrake, C., Harrington, H. A., & Pogudin, G. (2023). StructuralIdentifiability.jl: Structural identifiability analysis of differential equation models in julia. Journal of Open Source Software, 8(90), 5610. https://doi.org/10.21105/joss.05610

Raue, A., Kreutz, C., Maiwald, T., Bachmann, J., Schilling, M., Klingmüller, U., & Timmer, J. (2009). Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics, 25(15), 1923–1929. https://doi.org/10.1093/bioinformatics/btp358

Wieland, F.-G., Hauber, A. L., Rosenblatt, M., Tönsing, C., & Timmer, J. (2021). On structural and practical identifiability. Current Opinion in Systems Biology, 25, 60–69. https://doi.org/10.1016/j.coisb.2021.03.005