Structural Identifiability Analysis

Can We Actually Estimate These Parameters?

Authors

Saugandhika Minnikanti

Vijay Ivaturi

David Müller-Widmann

NoteTumor Burden Tutorial Series

This is Tutorial 3 of 6 in the Tumor Burden series.

# Tutorial Status
1 Model Introduction ✓ Complete
2 Global Sensitivity Analysis ✓ Complete
3 Structural Identifiability (current) ← You are here
4 Copula Vpop Generation
5 MILP Calibration
6 VCT Simulation

1 Learning Outcomes

Before attempting parameter estimation in pharmacometric models, it is essential to verify that parameters can actually be uniquely determined from the available data. This prerequisite check is called structural identifiability analysis, and is a critical early step in the In Silico Clinical Trial (ISCT) workflow (Cortés-Ríos et al., 2025).

In this tutorial, you will learn:

  • What structural identifiability means and why it matters for NLME modeling
  • The three classifications: globally identifiable, locally identifiable, and non-identifiable
  • How to use StructuralIdentifiability.jl to analyze Pumas models
  • How different measurement scenarios affect identifiability
  • How to interpret identifiability results and what actions to take
  • How to find identifiable parameter combinations for reparameterization
TipPrerequisites

This tutorial uses the tumor burden model introduced in Tutorial 1. You should be familiar with:

  • The biological meaning of parameters f, g, and k
  • The two-population model structure (sensitive vs resistant cells)
  • Basic Pumas model syntax (@param, @dynamics, @derived)

2 Background

2.1 The Problem: Can We Actually Estimate These Parameters?

Imagine you’ve built a pharmacometric model with several parameters. You collect clinical data, run your estimation algorithm, and… it doesn’t converge. Or worse, it converges to wildly different values depending on your initial estimates.

Why does this happen?

Often, the issue isn’t your data quality or your algorithm— it’s that some parameters are fundamentally impossible to estimate from the available measurements. This is a property of the model structure itself, and it’s called structural identifiability.

ImportantThe Key Question

Structural identifiability asks: Given perfect, noise-free data measured over infinite time, can each parameter be uniquely determined?

If the answer is “no” for some parameters, then no amount of data or sophisticated algorithms will help— you need to change your model or add measurements.

2.2 What is Structural Identifiability?

Structural identifiability analyzes whether model parameters can be uniquely estimated from input-output data under ideal conditions:

  • Perfect (noise-free) observations — no measurement error
  • Infinite time horizon — unlimited observation duration
  • Known model structure — the equations are correct
  • Known initial conditions (or treated as parameters)

This is a theoretical property of the model structure, independent of actual data quality.

2.3 The Three Classifications

Every parameter falls into one of three identifiability classes:

Classification Symbol What It Means Practical Impact
Globally identifiable globally Exactly one parameter value produces the observed output Safe to estimate — unique solution exists
Locally identifiable locally A finite number of values produce the same output Estimation depends on initial values — may find wrong solution
Non-identifiable nonidentifiable Infinitely many values produce identical output Cannot be estimated — must fix or reparameterize

2.4 A Simple Analogy

Think of identifiability like solving equations:

  • Globally identifiable: x + 3 = 5 → Only one solution: x = 2
  • Locally identifiable: x² = 4 → Two solutions: x = 2 or x = -2
  • Non-identifiable: x + y = 5 with only this equation → Infinite solutions

In pharmacometrics, the “equations” are your ODEs, and the “solutions” are your parameter values.

2.5 Why Check Identifiability Before Estimation?

WarningCommon Pitfalls Without Identifiability Analysis
  1. Estimation fails to converge — optimizer wanders in flat likelihood regions
  2. Different initial values give different final estimates — locally identifiable parameters
  3. Parameter confidence intervals are enormous — sign of non-identifiability
  4. Biologically implausible estimates — compensating for unidentifiable parameters
  5. Model predictions are good but parameters are wrong — many parameter sets give same output

3 Libraries

Load the necessary packages for this tutorial.

using Pumas
using ModelingToolkit
using StructuralIdentifiability
using AlgebraOfGraphics
using CairoMakie
using DataFramesMeta
using SummaryTables

import Logging

What each package does:

  • Pumas: Pharmacometric modeling and simulation
  • StructuralIdentifiability: The core analysis package — implements algorithms to determine identifiability (Dong et al., 2023)
  • ModelingToolkit: Symbolic math framework — needed to define measurement equations
  • AlgebraOfGraphics + CairoMakie: Visualization — for creating publication-quality plots
  • SummaryTables: Clean rendering of data frames in documents

4 Part 1: The Tumor Burden Model

Let’s start with the 3-parameter tumor burden model from cancer chemotherapy. This model is small enough to understand completely, making it perfect for learning identifiability concepts.

4.1 The Clinical Scenario

A patient receives chemotherapy for lung cancer. We measure tumor size over time using imaging (CT scans). The tumor initially shrinks (sensitive cells die), but eventually regrows (resistant cells take over).

Question: From tumor size measurements alone, can we estimate all the model parameters?

4.2 Model Definition

As introduced in Tutorial 1, the tumor burden model divides tumor cells into two populations:

flowchart LR
    subgraph Tumor["Total Tumor: Nt = N_sens + N_res"]
        direction TB
        N_sens["N_sens<br/>(sensitive cells)<br/>Initial: f × N₀"]
        N_res["N_res<br/>(resistant cells)<br/>Initial: (1-f) × N₀"]
    end

    N_sens -->|"dies at rate k"| Death["☠️ Death"]
    N_res -->|"grows at rate g"| Growth["📈 Growth"]

    Tumor -.->|"What we measure"| Obs["🔬 Observed:<br/>Nt"]

    style N_sens fill:#ffcccc,stroke:#cc0000
    style N_res fill:#ccffcc,stroke:#00cc00
    style Obs fill:#cce5ff,stroke:#0066cc

tumor_burden_model = @model begin
    @metadata begin
        desc = "Tumor Burden Model for NSCLC Chemotherapy"
        timeu = u"d"  # Time unit is days
    end

    @param begin
        # Typical values (population parameters)
        tvf  RealDomain(lower = 0.0, upper = 1.0)
        tvg  RealDomain(lower = 0.0)
        tvk  RealDomain(lower = 0.0)

        # Inter-individual variability (diagonal covariance matrix)
        # Values on the transformed scale (logit for f, log for g and k)
        Ω  PDiagDomain(3)

        # Residual error (standard deviation)
        σ  RealDomain(lower = 0.0)
    end

    @random begin
        # Random effects vector (one per parameter)
        η ~ MvNormal(Ω)
    end

    @covariates treatment

    @pre begin
        # Transform random effects to individual parameters
        # f: logit-normal transformation (bounded 0-1)
        f = logistic(logit(tvf) + η[1])
        # g: log-normal transformation (positive value)
        g = tvg * exp(η[2])
        # k: log-normal transformation (positive value)
        # Additionally, treatment affects death rate:
        # treatment = false → control arm (no killing)
        # treatment = true → treatment arm (killing at rate tvk * exp(η[3]))
        k = treatment * tvk * exp(η[3])
    end

    # Initial conditions (normalized tumor = 1)
    @init begin
        N_sens = f           # Sensitive cells
        N_res = 1 - f        # Resistant cells
    end

    @dynamics begin
        # Sensitive cells: die at rate k (0 if control)
        N_sens' = -k * N_sens

        # Resistant cells: grow at rate g
        N_res' = g * N_res
    end

    @derived begin
        # Total normalized tumor size
        Nt := @. N_sens + N_res

        # Observation with residual error
        tumor_size ~ @. Normal(Nt, σ)
    end
end
PumasModel
  Parameters: tvf, tvg, tvk, Ω, σ
  Random effects: η
  Covariates: treatment
  Dynamical system variables: N_sens, N_res
  Dynamical system type: Matrix exponential
  Derived: tumor_size
  Observed: tumor_size
NoteUnderstanding the Model

The @dynamics block defines our ODEs:

  • N_sens' = -k * N_sens: Sensitive cells decrease exponentially with rate k
  • N_res' = g * N_res: Resistant cells grow exponentially with rate g

The @derived block defines what we observe: total tumor = sensitive + resistant cells.

5 Part 2: Running Identifiability Analysis

5.1 Step 1: Extract the ODE System

Pumas models contain an underlying ModelingToolkit ODE system. This is what we’ll analyze for identifiability.

ode_system = tumor_burden_model.sys
Model ##Pumas#232:
Equations (2):
  2 standard: see equations(##Pumas#232)
Unknowns (2): see unknowns(##Pumas#232)
  N_sens(t)
  N_res(t)
Parameters (3): see parameters(##Pumas#232)
  f
  g
  k

What happened: We extracted the symbolic ODE system from the Pumas model. This contains all the equations in a form that StructuralIdentifiability.jl can analyze mathematically.

5.2 Step 2: Define What We Can Measure

In clinical practice, we can only measure certain quantities. The choice of measurements directly affects which parameters are identifiable.

TipKey Insight: Measurements Determine Identifiability

The same model can have different identifiability properties depending on what you measure. Adding a new biomarker or measurement can make previously unidentifiable parameters identifiable.

WarningInputs (Doses) Also Affect Identifiability

Which compartments receive inputs (doses) can also affect structural identifiability. In this tutorial, we do not consider inputs. To include them, you would define a separate ModelingToolkit ODE system where dose terms appear as variables that are neither ODE states nor outputs — StructuralIdentifiability.jl automatically treats these as known inputs. See the StructuralIdentifiability.jl documentation for details.

We create symbolic variables for measurements. As the existing variables in the ODE system, they depend on time.

t = ModelingToolkit.independent_variable(ode_system)
@variables Nt(t) N_sens_obs(t) N_res_obs(t)

Here, @variables Nt(t) creates a symbolic variable Nt that depends on time variable t, the independent variable already present in the ODE system.

We consider two measurement scenarios:

In the first scenario, only total tumor burden is measured (most common clinically):

measured_total_only = [Nt ~ ode_system.N_sens + ode_system.N_res]

Nt ~ ode_system.N_sens + ode_system.N_res defines the measurement equation: “Nt equals the sum of N_sens and N_res

The ~ operator means “equals” in measurement equations. It connects the name of what we observe (left side) and the model expression being measured (right side).

In the second scenario, both populations of treatment-sensitive and treatment-resistant cells are measured separately (e.g., with biomarkers):

measured_both = [
    N_sens_obs ~ ode_system.N_sens,
    N_res_obs ~ ode_system.N_res,
]

5.3 Step 3: Run Global Identifiability Analysis

Now we ask the key question: Can we estimate f, g, and k from total tumor measurements?

result_total = assess_identifiability(
    ode_system;
    measured_quantities = measured_total_only,
    loglevel = Logging.Warn,
)
OrderedCollections.OrderedDict{SymbolicUtils.BasicSymbolic{Real}, Symbol} with 5 entries:
  N_sens(t) => :locally
  N_res(t)  => :locally
  f         => :nonidentifiable
  g         => :locally
  k         => :locally

6 Part 3: Interpreting the Results

6.1 Understanding the Output

Looking at the output above:

Parameter Status What It Means
f nonidentifiable The sensitive fraction CANNOT be estimated from total tumor data alone
g locally The growth rate has a finite number of possible values (ambiguous)
k locally The death rate has a finite number of possible values (ambiguous)
N_sens(t) locally The sensitive cell trajectory is locally identifiable
N_res(t) locally The resistant cell trajectory is locally identifiable

The key finding: When we only measure total tumor burden, the parameter f (sensitive fraction) is completely non-identifiable! This means infinitely many values of f would produce the exact same tumor size curve.

6.2 Why is f Non-Identifiable?

Think about it intuitively: If we only see total tumor size, we can’t distinguish between:

  • 80% sensitive cells dying fast, 20% resistant cells growing slow
  • 60% sensitive cells dying medium, 40% resistant cells growing medium

Many combinations of (f, g, k) give the same total tumor curve!

6.3 What About Measuring Both Populations?

Let’s see if measuring both populations separately helps:

result_both = assess_identifiability(
    ode_system;
    measured_quantities = measured_both,
    loglevel = Logging.Warn,
)
OrderedCollections.OrderedDict{SymbolicUtils.BasicSymbolic{Real}, Symbol} with 5 entries:
  N_sens(t) => :globally
  N_res(t)  => :globally
  f         => :nonidentifiable
  g         => :globally
  k         => :globally
NoteDramatic Improvement

Look at how the results changed:

Parameter Total Only Both Populations
f ❌ nonidentifiable ❌ nonidentifiable
g ⚠️ locally ✅ globally
k ⚠️ locally ✅ globally
N_sens(t) ⚠️ locally ✅ globally
N_res(t) ⚠️ locally ✅ globally

Key insight: By measuring both cell populations (perhaps using flow cytometry or specific biomarkers), we go from locally identifiable to globally identifiable for g and k!

But wait: Parameter f is STILL non-identifiable! Why?

6.4 Why is f Still Non-Identifiable?

The parameter f represents the initial fraction of sensitive cells. But look at our initial conditions:

N_sens(0) = f
N_res(0) = 1 - f

If we measure both N_sens and N_res at time 0, we directly observe f! The issue is that f isn’t really a dynamic parameter — it’s embedded in the initial conditions. The analysis treats initial conditions and parameters differently.

In practice, you would either:

  1. Estimate f directly from the first measurement
  2. Treat it as a known value
  3. Reparameterize so initial conditions become explicit parameters

7 Part 4: Finding Identifiable Combinations

When individual parameters aren’t identifiable, we can find combinations of parameters that ARE identifiable. This is extremely useful for reparameterization.

find_identifiable_functions(
    ode_system;
    measured_quantities = measured_total_only,
    loglevel = Logging.Warn,
)
2-element Vector{Num}:
 g - k
   g*k
TipInterpreting Identifiable Functions

The output shows:

  • g - k: The difference between growth and death rates is identifiable
  • g × k: The product of growth and death rates is identifiable

What this means practically:

Even though we can’t estimate g and k individually from total tumor data, we CAN estimate:

  1. The net rate difference (g - k): How much faster resistant cells grow than sensitive cells die. If positive, tumor eventually grows; if negative, it shrinks.

  2. The product g × k: From this, we can derive the geometric mean √(g×k), which represents a “typical” rate between g and k.

Example: If the true values were g = 0.04/day and k = 0.16/day:

  • Product: g × k = 0.0064/day² (directly identifiable)
  • Geometric mean: √(0.0064) = 0.08/day (derived from product)
  • Difference: g - k = -0.12/day (tumor shrinks overall)

Clinical interpretation: We can determine the NET tumor behavior (is it overall growing or shrinking?) but not the individual rates.

Reparameterization strategy: Instead of estimating (g, k), estimate (g-k, g×k) and back-calculate if needed. Or fix one rate to a literature value and estimate the other.

8 Part 5: Visualization

Let’s create visualizations to communicate identifiability findings clearly.

8.1 Compile Results into DataFrames

# Tumor burden results
result_total_df = DataFrame(
    parameter = [string(p) for p in keys(result_total)],
    status = map(string, values(result_total))
)
result_both_df = DataFrame(
    parameter = [string(p) for p in keys(result_both)],
    status = map(string, values(result_both))
)
tumor_results = outerjoin(
    result_total_df,
    result_both_df;
    on = :parameter,
    renamecols = "_total" => "_both",
)

simple_table(
    tumor_results,
    [
        :parameter => "Parameter",
        :status_total => "Status (total only)",
        :status_both => "Status (both populations)",
    ],
)
Parameter Status (total only) Status (both populations)
N_sens(t) locally globally
N_res(t) locally globally
f nonidentifiable nonidentifiable
g locally globally
k locally globally

8.2 Identifiability Status Chart

long_tumor_results = vcat(
    result_total_df,
    result_both_df;
    source = :scenario => [:total, :both],
)
specs = data(long_tumor_results) * mapping(
    :parameter => "Parameter",
    :scenario => renamer(:both => "Both populations", :total => "Total only") => "Measurement scenario",
    color = :status => renamer(
        "globally" => "Globally identifiable",
        "locally" => "Locally identifiable",
        "nonidentifiable" => "Non-identifiable",
    ) => "Status",
) * visual(Scatter; markersize = 30, marker = :rect)
draw(
    specs,
    scales(Color = (; palette = [:green, :orange, :red]));
    axis = (; xticklabelrotation = π / 4, yticklabelrotation = π / 2),
)

Tumor Burden Model: How measurement choice affects identifiability. Green = globally identifiable (unique solution), Orange = locally identifiable (finite solutions), Red = non-identifiable (infinite solutions).

Tumor Burden Model: How measurement choice affects identifiability. Green = globally identifiable (unique solution), Orange = locally identifiable (finite solutions), Red = non-identifiable (infinite solutions).
NoteReading the Chart

Row 1 (Total Only): When we only measure total tumor size:

  • f is non-identifiable (red) — cannot be estimated
  • g, k, state variables are locally identifiable (orange) — finite ambiguity

Row 2 (Both Populations): When we measure both cell types:

  • f is STILL non-identifiable — it’s a structural issue
  • g, k, state variables become globally identifiable (green) — unique solutions!

Takeaway: Adding measurements dramatically improves identifiability for g and k.

8.3 Summary: Impact of Measurements

# Count parameters in each category, per scenario
tumor_results_counts = vcat(
    combine(groupby(result_both_df, :status), nrow => :count),
    combine(groupby(result_total_df, :status), nrow => :count);
    source = :scenario => [:total, :both],
)
status_mapping = :status => renamer(
    "globally" => "Globally identifiable",
    "locally" => "Locally identifiable",
    "nonidentifiable" => "Non-identifiable",
) => "Status"
specs = data(tumor_results_counts) * mapping(
    :scenario => renamer(:both => "Both populations", :total => "Total only") => "Measurement scenario",
    :count => "Number of parameters/states",
    stack = status_mapping,
    color = status_mapping,
) * visual(BarPlot)
draw(specs, scales(Color = (; palette = [:green, :orange, :red])))

Adding measurements increases the percentage of globally identifiable parameters.

Adding measurements increases the percentage of globally identifiable parameters.

9 Part 6: Practical Recommendations

Based on what we’ve learned, here are actionable guidelines for the tumor burden model and beyond.

Tip1. Always Check Identifiability BEFORE Estimation

Run identifiability analysis as the first step after building your model, not as a debugging step when estimation fails. This saves enormous time and frustration.

Workflow:

Build model → Check identifiability → Fix issues → THEN estimate
Tip2. Design Measurements for Identifiability

When planning a clinical study:

  1. List the parameters you need to estimate
  2. Run identifiability analysis with your planned measurements
  3. If key parameters are non-identifiable, consider adding biomarkers
  4. Balance cost/invasiveness against identifiability gains

Example: In the tumor burden model, adding biomarkers to distinguish sensitive vs resistant cells would make g and k globally identifiable.

Tip3. Handle Non-Identifiable Parameters

When you find non-identifiable parameters, you have four options:

Strategy When to Use Example
Fix to literature value Parameter well-characterized elsewhere Fix k to published death rate
Reparameterize Only combinations are identifiable Estimate (g-k, g×k) instead of (g, k)
Add measurements Feasible in study design Add biomarker for cell types
Use informative Bayesian priors Have prior knowledge Strong prior on growth rate
Tip4. Local Analysis First, Global Analysis to Confirm

For complex models (>5 parameters, >3 state variables):

  1. Start with local identifiability — fast screening
  2. If locally identifiable — follow up with global analysis for complete picture
  3. If locally non-identifiable — definitely non-identifiable globally, fix first

This saves computational time on large models.

Warning5. Structural vs Practical Identifiability

Remember: Structural identifiability is necessary but not sufficient! (Wieland et al., 2021)

Type Assumes Answers
Structural Perfect data, infinite time Can parameters theoretically be identified?
Practical Real data, finite samples Will parameters be well-estimated in practice?

Even structurally identifiable parameters may be practically non-identifiable due to:

  • Limited data
  • Poor sampling times
  • High measurement noise
  • Parameter values near boundaries

Always follow up with practical identifiability (FIM, profile likelihood, bootstrap) on your actual data (Raue et al., 2009).

10 Summary

10.1 Key Takeaways

  1. Structural identifiability tells you whether parameters CAN be estimated, assuming perfect data

  2. Three classifications matter:

    • Globally identifiable: Safe to estimate (unique solution)
    • Locally identifiable: Estimation depends on starting values (finite solutions)
    • Non-identifiable: Cannot estimate (infinite solutions)
  3. Measurements determine identifiability: The same model can be identifiable or not depending on what you observe

  4. When parameters aren’t identifiable individually, look for identifiable combinations (reparameterization)

  5. Always check BEFORE estimation to avoid wasted effort on impossible problems

10.2 Tumor Burden Model Findings

Finding Practical Implication
f non-identifiable from total tumor Can’t separate sensitive/resistant fractions without biomarkers
g-k and g×k identifiable Can estimate net tumor behavior, not individual rates
Measuring both populations helps Biomarkers for cell types dramatically improve estimation

10.3 Quick Reference: Key Functions

Function Purpose
model.sys Extract ODESystem from Pumas model
@variables x(t) Define symbolic measurement variable
assess_identifiability() Global analysis (globally/locally/non)
assess_local_identifiability() Local analysis (identifiable yes/no)
find_identifiable_functions() Find identifiable parameter combinations

10.4 What’s Next?

In Tutorial 4: Copula Vpop Generation, we’ll learn how to generate realistic virtual populations using Gaussian copulas that preserve the parameter correlations from NLME estimation.

10.5 References

Cortés-Ríos, J., Magee, M., Sher, A., Jusko, W. J., & Desikan, R. (2025). A step-by-step workflow for performing in silico clinical trials with nonlinear mixed effects models. CPT: Pharmacometrics & Systems Pharmacology. https://doi.org/10.1002/psp4.70122
Dong, R., Goodbrake, C., Harrington, H. A., & Pogudin, G. (2023). StructuralIdentifiability.jl: Structural identifiability analysis of differential equation models in julia. Journal of Open Source Software, 8(90), 5610. https://doi.org/10.21105/joss.05610
Raue, A., Kreutz, C., Maiwald, T., Bachmann, J., Schilling, M., Klingmüller, U., & Timmer, J. (2009). Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics, 25(15), 1923–1929. https://doi.org/10.1093/bioinformatics/btp358
Wieland, F.-G., Hauber, A. L., Rosenblatt, M., Tönsing, C., & Timmer, J. (2021). On structural and practical identifiability. Current Opinion in Systems Biology, 25, 60–69. https://doi.org/10.1016/j.coisb.2021.03.005