Mastering Interrupted Time Series (ITS) Design: A Comprehensive Guide for Robust Drug Utilization and Healthcare Policy Evaluation

Evelyn Gray Nov 29, 2025 304

This article provides a comprehensive guide to Interrupted Time Series (ITS) design, a powerful quasi-experimental method for evaluating the impact of interventions in healthcare and drug development.

Mastering Interrupted Time Series (ITS) Design: A Comprehensive Guide for Robust Drug Utilization and Healthcare Policy Evaluation

Abstract

This article provides a comprehensive guide to Interrupted Time Series (ITS) design, a powerful quasi-experimental method for evaluating the impact of interventions in healthcare and drug development. Tailored for researchers and professionals, it covers foundational concepts, advanced methodological approaches including segmented regression and ARIMA models, and common analytical pitfalls. Drawing on current literature and empirical findings, the guide offers practical solutions for troubleshooting issues like autocorrelation and model specification, compares the performance of different analytical techniques, and outlines best practices for validation and reporting to ensure rigorous and reliable study outcomes.

What is ITS Design? Foundational Principles and When to Use It in Healthcare Research

In health services and policy research, the Interrupted Time Series (ITS) design has emerged as a powerful quasi-experimental method for evaluating the effects of interventions when randomized controlled trials (RCTs) are infeasible, unethical, or impractical [1]. ITS analyses data collected at multiple time points before and after a well-defined interruption—such as the implementation of a new policy, drug approval, or clinical guideline—to assess whether the intervention caused a significant change in level or trend of the outcome of interest [2] [3]. This design is particularly valuable for evaluating the real-world effectiveness of large-scale health interventions and is increasingly employed in pharmacoepidemiology and health services research.

The fundamental strength of ITS lies in its ability to establish a pre-intervention trend and compare it with post-intervention data, creating a counterfactual framework that strengthens causal inference beyond simpler pre-post designs [1]. By accounting for underlying secular trends, ITS can distinguish between changes that would have occurred naturally and those truly attributable to the intervention. This methodological rigor, combined with its practical applicability, makes ITS an indispensable tool for researchers and drug development professionals seeking robust evidence for regulatory and policy decisions.

Theoretical Foundations and Key Principles

Core Statistical Model

The standard ITS model can be represented mathematically as [3]:

Yₜ = β₀ + β₁T + β₂Xₜ + β₃TXₜ + εₜ

Where:

  • Yₜ is the outcome measured at time t
  • T is the time since the start of the study
  • Xₜ is a dummy variable representing the intervention (0 = pre-intervention, 1 = post-intervention)
  • TXₜ is an interaction term between time and intervention
  • β₀ represents the baseline level of the outcome
  • β₁ is the pre-intervention trend
  • β₂ is the immediate level change following the intervention
  • β₃ is the change in trend after the intervention
  • εₜ represents error terms

Comparison with Other Quasi-Experimental Methods

Table 1: Comparison of Quasi-Experimental Methods in Health Research

Method Data Requirements Key Assumptions Strengths Limitations
Interrupted Time Series Multiple time points before & after intervention Underlying trend would continue without intervention; no concurrent interventions Controls for secular trends; no control group needed Requires sufficient data points; vulnerable to autocorrelation
Difference-in-Differences Pre/post data for treatment and control groups Parallel trends assumption Controls for time-invariant confounders Violation of parallel trends can bias estimates
Synthetic Control Method Time-series data for treated unit & multiple control units Combination of control units approximates treatment unit Flexible approach for single-unit interventions Complex implementation; limited statistical inference
Pre-Post Design Single point before & after intervention No other factors changed between measurements Simple implementation and analysis Highly vulnerable to confounding and secular trends

Among these approaches, ITS performs particularly well when data for a sufficiently long pre-intervention period are available and the underlying model is correctly specified [1]. When all included units have been exposed to treatment (single-group designs), ITS provides a robust framework for impact evaluation without requiring identification of control groups [1].

ITS Application Protocols and Methodologies

Standard Implementation Protocol

Table 2: Step-by-Step ITS Implementation Protocol for Drug Development Research

Phase Key Activities Methodological Considerations Quality Checks
Protocol Development Define research question; identify intervention point; select outcome measures; determine sample size Ensure clinical relevance of outcomes; specify primary and secondary analyses Protocol registration; ethical approvals; statistical review
Data Collection Extract time-series data; ensure consistent measurement; document data sources Adequate pre- and post-intervention periods; consistent frequency; manage missing data Data quality audit; verification of intervention timing; outlier assessment
Model Specification Select statistical model; account for autocorrelation; control for covariates Check stationarity; identify seasonal patterns; select appropriate correlation structure Residual diagnostics; goodness-of-fit tests; variance inflation factors
Analysis Execution Parameter estimation; hypothesis testing; effect size calculation Adjust for autocorrelation; consider segmented regression; model level and slope changes Sensitivity analyses; validation of model assumptions; robustness checks
Interpretation & Reporting Estimate intervention effects; contextualize findings; discuss limitations Differentiate statistical vs. clinical significance; consider confounding factors Comparison with prior evidence; assessment of publication bias

Advanced Analytical Approaches

For complex interventions, researchers may employ two-stage ITS designs to evaluate multiple intervention components simultaneously [4]. The statistical code provided illustrates how to model two sequential interventions while controlling for covariates using PROC MIXED for continuous outcomes and Poisson regression for count outcomes [4]. This approach is particularly relevant in drug development when evaluating phased implementation or combination therapies.

When implementing ITS analyses, researchers must address autocorrelation (correlation between successive observations), which violates the independence assumption of standard regression models [2]. Appropriate techniques include using autoregressive integrated moving average (ARIMA) models or including correlation structures in generalized estimating equations [2].

Data Visualization and Presentation Standards

Effective visualization is crucial for interpreting and communicating ITS findings. The following standards ensure accurate representation of time series data and intervention effects:

Core Graphing Principles

  • Plot all raw data points used in the analysis to allow examination of variation and facilitate data extraction [2]
  • Clearly indicate the interruption time with a vertical line or shaded region, labeled with the intervention type [2]
  • Display fitted pre- and post-intervention trend lines using bold, solid lines to distinguish from raw data [2]
  • Include counterfactual trend lines showing what would have occurred without the intervention, using different line patterns [2]
  • Ensure axes are clearly labeled with variables and units of measurement, aligning tick marks with data points [2]

Recent assessments of ITS graphs in published literature found that only 33% allowed accurate data extraction, highlighting the need for greater adherence to visualization standards [2]. Common deficiencies included unclear data points, missing trend lines, and poorly defined interruption points.

Visual Workflow for ITS Analysis

ITS Analytical Workflow

Comparative Performance and Validation

Evidence from methodological comparisons demonstrates that ITS provides reliable effect estimates when its assumptions are met. A simulation study comparing quasi-experimental methods found that ITS performs very well when all included units have been exposed to treatment and data for a sufficiently long pre-intervention period are available [1]. The key advantage of ITS over simpler pre-post designs is its ability to account for and separate underlying secular trends from intervention effects.

In an empirical comparison of methods evaluating the introduction of activity-based funding in Irish hospitals, ITS produced statistically significant results that differed in interpretation from control-group methods like Difference-in-Differences and Synthetic Control [3]. This highlights the importance of method selection based on the specific research context and available data structure.

Essential Research Reagents and Tools

Table 3: Essential Analytical Tools for ITS Implementation

Tool Category Specific Software/Solutions Primary Application in ITS Key Functions
Statistical Software SAS (PROC AUTOREG, PROC MIXED) Model estimation and inference Time series analysis; autocorrelation correction; parameter estimation
Statistical Packages R (its.analysis, forecast packages) Flexible model specification Segmented regression; ARIMA modeling; visualization
Data Visualization Stata, ggplot2, specialized ITS graphing tools Creating standard-compliant graphs Raw data plotting; trend line fitting; counterfactual display
Quality Assurance Statistical diagnostic tools Validation of model assumptions Residual analysis; autocorrelation tests; goodness-of-fit assessment

Specialized statistical software is essential for proper ITS implementation, as standard analytical packages may not adequately address autocorrelation or enable appropriate counterfactual modeling [4]. The provided SAS code illustrates a comprehensive approach to ITS analysis, including model specification, estimation of key parameters, and generation of appropriate visualizations [4].

Interrupted Time Series design represents a methodologically rigorous approach for evaluating intervention effects when RCTs are not feasible. By properly implementing ITS protocols—including appropriate model specification, accounting for autocorrelation, and adhering to visualization standards—researchers can generate robust evidence to inform drug development, health policy, and clinical practice. The structured protocols and analytical frameworks presented in this document provide researchers with practical guidance for applying ITS methods across diverse healthcare contexts.

The Interrupted Time Series (ITS) design is a robust quasi-experimental methodology used to evaluate the impact of interventions or exposures when randomized controlled trials (RCTs) are impractical due to high costs, ethical concerns, or the population-level nature of the intervention [5]. This design is characterized by the collection of data at multiple time points both before and after a clearly defined interruption. By modeling the pre-interruption trend, researchers can establish a counterfactual—what would have likely occurred without the intervention—and compare it to the observed post-interruption data [5] [6]. This allows for the estimation of both immediate and long-term intervention effects, accounting for underlying secular trends. ITS designs are particularly valuable in implementation science and public health for assessing the effects of policy changes, health system interventions, and large-scale quality improvement initiatives [5] [7].

Core Analytical Components of ITS

The analysis of ITS data typically employs segmented regression models to quantify intervention effects. These effects are conceptualized through two primary components: level changes and slope changes. The standard segmented regression model for a single interruption can be parameterized as follows [6]:

Y_t = β₀ + β₁*t + β₂*D_t + β₃*(t - T_I)*D_t + ε_t

Where:

  • Y_t is the outcome measured at time t.
  • t is a continuous variable indicating time elapsed since the start of the observation period.
  • D_t is a dummy variable representing the post-interruption period (0 before the interruption, 1 after).
  • T_I is the time at which the interruption occurs.
  • ε_t represents the error term.

The core components derived from this model are summarized in the table below.

Table 1: Core Components of the Interrupted Time Series Model

Component Statistical Parameter Interpretation Visual Representation
Level Change β₂ Represents the immediate effect of the intervention. It is the change in the outcome's level that occurs immediately following the interruption, measured as the difference between the observed value just after the intervention and the value predicted by the pre-interruption trend. A vertical shift in the time series line at the point of interruption.
Slope Change β₃ Represents the long-term effect of the intervention. It quantifies the change in the trajectory (steepness) of the time series after the interruption compared to the pre-interruption trend. A change in the angle of the time series line after the interruption.
Pre-Interruption Slope β₁ Describes the underlying secular trend of the outcome before the intervention was implemented. The direction and steepness of the line in the pre-interruption segment.
Baseline Level β₀ Represents the starting level of the outcome at time zero. The Y-intercept of the time series.

The Critical Assumption of ITS

The primary assumption underpinning a causal interpretation of ITS results is that the pre-interruption trend would have persisted unchanged into the post-interruption period in the absence of the intervention [5]. This assumption cannot be empirically proven and relies on the researcher's contextual knowledge and methodological rigor to ensure that no other concurrent events or changes (confounders) could plausibly explain the observed deviation in the time series. Violations of this assumption threaten the validity of the study's conclusions.

Quantitative Data and Statistical Protocols

Comparison of Statistical Methods for ITS Analysis

A critical consideration in ITS analysis is accounting for autocorrelation (serial correlation), where data points close in time are more similar than those further apart. Failure to account for positive autocorrelation can lead to underestimated standard errors, inflated test statistics, and an increased risk of Type I errors [6]. Several statistical methods are available, each handling autocorrelation differently. An empirical evaluation of 190 published ITS found that the choice of method can lead to substantially different conclusions, with statistical significance (at the 5% level) differing in 4% to 25% of pairwise comparisons between methods [6].

Table 2: Comparison of Statistical Methods for Analyzing Interrupted Time Series Data

Statistical Method Description Handling of Autocorrelation Key Considerations
Ordinary Least Squares (OLS) Most basic method; fits model via standard linear regression. Does not account for autocorrelation. Standard errors are likely biased if autocorrelation is present. Simple to implement but not recommended for ITS due to high risk of biased inference [6].
OLS with Newey-West Standard Errors (NW) Uses OLS for parameter estimates but adjusts the standard errors to account for autocorrelation and heteroscedasticity. Corrects the standard errors post-estimation, providing more robust confidence intervals and p-values. A pragmatic improvement over OLS; provides some protection against autocorrelation [6].
Prais-Winsten (PW) A generalized least squares (GLS) method that transforms the data to account for first-order autocorrelation. Directly models the autocorrelation in the error term (AR1 process) and uses this to improve estimation. Often more statistically efficient than OLS/NW when autocorrelation is correctly specified [6].
Restricted Maximum Likelihood (REML) A likelihood-based method that reduces bias in the estimation of variance components. Can model autocorrelation directly. The Satterthwaite approximation (Satt) can be used for small samples. Provides less biased variance estimates, which is beneficial for shorter time series [6].
Autoregressive Integrated Moving Average (ARIMA) A flexible class of models that can capture complex patterns, including autocorrelation, trends, and seasonality. Explicitly models the dependency structure using lagged values of the series and the error terms. Highly flexible but requires more expertise to specify the correct model order [5] [6].

Protocol for Conducting an ITS Analysis

The following protocol provides a step-by-step methodology for designing, conducting, and analyzing an ITS study.

Protocol 1: ITS Design and Analysis Workflow

  • Define the Intervention and Interruption Point: Clearly specify the intervention and the exact time point (T_I) at which it is implemented.
  • Ensure Sufficient Data Points: Collect data for a sufficient number of time points before and after the interruption. A minimum of 3 points per segment is often cited, but more are strongly recommended for robust trend estimation and autocorrelation modeling [5] [6].
  • Check and Control for Confounders: Systematically identify and document any other events or changes that occurred around the time of the intervention that could affect the outcome. If possible, collect data on these potential confounders for inclusion in the statistical model.
  • Conduct Descriptive Analysis: Plot the raw data to visually inspect for trends, seasonality, outliers, and any obvious level or slope changes at the interruption point.
  • Specify the Statistical Model Pre-Analysis: Pre-specify the segmented regression model (as in Equation 1) and the primary statistical method for estimation (e.g., Prais-Winsten) in a study protocol to avoid data-driven choices [6].
  • Estimate Model Parameters: Fit the segmented regression model using the chosen statistical method.
  • Test for and Model Autocorrelation: Examine the residuals of the model for significant autocorrelation (e.g., using Durbin-Watson or Ljung-Box tests). If autocorrelation is present, consider using a method that accounts for it (PW, REML, ARIMA) or confirm that the chosen method (NW) provides adequate correction.
  • Interpret the Core Components: Interpret the estimated coefficients (β₂ for level change, β₃ for slope change) along with their confidence intervals and p-values. The level change indicates the immediate effect, while the slope change indicates the sustained, long-term effect on the trend.
  • Perform Sensitivity Analyses: Assess the robustness of the findings by re-analyzing the data using alternative statistical methods (e.g., comparing PW, NW, and ARIMA results) to ensure conclusions are not dependent on a single methodological choice [5] [6].

Visualization of ITS Concepts and Workflow

The following diagrams, generated using Graphviz, illustrate the core model of an ITS and the recommended analytical workflow.

ITS_Core_Model title Core Interrupted Time Series (ITS) Model PreInt Pre-Interruption Segment IntPoint Interruption Point (T_I) PreInt->IntPoint SlopePre Pre-Interruption Slope (β₁) PreInt->SlopePre PostInt Post-Interruption Segment SlopePost Slope Change (β₃) Long-term Effect PostInt->SlopePost IntPoint->PostInt Level Level Change (β₂) Immediate Effect IntPoint->Level

ITS_Workflow title ITS Analysis Protocol Workflow Start 1. Define Intervention & T_I A 2. Collect Data (Pre & Post T_I) Start->A B 3. Control for Confounders A->B C 4. Plot & Inspect Data B->C D 5. Pre-specify Model & Method C->D E 6. Fit Segmented Regression Model D->E F 7. Check Residuals for Autocorrelation E->F G Autocorrelation Present? F->G J Use Pre-specified Model G->J No K Use Model Accounting for Autocorrelation (e.g., PW) G->K Yes H 8. Interpret β₂ (Level) & β₃ (Slope) I 9. Sensitivity Analysis (Alternative Methods) H->I J->H K->H

The Scientist's Toolkit: Essential Reagents for ITS Research

Table 3: Key Research Reagent Solutions for Interrupted Time Series Analysis

Tool / Reagent Function / Application Example / Note
Statistical Software (R/Stata/SAS) Platform for executing segmented regression and advanced time series analyses. R with packages like nlme, forecast, lmtest, and sandwich is widely used for its flexibility and comprehensive time series capabilities.
Segmented Regression Code The script specifying the statistical model to estimate level and slope changes. Pre-written code templates for methods like Prais-Winsten or Newey-West prevent errors and ensure reproducibility.
Autocorrelation Diagnostic Tests Statistical tests to detect the presence and structure of autocorrelation in model residuals. The Durbin-Watson test for first-order autocorrelation and the Ljung-Box test for higher-order autocorrelation are essential diagnostics.
WebPlotDigitizer A tool for digitally extracting aggregated data points from published graphs in systematic reviews or meta-analyses. Critical for including data from published ITS studies when raw data are not otherwise available [6].
Implementation Science Frameworks (e.g., CFIR) Conceptual tools to guide the understanding of context and determinants influencing the intervention being evaluated. The Consolidated Framework for Implementation Research (CFIR) helps systematically identify potential confounders and facilitators affecting the outcome [8].
Pre-Analysis Plan A formal document pre-specifying the research question, model, primary analysis method, and outcomes. Registering a pre-analysis plan reduces bias and enhances the credibility of reported ITS findings [6].

Application Notes: The Role of ITS in Health Research

Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the impact of interventions when randomized controlled trials (RCTs) are not feasible, ethical, or practical [9] [10]. ITS analyses involve collecting data at multiple, equally spaced time points before and after a defined intervention to determine whether the intervention has caused a significant change in the level or trend of the outcome of interest [9]. This design is particularly valuable in health services research, where policymakers and researchers need to understand the real-world effects of interventions implemented at the population or health system level.

Within the traditional translational research pipeline, ITS designs occupy a crucial space in dissemination and implementation research [11]. They help answer how evidence-based clinical and preventive interventions can be successfully adopted, scaled up, and sustained within community or service delivery systems after efficacy and effectiveness have been established [11]. The strength of ITS lies in its ability to control for underlying trends and secular changes, providing a robust counterfactual for what would have happened without the intervention.

Table 1: Key Characteristics of Interrupted Time Series Design

Characteristic Description Importance in Health Evaluation
Pre-intervention Data Points Multiple measurements before intervention Establishes baseline trend and pattern
Post-intervention Data Points Multiple measurements after intervention Captures intervention effects over time
Known Intervention Time Exact timing of intervention is specified Allows precise modeling of intervention effects
Autocorrelation Consideration Accounting for correlation between consecutive measurements Ensures proper statistical inference
Seasonality Adjustment Controlling for periodic fluctuations Isolates intervention effects from seasonal patterns

Experimental Protocols for ITS Analysis

Protocol 1: Segmented Regression Analysis for ITS

Purpose: To quantify intervention effects using segmented regression, the most commonly applied method in healthcare ITS studies [10].

Methodology:

  • Data Collection: Collect equally spaced time series data (e.g., monthly, quarterly) with a minimum of 2 pre-intervention and 1 post-intervention points, though substantially more are recommended for adequate power [9].
  • Model Specification: Fit a segmented regression model with terms for:
    • Baseline trend
    • Immediate level change post-intervention
    • Trend change post-intervention
  • Autocorrelation Testing: Use Durbin-Watson or other statistical tests to detect autocorrelation, which is present in 55% of healthcare ITS studies but formally tested in only 63% of those [10].
  • Model Adjustment: If autocorrelation is detected, use autoregressive integrated moving average (ARIMA) models or include correlation structures in segmented regression.
  • Effect Estimation: Calculate both absolute and relative effects with confidence intervals for:
    • Immediate level change (reported in 70% of studies)
    • Slope change (reported in 84% of studies)
    • Long-term level change (reported in 21% of studies) [10]

Interpretation: The coefficients for level and slope changes represent the intervention's impact, adjusted for pre-existing trends.

Protocol 2: ARIMA Modeling for Complex Time Series

Purpose: To model ITS data with complex autocorrelation patterns, seasonality, or non-stationarity [9].

Methodology:

  • Stationarity Assessment: Check for constant mean and variance over time using differencing if needed.
  • Model Identification: Determine appropriate ARIMA(p,d,q) parameters where:
    • p = autoregressive order
    • d = degree of differencing
    • q = moving average order
  • Intervention Components: Include step or pulse functions to model intervention effects.
  • Model Validation: Check residuals for white noise properties using ACF/PACF plots and Ljung-Box test.
  • Comparison with Alternative Models: Consider Generalized Additive Models (GAMs) when the form of non-linear relationships is unknown [9].

Application Notes: ARIMA demonstrates consistent performance across different policy effect sizes and seasonal patterns, while GAMs show greater robustness to model misspecification [9].

ARIMA_Workflow Start Start: Time Series Data Stationarity Check Stationarity Start->Stationarity Differencing Apply Differencing Stationarity->Differencing Non-stationary ModelID Identify ARIMA Parameters (p,d,q) Stationarity->ModelID Stationary Differencing->ModelID Estimation Parameter Estimation ModelID->Estimation Validation Model Validation Estimation->Validation Validation->ModelID Inadequate Intervention Add Intervention Components Validation->Intervention Adequate Forecasting Generate Estimates Intervention->Forecasting End Interpret Results Forecasting->End

ITS Analysis with ARIMA Modeling

Ideal Use Cases in Healthcare

Health Policy Evaluation

ITS designs are exceptionally well-suited for evaluating population-level health policies because they can detect both immediate and gradual effects of policy implementation [9]. The strength of ITS in policy analysis lies in its ability to account for pre-existing trends, which is crucial when policies are implemented in dynamic healthcare environments.

Exemplar Applications:

  • Taxation Policies: Evaluating the impact of alcohol or sugar-sweetened beverage taxes on consumption patterns and health outcomes [9]. These analyses often reveal anticipatory effects (declines before implementation) and lagged effects (full impact realized over years).
  • Regulatory Changes: Assessing the effect of prescription drug monitoring programs on opioid prescribing patterns and overdose rates.
  • Coverage Decisions: Measuring the impact of insurance coverage expansion on healthcare utilization and outcomes.

Table 2: Health Policy Interventions Evaluated with ITS Designs

Policy Type Exemplar Study Focus Typical Outcomes Measured Data Collection Frequency
Legislative Policies Bans on alcohol marketing [9] Consumption rates, Mortality Monthly/Quarterly
* Fiscal Policies* Taxation changes [9] Sales data, Hospitalizations Monthly
* Regulatory Policies* Prescription restrictions [11] Prescribing rates, Adverse events Monthly
* Coverage Policies* Insurance expansion [11] Utilization rates, Health outcomes Quarterly/Annual

Drug Utilization Review and Evaluation

Drug Utilization Review (DUR) represents a prime application for ITS designs in pharmaceutical research and regulation [12]. ITS methods can evaluate the impact of prospective, concurrent, and retrospective DUR programs on prescribing patterns, medication safety, and healthcare utilization.

Implementation Framework:

  • Prospective DUR: ITS can evaluate screening interventions applied before medication dispensing to assess their impact on inappropriate prescribing, drug-disease contraindications, and therapeutic duplication [12].
  • Concurrent DUR: Analyze ongoing monitoring interventions during treatment to assess effects on medication adherence, appropriate duration, and early problem detection.
  • Retrospective DUR: Evaluate systematic reviews of medication use after treatment to identify patterns of overuse, underuse, or misuse and assess corrective interventions [12].

Sample Protocol for Drug Policy Evaluation:

  • Objective: Evaluate the impact of a prior authorization policy for a new high-cost medication.
  • Data: Monthly prescription claims data for 24 months pre-implementation and 24 months post-implementation.
  • Outcomes: Primary - appropriate use rate; Secondary - alternative medication use, costs.
  • Analysis: Segmented regression with adjustment for seasonality and autocorrelation.

DUR_ITS DURTypes DUR Types Prospective Prospective DUR Pre-dispensing review Concurrent Concurrent DUR During treatment monitoring Claims Prescription Claims Data Retrospective Retrospective DUR Post-treatment analysis DataSources Data Sources EMR Electronic Medical Records Appropriate Appropriate Use Admin Administrative Databases Outcomes Outcomes Measured Safety Safety Indicators Cost Cost Outcomes

Drug Utilization Review ITS Framework

Healthcare Program Implementation

ITS designs are widely used to evaluate health programs at the hospital, health system, or population level [10]. Programs represent the most common intervention type evaluated using ITS (35% of healthcare ITS studies), followed by policies (28%) [10].

Key Considerations for Program Evaluation:

  • Intervention Specification: Clearly define when the program was implemented and whether there was a transition or ramp-up period (considered in only 17% of studies) [10].
  • Fidelity Assessment: Monitor program implementation fidelity across sites and over time.
  • Contextual Factors: Document concurrent events or policies that might confound program effects.

Methodological Considerations and Reporting Standards

Sample Size and Power Considerations

Determining adequate sample size (number of time points) in ITS designs remains challenging, with only 6% of healthcare ITS studies reporting any sample size calculation [10]. While traditional rules of thumb suggest a minimum of 50 observations, requirements depend on multiple factors:

  • Data Variability: More variable data require more time points
  • Effect Size: Smaller detectable effects require more observations
  • Seasonal Patterns: Seasonal models require additional degrees of freedom (m-1 for seasonal period m)
  • Model Complexity: ARIMA models with multiple parameters require more data than simple segmented regression

Simulation approaches are recommended for power analysis, particularly for complex models like GAM where effective degrees of freedom vary by smooth term [9].

Addressing Common Methodological Challenges

Table 3: Methodological Challenges in ITS Analysis

Challenge Description Recommended Approaches
Autocorrelation Correlation between sequential observations Use Durbin-Watson test; Employ ARIMA or correlated error models [9] [10]
Seasonality Regular periodic fluctuations Include seasonal terms; Use seasonal ARIMA; Apply seasonal adjustment [9]
Non-stationarity Changing mean or variance over time Apply differencing; Use integrated (I) component in ARIMA [9]
Multiple Interventions Concurrent or sequential interventions Include multiple intervention terms; Model complex intervention patterns [10]
Missing Data Gaps in time series Use appropriate imputation; Model missing data mechanism [10]

Current Reporting Practices and Gaps

Recent methodological reviews reveal significant reporting gaps in healthcare ITS studies [10]:

  • Only 57% report analytical methods in abstracts
  • Just 29% specify number of pre-intervention points in abstracts
  • Only 28% specify number of post-intervention points in abstracts
  • Seasonality is considered in only 24% of studies
  • Non-stationarity is addressed in only 8% of studies
  • Sensitivity analyses are conducted in only 17% of studies

The Scientist's Toolkit: Essential Research Reagents

Table 4: Essential Methodological Tools for ITS Analysis

Tool Category Specific Solutions Function/Application
Statistical Software R (package: forecast), SAS PROC ARIMA, Stata itsa Model estimation, hypothesis testing, forecasting
Primary Analysis Methods Segmented regression, ARIMA models, Generalized Additive Models (GAM) Quantifying intervention effects, handling autocorrelation [9]
Autocorrelation Diagnostics Durbin-Watson test, ACF/PACF plots, Ljung-Box test Detecting and quantifying autocorrelation in residuals [10]
Data Visualization Time series plots with intervention points, ACF plots, residual plots Visual assessment of trends, intervention effects, model adequacy
Sample Size Planning Simulation-based power analysis, heuristic approaches Determining adequate number of time points [9]

ITS_Selection Start Start: ITS Design Question DataFreq Determine Data Frequency & Points Start->DataFreq SimpleTrend Clear trend? Few covariates? DataFreq->SimpleTrend ChooseSegmented Segmented Regression SimpleTrend->ChooseSegmented Yes ComplexPattern Complex autocorrelation or seasonality? SimpleTrend->ComplexPattern No Validate Model Validation ChooseSegmented->Validate ChooseARIMA ARIMA Modeling ComplexPattern->ChooseARIMA Yes Nonlinear Non-linear patterns unknown form? ComplexPattern->Nonlinear No ChooseARIMA->Validate Nonlinear->ChooseSegmented No ChooseGAM Generalized Additive Model (GAM) Nonlinear->ChooseGAM Yes ChooseGAM->Validate Report Report Effects & Uncertainty Validate->Report

ITS Analytical Method Selection

Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the impact of large-scale health interventions when randomized controlled trials are not feasible or ethical [13]. The analysis of data from such designs hinges on a clear understanding of core time series concepts. Autocorrelation, seasonality, stationarity, and trend are not merely statistical properties; they are fundamental characteristics that, if unaccounted for, can severely bias the estimation of an intervention's effect [14] [13]. This document provides application notes and protocols for researchers and drug development professionals, framing these concepts within the practical context of ITS implementation research, such as evaluating a new drug's rollout or a policy change affecting prescribing practices.

Core Terminology and Implications for ITS

Conceptual Definitions

  • Trend: A trend represents a long-term increase or decrease in the data, which does not necessarily have to be linear [15]. In a pharmaceutical context, a gradual, nationwide increase in the use of a particular drug class preceding an intervention would constitute a trend. Failing to control for this underlying trend can lead to misattributing the pre-existing growth to the intervention effect.

  • Seasonality: Seasonality refers to patterns that repeat themselves over a fixed and known period (e.g., time of year or day of the week) [15]. This is common in health data due to factors like weather patterns (e.g., higher antibiotic prescriptions in winter) or administrative processes (e.g., increased medicine dispensings at the end of a financial year) [13]. In ITS, unmodeled seasonality can create the illusion of an intervention effect when the observed change is merely part of a predictable, recurring cycle.

  • Autocorrelation (Serial Correlation): Autocorrelation describes the phenomenon where successive values in a time series are correlated with themselves over time [16]. In simpler terms, an observation at one time point (e.g., drug sales this month) is often a good predictor of the observation at the next time point (drug sales next month). This violates the standard statistical assumption of independent errors. The presence of autocorrelation can severely bias inferences, leading to underestimation of standard errors and overconfidence in the significance of the intervention effect [14].

  • Stationarity: A time series is stationary if its statistical properties—such as mean, variance, and covariance—are constant over time [13]. Many time series models, including those based on Autoregressive Integrated Moving Average (ARIMA), require the data to be stationary. An interruption itself, like a policy change, is a structural break that induces non-stationarity. Therefore, the goal is often to achieve stationarity in the data before the intervention to build a reliable model, which is then used to assess the impact of the interruption [17] [18].

Interrelationships and Impact on ITS Analysis

These four concepts are deeply intertwined. A time series can exhibit a trend, upon which seasonal patterns are superimposed, and the deviations from these patterns (residuals) may be autocorrelated. In ITS analysis, the primary risk is that these inherent patterns can be confounded with the intervention effect. For instance, a sharp change following an intervention might be part of a seasonal cycle, or a pre-existing downward trend might make an intervention appear more effective than it truly is. Proper ITS modeling requires isolating the intervention effect from these other components.

Table 1: Core Terminology and ITS Implications

Term Core Definition Primary Risk in ITS Analysis Common Remedial Actions
Trend Long-term, non-random, directional movement in the data [15]. Confounding the intervention effect with a pre-existing slope. Detrending via differencing; including a time variable in segmented regression.
Seasonality Fixed-frequency, recurring patterns (e.g., yearly, quarterly) [15]. Misinterpreting a predictable, recurrent change as an intervention effect. Seasonal differencing; including seasonal dummy variables; using Seasonal ARIMA (SARIMA).
Autocorrelation Correlation of a time series with its own lagged values [16]. Biased standard errors, leading to overestimation of the intervention's significance [14]. Using models that explicitly model the error structure (e.g., ARIMA, GLS).
Stationarity Constant statistical properties (mean, variance) over time [13]. Invalid model parameters and spurious regression results. Differencing (regular and seasonal); transformations (e.g., log); explicit trend modeling.

Experimental Protocols for Analysis

This section outlines a standard workflow for preparing and analyzing an ITS dataset, focusing on diagnosing and managing trend, seasonality, and autocorrelation.

Data Preparation and Visualization Protocol

Objective: To visually inspect the raw time series data for initial patterns and prepare it for formal analysis.

  • Data Loading and Formatting: Load the data (e.g., a CSV file with monthly counts of a drug's dispensings) into a statistical software environment (e.g., R, Python, SAS). Ensure the time variable is correctly parsed as a date-time object and set as the series index. Python Code Snippet:

  • Initial Visualization: Plot the raw time series data against time. This is the first and most crucial step for identifying obvious trends, seasonal cycles, and potential structural breaks at the intervention point. Python Code Snippet:

Protocol for Decomposition and Stationarity Testing

Objective: To formally decompose the time series into its constituent parts and test for stationarity.

  • Time Series Decomposition: Decompose the series into trend, seasonal, and residual components. This helps visualize the contribution of each. Python Code Snippet (using statsmodels):

  • Test for Stationarity: Apply statistical tests to check the null hypothesis of non-stationarity.

    • Augmented Dickey-Fuller (ADF) Test: Null hypothesis: the series has a unit root (non-stationary).
    • KPSS Test: Null hypothesis: the series is trend-stationary. Python Code Snippet:

    Interpretation: A low p-value (e.g., <0.05) in the ADF test allows rejection of the null, suggesting stationarity. The converse is generally true for the KPSS test.

  • Addressing Non-Stationarity: If the series is non-stationary, apply differencing.

    • First Differencing: Subtract the previous value from the current value (Y_t - Y_{t-1}) to remove a linear trend.
    • Seasonal Differencing: For seasonal data, subtract the value from the same point in the previous season (e.g., Y_t - Y_{t-12} for monthly data) [19]. Python Code Snippet:

    Re-run the stationarity tests on the differenced data.

Protocol for Autocorrelation Analysis and Model Fitting

Objective: To quantify and model the autocorrelation structure and fit an appropriate ITS model.

  • Plot Autocorrelation Functions:

    • Autocorrelation Function (ACF): Plots the correlation between the series and its lags.
    • Partial Autocorrelation Function (PACF): Plots the partial correlation between the series and its lags, controlling for correlations at shorter lags. Python Code Snippet:

  • Model Selection and Fitting:

    • Segmented Regression: Can be used if the autocorrelation is minimal or modeled explicitly. It directly includes terms for time, intervention, and time after intervention.
    • ARIMA/SARIMA Models: A robust alternative when autocorrelation and seasonality are present. The model is defined by (p,d,q) for non-seasonal and (P,D,Q,m) for seasonal components [13]. Python Code Snippet for SARIMA:

  • Model Diagnostics: Check the residuals of the final model. They should resemble white noise (i.e., no significant autocorrelations). Plot the ACF/PACF of the residuals and perform a Ljung-Box test.

The following workflow diagram illustrates the logical relationship and sequence of these analytical steps.

G Raw ITS Data Raw ITS Data Visualize Data\n(Plot) Visualize Data (Plot) Raw ITS Data->Visualize Data\n(Plot) Decompose Series\n(Trend, Seasonal, Residual) Decompose Series (Trend, Seasonal, Residual) Visualize Data\n(Plot)->Decompose Series\n(Trend, Seasonal, Residual) Test for Stationarity\n(ADF, KPSS tests) Test for Stationarity (ADF, KPSS tests) Decompose Series\n(Trend, Seasonal, Residual)->Test for Stationarity\n(ADF, KPSS tests) Is Data Stationary? Is Data Stationary? Test for Stationarity\n(ADF, KPSS tests)->Is Data Stationary? Apply Differencing\n(Regular & Seasonal) Apply Differencing (Regular & Seasonal) Is Data Stationary?->Apply Differencing\n(Regular & Seasonal) No Analyze ACF/PACF Plots Analyze ACF/PACF Plots Is Data Stationary?->Analyze ACF/PACF Plots Yes Apply Differencing\n(Regular & Seasonal)->Analyze ACF/PACF Plots Fit ITS Model\n(Segmented Regression, ARIMA/SARIMA) Fit ITS Model (Segmented Regression, ARIMA/SARIMA) Analyze ACF/PACF Plots->Fit ITS Model\n(Segmented Regression, ARIMA/SARIMA) Diagnose Residuals\n(Should be White Noise) Diagnose Residuals (Should be White Noise) Fit ITS Model\n(Segmented Regression, ARIMA/SARIMA)->Diagnose Residuals\n(Should be White Noise) Interpret Intervention Effect Interpret Intervention Effect Diagnose Residuals\n(Should be White Noise)->Interpret Intervention Effect

For researchers implementing ITS analyses, the following "research reagents" are essential computational tools and statistical constructs.

Table 2: Key Research Reagents for ITS Analysis

Tool/Reagent Type Function in ITS Analysis
Statistical Software (R/Python/SAS) Software Platform Provides the computational environment for data manipulation, modeling, and visualization.
Augmented Dickey-Fuller (ADF) Test Statistical Test Formally tests the null hypothesis that a time series has a unit root (is non-stationary) [19].
Autocorrelation Function (ACF) Plot Diagnostic Plot Visualizes the correlation between a time series and its lagged values, helping to identify AR/MA processes and seasonality [13].
Partial Autocorrelation Function (PACF) Plot Diagnostic Plot Displays the partial correlation of a time series with its own lagged values, controlling for intermediate lags; crucial for identifying the order of autoregressive (AR) processes.
Seasonal Decomposition (e.g., STL) Analytical Method Separates a time series into trend, seasonal, and residual components, allowing for a clear inspection of each [20] [19].
SARIMA Model Statistical Model Extends ARIMA to explicitly model seasonal autocorrelation patterns, defined by parameters (p,d,q)(P,D,Q,m) [13].
Differencing (Y_t - Y_{t-1}) Data Transformation A method to remove trend and achieve stationarity by computing the changes between consecutive observations [13].

The rigorous application of ITS design is particularly relevant in the context of contemporary drug development. The industry is increasingly characterized by the use of artificial intelligence for drug discovery, a growth in personalized and precision medicine, and the adoption of virtual clinical trials and digital health technologies [21] [22]. These trends generate complex, longitudinal data perfect for ITS evaluation. For instance, an ITS could be used to assess the impact of an AI-driven diagnostic tool on time-to-patient-identification for a rare disease trial, or to evaluate how a new policy on personalized medicine reimbursement affected the uptake of a targeted therapy. In all these scenarios, controlling for autocorrelation, seasonality, and underlying trends is essential for deriving valid, actionable insights that can inform regulatory and commercial decisions.

Implementing ITS Analysis: Methodological Choices from Segmented Regression to ARIMA and GAM

The interrupted time series (ITS) design is a powerful quasi-experimental methodology used to evaluate the effects of interventions when randomized controlled trials are not feasible, ethical, or practical [23] [24]. Within this framework, segmented regression has emerged as the most prevalent and recommended statistical technique for analyzing time series data before and after a well-defined intervention [24] [25]. Also known as piecewise or broken-stick regression, this method allows researchers to quantify whether an intervention causes a significant change in the level or trend of an outcome of interest, beyond any pre-existing secular trends [23] [26].

The fundamental strength of segmented regression in ITS analysis lies in its ability to distinguish intervention effects from underlying trends that would have occurred regardless of the intervention [24]. This addresses a critical limitation of simple before-and-after comparisons, which may wrongly attribute secular trends to the intervention itself [23]. In implementation research across healthcare, public policy, and pharmaceutical development, segmented regression provides a robust analytical foundation for causal inference about real-world interventions [23] [27].

Theoretical Framework and Key Concepts

Core Model Specifications

Segmented regression models for ITS partitions a series of observations into pre- and post-intervention segments, fitting separate regression lines to each interval [26]. The most common parameterization for a continuous outcome variable involves four key parameters [23] [28]:

Table 1: Core Coefficients in Segmented Regression for ITS Analysis

Parameter Interpretation Causal Inference Question
β₀ Baseline level of outcome at time zero What was the starting level?
β₁ Pre-intervention slope (secular trend) What was the underlying trend before intervention?
β₂ Immediate level change following intervention Did the intervention cause an immediate shift?
β₃ Change in slope from pre- to post-intervention Did the intervention alter the ongoing trend?

The basic segmented regression model is expressed as:

Yt = β₀ + β₁ × time + β₂ × intervention + β₃ × post-time + εt

Where Yt is the outcome at time t, "time" indicates elapsed time since study start, "intervention" is a dummy variable (0 pre-intervention, 1 post-intervention), and "post-time" indicates time since intervention started (0 before intervention, 1,2,3... after) [25].

Critical Parameterization Distinction

Research has identified two common parameterizations of segmented regression with important interpretive differences [28] [29]. The approach by Wagner et al. uses (T-Ti) in the interaction term, where Ti is the intervention start time, making β₂ directly interpretable as the immediate level change [28]. In contrast, the approach by Bernal et al. uses simple time (T) in the interaction term, where β₂ represents the difference in intercepts at time zero rather than the immediate effect at intervention time [28] [29].

This distinction is crucial because using the incorrect interpretation can lead to erroneous conclusions about intervention effects [29]. Regardless of parameterization, the immediate level change at intervention time Ti is calculated as β₂ + β₃ × Ti in the Bernal parameterization, but is directly estimated as β₂ in the Wagner parameterization [28].

Fundamental Assumptions of Segmented Regression

For valid causal inference using segmented regression in ITS designs, several key assumptions must be met:

  • Linearity: The relationship between time and outcome within each segment is linear [23] [24].
  • Independent Errors: Residuals are independent of each other (often violated in time series data) [24].
  • Homoscedasticity: Constant variance of errors over time [23].
  • Correct Model Specification: The intervention time point is correctly identified, and the functional form is appropriate [23] [25].
  • No Omitted Confounders: No unmeasured variables simultaneously affect the intervention assignment and outcome trends [23].

The assumption of independent errors is frequently violated in time series data due to autocorrelation, where consecutive measurements are correlated [24]. When autocorrelation exists but is ignored, standard errors may be underestimated, increasing Type I error rates [24]. Appropriate statistical techniques, such as including autoregressive terms or using generalized estimating equations (GEE), should be employed to address this issue [27].

Application Protocols for Segmented Regression Analysis

Standard Two-Segment Protocol

For evaluating an intervention implemented at a single, well-defined time point:

Data Preparation

  • Collect equally spaced observations before and after intervention (minimum 8-12 points per segment recommended)
  • Create three key variables:
    • Time: continuous variable indicating time from study start
    • Intervention: dummy variable (0=pre, 1=post)
    • Post-time: time since intervention start (0 before, 1,2,3... after)

Model Specification

  • Select appropriate regression type based on outcome variable:
    • Linear regression for continuous outcomes
    • Logistic regression for binary outcomes
    • Poisson regression for count data [23]
  • Specify model: Outcome = β₀ + β₁ × time + β₂ × intervention + β₃ × post-time
  • Check residual autocorrelation using Durbin-Watson or related tests
  • If autocorrelation present, use appropriate correction (e.g., autoregressive terms, Newey-West standard errors)

Interpretation

  • Test significance of β₂ (immediate level change) and β₃ (slope change)
  • Apply multiplicity adjustment if testing both level and slope changes [23]
  • Calculate confidence intervals for intervention effects

G Segmented Regression Analysis Workflow for Interrupted Time Series start Start ITS Analysis data_prep Data Preparation: - Collect time series data - Create time variable - Create intervention dummy - Create post-time variable start->data_prep model_spec Model Specification: - Select regression type based on outcome - Specify segmented regression equation data_prep->model_spec assumption_check Assumption Checking: - Test for autocorrelation - Check linearity - Verify homoscedasticity model_spec->assumption_check assumption_check->data_prep Assumptions violated model_fit Model Fitting and Validation assumption_check->model_fit Assumptions met interpretation Interpretation and Inference model_fit->interpretation results Final Results and Reporting interpretation->results

Advanced Modeling Protocols

Three-Segment Model for Transition Periods When interventions are gradually implemented or effects manifest over a transition period [25]:

  • Identify transition period start (T₀) and end (T₂) points
  • Create a continuous function F(t) representing cumulative distribution of intervention effect during transition
  • Specify optimized segmented regression model: Yt = β₀ + β₁ × time + β₂ × F(t) × intervention + β₃ × F(t) × post-time + εt
  • Common distribution patterns for F(t) include uniform, normal, log-normal, or log-normal flip distributions [25]

Plateau Model with Estimated Breakpoint When the intervention timing or breakpoint is unknown and must be estimated from data [30]:

  • Specify nonlinear model with continuity and smoothness constraints at breakpoint
  • Use iterative procedures (e.g., PROC NLIN in SAS) to estimate breakpoint location
  • For quadratic-plateau model: pre-breakpoint (quadratic), post-breakpoint (constant)
  • Apply constraints: f(x₀) = g(x₀) and f'(x₀) = g'(x₀) at breakpoint x₀

Quantitative Data Synthesis

Table 2: Comparison of Segmented Regression Approaches for ITS Designs

Model Type Key Features Indications Statistical Considerations
Classic Two-Segment Single breakpoint at known intervention time; estimates level and slope changes Sharp, immediate interventions with known implementation timing Autocorrelation adjustment; sufficient data points per segment (≥8)
Three-Segment with Transition Models gradual implementation; accounts for transition period using CDFs Interventions phased over time; training periods; gradual effect manifestation Selection of appropriate distribution pattern for transition; sensitivity analysis for transition length
Plateau with Estimated Breakpoint Estimates breakpoint from data; continuity constraints Unknown intervention timing; natural thresholds; effect saturation Nonlinear estimation; initial parameter guesses; more complex implementation
Multivariate Segmented Multiple independent variables with potential breakpoints Complex interventions; multiple simultaneous components Increased model complexity; potential for overfitting

The Scientist's Toolkit: Essential Research Reagents

Table 3: Essential Methodological Tools for Segmented Regression Analysis

Tool/Technique Function/Purpose Implementation Examples
Segmented Package (R) Fitting segmented regression models with estimated breakpoints segmented() function for breakpoint detection and piecewise terms
PROC NLIN (SAS) Nonlinear regression for complex segmented models with constraints Plateau models with smoothness constraints at estimated breakpoints
Generalized Estimating Equations (GEE) Accounting for autocorrelation in correlated time series data proc genmod in SAS; geeglm in R for panel ITS data
Cumulative Distribution Functions (CDFs) Modeling transition periods in optimized segmented regression Uniform, normal, log-normal distributions for gradual effects
Durbin-Watson Test Detecting autocorrelation in regression residuals Statistical testing for serial correlation in time series errors

Advanced Considerations and Methodological Refinements

Addressing Common Analytical Errors

A frequent error in segmented regression parameterization involves incorrect specification of the interaction term [29]. Researchers must use the product between the intervention variable and time elapsed since intervention start (T-Ti), rather than time since study beginning (T), to obtain valid estimates of the immediate level change [29]. Simulation studies demonstrate that using the incorrect parameterization can produce substantially biased estimates of the intervention's immediate effect [29].

Handling Complex Intervention Scenarios

Multiple Intervention Components For complex interventions with several components introduced at different times:

  • Add multiple interruption points to the time series
  • Ensure sufficient data points between interventions for independent effect estimation
  • Consider structured interrupted time series to isolate component effects [24]

Multi-site ITS Designs When data comes from multiple implementation sites:

  • Perform segmented regression separately for each site, then meta-analyze results
  • Use hierarchical models with random effects for sites
  • Account for potential heterogeneity in intervention effects across sites [24] [27]

G Segmented Regression Model Decision Framework start Start known_break Known intervention time point? start->known_break immediate_effect Immediate or gradual effect? known_break->immediate_effect Yes estimate_break Estimate Breakpoint Models known_break->estimate_break No single_segment Classic Two-Segment Regression immediate_effect->single_segment Immediate transition_model Three-Segment Model with Transition Period immediate_effect->transition_model Gradual multi_site Multiple Sites? single_segment->multi_site transition_model->multi_site estimate_break->multi_site hierarchical Hierarchical Segmented Model multi_site->hierarchical Yes

Segmented regression remains the gold standard analytical method for interrupted time series designs in implementation research, providing robust causal inference about intervention effects while accounting for underlying secular trends [23] [24]. When properly specified with attention to key assumptions—particularly regarding parameterization, autocorrelation, and intervention timing—it offers researchers across scientific domains a powerful tool for evaluating real-world interventions [28] [29]. The continued development of optimized segmented regression approaches, particularly for handling transition periods and complex intervention scenarios, further enhances its applicability to contemporary implementation research challenges [25] [27].

Interrupted Time Series (ITS) analysis is a powerful quasi-experimental design for evaluating the population-level impact of health policy interventions, pharmaceutical regulations, and public health initiatives when randomized controlled trials are not feasible [13] [31]. Within this framework, Autoregressive Integrated Moving Average (ARIMA) and Seasonal ARIMA (SARIMA) models provide sophisticated analytical approaches that account for complex temporal structures, including autocorrelation, trends, and seasonal patterns, which simpler segmented regression models may inadequately capture [13] [32]. For researchers and drug development professionals, these models offer a robust methodology for determining whether an intervention—such as a new drug policy, vaccination campaign, or market approval—creates a significant deviation from pre-existing trends in outcomes like prescribing rates, disease incidence, or product demand [13] [32] [33].

The core strength of ARIMA/SARIMA models lies in their ability to model the outcome variable based on its own past values and previous forecast errors, while explicitly accounting for temporal dependencies that violate the independence assumption of standard statistical tests [13] [34]. This is particularly valuable in pharmaceutical and public health research, where data often exhibit seasonal fluctuations (e.g., annual influenza patterns, quarterly reporting cycles) and serial correlation that must be controlled to accurately isolate intervention effects [13] [32]. By properly addressing these temporal structures, ARIMA/SARIMA models reduce biased estimation of intervention impacts and provide more valid causal inference in observational settings [35].

Theoretical Foundations of ARIMA and SARIMA Models

Components of ARIMA Models

ARIMA models combine three primary components to describe and forecast time series data. The model is characterized by three parameters: (p, d, q), where:

  • AR (Autoregressive component - p): This component models the current value of the time series as a linear combination of its previous values [13] [34]. An autoregressive model of order p (AR(p)) can be expressed as:

    ( Yt = c + \phi1 Y{t-1} + \phi2 Y{t-2} + \dots + \phip Y{t-p} + \varepsilont )

    where (Yt) is the value at time t, (c) is a constant, (\phi1, \dots, \phip) are the autoregressive parameters, and (\varepsilont) represents the error term [13].

  • I (Integrated component - d): This component involves differencing the time series to make it stationary—removing trends and achieving constant statistical properties over time [13] [34] [36]. The order of differencing (d) indicates how many times differencing is applied. First-order differencing is expressed as:

    ( Yt' = Yt - Y_{t-1} )

    Stationarity is crucial for ARIMA modeling as it ensures stable parameters over time, typically verified using tests like the Augmented Dickey-Fuller (ADF) test [37] [36].

  • MA (Moving Average component - q): This component models the current value based on the residual errors from previous time points [13] [34]. A moving average model of order q (MA(q)) is expressed as:

    ( Yt = c + \varepsilont + \theta1 \varepsilon{t-1} + \theta2 \varepsilon{t-2} + \dots + \thetaq \varepsilon{t-q} )

    where (\theta1, \dots, \thetaq) are the moving average parameters [13].

The complete ARIMA(p,d,q) model combines these elements to regress the time series on its own lagged values and lagged forecast errors [34] [36].

Extension to Seasonal ARIMA (SARIMA) Models

For time series with seasonal patterns, the SARIMA model extends ARIMA by incorporating seasonal components. A SARIMA model is denoted as (p, d, q)(P, D, Q)_m, where [38] [34]:

  • (p, d, q): Non-seasonal orders (as in ARIMA)
  • (P, D, Q): Seasonal orders for the autoregressive, differencing, and moving average components, respectively
  • m: Number of periods in each seasonal cycle (e.g., 12 for monthly data with yearly seasonality, 4 for quarterly data)

The seasonal component models patterns that repeat at fixed intervals, addressing regular fluctuations such as increased prescribing of certain medications in winter months or quarterly reporting cycles in pharmaceutical sales [32] [38]. Seasonal differencing (D) removes seasonal trends, for instance, by computing ( Yt - Y{t-m} ) [34].

Model Implementation Protocol for ITS Analysis

Pre-Modeling Data Preparation and Stationarity Assessment

Step 1: Data Preprocessing

  • Address missing values through appropriate imputation methods to maintain series continuity [37].
  • Detect and normalize anomalies or outliers that may skew model estimation [37].
  • Ensure consistent sampling frequency and regular time intervals between observations [37].

Step 2: Stationarity Testing and Differencing

  • Test for stationarity using the Augmented Dickey-Fuller (ADF) test, where the null hypothesis assumes non-stationarity [37] [36]. A p-value below 0.05 typically indicates stationarity.
  • Apply differencing (d) if non-stationary: Start with first-order differencing ((Yt - Y{t-1})), and proceed to higher orders if necessary [34] [36].
  • Avoid over-differencing: An over-differenced series may still be stationary but can lead to unnecessary complexity and large standard errors [36].
  • For seasonal data, apply seasonal differencing (D) with period m (e.g., (Yt - Y{t-m}) for monthly data with m=12) [34].

Table 1: Stationarity Assessment and Differencing Guidelines

Scenario ADF Test Result Recommended Action Target d/D
No trend, constant variance p < 0.05 No differencing needed d = 0
Linear trend, constant variance p > 0.05 First-order differencing d = 1
Nonlinear trend, changing variance p > 0.05 Second-order differencing or transformation d = 2
Seasonal pattern present p > 0.05 at seasonal lags Seasonal differencing D = 1 with appropriate m

Model Identification and Parameter Selection

Step 3: Determine AR and MA Orders Using ACF and PACF

  • Plot and analyze the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) of the differenced series [13] [38] [36].
  • ACF helps identify the order of MA(q): Significant spikes at lag q suggest MA terms [37] [36].
  • PACF helps identify the order of AR(p): Significant spikes at lag p suggest AR terms [37] [36].
  • For SARIMA, examine ACF/PACF at seasonal lags (multiples of m) to identify P and Q [38].

Step 4: Model Selection and Validation

  • Fit multiple candidate models with different (p,d,q)(P,D,Q)_m parameters.
  • Compare models using information criteria such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), where lower values indicate better fit [34].
  • Perform residual analysis: Ensure residuals resemble white noise (no significant autocorrelations) and are normally distributed [13] [36].
  • Validate model stability using out-of-sample forecasting or cross-validation techniques [38].

Table 2: Interpretation of ACF and PACF Patterns for Model Identification

Pattern ACF Behavior PACF Behavior Suggested Model
AR(p) Decays exponentially or sinusoidal Significant spikes at lag p, then cuts off AR(p) with order p
MA(q) Significant spikes at lag q, then cuts off Decays exponentially or sinusoidal MA(q) with order q
ARMA(p,q) Decays after lag q Decays after lag p ARMA(p,q)
Seasonal AR Decays at seasonal lags Significant spikes at seasonal lags Seasonal AR(P)
Seasonal MA Significant spikes at seasonal lags Decays at seasonal lags Seasonal MA(Q)

Intervention Effect Quantification in ITS

Step 5: Modeling Intervention Effects

  • Incorporate intervention variables into the selected ARIMA/SARIMA model to test hypotheses about level and trend changes [13] [31].
  • Common intervention specifications include [31]:
    • Step change (abrupt permanent): Binary variable (0 pre-intervention, 1 post-intervention)
    • Pulse change (abrupt temporary): Binary variable for immediate, temporary effect
    • Ramp change (gradual): Continuous variable counting time since intervention
  • Estimate model parameters and test significance of intervention terms [13] [31].
  • Compute effect sizes with confidence intervals to quantify intervention magnitude [35].

Application in Pharmaceutical and Public Health Research

ARIMA/SARIMA models have demonstrated substantial utility across various pharmaceutical and public health research contexts:

Policy Impact Evaluation

In a study evaluating Australia's policy change restricting quetiapine prescriptions, ARIMA modeling quantified a significant reduction in inappropriate prescribing following the intervention, demonstrating its value for pharmaceutical policy analysis [13]. Similarly, research on COVID-19's impact on routine immunization in Kenya employed SARIMA to account for seasonal patterns in vaccine coverage, revealing immediate decreases in pentavalent and measles/rubella vaccine doses following pandemic onset, with recovery within approximately four months [32].

Pharmaceutical Sales and Demand Forecasting

ARIMA/SARIMA models provide critical forecasting capabilities for pharmaceutical supply chain management. Studies comparing forecasting approaches found that time series models effectively predict drug demand, enabling optimized production, inventory management, and market responsiveness [33]. Accurate forecasting is particularly valuable for pharmaceutical companies where prediction errors can significantly impact operational efficiency and resource allocation [39] [33].

Infectious Disease Surveillance and Intervention Assessment

In infectious disease research, SARIMA models help quantify the impact of public health interventions by accounting for both seasonal patterns and underlying trends [32] [31]. For example, studies have evaluated antibiotic stewardship programs, vaccination campaigns, and pandemic control measures while controlling for autocorrelation and seasonal variation in disease incidence [32] [31].

Research Reagent Solutions

Table 3: Essential Computational Tools for ARIMA/SARIMA Implementation

Tool/Software Primary Function Application Context
R statistical software (statsmodels package) Model fitting and diagnostics Comprehensive time series analysis [13] [38]
Python (statsmodels, pmdarima) Automated parameter selection and forecasting Flexible implementation with machine learning integration [37] [36]
Augmented Dickey-Fuller test Stationarity testing Determining differencing order (d) [37] [36]
ACF/PACF plots Model order identification Visual guidance for p, q, P, Q selection [13] [36]
AIC/BIC criteria Model comparison Selecting optimal parameter combinations [34]

Workflow Visualization

G Start Start: Time Series Data Preprocess Data Preprocessing (Missing values, outliers) Start->Preprocess StationarityTest Stationarity Testing (ADF test) Preprocess->StationarityTest Differencing Apply Differencing (determine d, D) StationarityTest->Differencing p > 0.05 ACF_PACF ACF/PACF Analysis (identify p, q, P, Q) StationarityTest->ACF_PACF p < 0.05 Differencing->ACF_PACF ModelFit Model Fitting (ARIMA/SARIMA) ACF_PACF->ModelFit Diagnostics Model Diagnostics (Residual analysis) ModelFit->Diagnostics Diagnostics->ACF_PACF Poor fit Intervention Add Intervention Terms (Step, pulse, ramp) Diagnostics->Intervention Residuals ≈ white noise Interpretation Effect Interpretation & Forecasting Intervention->Interpretation End Research Conclusions Interpretation->End

ARIMA/SARIMA Model Implementation Workflow for ITS Studies

Statistical Power Considerations

When designing ITS studies using ARIMA/SARIMA models, statistical power depends on several factors. Simulation studies indicate that [35]:

  • Power increases with the number of pre- and post-intervention time points
  • Power decreases as autocorrelation increases
  • Power increases with larger effect sizes
  • At least 24 time points pre- and post-intervention are typically needed to detect effect sizes of 1.0 with 80% power, depending on autocorrelation structure [35]

Smaller effect sizes (<0.5) or fewer time points may yield inadequate power, potentially leading to false conclusions about intervention effectiveness [35].

ARIMA and SARIMA models provide robust analytical frameworks for evaluating interventions in interrupted time series designs, particularly when data exhibit complex temporal structures including trends, autocorrelation, and seasonal patterns. By properly accounting for these features, researchers in pharmaceutical development and public health can obtain more valid estimates of intervention effects, leading to better-informed policy decisions and resource allocation. The structured protocol outlined in this document offers a systematic approach to model identification, estimation, and interpretation, supporting rigorous evaluation of health interventions in observational settings where randomized trials are not feasible.

Generalized Additive Models (GAMs) represent a powerful extension of Generalized Linear Models (GLMs) that replace the linear relationship between predictors and outcome with flexible smooth functions, enabling the modeling of complex, non-linear patterns without requiring prior specification of the relationship's form [40] [41]. In the context of interrupted time series (ITS) design implementation research, this flexibility is particularly valuable for evaluating policy interventions or treatment effects where the underlying trends may follow non-linear patterns that traditional segmented regression cannot adequately capture [9] [42].

The fundamental equation for a GAM can be expressed as:

[g(\mu) = \beta0 + f1(x1) + f2(x2) + \cdots + fp(x_p)]

where (g(\mu)) is the link function, (\beta0) is the intercept, and (fj(x_j)) are smooth functions of the covariates [40] [43]. This structure maintains the additivity of GLMs while allowing for non-linear relationships through the smooth functions, striking a balance between interpretability and flexibility [44] [40].

Compared to traditional linear models, GAMs offer distinct advantages for ITS research. While linear models assume a straight-line relationship between predictors and outcome, GAMs can capture complex nonlinear trends common in real-world time series data [45] [42]. Unlike polynomial regression, which can produce wild extrapolations at the endpoints (Runge's phenomenon), GAMs use smoothing splines that provide more stable behavior at data boundaries [46]. Additionally, compared to complex machine learning approaches, GAMs maintain interpretability through their additive structure, allowing researchers to understand and communicate the effect of individual variables [44] [40].

Comparative Analysis of Modeling Approaches for ITS

Table 1: Comparison of Statistical Approaches for Interrupted Time Series Analysis

Model Type Key Characteristics Assumptions Strengths Limitations
Segmented Linear Regression Assumes linear trends before and after intervention Linear relationships, independent errors Simple interpretation, widely understood [42] Poor performance with nonlinear trends [42]
ARIMA Models Accounts for autocorrelation, trend, and seasonality Stationarity after differencing Handles complex autocorrelation structures [9] Complex specification, less intuitive [9]
Generalized Additive Models (GAMs) Flexible smooth functions capture nonlinear patterns Additivity, smooth relationships Captures nonlinear trends automatically, robust to model misspecification [9] [42] Computational intensity, smoothing parameter selection [45]

Table 2: Performance Comparison of GAMs vs. Alternative Methods in Simulation Studies

Study Context Comparison Model Key Finding Performance Metric
Policy Intervention Evaluation [42] Segmented Linear Regression GAMs showed better performance with nonlinear trends, similar performance with linear trends Lower MSE and MPE for nonlinear data
Health Policy Analysis [9] ARIMA GAMs more robust to model misspecification; ARIMA more consistent with different effect sizes Model accuracy under varying conditions
Clinical Prediction Rules [43] Traditional Categorization GAM-based categorization performed similarly to continuous predictors No significant differences in AUC values

GAM Implementation Protocol for ITS Research

Data Preparation and Preprocessing

  • Temporal Alignment: Ensure time series data are equally spaced with consistent intervals between observations [42].
  • Missing Data Handling: Address missing values through appropriate imputation techniques or modeling strategies, as GAMs can accommodate missing data points [45].
  • Covariate Encoding: Encode categorical predictors (e.g., seasonality indicators) into numeric format using techniques like one-hot encoding [45].
  • Outcome Transformation: Transform the outcome variable if necessary to meet distributional assumptions (e.g., log transformation for count data) [47] [48].

Model Specification and Fitting

  • Select Smoothing Functions: Choose appropriate basis functions (e.g., thin plate regression splines, cubic regression splines) for the smooth terms [40] [42].
  • Define Intervention Effects: Specify appropriate terms to model intervention effects:
    • Immediate level changes: Indicator variable for pre/post intervention
    • Changes in trends: Interaction between time and intervention indicator
    • Lagged effects: Appropriately lagged terms based on content knowledge [9]
  • Account for Autocorrelation: Incorporate correlation structures or autoregressive terms when residuals show temporal dependence [9] [42].
  • Control for Seasonality: Include seasonal components using cyclic smooths or Fourier terms [9] [42].
  • Set Basis Complexity: Choose appropriate basis dimensions (knots) for smooth functions, balancing flexibility and overfitting [40].

Model Diagnostics and Validation

  • Residual Analysis: Check residual plots for patterns, heteroscedasticity, and autocorrelation [40] [48].
  • Basis Dimension Check: Verify that basis dimensions (k) for smooth terms are sufficient using gam.check() or similar functionality [40].
  • Model Comparison: Compare competing models using AIC, BIC, or cross-validation techniques [45] [43].
  • Temporal Validation: Validate model performance on holdout time periods to assess forecasting accuracy [42].

Effect Estimation and Interpretation

  • Intervention Effect Quantification: Calculate immediate and cumulative effects of the intervention by comparing predictions with and without the intervention [42].
  • Visualization: Generate partial dependence plots to visualize the relationship between predictors and outcome [45] [48].
  • Uncertainty Estimation: Compute confidence intervals for smooth functions and intervention effects using Bayesian posterior simulation or bootstrap methods [48].

GAM_ITS_Workflow Data_Prep Data Preparation (Time alignment, missing data, encoding) Model_Spec Model Specification (Smoothing functions, intervention terms) Data_Prep->Model_Spec Model_Fitting Model Fitting (Parameter estimation, smoothing selection) Model_Spec->Model_Fitting Diagnostics Model Diagnostics (Residual checks, basis dimension verification) Model_Fitting->Diagnostics Validation Model Validation (Temporal validation, performance assessment) Diagnostics->Validation Interpretation Effect Interpretation (Intervention effects, visualization) Validation->Interpretation Reporting Results Reporting (Effect sizes, uncertainty, visualizations) Interpretation->Reporting

GAM Implementation Workflow for ITS Studies

GAM Components and Analytical Framework

GAM_Components cluster_legend GAM Equation: g(μ) = β₀ + f₁(x₁) + f₂(x₂) + ... + fₚ(xₚ) Input Predictor Variables Smoothing Smoothing Splines (Basis functions + penalties) Input->Smoothing Additive Additive Combination Smoothing->Additive Link Link Function Additive->Link Output Expected Outcome Link->Output Components g(·) Link function (e.g., log, logit) μ Expected value of outcome β₀ Model intercept fₚ(xₚ) Smooth functions of predictors

GAM Mathematical Components and Structure

Research Reagent Solutions: Software and Computational Tools

Table 3: Essential Software Tools for Implementing GAMs in ITS Research

Tool Name Type Primary Function Key Features for ITS
mgcv (R Package) [48] [42] Software Library GAM estimation and inference Automatic smoothing parameter selection, various basis functions, AR1 error structures
pyGAM (Python Package) [46] Software Library GAM implementation in Python Multiple regression types, grid search for optimization, custom loss functions
marginaleffects (R Package) [48] Analysis Tool Post-estimation interpretation Conditional and marginal effects, predictions, hypothesis testing for GAMs
gam.check (mgcv) [40] Diagnostic Tool Model validation Basis dimension checks, residual diagnostics, QQ-plots
Thin Plate Regression Splines [42] Smoothing Method Default smoother in mgcv Optimal smoothing given basis dimension, no knot placement required

Application in Drug Development and Healthcare Research

GAMs offer particular utility in pharmaceutical and healthcare research where ITS designs are commonly employed to evaluate the impact of policy changes, treatment guidelines, or safety interventions. The non-linear modeling capability of GAMs allows researchers to detect and quantify intervention effects that may follow complex temporal patterns not captured by traditional methods [47] [42].

In a study evaluating the impact of Spain's 2012 cost-sharing reform on pharmaceutical prescriptions, GAMs revealed non-linear trends that would have been missed by segmented linear regression, providing more accurate estimates of the policy's cumulative effect [42]. Similarly, in clinical prediction research, GAMs have been used to develop optimal categorization schemes for continuous clinical variables, preserving critical prognostic information while creating clinically practical decision rules [43].

For drug safety monitoring, GAMs can model complex seasonal patterns in adverse event reports while evaluating the impact of safety warnings, and in clinical biomarker studies, GAMs have elucidated non-linear relationships between alcohol consumption and inflammatory markers like IL-6, revealing risk patterns that would be obscured by dichotomization or linear assumptions [47].

The flexibility of GAMs to accommodate non-normal distributions through appropriate link functions makes them particularly suitable for healthcare outcomes, which often include counts (e.g., hospitalizations), binary outcomes (e.g., mortality), or skewed continuous measures (e.g., healthcare costs) [41]. This distributional flexibility, combined with the ability to capture non-linear temporal patterns, positions GAMs as a powerful analytical tool for the complex longitudinal data common in pharmaceutical and health services research.

The Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the longitudinal effects of interventions implemented at a population level, such as new health policies, system changes, or drug utilization interventions [49] [50]. Within implementation research, ITS analysis enables researchers to determine whether an intervention has produced a significant effect beyond underlying trends by analyzing data points collected at regular intervals before and after an intervention point [51]. Despite its growing popularity in drug development and health services research, methodological challenges in its application persist, particularly regarding sample size determination, data aggregation strategies, and pre-specification of intervention effects [49] [51].

Recent evidence indicates substantial methodological gaps in current ITS practice. A cross-sectional survey of 153 drug utilization studies using ITS design found that only 28.1% clearly explained the rationale for using ITS, merely 13.7% clarified the rationale for their chosen model structure, and a mere 20.8% of studies using aggregated data justified the number of time points selected [51]. These identified shortcomings highlight the critical need for standardized protocols in ITS design and analysis.

This application note addresses three fundamental design considerations—sample size calculation, data aggregation principles, and pre-specification of intervention effects—to enhance the methodological rigor of ITS studies in implementation research. We provide detailed protocols and practical tools to help researchers navigate these complex methodological decisions within the context of drug development and healthcare policy evaluation.

Sample Size Calculation in ITS Designs

The Challenge of Power Calculation in Time Series

Determining adequate statistical power and sample size in ITS studies presents unique challenges compared to traditional experimental designs. The "sample size" in ITS refers to the number of time points observed before and after an intervention, while power depends on multiple factors including the magnitude of intervention effects, underlying variance, autocorrelation, and the model structure itself [49]. Unlike standard power calculations for clinical trials, ITS power analysis must account for temporal dependencies in the data, often requiring specialized simulation-based approaches.

A recent methodological review of ITS studies in drug utilization research revealed that only 20.8% of studies using aggregated data provided justification for their selected number of time points, indicating a substantial gap in current reporting practices [51]. This omission is critical because underpowered ITS studies may fail to detect clinically significant intervention effects, while overly long series may waste resources and potentially introduce confounding from external factors.

Simulation-Based Power Analysis

Simulation-based approaches represent the current best practice for power calculation in ITS designs [49] [52]. These methods involve generating multiple synthetic datasets with known effect sizes under various assumptions about the data structure, then analyzing each dataset to estimate the probability of detecting the specified effects (statistical power).

Table 1: Key Parameters for Simulation-Based Power Analysis in ITS Studies

Parameter Category Specific Parameters Considerations
Effect Size Level change (β₂), Slope change (β₄) Based on clinically meaningful difference or policy-relevant threshold
Time Series Structure Number of pre- and post-intervention points, Total series length Balance between statistical power and practical feasibility
Statistical Properties Autocorrelation (ρ), Variance (σ²), Seasonality patterns Estimate from preliminary data or literature review
Model Specifications Regression type (linear, Poisson, negative binomial), Inclusion of controls Match to outcome data type and study design

For continuous outcomes, the segmented autoregressive model can be specified as:

Yₜ = β₀ + β₁Tₜ + β₂Xₜ + β₃(Tₜ - t₁)Xₜ + εₜ

where εₜ = ρεₜ₋₁ + uₜ, uₜ ~ N(0, σ²) [49]

For count outcomes, which are common in drug utilization research (e.g., monthly prescriptions, adverse events), observation-driven models such as Poisson or negative binomial regression with lagged terms on the conditional mean are appropriate [52]. The power to detect the same magnitude of parameters varies considerably depending on whether testing focuses on level changes, trend changes, or both, necessitating careful pre-specification of primary hypotheses [52].

Practical Protocol for Sample Size Determination

Protocol 1: Simulation-Based Power Calculation for ITS Studies

  • Define Primary Hypothesis: Clearly specify whether the expected intervention effect manifests as an immediate level change, a slope change, or both. This determines the key parameters (β₂, β₄, or both) for power calculation.

  • Estimate Baseline Parameters:

    • Collect preliminary data or use literature estimates to determine baseline values (β₀, β₁)
    • Estimate autocorrelation (ρ) and variance (σ²) from historical data
    • For count outcomes, estimate the dispersion parameter
  • Specify Effect Sizes: Define clinically or policy-relevant effect sizes for level changes (β₂) and/or slope changes (β₄). Conduct power calculations across a plausible range of values.

  • Implement Simulation Code:

    • Develop software to generate multiple datasets (typically 1000+ replications) under specified parameters
    • For each dataset, fit the proposed ITS model and record whether the null hypothesis is rejected
    • Calculate power as the proportion of replications with statistically significant results
  • Vary Key Parameters Systematically: Execute simulations across different combinations of:

    • Number of pre- and post-intervention time points
    • Autocorrelation values (from -0.9 to 0.9)
    • Effect sizes of interest
  • Create Power Curves: Generate graphical representations showing statistical power as a function of the number of time points for different effect sizes and autocorrelation values.

  • Select Final Design: Choose the number of time points that provides adequate power (typically 80% or higher) for clinically relevant effect sizes while considering practical constraints.

Table 2: Exemplary Power Analysis Results for Different ITS Scenarios

Scenario Pre/Post Points Autocorrelation (ρ) Level Change Slope Change Achieved Power
Base Case 24/24 0.2 0.5 SD 0.1 SD/t 82%
High AC 24/24 0.6 0.5 SD 0.1 SD/t 64%
Longer Series 36/36 0.2 0.5 SD 0.1 SD/t 92%
Larger Effect 24/24 0.2 0.8 SD 0.15 SD/t 96%
Count Outcome 24/24 0.2 IRR=1.5 OR=1.2/t 85%

Data Aggregation Strategies

Principles of Data Aggregation in ITS

Data aggregation refers to the process of combining individual-level data into meaningful temporal units for analysis. Appropriate aggregation is critical in ITS designs as it directly affects the interpretation of intervention effects and statistical properties of the time series. Most ITS studies (97.4% in recent surveys) use aggregated data as the unit of analysis, with monthly intervals being the most common approach (73.8%) [51].

The choice of aggregation level involves balancing statistical precision, methodological requirements, and clinical relevance. Longer intervals (e.g., monthly or quarterly) typically reduce variability and mitigate autocorrelation issues but may obscure brief intervention effects or precise timing of changes. Shorter intervals (e.g., daily or weekly) offer finer temporal resolution but often exhibit higher variability and stronger autocorrelation.

Aggregation Protocol for Drug Utilization Research

Protocol 2: Systematic Approach to Data Aggregation in ITS Studies

  • Define Temporal Unit Based on Intervention Mechanism:

    • For interventions with immediate effects (e.g., new prescribing restrictions), consider shorter intervals (weekly)
    • For interventions with gradual implementation (e.g., educational campaigns), longer intervals (monthly) may be appropriate
    • Align aggregation with natural reporting cycles (e.g., monthly prescription data)
  • Ensure Consistency in Pre- and Post-Intervention Periods: Maintain identical aggregation units throughout the entire study period to avoid artificial discontinuities.

  • Address Missing Data Proactively:

    • Establish protocols for handling missing aggregated values before analysis
    • Consider multiple imputation techniques specifically designed for time series data
    • Document all missing data and imputation approaches thoroughly
  • Validate Aggregation Level:

    • Conduct sensitivity analyses using different aggregation levels when feasible
    • Assess autocorrelation at chosen interval; consider adjustment if high autocorrelation is present
    • Ensure sufficient data points for statistical power (typically 12+ pre- and post-intervention points)
  • Account for Seasonal Patterns:

    • For data with suspected seasonality, ensure aggregation level captures seasonal cycles
    • Consider using matching periods (e.g., January-December comparisons) to control for seasonal effects
  • Document Rationale Explicitly: Justify the chosen aggregation level based on intervention characteristics, data availability, and statistical considerations.

G cluster_options Aggregation Options start Start: Individual-Level Data step1 Step 1: Assess Intervention Mechanism & Timing start->step1 step2 Step 2: Evaluate Data Availability & Quality step1->step2 step3 Step 3: Consider Seasonal Patterns & Cycles step2->step3 step4 Step 4: Select Preliminary Aggregation Level step3->step4 step5 Step 5: Validate Statistical Properties step4->step5 Select interval weekly Weekly (High resolution) step4->weekly monthly Monthly (Balanced approach) step4->monthly quarterly Quarterly (Low variability) step4->quarterly step6 Step 6: Conduct Sensitivity Analysis step5->step6 step6->step4 Adjust if needed final Final Aggregation Decision step6->final Validation successful

Data Aggregation Decision Pathway

The flowchart above illustrates a systematic approach to selecting appropriate data aggregation levels in ITS studies, balancing methodological requirements with practical considerations.

Pre-Specifying Intervention Effects

The Importance of A Priori Hypothesis Specification

Pre-specification of hypothesized intervention effects represents a critical safeguard against Type I errors and data-driven conclusions in ITS analyses. Recent evidence indicates that only 13.7% of ITS studies in drug utilization research provide clear justification for their selected model structure [51]. Furthermore, approximately 15 studies provided incorrect interpretation of level change parameters due to improper time parameterization, highlighting the need for greater precision in model specification [51].

Pre-specification involves explicitly stating the expected nature, timing, direction, and magnitude of intervention effects before data collection or analysis. This practice enhances research transparency, minimizes analytical flexibility, and strengthens causal inferences drawn from ITS designs.

Structured Approach to Intervention Effect Specification

Protocol 3: Pre-Specification Framework for Intervention Effects

  • Define Effect Mechanism:

    • Immediate Level Change: Abrupt, permanent shift in outcome level post-intervention
    • Slope Change: Gradual change in outcome trajectory post-intervention
    • Combined Effects: Both immediate and gradual changes
    • Transitional/Phase-in Effects: For multi-phase interventions with ramp-up periods [49]
  • Specify Timing Relationships:

    • Define precise intervention time point(s)
    • For delayed effects, specify expected lag period between implementation and effect
    • For multi-component interventions, specify sequence of implementation and expected effect timing for each component
  • Quantify Expected Effect Size:

    • Specify magnitude of level changes in clinically interpretable units
    • Define slope changes in units per time period
    • Justify effect sizes based on prior evidence or clinical significance
  • Document Functional Form:

    • Specify complete regression model with all parameters
    • Clearly define time metrics and parameterization approach
    • For three-phase ITS designs, extend model accordingly [49]:

Yₜ = β₀ + β₁Tₜ + β₂Xₜ⁽¹⁾ + β₃Xₜ⁽²⁾ + β₄(Tₜ - t₁)Xₜ¹ + β₅(Tₜ - t₂)Xₜ⁽²⁾ + εₜ

  • Address Methodological Complexities:
    • Pre-specify approach to handling autocorrelation
    • Define strategy for seasonality adjustment if needed
    • For studies with hierarchical data, specify multilevel structure and random effects

G cluster_mechanisms Effect Mechanism Options start Start: Intervention Characterization mechanism Define Effect Mechanism start->mechanism timing Specify Timing Relationships mechanism->timing level Immediate Level Change mechanism->level slope Slope Change mechanism->slope combined Combined Level & Slope Change mechanism->combined phased Phased/Transitional Effects mechanism->phased magnitude Quantify Expected Effect Size timing->magnitude model Document Functional Form & Model magnitude->model complexities Address Methodological Complexities model->complexities final Pre-Specification Complete complexities->final

Intervention Effect Pre-Specification Workflow

The diagram above outlines a systematic workflow for pre-specifying intervention effects in ITS studies, ensuring transparent and methodologically sound hypothesis development.

Three-Phase ITS Designs for Complex Interventions

For interventions with phased implementation or ramp-up periods, the standard two-phase ITS model may be insufficient. In such cases, three-phase ITS designs more accurately capture the intervention's temporal structure [49]. These designs are particularly relevant for:

  • Interventions requiring implementation period: When full implementation occurs gradually after initial introduction
  • Multi-component interventions: When different intervention components are introduced sequentially

In three-phase ITS designs, the model includes two change points (t₁ and t₂) representing transitions between pre-implementation, ramp-up/partial implementation, and full implementation phases [49]. The coefficients β₂ and β₃ represent immediate level changes following the first and second transitions, while β₄ and β₅ represent slope changes during the second and third phases.

The Scientist's Toolkit: Essential Research Reagents

Table 3: Essential Methodological Tools for ITS Implementation Research

Tool Category Specific Tool/Technique Function/Purpose
Statistical Software R packages (tseries, forecast, segmented) Conduct time series analysis, including autocorrelation testing and segmented regression
Power Analysis Tools Simulation-based power calculation scripts [49] [52] Determine required number of time points for adequate statistical power
Data Management Platforms Electronic Data Capture (EDC) systems, Interactive erroneous data platforms [53] Identify potential outliers, perform quality checks, and manage temporal data
Visualization Tools Interactive Tables, Listings, and Figures (TLFs) [53] Monitor data patterns, identify trends, and communicate findings effectively
Model Specification Aids Segmented regression templates, Three-phase ITS code [49] Implement correct model parameterization and avoid interpretation errors
Bias Assessment Tools Autocorrelation tests (Durbin-Watson, Ljung-Box), Seasonality diagnostics Identify and address threats to validity in time series analysis

Integrated Experimental Protocol

Comprehensive Protocol 4: Implementing a Rigorous ITS Study

  • Pre-Study Planning Phase:

    • Clearly articulate primary research question and causal hypothesis
    • Conduct systematic review of existing evidence to inform effect size estimates
    • Pre-specify primary analysis model including all parameters
    • Perform simulation-based power analysis to determine required time points
    • Document all pre-specification decisions in analysis plan
  • Data Collection and Aggregation:

    • Establish prospective data collection system with consistent temporal intervals
    • Implement quality control procedures for data aggregation
    • Monitor data completeness and address missing data proactively
    • For hierarchical data, document cluster structure and sample sizes at each level
  • Analytical Phase:

    • Conduct exploratory analysis to assess autocorrelation, stationarity, and seasonality
    • Execute pre-specified primary analysis without deviation
    • Perform sensitivity analyses addressing methodological assumptions
    • For three-phase designs, correctly interpret level and slope change parameters [49]
  • Reporting and Interpretation:

    • Report complete regression model with all parameter estimates
    • Clearly distinguish pre-specified primary analyses from exploratory analyses
    • Interpret findings in context of pre-specified hypotheses and effect sizes
    • Discuss limitations, particularly regarding power and generalizability

Methodologically rigorous ITS designs require careful attention to sample size determination, data aggregation strategies, and pre-specification of intervention effects. The protocols and tools provided in this application note address critical gaps identified in current practice, particularly the underutilization of power analysis, inadequate justification of aggregation levels, and insufficient model specification. By adopting these structured approaches, researchers in drug development and implementation science can enhance the validity, transparency, and interpretability of their ITS studies, ultimately contributing to more robust evidence for healthcare decision-making.

Future methodological development should focus on standardized power calculation tools for complex ITS designs, improved handling of hierarchical structures in time series data, and best practice guidelines for reporting ITS analyses in implementation research contexts.

Interrupted Time Series (ITS) design is a powerful quasi-experimental method used to evaluate the effects of interventions introduced at a specific point in time. In drug utilization research, these interventions can range from new clinical guidelines and drug pricing policies to prescription restrictions and public health campaigns [51]. The core strength of ITS analysis lies in its ability to model longitudinal data, using pre-intervention trends to forecast a counterfactual—what would have happened in the absence of the intervention—and then comparing this forecast to the observed post-intervention data [5]. This makes ITS particularly valuable when randomized controlled trials (RCTs) are impractical, unethical, or too costly, such as when evaluating population-level health policies [51] [5].

ITS design offers a significant advantage over simple before-and-after studies by using multiple data points before and after the intervention. This allows researchers to account for underlying secular trends and natural fluctuations in the data, thereby providing a more robust estimate of the intervention effect [5]. A well-executed ITS can estimate two primary effects: an immediate level change following the intervention and a sustained slope change in the trend over time [5].

Study Design and Data Preparation

Key Design Considerations

Before commencing data analysis, several critical design elements must be addressed to ensure the validity of the ITS study. The foundational assumption of any ITS is that, in the absence of the intervention, the pre-intervention trend would have continued unchanged into the post-intervention period [5]. Violations of this assumption lead to biased results.

  • Rationale and Definition: Clearly articulate why ITS is the appropriate design for the research question. Pre-specify the intervention's precise start date and provide a clear rationale for the chosen ITS model structure. A survey of ITS studies found that only 28.1% explained the rationale for using ITS, and merely 13.7% clarified the rationale for their model specification [51].
  • Data Source Selection: Drug utilization data often comes from hospital records, insurance claims, or other administrative databases [51]. The unit of analysis is typically aggregated (e.g., monthly prescription counts) rather than individual-level.
  • Time Points and Sample Size: Determine the number and frequency of time points (e.g., monthly, quarterly). A review of recent ITS studies found a median of 48 total time points [51]. Only 20.8% of studies provided a rationale for this number, underscoring the need for a priori sample size considerations where possible.
  • Control for Confounding: Be aware of time-varying confounders, such as changes in the patient population's characteristics or simultaneous interventions that could affect the outcome. Only 15.7% of studies in the survey considered time-varying participant characteristics [51]. Using a control series (e.g., a region without the policy change) can significantly strengthen the study design, though it was employed in only 12.4% of surveyed studies [51].

Data Structure and Workflow

A properly structured dataset and a systematic workflow are prerequisites for a successful ITS analysis. The diagram below outlines the key stages from data preparation to interpretation.

G Start Start: Raw Dataset (Individual-level records) Step1 1. Data Aggregation (Aggregate by time unit, e.g., month) Start->Step1 Step2 2. Variable Creation (Create time and intervention variables) Step1->Step2 Step3 3. Data Quality Checks (Handle missing data, check for stationarity) Step2->Step3 Step4 4. Model Specification & Fitting (Segmented regression, account for autocorrelation) Step3->Step4 Step5 5. Model Validation (Check residuals, assumptions) Step4->Step5 Step6 6. Effect Estimation & Interpretation (Level & slope changes, confidence intervals) Step5->Step6 End End: Final Report & Conclusions Step6->End

Statistical Analysis Protocol

Core Segmented Regression Model

Segmented regression is the most frequently used method for analyzing ITS data [51] [5]. It models the outcome as a function of time and the intervention, allowing for separate intercepts and slopes before and after the intervention. The primary statistical model can be formulated as:

Yₜ = β₀ + β₁ × Tₜ + β₂ × Xₜ + β₃ × (Tₜ - T_intervention) × Xₜ + εₜ

Table 1: Variables and parameters in the core ITS segmented regression model.

Variable/Parameter Symbol Description
Outcome Yₜ The drug utilization measure (e.g., consumption rate) at time t.
Baseline Level β₀ The starting level of the outcome at the beginning of the time series.
Time Tₜ The time elapsed since the start of the observation period (e.g., 1, 2, 3...).
Pre-Intervention Trend β₁ The slope (trend) of the outcome during the pre-intervention period.
Intervention Indicator Xₜ A dummy variable: 0 for pre-intervention points, 1 for post-intervention.
Immediate Level Change β₂ The estimated immediate change in the outcome level following the intervention.
Time Post-Intervention (Tₜ - T_intervention) Time since the intervention began (0 at the interruption point, then 1, 2...).
Slope Change β₃ The estimated difference between the pre- and post-intervention slopes.
Error Term εₜ The random, unexplained variability at time t; must be checked for autocorrelation.

Addressing Key Methodological Issues

Failure to account for complex statistical properties of time series data is a common source of bias. The following table summarizes critical issues and recommended actions, based on common deficiencies found in recent literature [51].

Table 2: Key methodological issues and analytical responses in ITS analysis.

Methodological Issue Analytical Consideration Recommended Action
Autocorrelation Successive observations are correlated. Use Durbin-Watson or Ljung-Box tests to detect it. Employ Prais-Winsten regression, ARIMA, or Generalized Least Squares (GLS) to correct for it.
Seasonality Periodic, predictable fluctuations (e.g., monthly). Include seasonal terms (e.g., sine/cosine functions) or dummy variables in the model.
Non-Stationarity The mean or variance of the series changes over time. Use Dickey-Fuller test. If non-stationary, de-trend or difference the data.
Model Specification Incorrectly interpreting parameters. Ensure β₂ is correctly interpreted as the immediate level change at the intervention point, not a change at the end of the series [51].
Hierarchical Data Data clustered within hospitals or regions. Use mixed-effects (multilevel) models to account for within-cluster and between-cluster heterogeneity.

Interpretation and Reporting of Results

Estimating and Interpreting Intervention Effects

The coefficients from the segmented regression model provide the estimates for the core intervention effects. The immediate effect of the intervention is given directly by β₂. The change in the trend is given by β₃. The post-intervention slope is calculated as (β₁ + β₃). It is crucial to report both the point estimates and their confidence intervals to convey the precision of these estimates.

A common pitfall in interpretation involves the parameterization of time. In the model presented in Table 1, β₂ represents the change in level that occurs immediately after the intervention, comparing the observed value to the value predicted by the pre-intervention trend. Some studies have incorrectly specified the model, leading to a misinterpretation of this parameter [51].

The Scientist's Toolkit: ITS Research Reagents

Table 3: Essential "research reagents" and resources for implementing an ITS study in drug utilization research.

Item / Concept Function / Purpose in ITS Analysis
Segmented Regression The primary statistical model used to estimate level and slope changes associated with an intervention.
Autocorrelation Function (ACF) Plot A diagnostic plot used to visualize and identify autocorrelation in the time series residuals.
Durbin-Watson Statistic A statistical test used to detect the presence of autocorrelation in the residuals from a regression analysis.
ARIMA Model (Autoregressive Integrated Moving Average) An alternative to segmented regression for complex time series, often better at modeling autocorrelation and seasonality.
Control Series A parallel time series not exposed to the intervention, used to strengthen causal inference by accounting for external trends.
Sensitivity Analysis A set of additional analyses (e.g., using different model specifications or excluding specific time points) to test the robustness of the primary findings.

Overcoming Common ITS Pitfalls: A Guide to Troubleshooting and Optimizing Your Analysis

In Interrupted Time Series (ITS) design implementation research, a fundamental assumption is that observations are independent. However, data collected sequentially over time often violate this assumption due to serial correlation between adjacent measurements, a phenomenon known as autocorrelation [5] [54]. Positive autocorrelation, where consecutive values are more similar than distant ones, is most common and leads to * underestimated standard errors, *inflated Type I error rates, and potentially spurious conclusions about intervention effects [54] [55]. Addressing autocorrelation is therefore not merely a statistical formality but a critical step for ensuring the validity of causal inferences in ITS studies, particularly in drug development and public health intervention research where these designs are frequently employed when randomized trials are infeasible [5] [6]. This protocol details robust methods for detecting and correcting for autocorrelation to safeguard the integrity of research findings.

Detection of Autocorrelation

Analytical Testing: The Durbin-Watson Test

The Durbin-Watson (DW) test is a widely used statistical test for detecting lag-1 autocorrelation [54].

  • Purpose: To test the null hypothesis that the autocorrelation of the residuals is zero.
  • Test Statistic: The DW statistic ( d ) ranges from 0 to 4. A value of approximately 2 suggests no autocorrelation. A value toward 0 indicates positive autocorrelation, and a value toward 4 indicates negative autocorrelation.
  • Interpretation: The calculated statistic is compared to critical values from the Durbin-Watson table based on the sample size and number of predictors.
  • Limitations: Simulation studies indicate that the DW test performs poorly in detecting autocorrelation except in long time series or when autocorrelation is large. It should not be relied upon exclusively, especially in short series [54].

Visual Inspection: Residual Plots

Visual analysis of the residuals from a preliminary ordinary least squares (OLS) regression model is a fundamental and highly recommended step.

  • Procedure: Plot the model residuals against time.
  • Interpretation:
    • No Autocorrelation: Residuals fluctuate randomly around zero, with no discernible pattern.
    • Positive Autocorrelation: Residuals form a smooth, wave-like pattern, where successive residuals are close in value (a "runs" pattern).
    • Negative Autocorrelation: Residuals oscillate frequently, with successive residuals tending to be on opposite sides of zero.

Model Correction Methods

When autocorrelation is detected, several statistical methods can be employed to obtain unbiased parameter estimates and valid standard errors. The choice of method depends on the length of the time series and the magnitude of the autocorrelation.

Feasible Generalized Least Squares (FGLS)

FGLS methods, such as Prais-Winsten (PW) and Cochrane-Orcutt (CO), are common approaches that use an iterative process to estimate the autocorrelation parameter (ρ) and transform the data to remove the correlation structure [54] [6].

  • Prais-Winsten (PW): This method transforms all data points, including the first observation, making it more efficient for small samples [54].
  • Process:
    • Fit an initial OLS model and obtain the residuals.
    • Estimate ρ from the residuals.
    • Transform the dependent and independent variables using the estimated ρ.
    • Fit an OLS model to the transformed data.
    • Iterate steps 2-4 until the estimate of ρ converges.

Maximum Likelihood Estimation

Restricted Maximum Likelihood (REML) is a preferred method for longer time series as it reduces bias in the estimation of variance components, including the autocorrelation parameter [54] [6].

  • Advantage: Less biased estimates of autocorrelation compared to other methods, leading to more accurate standard errors and confidence intervals [54].
  • Performance: Empirical evaluations show that REML performs well, especially in series with 12 or more time points, and is a recommended method for obtaining reliable results [54] [6].

Heteroskedasticity and Autocorrelation Consistent (HAC) Standard Errors

The Newey-West (NW) estimator corrects the OLS model's standard errors to account for both autocorrelation and heteroskedasticity, without changing the point estimates of the regression coefficients [54] [55] [6].

  • Advantage: Simple implementation, as the original regression parameters are retained.
  • Disadvantage: May be less efficient than FGLS or REML, particularly with high levels of autocorrelation [55].

Autoregressive Integrated Moving Average (ARIMA) Models

ARIMA models explicitly model the autocorrelation structure within the data-generating process itself [5] [6]. An ARIMA(p,d,q) model can be combined with independent variables (ARIMAX) to assess intervention effects.

  • Application: This method directly models the correlation of the outcome variable with its own past values (autoregressive component) and past error terms (moving average component).
  • Complexity: Requires more statistical expertise for model identification and fitting compared to segmented regression approaches.

The table below provides a comparative summary of these correction methods.

Table 1: Comparison of Statistical Methods for Correcting Autocorrelation in ITS Studies

Method Key Principle Advantages Disadvantages/Limitations
Prais-Winsten (PW) FGLS estimation with data transformation. More efficient for small samples than CO as it uses all data. Performance can be poor in very short series.
Restricted Maximum Likelihood (REML) Maximizes a likelihood function separated from fixed effects. Less biased estimate of autocorrelation; good performance in longer series. Computationally more intensive.
Newey-West (NW) Corrects OLS standard errors for autocorrelation. Simple; preserves original OLS coefficient estimates. Can be inefficient with high autocorrelation [55].
ARIMA Explicitly models the autocorrelation structure. Very flexible for complex time series patterns. Complex model identification and fitting.

Application Protocol: A Step-by-Step Workflow

The following diagram illustrates the logical workflow for addressing autocorrelation in an ITS analysis.

G Start Start: Fit Initial OLS Model Detect Detect Autocorrelation Start->Detect DWTest Durbin-Watson Test Detect->DWTest VisCheck Visual Check of Residuals Detect->VisCheck Decision1 Significant Autocorrelation? DWTest->Decision1 VisCheck->Decision1 ChooseMethod Choose Correction Method Decision1->ChooseMethod Yes Report Report Final Model Decision1->Report No SeriesLength Based on Series Length ChooseMethod->SeriesLength MethodA Series < 12 points? Prefer OLS with caution SeriesLength->MethodA Yes MethodB Series ≥ 12 points? SeriesLength->MethodB No MethodA->Report MethodB1 Prefer REML or Prais-Winsten MethodB->MethodB1 Yes MethodC Need to preserve OLS coefficients? Use Newey-West MethodB->MethodC No MethodB1->Report MethodC->Report

Figure 1. A logical workflow for detecting and correcting for autocorrelation in Interrupted Time Series (ITS) analysis.

Step-by-Step Instructions

  • Fit an Initial Model: Begin by fitting a segmented regression model using Ordinary Least Squares (OLS) to estimate the initial level, trend, and intervention effects [5] [6].
  • Test for Autocorrelation: Extract the residuals from the OLS model. Perform the Durbin-Watson test and create a plot of residuals over time to visually inspect for patterns [54].
  • Evaluate the Need for Correction: If both the statistical test and visual inspection indicate negligible autocorrelation, the OLS model may be sufficient. Proceed with caution, as the DW test has low power in short series.
  • Select and Apply a Correction Method: If significant autocorrelation is present, select an appropriate correction method based on the length of your time series and research objectives, as guided by the workflow in Figure 1 and the comparisons in Table 1.
  • Report the Final Analysis: Clearly document the entire process in research outputs: specify the initial detection method, the rationale for the chosen correction technique, and the final parameter estimates with corrected standard errors [6].

The Scientist's Toolkit: Essential Reagents and Materials

Table 2: Key Research Reagent Solutions for ITS Analysis

Item Name Function/Application in ITS Research
Statistical Software (R/Stata) Platform for implementing segmented regression, conducting autocorrelation tests (e.g., Durbin-Watson), and applying correction methods (e.g., Prais-Winsten, REML, Newey-West).
Segmented Regression Code Pre-written scripts (e.g., in R or Stata) to specify the ITS model, which includes terms for baseline level, pre-intervention slope, change in level, and change in slope [5] [6].
Durbin-Watson Test Function A standardized statistical function within software packages used to formally test the null hypothesis of no first-order autocorrelation in the model residuals [54].
Longitudinal Dataset The primary data reagent. A structured dataset collected at regular intervals over time, with a sufficient number of pre- and post-interruption points (recommended minimum of 3 each, but more are preferred) [5].
Data Extraction Tool (e.g., WebPlotDigitizer) Software used to extract raw numerical data from published graphs in literature reviews when original datasets are unavailable, enabling re-analysis or meta-analysis [6].

In Interrupted Time Series (ITS) design, the accurate estimation of an intervention's effect depends entirely on the proper modeling of the underlying pre-existing trends and seasonal patterns [56]. A stationary time series is one whose statistical properties—such as mean, variance, and autocorrelation—do not change over time [57]. Conversely, non-stationarity manifests through trends (long-term increase or decrease), seasonality (periodic fluctuations), changing variance, or structural breaks [58] [59]. In public health and drug development research, failure to address these components can lead to severely biased estimates of intervention effectiveness. For instance, a gradual, pre-existing improvement in a health outcome could be mistakenly attributed to a new policy or treatment [56]. This article provides application notes and protocols for handling non-stationarity and seasonality within ITS studies, ensuring robust and interpretable results.

Core Concepts and Definitions

Non-Stationarity: Forms and Implications

Non-stationarity presents a fundamental challenge for ITS analysis because it violates the assumption that the pre-intervention trend would have remained stable in the absence of the intervention [56].

  • Trend: A long-term upward or downward drift in the data. In ITS, this is modeled directly via the time index (T_t) [56].
  • Seasonality: Regular, periodic fluctuations that occur at fixed intervals (e.g., daily, weekly, or yearly cycles). Drug sales data often exhibit strong seasonal patterns [57] [59].
  • Changing Variance (Heteroscedasticity): A non-constant spread of data points around the mean over time [59].
  • Structural Breaks: Sudden shifts in the time series properties, which can be confused with or mask an intervention effect if they occur concurrently [56].

The Role of Differencing and Decomposition

Differencing and decomposition are two primary techniques used to transform a non-stationary series into a stationary one or to isolate and remove nuisance components [57] [59].

  • Differencing works by computing the changes between consecutive observations, effectively removing trends and reducing autocorrelation [57].
  • Decomposition separates the time series into constituent components—typically trend, seasonal, and residual (remainder)—allowing analysts to model or remove each part systematically [60].

Table 1: Summary of Non-Stationarity Components and Their Treatments

Component Description Primary Handling Techniques
Trend Long-term upward or downward movement First-differencing, detrending with regression [59]
Seasonality Regular, repeating patterns Seasonal differencing, seasonal decomposition (STL/MSTL) [57] [60]
Changing Variance Non-constant spread of data over time Logarithmic, Box-Cox, or Yeo-Johnson transformations [58] [59]
Autocorrelation Correlation between consecutive observations Differencing, Autoregressive (AR) models [59]

Detection and Diagnostic Protocols

Visual Inspection Protocol

Objective: To identify obvious trends, seasonal patterns, and structural breaks through graphical analysis.

  • Procedure: a. Create a time plot of the raw data. b. Visually inspect for a sustained upward or downward slope (trend). c. Look for repeating peaks and troughs at fixed intervals (seasonality). d. Check for periods where the variability of the data appears to change dramatically. e. In ITS, superimpose a vertical line at the intervention time (T*) to visually assess level and slope changes [56].
  • Interpretation: While visualization is subjective, it provides the initial hypothesis about the nature of non-stationarity, which should be confirmed with statistical tests [58].

Statistical Testing for Stationarity

Objective: To formally test the null hypothesis of non-stationarity (or stationarity) using statistical tests.

Table 2: Statistical Tests for Stationarity

Test Null Hypothesis (H₀) Alternative Hypothesis (H₁) Interpretation & Use in ITS
Augmented Dickey-Fuller (ADF) The series has a unit root (non-stationary) [59]. The series is stationary [59]. A small p-value (e.g., <0.05) leads to rejecting H₀, suggesting stationarity. Used to confirm if differencing is needed.
KPSS Test The series is trend-stationary [57] [59]. The series has a unit root (non-stationary) [57] [59]. A large test statistic (p-value <0.05) leads to rejecting H₀, suggesting differencing is required. Often used in conjunction with ADF [57].

Protocol for Determining Differencing:

  • Apply the KPSS test to the original series.
  • If the null hypothesis is rejected (p-value < 0.05), the series is non-stationary; apply first differencing.
  • Re-apply the KPSS test to the differenced series. If H₀ is still rejected, consider second-order differencing (rarely needed in practice) [57].
  • The ndiffs() function in R automates this process [57].

Protocol for Determining Seasonal Differencing:

  • Visually inspect the data and ACF plot for strong seasonal cycles.
  • Use the nsdiffs() function in R, which uses a measure of seasonal strength. A result of 1 suggests one round of seasonal differencing is required [57].
  • For complex data with multiple seasonal patterns (e.g., hourly), MSTL decomposition can be more effective than simple differencing [60].

Application Techniques and Protocols

Differencing Techniques

Objective: To remove trend and seasonality by computing changes between observations.

Protocol 1: First Differencing

  • Calculation: For a time series y_t, the first difference is y'_t = y_t - y_{t-1} [57].
  • Interpretation: The new series y' represents the period-to-period change. In a random walk model, if the differenced series is white noise, the original model is y_t = y_{t-1} + ε_t [57].
  • Application in ITS: Can be applied to the pre-intervention data to achieve stationarity before fitting models like ARIMA.

Protocol 2: Seasonal Differencing

  • Calculation: For a series with seasonal period m, the seasonal difference is y'_t = y_t - y_{t-m} [57]. For monthly data with yearly seasonality, m=12.
  • Interpretation: This computes the difference between an observation and the corresponding observation from the previous season.
  • Application: If a seasonally differenced series appears to be white noise, a simple model for the original data is y_t = y_{t-m} + ε_t, which underpins seasonal naïve forecasts [57].

Protocol 3: Combined Differencing

  • Procedure: When both trend and seasonality are present, apply seasonal differencing first. If a trend remains, apply first differencing to the seasonally differenced series [57].
  • Equation: The combined difference is y''_t = (y_t - y_{t-m}) - (y_{t-1} - y_{t-m-1}) [57].
  • Order: It makes no mathematical difference which is done first, but applying seasonal differencing first is often more interpretable [57].

Decomposition Techniques

Objective: To separate a time series into trend, seasonal, and residual components.

Protocol 4: STL Decomposition (Single Seasonality)

  • Method: STL (Seasonal-Trend decomposition using Loess) is a robust method for decomposing a series with one seasonal period [60].
  • Procedure: a. In Python, use from statsmodels.tsa.seasonal import STL. b. Specify the seasonal parameter, which must be an odd integer (e.g., 13 for yearly seasonality in monthly data) [59]. c. Fit the model: res = stl.fit(). d. Access components: res.trend, res.seasonal, res.resid.
  • Application: The seasonally adjusted series is calculated as ts_seasonal_adj = ts - res.seasonal and can be used for further modeling [59].

Protocol 5: MSTL Decomposition (Multiple Seasonality)

  • Method: MSTL extends STL to handle multiple seasonal patterns (e.g., hourly data with daily, weekly, and yearly seasonality) [60].
  • Procedure: a. In Python, use from statsmodels.tsa.seasonal import MSTL. b. Specify the periods parameter with a tuple of all seasonal cycles (e.g., periods=(24, 24*7) for daily and weekly patterns in hourly data). c. Fit the model: results = mstl.fit(). The result object contains a DataFrame with multiple seasonal components.
  • Interpretation: The decomposition follows an additive model: Observation = Trend + Seasonal_24 + Seasonal_168 + Residual [60].

G A Original Non-Stationary Time Series B Visual Inspection & Statistical Tests (ADF/KPSS) A->B C Stationary? B->C D Identify Component(s) - Trend - Seasonality - Multiple Seasonality C->D No G Proceed to ITS Analysis C->G Yes E1 Apply Differencing (First, Seasonal, or Combined) D->E1 E2 Apply Decomposition (STL, MSTL) D->E2 F Model Stationary Residuals E1->F E2->F F->G

Figure 1: Workflow for handling non-stationarity and seasonality in ITS analysis.

Advanced and Emerging Techniques

Addressing the Over-Stationarization Problem

Recent research highlights a critical pitfall: over-aggressive stationarization can remove valuable predictive information. Sudden, event-based changes (e.g., a spike in drug sales due to an outbreak) are inherently non-stationary but are crucial for accurate forecasting [61]. Over-stationarization occurs when smoothing processes strip out these important non-stationary properties, limiting the model's ability to react to real-world shocks [61].

Advanced frameworks like NSPLformer have been proposed to balance this contradiction. They employ:

  • Series Stationarization: A normalization step to improve predictability.
  • De-stationary Attention: A mechanism that restores intrinsic non-stationary information to the model's attention, preventing the loss of critical event-based patterns [61].

Dual-Branch Modeling for Temporal and Spectral Non-Stationarity

Cutting-edge approaches like the DTAF framework address non-stationarity in both temporal and frequency domains simultaneously [62].

  • Temporal Stabilizing Fusion (TFS): Uses a mixture of experts (MOE) to disentangle and filter heterogeneous non-stationary temporal patterns [62].
  • Frequency Wave Modeling (FWM): Applies frequency differencing to dynamically highlight components with significant spectral shifts, adapting to changes in periodicity and cycles [62]. This dual-branch approach demonstrates that jointly modeling non-stationarity from both domains can yield significant improvements in forecasting accuracy under complex, real-world conditions [62].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Analytical Tools for Non-Stationary Time Series Analysis

Tool / Reagent Function / Purpose Example Use Case / Note
Augmented Dickey-Fuller (ADF) Test Statistical test for unit root (non-stationarity) [59]. Determining if first differencing is required.
KPSS Test Statistical test for trend-stationarity [57] [59]. Used with ADF for a robust stationarity assessment.
STL Decomposition Decomposes series into Trend, Seasonal, and Residual components [59]. Handling series with a single, dominant seasonal pattern.
MSTL Decomposition Decomposes series with multiple seasonal patterns [60]. Ideal for high-frequency data (e.g., hourly energy data with daily, weekly, yearly cycles).
Unobserved Components Model (UCM) Model-based decomposition and forecasting [60]. Does not require pre-differencing; models components as latent states.
Box-Cox / Yeo-Johnson Transform Stabilizes variance in a time series [58] [59]. Applied before differencing or decomposition to address heteroscedasticity.
Prophet Forecasting procedure that handles multiple seasonality and trends automatically [60]. Useful for rapid prototyping and for series with strong, predictable seasonal patterns.

G TS Original Time Series Trend Seasonality Residual DEC Decomposition Method (STL/MSTL) TS->DEC TREND Trend Component DEC->TREND SEASONAL Seasonal Component (Can be multiple: S1, S2...) DEC->SEASONAL RESID Stationary Residual Component DEC->RESID

Figure 2: Logical relationship of time series decomposition.

Mastering differencing and decomposition is not merely a statistical exercise but a foundational requirement for producing valid, reliable, and actionable results from Interrupted Time Series studies. The protocols outlined provide a clear pathway for researchers to diagnose and treat non-stationarity and seasonality, from basic visual inspections to advanced multi-seasonal decomposition. As the field evolves, techniques that avoid over-stationarization and jointly model temporal and spectral dynamics offer promising avenues for more robust analysis, ultimately ensuring that evaluations of interventions in drug development and public health are built on a solid analytical foundation.

Interrupted Time Series (ITS) design is a quasi-experimental approach widely employed in public health and drug development research to evaluate the impact of interventions, such as policy changes or new therapy introductions, when randomized controlled trials are not feasible [9] [13]. A critical challenge in ITS analysis is model misspecification, which occurs when the statistical model does not correctly represent the underlying data structure, potentially leading to biased conclusions about an intervention's effect. Two of the most often recommended analytical approaches for ITS analysis are the Autoregressive Integrated Moving Average (ARIMA) model and Generalized Additive Models (GAM) [63] [9]. The selection between these methods is paramount, as empirical evidence demonstrates that the choice of statistical method can substantially alter conclusions about intervention impacts, with statistical significance disagreements ranging from 4% to 25% across different methods applied to the same datasets [6]. This application note provides a structured framework for selecting between ARIMA and GAM to ensure robust inferences in ITS studies, particularly within drug development and public health implementation research.

Comparative Analysis of ARIMA and GAM

Fundamental Model Characteristics

ARIMA and GAM approach time series analysis from fundamentally different perspectives, leading to distinct strengths and vulnerabilities regarding model misspecification.

ARIMA models are characterized by their focus on the autocorrelation structure within the data. They express the current value of a time series as a linear function of its past values (autoregressive component) and past forecast errors (moving average component), often requiring differencing to achieve stationarity [13]. This approach excels at capturing temporal dependence patterns where present observations depend on past values [64]. However, ARIMA requires that time series be stationary (constant mean and variance over time) and assumes a specific parametric form for the autocorrelation structure [64] [13]. Violations of these assumptions constitute common misspecification scenarios.

GAMs, in contrast, take a more flexible, non-parametric approach to trend modeling. They model time series as a sum of smooth functions of time, allowing data to determine the functional form of trends rather than imposing a predetermined structure [64]. This flexibility makes GAMs particularly adept at capturing complex, non-linear trends without requiring explicit specification of the functional form [64] [9]. The GAM framework can be expressed as: g(μ) = β₀ + f₁(x₁) + f₂(x₂) + ... + fₙ(xₙ), where g() is the link function, μ is the expected value of the outcome, and fₙ() are smooth functions of predictors [65].

Performance Under Misspecification Scenarios

Table 1: Performance Comparison of ARIMA and GAM Under Different Misspecification Conditions

Misspecification Type ARIMA Performance GAM Performance Key Evidence
Incorrect functional form Vulnerable; assumes specific linear correlation structure Highly robust; flexible smooth functions adapt to data shape GAMs utilize non-parametric fitting with relaxed assumptions about relationship shapes [65] [64]
Autocorrelation structure misspecification Vulnerable; relies on correct AR/MA order identification Moderately robust; can accommodate residual autocorrelation Incorrect ARIMA order selection leads to biased estimates; GAMs can use GAMM extension with correlation structures [66] [13]
Seasonality misspecification Consistent performance with proper seasonal parameters Variable performance depending on basis dimension specification ARIMA explicitly models seasonality with seasonal parameters; GAM seasonal accuracy depends on Fourier series specification [64] [9]
Intervention effect shape misspecification Vulnerable; requires pre-specification of impact shape More robust; can adapt to varying intervention effect patterns GAMs more robust when policy variables are misspecified [63] [9]
Non-stationarity handling Requires differencing to achieve stationarity Directly models non-linear trends without differencing ARIMA requires stationary series; GAMs can model non-linear trends directly [64] [13]

Simulation studies directly comparing ARIMA and GAM for ITS analysis have revealed important performance differences. ARIMA exhibits more consistent results across different policy effect sizes and in the presence of seasonality, while GAM demonstrates superior robustness when the model is misspecified, particularly regarding the shape of intervention effects [63] [9]. This suggests that when researchers lack certainty about the precise functional form of the intervention effect, GAM may provide more reliable inference.

Experimental Protocols for Model Implementation

ARIMA Model Implementation Protocol

Protocol Objective: To specify a robust ARIMA modeling procedure for ITS analysis that minimizes misspecification risk.

Step-by-Step Procedure:

  • Stationarity Assessment: Test for stationarity using Augmented Dickey-Fuller (ADF) or Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests. A non-stationary series (p > 0.05 for ADF) requires differencing [13].

  • Differencing Application: Apply first-order differencing (d=1) to remove trend: Ýₜ = Yₜ - Yₜ₋₁. For seasonal data, apply seasonal differencing: Ýₜ = Yₜ - Yₜ₋ₛ, where s is the seasonal period (e.g., 12 for monthly data) [13].

  • Model Identification: Examine Autocorrelation Function (ACF) and Partial ACF (PACF) plots of the differenced series to identify potential AR (p) and MA (q) orders [13].

    • AR signature: PACF cuts off, ACF decays
    • MA signature: ACF cuts off, PACF decays
    • Seasonal patterns: Significant spikes at seasonal lags
  • Model Fitting: Fit multiple candidate ARIMA (p,d,q)(P,D,Q)ₛ models and compare using Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), selecting the model with the lowest value [65] [13].

  • Intervention Effect Incorporation: Include intervention terms in the final ARIMA model to estimate level and slope changes:

    • Immediate level change: Add a step function variable (0 pre-intervention, 1 post-intervention)
    • Slope change: Add a continuous variable counting time since intervention [13]
  • Model Validation: Check residuals for white noise using Ljung-Box test (p > 0.05 indicates adequate fit) and ensure no significant patterns remain in ACF/PACF of residuals [13].

GAM Implementation Protocol

Protocol Objective: To implement a flexible GAM for ITS analysis that captures complex temporal patterns while properly accounting for intervention effects.

Step-by-Step Procedure:

  • Basis Function Specification: Select appropriate basis functions for smooth terms. For seasonal patterns, Fourier series are recommended with basis dimension (k) determined by generalized cross-validation (GCV) [64].

  • Model Structure Definition: Specify the GAM structure incorporating multiple trend components [64]:

    • Overall trend: Smooth function of time (s(time))
    • Seasonal patterns: Cyclical smooth functions (e.g., s(dayofyear) or s(week))
    • Intervention effects: Separate smooth functions for pre- and post-intervention periods or parametric intervention terms
  • Model Fitting: Fit the GAM using a backfitting algorithm, which iteratively estimates each smooth function while adjusting for others, minimizing prediction errors through successive iterations [64].

  • Intervention Effect Modeling: Model intervention effects using two approaches [64] [9]:

    • Parametric approach: Include step function for level change and slope change term
    • Flexible approach: Separate smooth functions for pre- and post-intervention periods
  • Basis Dimension Checking: Verify that basis dimensions (k) are sufficiently large to capture true patterns without overfitting using k-index and p-values for smooth terms [64].

  • Model Validation: Use simulated historical forecasts with time-based cross-validation, holding out the most recent 20% of data, to assess forecasting accuracy and model calibration [64].

Decision Framework and Visualization

Model Selection Workflow

The following diagram illustrates the systematic decision process for choosing between ARIMA and GAM based on data characteristics and research context:

Model Selection Workflow start Start: ITS Analysis Design d1 Are underlying data generating processes well-understood? start->d1 d2 Is the functional form of intervention effects uncertain? d1->d2 No arima Select ARIMA Model d1->arima Yes d3 Does series exhibit complex non-linear trends or multiple seasonal patterns? d2->d3 No gam Select GAM Model d2->gam Yes d4 Is seasonal pattern consistent and well-defined? d3->d4 No d3->gam Yes d5 Is interpretability of intervention effect paramount? d4->d5 No d4->arima Yes d5->arima Yes hybrid Consider Hybrid Approach d5->hybrid No

Research Reagent Solutions

Table 2: Essential Computational Tools for ITS Analysis Implementation

Research Reagent Function Implementation Examples
Stationarity Tests Determines if time series has constant mean and variance, guiding need for differencing Augmented Dickey-Fuller (ADF), Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests [13]
Autocorrelation Diagnostics Identifies temporal dependence patterns to specify AR/MA orders in ARIMA Autocorrelation Function (ACF), Partial ACF (PACF) plots [13]
Information Criteria Compares model fit while penalizing complexity to prevent overfitting Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) [65]
Backfitting Algorithm Iteratively estimates smooth functions in GAM by minimizing prediction errors Prophet package (Python/R), mgcv package (R) [64]
Time Series Cross-Validation Assesses model forecasting performance while respecting temporal structure Simulated historical forecasts with expanding/rolling windows [64]

Robust ITS analysis in drug development and public health implementation research requires careful consideration of model selection between ARIMA and GAM approaches. ARIMA models provide more consistent results when underlying processes are well-understood and seasonal patterns are consistent, while GAM offers superior flexibility and robustness to misspecification of intervention effects and functional forms. Researchers should systematically evaluate their data characteristics, intervention properties, and analytical priorities using the provided decision framework to select the most appropriate methodology. In cases of uncertainty, applying both models and comparing results, or considering hybrid approaches, may provide the most robust inference for critical policy and clinical decisions.

Accounting for Lagged Intervention Effects and Transition Periods

Interrupted Time Series (ITS) design is a powerful quasi-experimental method used extensively in healthcare research to evaluate the impact of interventions, policies, or programs when randomized controlled trials are not feasible [67]. A fundamental assumption in classic segmented regression (CSR) analysis is that interventions have an immediate and permanent effect starting precisely at a known point in time. However, this assumption is frequently violated in real-world applications, where interventions often exhibit lagged effects and transition periods during which the full effect unfolds gradually [68] [69].

Understanding and appropriately modeling these transition phases is crucial for generating valid estimates of intervention effectiveness. This application note provides researchers with advanced methodologies to account for lagged intervention effects and transition periods in ITS analyses, with specific applications in pharmaceutical and clinical development settings.

Statistical Methods for Modeling Transition Periods

Limitations of Classic Segmented Regression

The classic segmented regression model for ITS is specified as:

Yₜ = β₀ + β₁ × time + β₂ × intervention + β₃ × post-time + εₜ [68]

Where:

  • Yₜ represents the outcome at time t
  • β₀ represents the baseline level at t=0
  • β₁ denotes the underlying pre-intervention trend
  • β₂ represents the immediate level change following intervention
  • β₃ represents the change in slope following intervention

This model restricts the interruption effect to a single predetermined time point, assuming an instantaneous intervention effect [68]. In practice, however, interventions often require an adjustment period or are implemented gradually, creating a transition phase where effects are distributed over time rather than immediate.

Table 1: Comparison of ITS Approaches for Intervention Effects

Model Type Intervention Effect Assumption Key Strengths Key Limitations
Classic Segmented Regression Instantaneous and permanent at fixed point Simple implementation; Easy interpretation Cannot model gradual effects; Misleading if transition period exists
Optimized Segmented Regression (OSR) Distributed during transition period Models realistic implementation patterns; Superior model fit Requires specification of transition length and distribution pattern
ARIMA with Distributed Lag Terms Unclear timing with distributed effects Accounts for autocorrelation; Flexible effect distribution Complex implementation; Requires larger sample size
Optimized Segmented Regression (OSR) Models

To address the limitations of CSR, Optimized Segmented Regression (OSR) models incorporate a transition period using cumulative distribution functions (CDFs). The OSR model is specified as:

Yₜ = β₀ + β₁ × time + β₂ × F(t) × intervention + β₃ × F(t) × post-time + εₜ [68]

Where F(t) is a piecewise function that models the transition period using CDFs:

F(t) = { CDF(t - T₀), T₀ < t ≤ T₂; 1, T₂ < t } [68]

The transition period extends from T₀ (nominal intervention start) to T₂ (full implementation), with L = T₂ - T₀ representing the transition length. The CDF can take various forms depending on the expected distribution of the intervention effect during the transition period:

  • Uniform distribution: Constant effect throughout transition
  • Normal distribution: Symmetrically increasing effect peaking at midpoint
  • Log-normal distribution: Right-skewed effect (slow start, rapid buildup)
  • Log-normal flip distribution: Left-skewed effect (rapid start, slow stabilization) [68]
ARIMA with Distributed Lag (ARIMAITS-DL) Models

For situations with unclear intervention timing or complex autocorrelation structures, the ARIMAITS-DL model combines ARIMA methodology with distributed lag functional terms:

(1-∑δᵢBⁱ)Yₜ = ∑wₖFₜ₋ₖ + Xₜβ + (1-∑θᵢBⁱ)εₜ [69]

Where:

  • Fₜ₋ₖ represents distributed lag functional terms
  • l₁ controls duration before effect appears
  • l₂ controls duration of the effect
  • wₖ represents weights for lagged effects [69]

This approach allows the intervention effect to be distributed over a specified interval [T₀-l₁, T₀+l₂], accommodating situations where the actual intervention timing is ambiguous or the effect manifests gradually.

Experimental Protocols for ITS with Transition Periods

Protocol 1: Optimized Segmented Regression Analysis

Purpose: To evaluate intervention effects when a transition period between pre- and post-intervention phases is expected.

Materials Required:

  • Time series data with sufficient pre- and post-intervention observations
  • Statistical software with regression capabilities (R, SAS, Stata, Python)
  • Clear documentation of intervention nominal timing

Procedure:

  • Data Preparation: Assemble time series data with regular intervals, ensuring at least 3 observations per segment (pre-intervention, transition, post-intervention)
  • Exploratory Analysis: Plot the complete time series to visually identify potential transition patterns
  • Transition Length Specification:
    • Use theoretical knowledge about intervention implementation
    • Employ data-driven approach using Mean Squared Error (MSE) comparison
    • Test multiple plausible transition lengths (L) if uncertain
  • Distribution Selection: Choose appropriate CDF based on expected effect distribution pattern
  • Model Fitting: Implement OSR model using nonlinear regression techniques
  • Model Validation: Compare MSE with CSR model to assess improvement in fit
  • Sensitivity Analysis: Examine how different transition lengths and distribution patterns affect long-term impact estimates [68]

Interpretation Guidelines:

  • Significant β₂ indicates change in level during transition period
  • Significant β₃ indicates change in trend during transition period
  • Compare AIC/BIC values across models to select optimal specification
  • Report transition length (L) and distribution pattern with effect estimates
Protocol 2: ARIMA with Distributed Lags Analysis

Purpose: To model intervention effects when timing is unclear or effects are distributed over time with autocorrelation present.

Materials Required:

  • Extended time series data (≥20 observations pre- and post-intervention)
  • Statistical software with ARIMA capabilities
  • Knowledge of expected minimum and maximum effect lag times

Procedure:

  • Stationarity Assessment: Check stationarity using Augmented Dickey-Fuller test; difference if necessary
  • ARIMA Structure Identification: Determine appropriate (p,d,q) orders using ACF/PACF examination
  • Distributed Lag Specification:
    • Define plausible range for l₁ (effect start timing) and l₂ (effect duration)
    • Select appropriate distribution for effect weights (uniform, normal, etc.)
  • Model Fitting: Estimate ARIMAX model with distributed lag terms
  • Residual Diagnostics: Check for remaining autocorrelation in residuals
  • Model Comparison: Compare with standard ARIMA and segmented regression models using information criteria [69]

Interpretation Guidelines:

  • Cumulative effect size is interpreted as the total impact across the distributed lags
  • Significance of distributed lag terms indicates gradual rather than immediate effects
  • Plot cumulative effect over time to visualize effect distribution pattern
Protocol 3: Transition Period Length Determination

Purpose: To empirically determine optimal transition period length when theoretical guidance is unavailable.

Procedure:

  • Specify plausible range of transition lengths (L) based on context
  • Fit OSR models across the range of L values
  • Calculate goodness-of-fit statistics (MSE, AIC, BIC) for each model
  • Identify L value that optimizes fit statistics
  • Validate selected L through sensitivity analysis and theoretical plausibility [68]

Table 2: Data Requirements for Different ITS Approaches

Method Minimum Pre-Intervention Observations Minimum Post-Intervention Observations Total Series Length Recommendation Handling of Autocorrelation
Classic Segmented Regression 3 3 8+ time points Requires additional tests and corrections
Optimized Segmented Regression 3 3 (excluding transition) 12+ time points Requires additional tests and corrections
ARIMA with Distributed Lags 20 20 40+ time points Explicitly models autocorrelation structure

Visualization of Analytical Approaches

G Interrupted Time Series Analysis Decision Framework Start Start: ITS Study Design DataCheck Data Collection & Preparation Start->DataCheck TimingClear Is intervention timing clearly defined? DataCheck->TimingClear ImmediateEffect Is effect expected to be immediate? TimingClear->ImmediateEffect Yes ARIMADL ARIMA with Distributed Lags (ARIMAITS-DL) TimingClear->ARIMADL No TransitionKnown Is transition period length known? ImmediateEffect->TransitionKnown No CSR Classic Segmented Regression (CSR) ImmediateEffect->CSR Yes OSR Optimized Segmented Regression (OSR) TransitionKnown->OSR Yes DataDrivenL Data-Driven Transition Length Determination TransitionKnown->DataDrivenL No Autocorrelation Significant autocorrelation present? Autocorrelation->ARIMADL Yes CompareModels Compare Models & Validate Results Autocorrelation->CompareModels No CSR->Autocorrelation OSR->Autocorrelation ARIMADL->CompareModels DataDrivenL->OSR FinalInterpret Effect Interpretation & Reporting CompareModels->FinalInterpret

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Analytical Tools for Advanced ITS Analysis

Tool Category Specific Software/Packages Key Functionality Implementation Considerations
Statistical Software R: stats, forecast, dlm packages ARIMA modeling, Distributed lag structures, Nonlinear optimization Steeper learning curve but maximum flexibility
SAS: PROC ARIMA, PROC AUTOREG Automated model selection, Comprehensive diagnostics Enterprise-level stability and documentation
Stata: arima, newey commands Panel data extensions, Robust standard errors Balanced approach for applied researchers
Data Extraction Tools WebPlotDigitizer Digital extraction from published graphs Essential for replication and meta-analysis
Visualization Tools ggplot2 (R), matplotlib (Python) Creation of publication-quality time series plots Critical for communicating transition effects
Model Diagnostics ACF/PACF plots, Ljung-Box test Detection of residual autocorrelation Required for validating model assumptions

Application in Pharmaceutical and Clinical Contexts

The methodologies described above have particular relevance in pharmaceutical and clinical development settings:

Clinical Guideline Implementation: When new treatment guidelines are introduced, their adoption typically follows a gradual pattern as clinicians require time to adjust practice behaviors. OSR models can capture this transition period more accurately than traditional methods [67].

Drug Policy Evaluations: Changes in drug formularies or reimbursement policies often have phased implementation, where effects distribute over time rather than occurring instantaneously [68].

Pharmacovigilance Studies: Safety-related interventions (e.g., black box warnings) may have delayed effects as awareness disseminates through the healthcare system, making distributed lag models particularly appropriate [69].

Quality Improvement Initiatives: Hospital quality programs often include training periods followed by gradual implementation, creating natural transition periods that should be accounted for in evaluation [68].

When applying these methods in regulatory contexts, researchers should pre-specify the analytical approach, justify the selected transition length based on theoretical or empirical grounds, and conduct sensitivity analyses to demonstrate the robustness of findings to alternative model specifications.

Managing Hierarchical Data and Time-Varying Confounding Factors

Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the effects of interventions or exposures that are introduced at a specific point in time. However, the validity of ITS research is often threatened by two significant methodological challenges: the presence of hierarchical data structures (e.g., repeated measurements within patients, patients clustered within clinics) and time-varying confounding. Time-varying confounding occurs when covariates that influence both subsequent treatment and the outcome evolve over time, creating complex feedback loops that standard regression methods fail to address adequately [70]. This application note provides researchers, scientists, and drug development professionals with structured protocols and analytical frameworks to manage these challenges within ITS studies, ensuring more robust and causally interpretable findings.

Methodological Foundations

The Challenge of Time-Varying Confounding

In observational studies, treatments or exposures are often adjusted over time in response to a patient's changing condition. This creates a scenario where time-dependent covariates are affected by previous treatment and subsequently influence future treatment decisions and outcomes. A classic example is in HIV management, where antiretroviral therapy (ART) influences CD4 counts, and these evolving CD4 counts subsequently influence future ART decisions and clinical outcomes [70]. Standard regression-based adjustment analyses condition on these time-varying confounders affected by prior treatment, which can introduce bias and lead to misleading conclusions [70].

Table 1: Key Identifiability Assumptions for Causal Inference with Time-Varying Treatments

Assumption Description Application in ITS
Consistency The observed outcome for a given treatment history corresponds to the potential outcome under that history. Ensure clear definition of intervention and outcome measurement in ITS.
Positivity For any combination of covariates and treatment history, there is a non-zero probability of receiving each treatment level. Verify that at each time point, all treatment options are possible for some individuals.
Exchangeability No unmeasured confounding; the treatment assignment is independent of potential outcomes given covariates and treatment history. Measure and adjust for all common causes of treatment and outcome at each time point.
Non-Interference One unit's treatment does not affect another unit's outcome. Consider this assumption in clustered or hierarchical ITS designs.
Analytical Framework for Hierarchical Data

Hierarchical data structures in ITS designs require specialized analytical approaches that account for correlations within clusters. Multilevel models (also known as mixed-effects or hierarchical models) provide a flexible framework for analyzing such data, allowing for the incorporation of random effects to capture cluster-specific variations and the proper estimation of standard errors.

Application Protocols

Protocol for Addressing Time-Varying Confounding

Objective: To estimate the causal effect of a time-varying treatment or exposure in an ITS design while adequately adjusting for time-varying confounders.

Materials:

  • Longitudinal dataset with repeated measurements of treatment, outcome, and potential confounders
  • Statistical software with capabilities for implementing g-methods (e.g., R, Python, Stata)

Procedure:

  • Study Design Phase:

    • Define the time-varying treatment/exposure of interest and its measurement intervals
    • Identify potential time-varying confounders based on subject matter knowledge and prior literature
    • Construct a directed acyclic graph (DAG) to visualize the relationships between variables over time
  • Data Preparation Phase:

    • Structure data in a long format with one row per measurement time per subject
    • Create variables representing the history of treatment and confounders up to each time point
    • Assess and document the extent of missing data
  • Model Specification Phase:

    • Select an appropriate causal inference method based on study characteristics (see Table 2)
    • For Marginal Structural Models (MSMs):
      • Estimate stabilized inverse probability of treatment weights (IPTW) using a model for treatment assignment
      • Assess weight distribution and consider truncation if extreme weights are present
      • Fit a weighted regression model for the outcome, including the treatment and time variables
  • Sensitivity Analysis Phase:

    • Conduct analyses under different model specifications
    • Perform sensitivity analyses for unmeasured confounding
    • Implement analyses to assess the impact of missing data

Table 2: Comparison of Methods for Addressing Time-Varying Confounding

Method Key Principle Strengths Limitations Software Implementation
Inverse Probability of Treatment Weighting (IPTW) Creates a pseudo-population where treatment is independent of time-varying confounders through weighting [70] Intuitive; directly addresses time-varying confounding; resembles randomized trial Susceptible to bias from extreme weights; requires correct treatment model specification R: ipw package; Stata: teffects
G-Computation Models the outcome process and simulates potential outcomes under different treatment regimes [70] Efficient when outcome model is correctly specified Prone to bias from outcome model misspecification; parametric g-formula requires correct model specification R: gfoRmula package
Targeted Maximum Likelihood Estimation (TMLE) Doubly robust method that combines initial outcome prediction with a targeting step to optimize treatment effect estimation [70] Robust to misspecification of either outcome or treatment model; efficient; allows machine learning Computational intensity; more complex implementation R: tmle package
G-Estimation of Structural Nested Models Directly models the causal effect of treatment while accounting for time-varying confounding [70] Direct modeling of treatment effect; handles effect modification Complex implementation; less intuitive for applied researchers R: SNM package
Protocol for Analyzing Hierarchical ITS Data

Objective: To appropriately analyze ITS data with hierarchical structure while accounting for within-cluster correlations.

Procedure:

  • Exploratory Analysis:

    • Visualize outcome trends over time separately for each cluster
    • Calculate intraclass correlation coefficient (ICC) to quantify cluster dependence
    • Examine between-cluster variation in pre-intervention trends
  • Model Specification:

    • Specify a multilevel model with random intercepts for clusters
    • Consider random slopes for time trends if supported by data and theory
    • Include fixed effects for time, intervention, and their interaction
    • Adjust for relevant patient-level and cluster-level covariates
  • Model Estimation and Checking:

    • Estimate model parameters using restricted maximum likelihood (REML) or Bayesian methods
    • Assess model assumptions (normality of residuals, homoscedasticity)
    • Evaluate model fit using information criteria (AIC, BIC)
  • Interpretation:

    • Calculate immediate level change and slope change following intervention
    • Present cluster-specific and population-average estimates as appropriate
    • Quantify between-cluster variance components

Visualization of Methodological Relationships

G TVC Time-Varying Confounder SubsequentTx Subsequent Treatment TVC->SubsequentTx Outcome Clinical Outcome TVC->Outcome PriorTx Prior Treatment PriorTx->TVC PriorTx->Outcome SubsequentTx->Outcome Feedback Feedback Loop SubsequentTx->Feedback Feedback->TVC

Time-Varying Confounding Feedback Mechanism

G Start Study Design DAG Develop DAG Start->DAG Method Select Method DAG->Method IPTW IPTW Method->IPTW Singly robust TMLE TMLE Method->TMLE Doubly robust GComp G-Computation Method->GComp Outcome model Estimate Causal Estimate IPTW->Estimate TMLE->Estimate GComp->Estimate

Analytical Method Selection Workflow

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Methodological Tools for Hierarchical Data and Time-Varying Confounding Analysis

Tool Category Specific Solution Function Implementation Considerations
Statistical Software R with specialized packages Provides open-source environment with comprehensive causal inference packages Use ipw for IPTW, tmle for TMLE, lme4 for hierarchical models [71]
Statistical Software Stata Offers robust implementation of g-methods through commands like xtset and teffects Particularly strong for panel data and econometric approaches [71]
Statistical Software SAS Handles complex longitudinal data structures and advanced statistical procedures PROC GENMOD for GEE, PROC MIXED for multilevel models [71]
Causal Inference Methods Inverse Probability of Treatment Weighting (IPTW) Creates a pseudo-population where treatment is independent of confounders [70] Requires correct specification of treatment model; check for extreme weights
Causal Inference Methods Targeted Maximum Likelihood Estimation (TMLE) Doubly robust method that protects against model misspecification [70] Can incorporate machine learning for model fitting; more computationally intensive
Causal Inference Methods G-Computation Estimates causal effects by simulating potential outcomes under different treatment regimes [70] Requires correct specification of outcome model; parametric g-formula may be sensitive to model misspecification
Missing Data Handling Multiple Imputation Accounts for uncertainty in missing data by creating and analyzing multiple complete datasets Preferable to complete case analysis when data are missing at random [70]
Sensitivity Analysis Quantitative bias analysis Assesses robustness of results to potential unmeasured confounding or selection bias Particularly important when claiming causal effects from observational data

Experimental Protocols for Method Validation

Protocol for Simulation Studies

Objective: To evaluate the performance of different methods for addressing time-varying confounding in hierarchical ITS data under controlled conditions.

Experimental Setup:

  • Data Generation:

    • Simulate hierarchical longitudinal data with known data-generating mechanisms
    • Incorporate time-varying confounding with varying strength of relationships
    • Introduce different magnitudes of treatment effects
    • Generate data under different missing data mechanisms
  • Method Implementation:

    • Apply standard regression adjustment (as a comparator)
    • Implement IPTW with Marginal Structural Models
    • Apply TMLE with Super Learner for model fitting
    • Implement g-computation
  • Performance Metrics:

    • Bias: Difference between estimated and true treatment effect
    • Coverage: Proportion of 95% confidence intervals containing the true effect
    • Mean squared error: Combination of bias and variance
    • Type I error and power for hypothesis testing
Protocol for Empirical Applications

Objective: To demonstrate the application of methods for time-varying confounding in real-world hierarchical ITS studies.

Case Study: Evaluation of a new drug therapy in electronic health record data with monthly measurements over two years, with patients nested within clinics.

Procedure:

  • Data Preparation:

    • Extract longitudinal data on drug exposure, clinical outcomes, and potential confounders
    • Restructure data into person-time format
    • Document missing data patterns and extent
  • Primary Analysis:

    • Specify a DAG representing hypothesized causal relationships
    • Implement IPTW with stabilized weights, truncating extreme weights if necessary
    • Fit a weighted generalized estimating equation (GEE) to account for clustering
  • Secondary Analysis:

    • Implement TMLE with machine learning algorithms for outcome and treatment models
    • Compare results with primary IPTW analysis
    • Conduct sensitivity analyses for unmeasured confounding
  • Missing Data Handling:

    • Implement multiple imputation for missing confounder data
    • Compare results with complete case analysis
    • Document assumptions about missing data mechanisms

Effectively managing hierarchical data and time-varying confounding is essential for deriving valid causal inferences from Interrupted Time Series studies in drug development and clinical research. The protocols and frameworks presented in this application note provide researchers with practical guidance for addressing these methodological challenges. Current evidence suggests that while singly robust methods like IPTW remain prevalent in epidemiological studies, doubly robust approaches such as TMLE offer superior protection against model misspecification and should be more widely adopted [70]. Furthermore, the inadequate handling of missing data in many applied studies highlights the need for greater methodological rigor and transparency in reporting. By implementing these advanced causal inference methods while maintaining careful attention to hierarchical data structures, researchers can strengthen the evidentiary foundation for therapeutic decisions and health policy interventions.

Validating Your Model and Comparing Methodological Performance

Interrupted Time Series (ITS) design is a powerful quasi-experimental methodology increasingly employed in public health, epidemiology, and pharmaceutical outcome research to evaluate the effects of interventions, policy changes, or exposures when randomized controlled trials are infeasible or unethical [72]. This design involves collecting data at multiple timepoints before and after a defined interruption, enabling researchers to analyze whether the intervention caused a significant deviation from the pre-existing trend [73]. The validity of causal inferences drawn from ITS studies hinges entirely on proper model specification and rigorous validation. Residual analysis and forecast accuracy assessment together provide a comprehensive framework for verifying model assumptions, detecting specification issues, and quantifying predictive performance, thereby ensuring the reliability of effect estimates in thesis research and professional drug development contexts.

Theoretical Foundations of Residual Analysis

Residuals versus Errors in Statistical Modeling

In time series analysis, residuals represent the differences between observed values and the values fitted by the model to the training data: ( r = y - ŷ ) [74]. It is critical to distinguish residuals from forecast errors, which are calculated on genuinely new, out-of-sample test data [75]. Residuals serve as proxies for the unobservable true model errors and contain valuable information about model adequacy. They play an indispensable diagnostic role by revealing patterns that indicate violations of key regression assumptions, including linearity, independence, constant variance, and normality [76] [77]. For ITS designs specifically, where the goal is to estimate causal effects by comparing pre- and post-intervention trends, any systematic pattern in the residuals suggests the model has not fully captured the underlying data structure, potentially biasing the intervention effect estimate.

Core Assumptions in Regression Residuals

The validity of any regression model, including ITS models, depends on several assumptions about the residuals. Violations of these assumptions can lead to inefficient estimates, incorrect standard errors, and invalid inferences [76] [78].

  • Independence: Residuals should be statistically independent of each other, with no autocorrelation structure. In time series data, this is frequently violated due to temporal dependencies [77].
  • Constant Variance (Homoscedasticity): The variance of residuals should remain constant across all levels of the predictor variables and fitted values. Non-constant variance (heteroscedasticity) appears as funnel-shaped patterns in residual plots [76] [79].
  • Normality: For purposes of hypothesis testing and confidence interval construction, residuals should be approximately normally distributed [77] [78].
  • Zero Mean: The expected value of the residuals should be zero. A mean significantly different from zero indicates model bias [76].

Table 1: Core Regression Assumptions and Violation Consequences

Assumption Description Consequence of Violation
Independence Errors are uncorrelated with each other Standard errors underestimated, test statistics inflated
Homoscedasticity Constant error variance across observations Inefficient parameter estimates, invalid standard errors
Normality Errors follow a normal distribution Invalid confidence intervals and p-values
Linearity Relationship between predictors and outcome is linear Biased and inconsistent parameter estimates

Diagnostic Procedures for Residual Analysis

Graphical Residual Diagnostics

Visual inspection of residual plots provides the most intuitive method for detecting model misspecification and assumption violations. Researchers should generate and systematically examine a series of plots.

Residual_Analysis_Workflow Start Start: Fit Initial Model RVP Residuals vs. Fitted Plot Start->RVP QQP Normal Q-Q Plot RVP->QQP SLP Scale-Location Plot QQP->SLP RVO Residuals vs. Order Plot SLP->RVO Patterns Identify Patterns/Violations RVO->Patterns Remedies Apply Remedial Measures Patterns->Remedies Remedies->RVP Re-diagnose Validate Re-validate Model Remedies->Validate

The primary graphical tool is the residuals versus fitted values plot. A well-specified model will display residuals randomly scattered around the horizontal line at zero, with constant variance and no discernible patterns [76] [79] [77]. Specific patterns indicate different problems:

  • Funneling pattern: Indicates heteroscedasticity (non-constant variance), where the spread of residuals increases or decreases with fitted values [76] [79].
  • Curved or systematic pattern: Suggests a non-linear relationship not captured by the model, indicating potential missing higher-order terms or incorrect functional form [79].
  • Shifted or skewed pattern: May indicate the presence of outliers or influential observations distorting the regression line [76].

The Normal Quantile-Quantile (Q-Q) plot assesses the normality assumption by plotting residual quantiles against theoretical quantiles from a normal distribution. Points should closely follow the 45-degree reference line. Substantial deviations indicate non-normality, which can affect the validity of statistical inferences [76] [79] [77].

The scale-location plot (plotting the square root of the absolute standardized residuals against fitted values) provides another view for detecting heteroscedasticity. A horizontal line with randomly scattered points indicates constant variance, while an increasing or decreasing trend indicates heteroscedasticity [76].

For time series data, the residuals versus observation order plot is crucial for detecting autocorrelation. A random scatter indicates independence, while sequences of positive or negative residuals or cyclical patterns suggest temporal correlation that should be accounted for in the model [77] [78].

Identifying Outliers and Influential Points

Outliers and influential observations can disproportionately affect the fitted model in ITS analyses, potentially biasing intervention effect estimates. Several diagnostic statistics help identify these points:

  • Studentized Residuals: Residuals standardized by their standard deviation. Observations with absolute studentized residuals exceeding 3 are potential outliers requiring investigation [76] [77].
  • Leverage: Measures how extreme an observation is in the predictor space, determined by its distance from the mean of the predictors. High leverage points can exert undue influence on the parameter estimates [77].
  • Cook's Distance: Measures the overall influence of an observation on the entire set of regression coefficients. Values greater than 1.0 or that stand out noticeably from the rest indicate influential points that may warrant further examination [76] [77].

Table 2: Outlier and Influence Diagnostics

Diagnostic Measure Purpose Interpretation Threshold
Studentized Residuals Identify outliers relative to model fit Value > 3 suggests potential outlier
Leverage (h) Identify extreme points in predictor space h > 2p/n (p = parameters, n = sample size)
Cook's Distance (D) Measure overall influence on coefficients D > 1.0 or visually distinct values
DFBETAS Measure influence on specific coefficients DFBETAS > 2/√n

Protocol for Conducting Residual Analysis

  • Compute Residuals: After fitting the initial ITS model, calculate residuals as ( r = y - ŷ ) for all observations in the dataset [74].
  • Generate Diagnostic Plots: Create (a) residuals vs. fitted values, (b) Normal Q-Q plot, (c) scale-location plot, and (d) residuals vs. time order plot [76] [77].
  • Calculate Influence Statistics: Compute studentized residuals, leverage values, and Cook's Distance for each observation [77].
  • Interpret Patterns: Systematically examine plots for patterns indicating assumption violations (funneling, curvature, autocorrelation) [79].
  • Identify Influential Points: Flag observations that exceed thresholds for studentized residuals, leverage, or Cook's Distance [76].
  • Implement Remedial Measures: For violations, consider data transformations, adding nonlinear terms, using weighted regression, or incorporating autocorrelation structure [79].
  • Re-fit and Re-check: Apply remedies, re-fit the model, and repeat the residual analysis until no serious violations remain.

Forecast Accuracy Metrics for Model Validation

Foundational Concepts in Forecast Evaluation

While residual analysis assesses model fit, forecast accuracy evaluation tests the model's predictive performance on new data. This distinction is crucial for ITS studies aiming to make causal inferences, as a model that overfits the training data will perform poorly in prediction and provide unreliable effect estimates [75]. The gold standard approach involves splitting the data into training and test sets, where the training set is used for model fitting and the test set is reserved for evaluating genuine forecasts [75]. For ITS designs, careful consideration must be given to this split to ensure both pre- and post-intervention patterns are appropriately represented.

Key Forecast Accuracy Metrics

Different accuracy metrics provide complementary perspectives on forecast performance, each with distinct advantages and limitations.

  • Mean Absolute Error (MAE): ( \text{MAE} = \text{mean}(|e_{t}|) ) measures the average magnitude of forecast errors in the original data units. It is robust to occasional large errors but cannot distinguish between over- and under-prediction [75].
  • Root Mean Squared Error (RMSE): ( \text{RMSE} = \sqrt{\text{mean}(e_{t}^2)} ) gives more weight to large errors due to the squaring operation. It is more sensitive to outliers than MAE and is appropriate when large errors are particularly undesirable [75].
  • Mean Absolute Percentage Error (MAPE): ( \text{MAPE} = \text{mean}(|p{t}|) ), where ( p{t} = 100 e{t}/y{t} ), expresses accuracy as a percentage, making it unit-free and easily interpretable. However, it is undefined when actual values are zero and can be biased when dealing with low-volume data [75].
  • Mean Absolute Scaled Error (MASE): MASE scales the forecast errors by the in-sample MAE of a naive (or seasonal naive) forecast, making it unit-free, robust to outliers, and applicable across different time series [75].

Table 3: Forecast Accuracy Metrics Comparison

Metric Formula Advantages Disadvantages
MAE ( \text{mean}(|e_{t}|) ) Easy to interpret, robust to outliers Does not penalize large errors heavily
RMSE ( \sqrt{\text{mean}(e_{t}^2)} ) Sensitive to large errors, widely used Not robust to outliers, scale-dependent
MAPE ( \text{mean}(|100 e{t}/y{t}|) ) Unit-free, intuitive interpretation Undefined for zero values, biased for low volumes
MASE ( \text{mean}(|e_{t}/\text{Naive MAE}|) ) Unit-free, robust, comparable across series Less intuitive than percentage error

Protocol for Forecast Accuracy Assessment

  • Data Partitioning: Split the time series data into training and test sets, ensuring the test set covers at least the maximum forecast horizon of interest and preserves the intervention structure in ITS designs [75].
  • Model Training: Fit the candidate ITS models using only the training data.
  • Generate Forecasts: Produce forecasts for the test set period without re-estimating model parameters.
  • Calculate Forecast Errors: Compute the difference between observed test values and forecasts: ( e{T+h} = y{T+h} - \hat{y}_{T+h|T} ) [75].
  • Compute Multiple Metrics: Calculate MAE, RMSE, MAPE, and MASE to gain complementary insights into forecast performance [75].
  • Compare Against Benchmarks: Compare model performance against simple benchmarks (e.g., naive, seasonal naive, or mean forecasts) to assess value added by complex modeling [75].
  • Select Optimal Model: Choose the model that demonstrates the best balance of accuracy across metrics, with particular attention to MASE for its favorable properties.

Advanced Applications in Interrupted Time Series Research

Machine Learning Integration in ITS Analysis

Recent methodological advances have integrated machine learning algorithms with traditional ITS frameworks to enhance model flexibility and accuracy. For example, a two-stage ITS framework can compare traditional ARIMA models with machine learning approaches like Neural Network Autoregression (NNETAR) and Prophet-Extreme Gradient Boosting (Prophet-XGBoost) [73]. In a case study evaluating the health effects of the 2018 wildfire smoke event in San Francisco County, the Prophet-XGBoost hybrid demonstrated superior performance in estimating excess respiratory hospitalizations, highlighting the potential of these methods for complex ITS analyses in public health and pharmaceutical outcomes research [73].

Special Considerations for ITS Designs

ITS analyses present unique validation challenges that require specialized approaches:

  • Counterfactual Prediction: The primary goal in ITS is often to predict the counterfactual trend (what would have happened without the intervention). Forecast accuracy should therefore be assessed on the pre-intervention period, while the post-intervention comparison focuses on the difference between predicted counterfactual and observed values [73].
  • Autocorrelation Management: Time series data inherently violates the independence assumption, necessitating models that explicitly account for autocorrelation (e.g., ARIMA models) or using robust variance estimators [77] [72].
  • Intervention Effect Specification: Residual patterns may suggest incorrect specification of the intervention effect (e.g., immediate level change vs. gradual slope change), requiring model refinement to accurately capture the intervention's true nature.

ITS_Validation_Protocol Data ITS Dataset Collection Split Split into Training/Test Sets Data->Split Model Develop Candidate Models (ARIMA, ML, etc.) Split->Model Residual Comprehensive Residual Analysis Model->Residual Residual->Model Refine based on diagnostics Forecast Forecast Accuracy Assessment Residual->Forecast Forecast->Model Select best performer Compare Compare Model Performance Forecast->Compare Select Select Final Validated Model Compare->Select Infer Draw Causal Inferences Select->Infer

The Scientist's Toolkit: Research Reagent Solutions

Table 4: Essential Analytical Tools for ITS Model Validation

Tool Category Specific Solution Primary Application in Research
Statistical Software R with forecast, tsoutliers packages Comprehensive time series modeling and residual diagnostics
Statistical Software Python with statsmodels, scikit-learn Flexible machine learning integration with traditional ITS
Specialized ITS Algorithms Two-stage ITS with Prophet-XGBoost [73] Enhanced counterfactual prediction in complex scenarios
Data Extraction Tools WebPlotDigitizer [72] Accurate data extraction from published ITS graphs
Reference Datasets Curated ITS repository (430 datasets) [72] Methodological benchmarking and teaching examples
Influence Diagnostics Cook's Distance, DFBETAS [76] [77] Identification of overly influential observations
Accuracy Metrics MASE, RMSE, MAPE [75] Comparative forecast performance assessment

Robust model validation through residual analysis and forecast accuracy assessment is fundamental to producing reliable, reproducible ITS research. These techniques provide complementary perspectives on model performance: residual analysis ensures the theoretical assumptions underlying causal inference are met, while forecast accuracy evaluation tests the model's predictive validity in practical scenarios. For researchers and drug development professionals implementing ITS designs, the integrated protocol presented here—combining graphical diagnostics, influence assessment, and multiple accuracy metrics—provides a comprehensive framework for validating models before drawing causal conclusions about interventions, policies, or treatments. As ITS methodologies continue evolving with machine learning integration, these validation principles will remain essential for maintaining scientific rigor in observational studies across public health, epidemiology, and pharmaceutical sciences.

Interrupted Time Series (ITS) analysis is a powerful quasi-experimental design for evaluating the impact of large-scale health interventions, such as policy changes or new drug implementations, when randomized controlled trials are not feasible [9] [13]. The selection of an appropriate statistical model is paramount for generating valid inferences about intervention effects. Two commonly employed analytical approaches for ITS are the Autoregressive Integrated Moving Average (ARIMA) model and Generalized Additive Models (GAM), each with distinct theoretical foundations and performance characteristics [9].

This application note provides a structured comparison of ARIMA and GAM performance within the context of ITS designs, offering drug development professionals and public health researchers evidence-based guidance for model selection. We synthesize recent empirical findings, present quantitative performance metrics, and provide detailed experimental protocols to facilitate robust implementation in health intervention research.

Theoretical Foundations and Model Characteristics

Autoregressive Integrated Moving Average (ARIMA) Models

ARIMA models express a time series variable based on its own past values (autoregressive component), past forecast errors (moving average component), and differencing to achieve stationarity [13]. The model is characterized by three parameters: AR order (p), degree of differencing (d), and MA order (q), denoted as ARIMA(p,d,q). For seasonal data, this extends to seasonal ARIMA (SARIMA) that incorporates seasonal periodicities [80].

Key assumptions include:

  • Stationarity: Requires constant statistical properties over time, often achieved through differencing [13] [81]
  • Linearity: Assumes linear relationships between past and future values [80] [82]
  • Data continuity: Sensitive to outliers and missing values [82]

Generalized Additive Models (GAM)

GAMs are an extension of generalized linear models that use smooth functions to model predictor variables, allowing for flexible modeling of non-linear relationships without specifying the functional form a priori [65] [9]. The model takes the form:

[g(μi) = β0 + f1(x{1i}) + f2(x{2i}) + ... + fp(x{pi})]

where (g()) is the link function, (μi = E(yi)), and (f_j()) are smooth functions of the predictor variables [65].

Comparative Performance Analysis

Quantitative Performance Metrics

Table 1: Model Performance in Forecasting COPD Patient Numbers in China

Model MAE MAPE RMSE AIC/BIC Data Features
GAM Highest Lowest Lowest Lowest Optimal [65] Non-linear trends, complex patterns
ARIMA(0,1,2) Intermediate Intermediate Intermediate Intermediate Intermediate [65] Linear trends, stationary data
Curve Fitting Lowest Highest Highest Highest Suboptimal [65] Simple trends, limited data points

Table 2: Performance Under Different Structural Break Scenarios (Depression Incidence Forecasting)

Scenario Best Performing Model SMAPE Key Advantage
No significant structural breaks Multivariate TFT 11.6% Captures complex relationships [83]
Stable burden level TFT 11.6-13.2% Robustness to temporary interruptions [83]
Incidence surge remaining high VARIMA/ARIMA 14.8-16.4% Better captures persistent level shifts [83]
Fluctuating outbreaks TFT Lower than ARIMA Handles sharp, temporary interruptions [83]

Table 3: Simulation Performance in Interrupted Time Series Analysis

Condition Better Performing Model Rationale
Different policy effect sizes ARIMA More consistent results [9]
Presence of seasonality ARIMA Explicit seasonal modeling capability [9]
Model misspecification GAM Greater robustness to specification errors [9]

Scenario-Based Performance Recommendations

Scenarios Favoring ARIMA
  • Seasonal data with clear patterns: ARIMA models explicitly model seasonality through seasonal differencing and seasonal AR/MA components [9] [13]
  • Linear relationships: When the relationship between past and future values is predominantly linear [80]
  • Large policy effect sizes: ARIMA demonstrates more consistent performance with substantial intervention effects [9]
  • Short-term forecasting: ARIMA typically outperforms for near-term predictions due to reduced error accumulation [80]
Scenarios Favoring GAM
  • Non-linear trends: GAMs excel at capturing complex, non-linear patterns without pre-specification of functional forms [65] [9]
  • Model misspecification concerns: GAMs are more robust when the true data-generating process is unknown [9]
  • Temporary interruptions: GAMs handle sharp, temporary disruptions more effectively than ARIMA [83]
  • Complex relationships with multiple predictors: GAMs flexibly incorporate multiple smooth terms without requiring linearity assumptions [65]

Experimental Protocols

Protocol for ARIMA Model Implementation in ITS

Objective: To implement an ARIMA model for evaluating the impact of a health policy intervention on a continuous outcome measure.

Materials and Software Requirements:

  • R statistical software with forecast, tseries packages, or Python with statsmodels
  • Time series data with sufficient pre- and post-intervention observations (minimum 50 points recommended [9])

Procedure:

  • Data Preparation: Organize data into equally spaced time intervals (e.g., monthly, quarterly)
  • Stationarity Assessment: Test for stationarity using Augmented Dickey-Fuller (ADF) test [80]
  • Differencing if Necessary: Apply differencing (d) until stationarity is achieved [13]
  • Parameter Identification:
    • Examine ACF and PACF plots to identify potential AR (p) and MA (q) orders [13]
    • Use automated selection (e.g., auto.arima) minimizing AIC/BIC [80]
  • Model Fitting: Estimate parameters using maximum likelihood estimation
  • Diagnostic Checking:
    • Analyze residuals for independence (Ljung-Box test) [13]
    • Ensure no significant autocorrelation in residuals
  • Intervention Modeling: Incorporate intervention variables as step functions or pulse functions at the intervention point
  • Validation: Use time series cross-validation with rolling windows [80]

Interpretation: The intervention effect is assessed by the significance and magnitude of the intervention coefficients, indicating immediate level changes or trend alterations post-intervention.

Protocol for GAM Implementation in ITS

Objective: To implement a GAM for evaluating health intervention effects while accommodating non-linear trends.

Materials and Software Requirements:

  • R with mgcv package or Python with pygam
  • Time series data with pre- and post-intervention periods

Procedure:

  • Data Preparation: Structure data with time as a continuous predictor
  • Base Model Specification:
    • Define core model structure: gam(outcome ~ s(time) + intervention + post_intervention_time, data)
    • Select appropriate family based on outcome distribution [65]
  • Smooth Function Selection:
    • Choose basis dimension (k) using generalized cross-validation [65]
    • Consider thin plate regression splines or cubic regression splines
  • Model Fitting: Use penalized likelihood maximization to estimate smooth terms [65]
  • Model Checking:
    • Examine residual plots for patterns
    • Check concurvity (non-linear multicollinearity) [9]
  • Intervention Effects: Assess significance of intervention terms for level and slope changes
  • Model Comparison: Compare with nested models using AIC or likelihood ratio tests

Interpretation: The intervention effect is evaluated through the parametric components representing immediate level changes and altered trends, while the smooth function captures underlying non-linear patterns.

Visual Decision Framework

G Start Start: ITS Model Selection Q1 Does data exhibit strong seasonality or clear linear trends? Start->Q1 Q2 Are relationships between variables predominantly linear? Q1->Q2 Yes Q3 Are there concerns about model misspecification? Q1->Q3 No Q4 Does data contain complex non-linear patterns? Q2->Q4 No ARIMA_rec Recommend ARIMA Q2->ARIMA_rec Yes Q3->Q4 No GAM_rec Recommend GAM Q3->GAM_rec Yes Q4->GAM_rec Yes Both Consider both models and compare performance Q4->Both Uncertain

Diagram 1: Model Selection Decision Tree for ITS Analysis

The Scientist's Toolkit: Essential Research Reagents

Table 4: Key Analytical Tools for ITS Implementation

Tool/Software Function Implementation Notes
R forecast package Automated ARIMA modeling Provides auto.arima() for parameter selection [80]
R mgcv package GAM implementation Handles smooth function estimation with automatic smoothing parameter selection [65]
Python statsmodels Time series analysis Implements ARIMA and seasonal ARIMA models [83]
Augmented Dickey-Fuller test Stationarity testing Determines need for differencing in ARIMA [80]
AIC/BIC criteria Model selection Penalized likelihood measures for comparing model fit [65] [84]
Time series cross-validation Model validation Maintains temporal ordering during validation [80]

Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the impact of interventions or exposures in public health, clinical science, and policy research. Unlike randomized controlled trials, ITS studies investigate interventions that occur at specific, known time points across entire populations or systems, making them particularly valuable for assessing real-world policy changes, healthcare interventions, and public health initiatives. The fundamental strength of ITS design lies in its ability to model underlying secular trends from pre-interruption data, creating a counterfactual for what would have occurred without the intervention. However, this analytical complexity also introduces unique challenges for transparency and reproducibility. Without comprehensive reporting of methodological choices, data handling procedures, and analytical techniques, the validity and interpretability of ITS findings can be compromised, potentially leading to erroneous conclusions about intervention effectiveness.

The growing emphasis on research transparency reflects a broader movement within the scientific community to address what many researchers perceive as a reproducibility crisis. In response, governmental bodies, research organizations, and scientific communities have developed frameworks to establish "gold standards" for scientific practice. These standards emphasize that research must be reproducible, transparent, communicative of error and uncertainty, collaborative, and skeptical of its own findings [85]. For ITS studies specifically, transparent reporting is essential because the analytical approach involves multiple consequential decisions about model specification, handling of autocorrelation, and interpretation of complex temporal patterns. This document provides comprehensive guidance on applying these emerging standards specifically to ITS research, with detailed protocols designed to enhance the transparency, reproducibility, and overall scientific rigor of interrupted time series studies.

Establishing Reporting Standards for ITS Studies

Core Principles for Transparent ITS Reporting

The foundation of transparent ITS research rests on several core principles that align with broader scientific integrity policies while addressing the specific methodological challenges of time series analysis. First, researchers must prioritize complete methodological disclosure, ensuring that all analytical choices are explicitly documented and justified rather than left for readers to infer. This principle echoes the fundamental reporting ethic articulated by Douglas Altman, who famously stated that "readers should not have to infer what was probably done; they should be told explicitly" [86]. Second, ITS reporting should embrace analytical transparency by making available the data, code, and models used to generate findings, allowing for independent verification of results. Third, researchers must communicate uncertainties inherent in both the data and models, including confidence intervals around effect estimates and discussions of model limitations.

These principles align with the "Gold Standard Science" framework articulated in recent governmental policy, which emphasizes that federally funded research must be "reproducible, transparent, communicative of error and uncertainty, collaborative and interdisciplinary, skeptical of its findings and assumptions, structured for falsifiability of hypotheses, subject to unbiased peer review, accepting of negative results as positive outcomes, and without conflicts of interest" [85]. For ITS studies specifically, this means documenting not just what analyses were conducted, but why specific statistical approaches were selected, how missing data were handled, what sensitivity analyses were performed, and how potential confounding factors were addressed.

Adapting General Reporting Guidelines for ITS Design

While no reporting guidelines have been developed specifically for ITS studies, researchers can adapt relevant elements from established frameworks for clinical trials and observational studies. The SPIRIT 2013 statement (Standard Protocol Items: Recommendations for Interventional Trials) and its 2025 update provide valuable guidance for documenting study protocols, while the CONSORT statement offers complementary guidance for reporting completed studies [86] [87]. Although developed for randomized trials, many SPIRIT and CONSORT elements have direct relevance to ITS studies, particularly those evaluating health interventions.

Table: Adapted SPIRIT/CONSORT Items Relevant to ITS Study Reporting

Reporting Element Application to ITS Studies SPIRIT 2025 Section
Protocol availability Publicly sharing the detailed statistical analysis plan before conducting analysis Open Science section [86]
Data sharing Making de-identified time series data available in trusted repositories Open Science section [86]
Analysis code transparency Sharing code used for statistical modeling and visualization Open Science section [86]
Outcome definitions Precisely defining how and when outcomes were measured Methods section [88]
Statistical methods Detailing approaches to handle autocorrelation and model selection Methods section [88]

The recently updated SPIRIT 2025 statement introduces a new open science section that emphasizes the importance of trial registration, protocol and statistical analysis plan availability, data sharing policies, and disclosure of conflicts of interest [86] [88]. These elements are equally crucial for ITS studies, particularly when research informs clinical or policy decisions. Similarly, the Transparency and Openness Promotion (TOP) Guidelines provide a structured framework for implementing open science practices across seven research practices: study registration, study protocol, analysis plan, materials transparency, analysis code transparency, data transparency, and reporting transparency [89]. The TOP Guidelines implement three increasing levels of transparency—from simple disclosure to independent certification—giving researchers clear benchmarks for improving the transparency of their ITS studies.

Quantitative Analysis of Methodological Choices in ITS

Empirical Comparison of Statistical Methods

The choice of statistical method in ITS analysis can substantially influence study conclusions, making transparent reporting of analytical approaches particularly important. Empirical research comparing six statistical methods for analysing ITS data has demonstrated that methodological choices can meaningfully affect point estimates, standard errors, confidence intervals, and p-values [6]. This comprehensive evaluation applied different statistical approaches to 190 published time series, providing robust evidence about how method selection impacts findings in real-world research contexts.

The six methods evaluated included: (1) ordinary least squares regression (OLS), which provides no adjustment for autocorrelation; (2) OLS with Newey-West standard errors (NW), which adjusts standard errors for autocorrelation; (3) Prais-Winsten (PW), a generalized least squares method that accounts for autocorrelation; (4) restricted maximum likelihood (REML) with and without Satterthwaite approximation; and (5) autoregressive integrated moving average (ARIMA), which explicitly models lagged dependencies [6]. Each method approaches the fundamental segmented regression model differently, particularly in how they handle the error term (εₜ) and account for autocorrelation.

Table: Comparison of Statistical Methods for ITS Analysis Based on 190 Published Series

Statistical Method Handling of Autocorrelation Impact on Significance Findings Considerations
Ordinary Least Squares (OLS) No adjustment 4-25% disagreement with other methods Underestimates SE with positive autocorrelation [6]
Newey-West (NW) Adjusts standard errors Moderate impact on significance Robust to unknown autocorrelation form [6]
Prais-Winsten (PW) Models error structure Substantial impact on significance Generalizes OLS for correlated errors [6]
REML Models error structure Substantial impact on significance Reduces bias in variance estimates [6]
ARIMA Explicitly models lags Substantial impact on significance Flexible for complex time patterns [6]

The empirical findings revealed that the choice of statistical method can lead to importantly different conclusions about intervention effects. Across the 190 analyzed series, statistical significance (categorized at the 5% level) differed between methods in 4% to 25% of pairwise comparisons, depending on which methods were being contrasted [6]. This variation underscores the critical importance of pre-specifying and transparently reporting the statistical approach in ITS studies. Researchers should avoid drawing naive conclusions based solely on statistical significance without considering how methodological choices might influence results.

Accounting for Autocorrelation in ITS Analysis

A fundamental challenge in ITS analysis is the presence of autocorrelation (serial correlation), where data points close in time tend to be more similar than those further apart. Positive autocorrelation, which commonly occurs in time series data, inflates Type I error rates when unaccounted for, potentially leading to false conclusions about intervention effects. Different statistical methods address autocorrelation in distinct ways, which explains their varying performance characteristics.

The empirical evaluation found that estimates of autocorrelation differed substantially depending on the method used and the length of the time series [6]. This variation has important implications for statistical inference, as accurate characterization of autocorrelation structure affects both parameter estimates and their precision. Longer time series generally provide more reliable estimates of autocorrelation, but also present greater risk of structural changes unrelated to the intervention being studied. The research highlights that pre-specification of the statistical method is essential, and that sensitivity analyses using different approaches should be conducted to assess the robustness of findings to methodological choices.

Experimental Protocols for ITS Analysis

Pre-Analysis Protocol Planning

Comprehensive protocol development before conducting ITS analysis is essential for minimizing selective reporting and ensuring analytical rigor. The following protocol provides a structured approach for designing and conducting transparent ITS studies:

Protocol: ITS Study Design and Analysis

  • Objective: To establish a standardized approach for designing, conducting, and reporting Interrupted Time Series studies that aligns with transparency and reproducibility standards.
  • Primary Applications: Evaluation of policy interventions, healthcare quality improvements, public health initiatives, and clinical interventions where randomization is impractical or unethical.
  • Materials: Time series data collected at multiple time points before and after a defined interruption; statistical software capable of time series analysis (R, Python, Stata, SAS); pre-registration platform (e.g., ClinicalTrials.gov, OSF).

Procedural Framework:

  • Pre-registration: Document the study hypothesis, primary and secondary outcomes, analytical approach, and handling of potential confounding factors before data collection or analysis begins.
  • Data Collection:
    • Clearly define the interruption point and justify its selection.
    • Document all data sources and methods for variable construction.
    • Establish procedures for handling missing data and outliers.
  • Statistical Analysis Plan:
    • Pre-specify the primary statistical method and justification for its selection.
    • Define the approach for identifying and accounting for autocorrelation.
    • Plan sensitivity analyses using alternative statistical methods.
    • Determine approaches for handling seasonality and other time-varying confounders.
  • Reporting Framework:
    • Prepare to share de-identified data and analytical code.
    • Document all deviations from the pre-registered analysis plan.
    • Report effect estimates with confidence intervals rather than relying solely on p-values.

Step-by-Step Analytical Workflow

The following workflow provides a detailed, practical guide for conducting ITS analysis in a manner that promotes transparency and reproducibility:

G start Start ITS Analysis prep Data Preparation and Validation start->prep explore Exploratory Analysis Visualize trends and patterns prep->explore pre_spec Pre-specify Statistical Model explore->pre_spec model_fit Fit Primary Model Estimate parameters pre_spec->model_fit diag Model Diagnostics Check residuals and autocorrelation model_fit->diag sensitivity Sensitivity Analysis Alternative methods diag->sensitivity If issues found interp Interpret Results In context of limitations diag->interp If adequate fit sensitivity->interp report Comprehensive Reporting Data, code, methods interp->report end Complete Analysis report->end

ITS Analytical Workflow

  • Data Preparation and Validation

    • Compile time series data with consistent intervals before and after the interruption
    • Document data sources, measurement methods, and any transformations
    • Address missing data using pre-specified methods (e.g., interpolation, imputation)
    • Create variables for time, interruption point, and post-interruption period
  • Exploratory Data Analysis

    • Visualize the complete time series to identify trends, seasonality, and outliers
    • Conduct preliminary tests for autocorrelation (e.g., Durbin-Watson statistic, ACF plots)
    • Assess stationarity and consider transformations if needed
  • Model Specification

    • Implement the core segmented regression model: Yₜ = β₀ + β₁t + β₂Dₜ + β₃[t-TI]Dₜ + εₜ
      • Yₜ represents the outcome at time t
      • β₀ represents the baseline level
      • β₁ represents the pre-interruption slope
      • β₂ represents the immediate level change after interruption
      • β₃ represents the slope change following interruption
      • Dₜ is the interruption indicator (0 before, 1 after)
      • TI is the interruption time point
    • Select primary approach for handling autocorrelation based on pre-specified criteria
  • Model Fitting and Diagnostics

    • Fit the primary statistical model to the data
    • Examine residuals for patterns and conduct tests for remaining autocorrelation
    • Assess model fit using appropriate goodness-of-fit statistics
    • If model inadequacies are identified, consider pre-specified alternative approaches
  • Sensitivity Analysis

    • Re-analyze data using alternative statistical methods (e.g., OLS, NW, PW, REML, ARIMA)
    • Compare point estimates, confidence intervals, and conclusions across methods
    • Report any substantive differences in findings based on methodological choices
  • Interpretation and Reporting

    • Interpret level and slope changes in the context of the intervention being evaluated
    • Acknowledge limitations, including potential confounding and methodological constraints
    • Report both statistical significance and clinical/practical significance of findings

Research Reagent Solutions for ITS Studies

Essential Methodological Tools

The following table details key methodological "reagents" essential for conducting rigorous ITS studies. These tools represent the conceptual and practical resources needed to implement transparent and reproducible interrupted time series analyses.

Table: Essential Methodological Tools for ITS Research

Tool Category Specific Examples Application in ITS Studies
Statistical Software R (with packages like forecast, tseries, nlme), Python (statsmodels), Stata, SAS Implementation of segmented regression models with appropriate error structures [6]
Study Design Frameworks SPIRIT 2013/2025, CONSORT, TOP Guidelines Protocol development and reporting standards adaptation [86] [89]
Analytical Methods OLS, Newey-West, Prais-Winsten, REML, ARIMA Statistical modeling approaches with different approaches to autocorrelation [6]
Data Visualization Tools Line charts, bar charts, combination charts Visual representation of time trends and intervention effects [90]
Transparency Infrastructure ClinicalTrials.gov, OSF, GitHub, Dataverse Study registration, data sharing, and code dissemination [89]

Implementation Guidelines

Successfully implementing these research tools requires careful planning and consistent application throughout the research process. For statistical software, researchers should select programs that offer multiple approaches to time series analysis and explicitly document the specific packages, functions, and version numbers used in their analyses. This level of detail is essential for computational reproducibility. When applying study design frameworks like SPIRIT 2025, researchers should focus particularly on the open science elements, including trial registration, protocol availability, statistical analysis plan, and data sharing arrangements [86]. These elements, though developed for clinical trials, provide a robust framework for establishing the transparency of ITS studies.

For analytical methods, the empirical evidence strongly suggests that researchers should avoid relying exclusively on a single statistical approach [6]. Instead, studies should pre-specify a primary method while planning sensitivity analyses using alternative approaches. This methodological triangulation helps assess whether substantive conclusions are robust to different analytical assumptions. Data visualization should include both the complete time series with fitted models and diagnostic plots showing residual patterns, as these visualizations help readers understand both the primary findings and the adequacy of the statistical model. Finally, transparency infrastructure should be utilized to make de-identified data, analytical code, and study materials publicly available whenever possible, following the TOP Guidelines' recommendations for citation and sharing practices [89].

The credibility and utility of interrupted time series research depends fundamentally on the transparency and reproducibility of its methods and reporting. As empirical evidence demonstrates, the choice of analytical method in ITS studies can substantially influence conclusions about intervention effectiveness, making comprehensive methodological reporting essential for proper interpretation of findings. By adopting the standards and protocols outlined in this document, researchers can enhance the rigor, transparency, and reproducibility of their ITS studies, contributing to more reliable evidence for healthcare, policy, and scientific decision-making.

The movement toward "Gold Standard Science" emphasizes that rigorous research must be not only methodologically sound but also transparent in its execution and reporting [85]. For interrupted time series studies, this means pre-specifying analytical plans, comprehensively reporting methodological choices, acknowledging uncertainties, and sharing data and code to enable verification and extension of findings. As reporting standards continue to evolve, researchers conducting ITS studies should monitor updates to frameworks like SPIRIT, CONSORT, and TOP Guidelines, adapting their practices to incorporate emerging best practices in transparent and reproducible research.

Incorporating Control Series and Sensitivity Analyses to Strengthen Causal Inference

Application Note: Foundational Concepts for ITS

The Role of Control Series in Causal Identification

Interrupted Time Series (ITS) designs analyze changes in level and trend following an intervention to infer causal effects. Incorporation of a control series—a group not receiving the intervention—transforms a single-arm ITS into a much more robust quasi-experimental design. This approach mitigates biases from concurrent events (history threats) and other secular trends that could be mistaken for an intervention effect. The core causal question is: "How would the outcome have trended in the intervention group had the intervention not occurred?" A well-chosen control series provides the data-driven counterfactual necessary to answer this [91] [92].

The potential outcomes framework, or Rubin Causal Model, formalizes this logic. For each unit i at time t, we define:

  • Y1it: The potential outcome with the intervention.
  • Y0it: The potential outcome without the intervention.

The causal effect is τ = Y1it - Y0it. The fundamental problem of causal inference is that we can only observe one of these potential outcomes for any i,t pair [93] [94]. The control series provides an estimate of the missing Y0it for the intervention group post-intervention, enabling the estimation of τ.

Sensitivity Analysis as a Pillar of Causal Rigor

Sensitivity analysis tests the robustness of causal conclusions to violations of the study's underlying assumptions [95]. In observational ITS studies, key assumptions—such as the absence of unmeasured confounding or the correct specification of the control series—are untestable with the data at hand. Sensitivity analysis quantifies how strong an unmeasured confounder would need to be to explain away the observed effect, or how much the effect estimate might vary under different plausible model specifications [95] [96].

This process is crucial for building trust in causal claims, especially for regulatory decisions where randomized trials are not feasible. Historical examples, like the analysis linking smoking to lung cancer, used sensitivity analysis to convincingly rule out alternative explanations, thereby strengthening the causal argument [95].

Protocol: Designing an ITS Study with a Control Series

Selection and Validation of a Control Series

Objective: To identify a control group that represents the counterfactual trend of the intervention group.

Procedure:

  • Define Candidate Pools: Identify potential control units (e.g., regions, patient populations, product lines) that did not receive the intervention.
  • Assess Pre-Intervention Parallelism: Graphically and statistically test whether the trends in the outcome variable for the intervention and control candidates are parallel during the pre-intervention period. This is a critical assumption for validity [91].
  • Evaluate Covariate Balance: Check that the intervention and control series are similar with respect to key observed covariates that may predict the outcome.
  • Justify Relevance: Document the rationale for why the chosen control series is a plausible counterfactual, based on subject-matter knowledge.

Validation Table:

Validation Metric Target Threshold Statistical Test/Method
Pre-intervention Trend Parallelism p > 0.05 for group-time interaction term Linear regression with time-by-group interaction term
Covariate Balance Standardized Mean Difference < 0.1 Two-sample t-test, Chi-square test
Model Fit (Pre-period) R² > 0.8, MAPE < 10% Segmented regression, forecasting models
Core Statistical Analysis Model

Objective: To estimate the level and trend change in the intervention group relative to the control series.

A segmented regression model for an ITS with a control series can be specified as follows:

Yt = β0 + β1Tt + β2Xt + β3TtXt + β4Zt + β5ZtTt + β6ZtXt + β7ZtTtXt + εt

Where:

  • Yt: Outcome at time t
  • Tt: Time since start of study (continuous, unit = 1)
  • Xt: Pre-intervention period (0) vs. Post-intervention period (1)
  • Zt: Group indicator, Control (0) vs. Intervention (1)
  • εt: Error term (account for autocorrelation, e.g., with ARIMA models)

Key Coefficients:

  • β6: Immediate level change following the intervention in the intervention group, relative to the control.
  • β7: Change in the ongoing trend (slope) following the intervention in the intervention group, relative to the control.

Table: Interpretation of Segmented Regression Coefficients

Coefficient Interpretation
β0 Starting level of the outcome in the control group.
β1 Pre-intervention trend in the control group (baseline trend).
β2 Immediate level change in the control group at the intervention timepoint.
β3 Trend change in the control group after the intervention.
β4 Difference in starting level between intervention and control groups.
β5 Difference in pre-intervention trends between intervention and control groups.
β6 Causal Effect: Immediate level change attributable to the intervention.
β7 Causal Effect: Sustained trend change attributable to the intervention.

Protocol: Implementing Sensitivity Analyses

Sensitivity to Unmeasured Confounding

Objective: To assess how an unmeasured confounder could alter the causal conclusion.

Procedure (Using an E-Value Approach):

  • Estimate the Adjusted Effect: From your primary ITS model, obtain the risk ratio (RR) or hazard ratio (HR) that represents the causal effect. If the point estimate is not significant, use the confidence interval limit closest to the null.
  • Calculate the E-Value: The E-value is the minimum strength of association that an unmeasured confounder would need to have with both the intervention and the outcome, conditional on the measured covariates, to fully explain away the observed effect [95].
    • Formula for Point Estimate: E-value = RR + sqrt(RR * (RR - 1))
    • Formula for Confidence Interval: A similar calculation is applied to the confidence limit.
  • Interpret the E-Value: A large E-value (e.g., > 1.5 or 2) suggests that an unmeasured confounder would need to be strongly associated with both the intervention and the outcome to nullify the effect, thus strengthening confidence in the result. A small E-value indicates fragility.
Sensitivity to Control Series Construction

Objective: To test if the causal effect estimate is robust to different methods of constructing the counterfactual.

Procedure:

  • Synthetic Control Method: Instead of a single control unit, construct a weighted combination of control units that best mimics the pre-intervention profile of the intervention unit. Re-estimate the effect using this synthetic control [96].
  • Placebo Tests:
    • Falsification in Time: Re-analyze the data pretending the intervention occurred at a different, earlier time point in the pre-intervention period. A significant "effect" at this false time suggests underlying trends are not fully controlled.
    • Placebo Intervention Groups: Apply the analysis to a group that definitively did not receive the intervention but is otherwise similar. A significant "effect" here suggests bias.
  • Vary Model Specifications: Re-run the primary model with different:
    • Functional forms for time (e.g., linear, quadratic).
    • Lengths of the pre-intervention period.
    • Methods for handling autocorrelation.

Sensitivity Analysis Summary Table:

Method Question Answered Interpretation of Robust Result
E-Value How strong must an unmeasured confounder be? Large E-value; confounder stronger than known risks is unlikely.
Synthetic Control Is the effect consistent with a more data-driven counterfactual? Effect estimate remains statistically and substantively significant.
Placebo Test (Time) Does a spurious "effect" appear before the intervention? Null effect is found at all false intervention timepoints.
Model Specification Does the effect depend on specific modeling choices? Effect estimate is stable across plausible alternative models.

The Scientist's Toolkit: Research Reagent Solutions

Table: Essential Methodological Tools for Causal ITS Analysis

Tool / Reagent Function in Causal Inference Example Software/Package
Segmented Regression Framework Estimates level and trend changes in outcome associated with an intervention, controlling for pre-existing trends. Base R (lm), Python (statsmodels), SAS (PROC AUTOREG)
Autocorrelation Handling Corrects for correlation between successive time points, which violates independence assumption and biases standard errors. R (forecast::auto.arima), Stata (xtregar)
Synthetic Control Method Creates a weighted combination of control units to build a more comparable counterfactual for a single treated unit. R (Synth), Python (SyntheticControlMethods)
Propensity Score Matching Balances observed covariates between intervention and control series in the pre-period to improve comparability. R (MatchIt), Stata (teffects psmatch)
Sensitivity Analysis (E-Value) Quantifies robustness of a causal conclusion to potential unmeasured confounding. R (EValue), Online E-Value calculators

Causal Inference Workflow and Logical Relationships

G Start Define Causal Question Design ITS Study Design Start->Design ControlSelect Control Series Selection & Validation Design->ControlSelect Identify Counterfactual PrimaryModel Primary Analysis: Segmented Regression ControlSelect->PrimaryModel Pre-check: Parallel Trends SA_Unmeasured Sensitivity Analysis: Unmeasured Confounding PrimaryModel->SA_Unmeasured Obtain Point Estimate SA_Control Sensitivity Analysis: Control Series Construction PrimaryModel->SA_Control SA_Specification Sensitivity Analysis: Model Specification PrimaryModel->SA_Specification Interpret Interpret Causal Effect SA_Unmeasured->Interpret E-Value & Bounds SA_Control->Interpret Placebo Tests & Synthetic Controls SA_Specification->Interpret Stability Check

Causal Inference Workflow

Conceptual Diagram of a Controlled ITS Design

ITS Design with Control Series

Drug utilization research provides critical insights into the real-world use, safety, and effectiveness of pharmaceutical products, informing healthcare policy and clinical practice. Within this field, the interrupted time series (ITS) design has emerged as a powerful quasi-experimental method for evaluating the causal effects of interventions when randomized controlled trials (RCTs) are impractical, unethical, or prohibitively expensive [97]. This review assesses the current reporting quality of ITS studies in drug utilization research, identifying common methodological strengths and deficiencies while providing structured guidance to enhance the rigor, transparency, and reproducibility of future research.

The ITS design is considered one of the strongest quasi-experimental approaches because it uses longitudinal data collected at multiple, equally spaced time points before and after a well-defined intervention to evaluate whether the intervention caused a significant change in outcomes [10]. By accounting for pre-existing trends and autocorrelation, ITS analyses can distinguish true intervention effects from underlying secular trends, making them particularly valuable for studying the impact of policy changes, new guidelines, or quality improvement initiatives on drug utilization patterns [97].

Methodological Framework of Interrupted Time Series Design

Core Components and Analytical Principles

The ITS design operates on the fundamental principle of comparing observed post-intervention data points against a counterfactual—what would have occurred had the pre-intervention trend continued unchanged [97]. This comparison relies on three essential analytical components that must be clearly defined and reported in any ITS study:

  • Pre-intervention slope: The underlying trend in the outcome variable before the intervention
  • Level change: An immediate shift in the outcome following the intervention
  • Slope change: An alteration in the trajectory of the outcome during the post-intervention period

Segmented regression represents the most common analytical approach for ITS designs, used in approximately 78% of healthcare ITS studies [10]. This method models the outcome variable as a function of time, intervention status, and time since intervention, formally testing whether the intervention was associated with statistically significant changes in level or trend.

Critical Methodological Considerations

Several methodological factors must be addressed to ensure valid ITS analysis and interpretation. The table below summarizes key considerations and their implications for study validity:

Table 1: Critical Methodological Considerations in ITS Designs

Consideration Description Impact on Validity
Autocorrelation Correlation between successive measurements Can lead to underestimated standard errors and inflated Type I errors if unaddressed [10]
Seasonality Periodic, predictable patterns in data May confound intervention effects if not accounted for in models
Non-stationarity Systematic changes in outcome unrelated to intervention Can create spurious associations if misinterpreted as intervention effects
Outliers Extreme values disproportionate to overall pattern May disproportionately influence model estimates and conclusions
Sample size Number of pre- and post-intervention observations Affects statistical power to detect meaningful intervention effects

Despite their importance, these methodological elements are frequently underreported. A systematic assessment of 116 healthcare ITS studies found that only 55% considered autocorrelation, with just 63% of those reporting formal statistical tests [10]. Similarly, seasonality was addressed in only 24% of studies, and non-stationarity in a mere 8% [10].

Current Reporting Quality in Drug Utilization Research

Systematic Assessment of Reporting Practices

A methodological review of ITS designs across healthcare settings revealed significant gaps in reporting quality that limit the reliability and interpretability of findings [10]. The assessment examined 116 ITS studies published in 2015, with many focusing on drug utilization outcomes. Key findings regarding reporting completeness include:

Table 2: Reporting Completeness in Healthcare ITS Studies (n=116)

Reporting Element Studies Including Element Percentage
Clear intervention definition 110 95%
Analysis method described 115 99%
Number of pre-intervention points stated 34 29%
Number of post-intervention points stated 32 28%
Autocorrelation considered 63 55%
Seasonality addressed 28 24%
Sample size justification 7 6%
Primary outcome specified 30 26%

This assessment identified particular deficiencies in reporting time point justification, with less than 30% of studies specifying the number of pre- and post-intervention data points [10]. This omission is critical because sufficient data points are necessary to establish pre-intervention trends and detect post-intervention changes with adequate statistical power.

Terminology Inconsistency and Design Clarity

Inconsistent terminology represents another challenge in the ITS literature. Among the 74 studies (64%) that provided some definition of their design, significant variation existed in how authors characterized the ITS approach [10]. Definitions included "interrupted time series," "quasi-experimental," "time series," "observational," "cohort," and "cross-sectional study," with only two studies consistently using the same terminology throughout their manuscript [10]. This terminology inconsistency creates confusion for readers and may reflect underlying methodological misunderstandings among authors.

Application Notes and Protocols for Robust ITS Implementation

Minimum Reporting Standards Protocol

To address identified reporting deficiencies, researchers should adhere to the following minimum reporting standards when publishing ITS studies in drug utilization research:

  • Intervention specification: Clearly define the intervention, including its precise start date and implementation details
  • Theoretical rationale: Provide justification for why an ITS design was selected over alternative approaches
  • Data collection details: Specify the frequency of data collection and number of pre- and post-intervention observations
  • Outcome definitions: Explicitly define primary and secondary outcomes with operational definitions
  • Analytical approach: Describe the statistical model used, including how autocorrelation, seasonality, and secular trends were addressed
  • Sensitivity analyses: Report any sensitivity analyses conducted to test model assumptions and robustness of findings

Analytical Protocol for Segmented Regression Analysis

For researchers implementing ITS designs, the following step-by-step protocol ensures methodological rigor:

Step 1: Visual Data Exploration

  • Create a scatter plot of the outcome variable over time
  • Mark the intervention point clearly
  • Visually assess pre-intervention trends, potential outliers, and seasonal patterns

Step 2: Model Specification

  • Develop a segmented regression model including terms for baseline trend, level change, and slope change
  • The basic model structure should be: Yt = β0 + β1 × timet + β2 × interventiont + β3 × time after interventiont + et
  • Where Yt is the outcome at time t, timet is time since start of study, interventiont is a dummy variable for pre-/post-intervention, and time after interventiont is time since intervention

Step 3: Autocorrelation Assessment

  • Examine residuals from the initial model using autocorrelation and partial autocorrelation functions
  • If autocorrelation is present, incorporate appropriate autoregressive integrated moving average (ARIMA) structures or use robust standard errors

Step 4: Model Refinement

  • Add terms to address seasonality if present in residual plots
  • Consider control variables for external factors that might confound the intervention effect
  • Evaluate potential non-linearity in trends using polynomial terms or splines

Step 5: Intervention Effect Estimation

  • Calculate the level change (β2) and slope change (β3) with confidence intervals
  • Generate model-predicted values for the counterfactual scenario (no intervention)
  • Quantify the overall intervention effect as the difference between observed and predicted values at key post-intervention time points

Step 6: Validation and Sensitivity Analysis

  • Conduct sensitivity analyses using different model specifications
  • Test robustness to exclusion of potential outliers
  • If multiple sites are available, consider hierarchical models to account for clustering

Sample Size Consideration Protocol

Although formal sample size calculations for ITS designs are complex and infrequently reported (only 6% of studies) [10], researchers should consider the following guidelines:

  • Absolute minimum: At least 3 pre-intervention and 3 post-intervention observations
  • Recommended minimum: 8-12 observations in each period for adequate statistical power
  • Ideal: 12 or more observations in each period to confidently establish trends and detect changes
  • Formal power calculations: Use specialized software (e.g., powerpropack.software.informer.com) or simulation-based approaches based on anticipated effect sizes and autocorrelation structure

Visualization Framework for ITS Analysis

Effective visualization is critical for both analyzing ITS data and communicating findings. The following standardized approach ensures clarity and consistency in graphical representation.

Core ITS Workflow Visualization

ITSWorkflow Start Define Research Question DataCollection Data Collection Time Series Start->DataCollection Exploratory Exploratory Analysis Visual Inspection DataCollection->Exploratory ModelSpec Model Specification Segmented Regression Exploratory->ModelSpec AssumptionCheck Check Assumptions Autocorrelation ModelSpec->AssumptionCheck ModelRefine Model Refinement AssumptionCheck->ModelRefine If needed EffectEst Effect Estimation AssumptionCheck->EffectEst If valid ModelRefine->EffectEst Interpretation Interpretation & Reporting EffectEst->Interpretation

Visualization 1: Analytical workflow for interrupted time series studies

ITS Design Components Visualization

ITSComponents Past Pre-Intervention Period InterventionPoint Intervention Point Past->InterventionPoint Future Post-Intervention Period InterventionPoint->Future LevelChange Level Change (β2) InterventionPoint->LevelChange Counterfactual Counterfactual Trend (No Intervention) InterventionPoint->Counterfactual Actual Actual Observed Trend InterventionPoint->Actual PreTrend Pre-Intervention Trend (β1) PreTrend->InterventionPoint SlopeChange Slope Change (β3) LevelChange->SlopeChange

Visualization 2: Key components and parameters in ITS analysis

The Scientist's Toolkit: Essential Research Reagents for ITS Studies

Successful implementation of ITS designs in drug utilization research requires both methodological expertise and appropriate analytical tools. The following table details essential "research reagents" for conducting robust ITS analyses:

Table 3: Essential Research Reagents for ITS Studies in Drug Utilization Research

Tool Category Specific Examples Function in ITS Analysis
Statistical Software R (package: its.analysis), SAS (PROC AUTOREG), Stata (itsa), Python (statsmodels) Implement segmented regression models, account for autocorrelation, estimate intervention effects [97]
Data Visualization Tools ggplot2 (R), matplotlib (Python), Stata graphics Create time series plots showing pre/post-intervention trends, highlight intervention points, display model fits
Autocorrelation Diagnostics Durbin-Watson test, Ljung-Box test, ACF/PACF plots Detect and quantify correlation between successive observations, inform model selection [10]
Sample Size Calculators Power calculations for ITS designs (simulation-based) Determine adequate number of pre/post observations to detect clinically meaningful effect sizes
Reporting Guidelines CONSORT extension for ITS, RECORD statement Ensure comprehensive reporting of methods, results, and limitations [10]

The current state of reporting quality in drug utilization research using ITS designs shows significant room for improvement. While the method itself represents a powerful quasi-experimental approach for evaluating interventions, deficiencies in reporting analytical methods, handling autocorrelation, justifying sample sizes, and using consistent terminology limit the reliability and interpretability of findings.

Moving forward, the field would benefit from developing and adopting formal reporting guidelines specific to ITS designs, similar to CONSORT for randomized trials or STROBE for observational studies. Such guidelines would address the unique methodological features of ITS analyses and promote more transparent, reproducible research practices. Additionally, increased attention to statistical fundamentals—including autocorrelation assessment, seasonality adjustment, and appropriate sample size considerations—would enhance the validity of causal inferences drawn from ITS studies in drug utilization research.

As pharmaceutical interventions and policy changes continue to evolve rapidly, rigorous application of ITS designs will remain essential for generating timely evidence about the real-world impacts of these changes on drug utilization patterns and patient outcomes.

Conclusion

Interrupted Time Series design is an indispensable tool for evaluating population-level interventions in healthcare and drug development, offering a rigorous framework for causal inference when randomized trials are not feasible. Successful implementation hinges on a deep understanding of its foundational principles, careful selection and application of analytical methods like segmented regression, ARIMA, or GAM, and diligent attention to common pitfalls such as autocorrelation and seasonality. As methodological research advances, future efforts should focus on the adoption of formal reporting guidelines, improved handling of hierarchical data structures, and the development of more accessible sample size estimation techniques. By mastering these elements, researchers can leverage ITS to generate robust, actionable evidence that directly informs clinical practice and public health policy.

References