This article provides a comprehensive guide to Interrupted Time Series (ITS) design, a powerful quasi-experimental method for evaluating the impact of interventions in healthcare and drug development.
This article provides a comprehensive guide to Interrupted Time Series (ITS) design, a powerful quasi-experimental method for evaluating the impact of interventions in healthcare and drug development. Tailored for researchers and professionals, it covers foundational concepts, advanced methodological approaches including segmented regression and ARIMA models, and common analytical pitfalls. Drawing on current literature and empirical findings, the guide offers practical solutions for troubleshooting issues like autocorrelation and model specification, compares the performance of different analytical techniques, and outlines best practices for validation and reporting to ensure rigorous and reliable study outcomes.
In health services and policy research, the Interrupted Time Series (ITS) design has emerged as a powerful quasi-experimental method for evaluating the effects of interventions when randomized controlled trials (RCTs) are infeasible, unethical, or impractical [1]. ITS analyses data collected at multiple time points before and after a well-defined interruption—such as the implementation of a new policy, drug approval, or clinical guideline—to assess whether the intervention caused a significant change in level or trend of the outcome of interest [2] [3]. This design is particularly valuable for evaluating the real-world effectiveness of large-scale health interventions and is increasingly employed in pharmacoepidemiology and health services research.
The fundamental strength of ITS lies in its ability to establish a pre-intervention trend and compare it with post-intervention data, creating a counterfactual framework that strengthens causal inference beyond simpler pre-post designs [1]. By accounting for underlying secular trends, ITS can distinguish between changes that would have occurred naturally and those truly attributable to the intervention. This methodological rigor, combined with its practical applicability, makes ITS an indispensable tool for researchers and drug development professionals seeking robust evidence for regulatory and policy decisions.
The standard ITS model can be represented mathematically as [3]:
Yₜ = β₀ + β₁T + β₂Xₜ + β₃TXₜ + εₜ
Where:
Table 1: Comparison of Quasi-Experimental Methods in Health Research
| Method | Data Requirements | Key Assumptions | Strengths | Limitations |
|---|---|---|---|---|
| Interrupted Time Series | Multiple time points before & after intervention | Underlying trend would continue without intervention; no concurrent interventions | Controls for secular trends; no control group needed | Requires sufficient data points; vulnerable to autocorrelation |
| Difference-in-Differences | Pre/post data for treatment and control groups | Parallel trends assumption | Controls for time-invariant confounders | Violation of parallel trends can bias estimates |
| Synthetic Control Method | Time-series data for treated unit & multiple control units | Combination of control units approximates treatment unit | Flexible approach for single-unit interventions | Complex implementation; limited statistical inference |
| Pre-Post Design | Single point before & after intervention | No other factors changed between measurements | Simple implementation and analysis | Highly vulnerable to confounding and secular trends |
Among these approaches, ITS performs particularly well when data for a sufficiently long pre-intervention period are available and the underlying model is correctly specified [1]. When all included units have been exposed to treatment (single-group designs), ITS provides a robust framework for impact evaluation without requiring identification of control groups [1].
Table 2: Step-by-Step ITS Implementation Protocol for Drug Development Research
| Phase | Key Activities | Methodological Considerations | Quality Checks |
|---|---|---|---|
| Protocol Development | Define research question; identify intervention point; select outcome measures; determine sample size | Ensure clinical relevance of outcomes; specify primary and secondary analyses | Protocol registration; ethical approvals; statistical review |
| Data Collection | Extract time-series data; ensure consistent measurement; document data sources | Adequate pre- and post-intervention periods; consistent frequency; manage missing data | Data quality audit; verification of intervention timing; outlier assessment |
| Model Specification | Select statistical model; account for autocorrelation; control for covariates | Check stationarity; identify seasonal patterns; select appropriate correlation structure | Residual diagnostics; goodness-of-fit tests; variance inflation factors |
| Analysis Execution | Parameter estimation; hypothesis testing; effect size calculation | Adjust for autocorrelation; consider segmented regression; model level and slope changes | Sensitivity analyses; validation of model assumptions; robustness checks |
| Interpretation & Reporting | Estimate intervention effects; contextualize findings; discuss limitations | Differentiate statistical vs. clinical significance; consider confounding factors | Comparison with prior evidence; assessment of publication bias |
For complex interventions, researchers may employ two-stage ITS designs to evaluate multiple intervention components simultaneously [4]. The statistical code provided illustrates how to model two sequential interventions while controlling for covariates using PROC MIXED for continuous outcomes and Poisson regression for count outcomes [4]. This approach is particularly relevant in drug development when evaluating phased implementation or combination therapies.
When implementing ITS analyses, researchers must address autocorrelation (correlation between successive observations), which violates the independence assumption of standard regression models [2]. Appropriate techniques include using autoregressive integrated moving average (ARIMA) models or including correlation structures in generalized estimating equations [2].
Effective visualization is crucial for interpreting and communicating ITS findings. The following standards ensure accurate representation of time series data and intervention effects:
Recent assessments of ITS graphs in published literature found that only 33% allowed accurate data extraction, highlighting the need for greater adherence to visualization standards [2]. Common deficiencies included unclear data points, missing trend lines, and poorly defined interruption points.
ITS Analytical Workflow
Evidence from methodological comparisons demonstrates that ITS provides reliable effect estimates when its assumptions are met. A simulation study comparing quasi-experimental methods found that ITS performs very well when all included units have been exposed to treatment and data for a sufficiently long pre-intervention period are available [1]. The key advantage of ITS over simpler pre-post designs is its ability to account for and separate underlying secular trends from intervention effects.
In an empirical comparison of methods evaluating the introduction of activity-based funding in Irish hospitals, ITS produced statistically significant results that differed in interpretation from control-group methods like Difference-in-Differences and Synthetic Control [3]. This highlights the importance of method selection based on the specific research context and available data structure.
Table 3: Essential Analytical Tools for ITS Implementation
| Tool Category | Specific Software/Solutions | Primary Application in ITS | Key Functions |
|---|---|---|---|
| Statistical Software | SAS (PROC AUTOREG, PROC MIXED) | Model estimation and inference | Time series analysis; autocorrelation correction; parameter estimation |
| Statistical Packages | R (its.analysis, forecast packages) | Flexible model specification | Segmented regression; ARIMA modeling; visualization |
| Data Visualization | Stata, ggplot2, specialized ITS graphing tools | Creating standard-compliant graphs | Raw data plotting; trend line fitting; counterfactual display |
| Quality Assurance | Statistical diagnostic tools | Validation of model assumptions | Residual analysis; autocorrelation tests; goodness-of-fit assessment |
Specialized statistical software is essential for proper ITS implementation, as standard analytical packages may not adequately address autocorrelation or enable appropriate counterfactual modeling [4]. The provided SAS code illustrates a comprehensive approach to ITS analysis, including model specification, estimation of key parameters, and generation of appropriate visualizations [4].
Interrupted Time Series design represents a methodologically rigorous approach for evaluating intervention effects when RCTs are not feasible. By properly implementing ITS protocols—including appropriate model specification, accounting for autocorrelation, and adhering to visualization standards—researchers can generate robust evidence to inform drug development, health policy, and clinical practice. The structured protocols and analytical frameworks presented in this document provide researchers with practical guidance for applying ITS methods across diverse healthcare contexts.
The Interrupted Time Series (ITS) design is a robust quasi-experimental methodology used to evaluate the impact of interventions or exposures when randomized controlled trials (RCTs) are impractical due to high costs, ethical concerns, or the population-level nature of the intervention [5]. This design is characterized by the collection of data at multiple time points both before and after a clearly defined interruption. By modeling the pre-interruption trend, researchers can establish a counterfactual—what would have likely occurred without the intervention—and compare it to the observed post-interruption data [5] [6]. This allows for the estimation of both immediate and long-term intervention effects, accounting for underlying secular trends. ITS designs are particularly valuable in implementation science and public health for assessing the effects of policy changes, health system interventions, and large-scale quality improvement initiatives [5] [7].
The analysis of ITS data typically employs segmented regression models to quantify intervention effects. These effects are conceptualized through two primary components: level changes and slope changes. The standard segmented regression model for a single interruption can be parameterized as follows [6]:
Y_t = β₀ + β₁*t + β₂*D_t + β₃*(t - T_I)*D_t + ε_t
Where:
t.The core components derived from this model are summarized in the table below.
Table 1: Core Components of the Interrupted Time Series Model
| Component | Statistical Parameter | Interpretation | Visual Representation |
|---|---|---|---|
| Level Change | β₂ | Represents the immediate effect of the intervention. It is the change in the outcome's level that occurs immediately following the interruption, measured as the difference between the observed value just after the intervention and the value predicted by the pre-interruption trend. | A vertical shift in the time series line at the point of interruption. |
| Slope Change | β₃ | Represents the long-term effect of the intervention. It quantifies the change in the trajectory (steepness) of the time series after the interruption compared to the pre-interruption trend. | A change in the angle of the time series line after the interruption. |
| Pre-Interruption Slope | β₁ | Describes the underlying secular trend of the outcome before the intervention was implemented. | The direction and steepness of the line in the pre-interruption segment. |
| Baseline Level | β₀ | Represents the starting level of the outcome at time zero. | The Y-intercept of the time series. |
The primary assumption underpinning a causal interpretation of ITS results is that the pre-interruption trend would have persisted unchanged into the post-interruption period in the absence of the intervention [5]. This assumption cannot be empirically proven and relies on the researcher's contextual knowledge and methodological rigor to ensure that no other concurrent events or changes (confounders) could plausibly explain the observed deviation in the time series. Violations of this assumption threaten the validity of the study's conclusions.
A critical consideration in ITS analysis is accounting for autocorrelation (serial correlation), where data points close in time are more similar than those further apart. Failure to account for positive autocorrelation can lead to underestimated standard errors, inflated test statistics, and an increased risk of Type I errors [6]. Several statistical methods are available, each handling autocorrelation differently. An empirical evaluation of 190 published ITS found that the choice of method can lead to substantially different conclusions, with statistical significance (at the 5% level) differing in 4% to 25% of pairwise comparisons between methods [6].
Table 2: Comparison of Statistical Methods for Analyzing Interrupted Time Series Data
| Statistical Method | Description | Handling of Autocorrelation | Key Considerations |
|---|---|---|---|
| Ordinary Least Squares (OLS) | Most basic method; fits model via standard linear regression. | Does not account for autocorrelation. Standard errors are likely biased if autocorrelation is present. | Simple to implement but not recommended for ITS due to high risk of biased inference [6]. |
| OLS with Newey-West Standard Errors (NW) | Uses OLS for parameter estimates but adjusts the standard errors to account for autocorrelation and heteroscedasticity. | Corrects the standard errors post-estimation, providing more robust confidence intervals and p-values. | A pragmatic improvement over OLS; provides some protection against autocorrelation [6]. |
| Prais-Winsten (PW) | A generalized least squares (GLS) method that transforms the data to account for first-order autocorrelation. | Directly models the autocorrelation in the error term (AR1 process) and uses this to improve estimation. | Often more statistically efficient than OLS/NW when autocorrelation is correctly specified [6]. |
| Restricted Maximum Likelihood (REML) | A likelihood-based method that reduces bias in the estimation of variance components. | Can model autocorrelation directly. The Satterthwaite approximation (Satt) can be used for small samples. | Provides less biased variance estimates, which is beneficial for shorter time series [6]. |
| Autoregressive Integrated Moving Average (ARIMA) | A flexible class of models that can capture complex patterns, including autocorrelation, trends, and seasonality. | Explicitly models the dependency structure using lagged values of the series and the error terms. | Highly flexible but requires more expertise to specify the correct model order [5] [6]. |
The following protocol provides a step-by-step methodology for designing, conducting, and analyzing an ITS study.
Protocol 1: ITS Design and Analysis Workflow
T_I) at which it is implemented.β₂ for level change, β₃ for slope change) along with their confidence intervals and p-values. The level change indicates the immediate effect, while the slope change indicates the sustained, long-term effect on the trend.The following diagrams, generated using Graphviz, illustrate the core model of an ITS and the recommended analytical workflow.
Table 3: Key Research Reagent Solutions for Interrupted Time Series Analysis
| Tool / Reagent | Function / Application | Example / Note |
|---|---|---|
| Statistical Software (R/Stata/SAS) | Platform for executing segmented regression and advanced time series analyses. | R with packages like nlme, forecast, lmtest, and sandwich is widely used for its flexibility and comprehensive time series capabilities. |
| Segmented Regression Code | The script specifying the statistical model to estimate level and slope changes. | Pre-written code templates for methods like Prais-Winsten or Newey-West prevent errors and ensure reproducibility. |
| Autocorrelation Diagnostic Tests | Statistical tests to detect the presence and structure of autocorrelation in model residuals. | The Durbin-Watson test for first-order autocorrelation and the Ljung-Box test for higher-order autocorrelation are essential diagnostics. |
| WebPlotDigitizer | A tool for digitally extracting aggregated data points from published graphs in systematic reviews or meta-analyses. | Critical for including data from published ITS studies when raw data are not otherwise available [6]. |
| Implementation Science Frameworks (e.g., CFIR) | Conceptual tools to guide the understanding of context and determinants influencing the intervention being evaluated. | The Consolidated Framework for Implementation Research (CFIR) helps systematically identify potential confounders and facilitators affecting the outcome [8]. |
| Pre-Analysis Plan | A formal document pre-specifying the research question, model, primary analysis method, and outcomes. | Registering a pre-analysis plan reduces bias and enhances the credibility of reported ITS findings [6]. |
Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the impact of interventions when randomized controlled trials (RCTs) are not feasible, ethical, or practical [9] [10]. ITS analyses involve collecting data at multiple, equally spaced time points before and after a defined intervention to determine whether the intervention has caused a significant change in the level or trend of the outcome of interest [9]. This design is particularly valuable in health services research, where policymakers and researchers need to understand the real-world effects of interventions implemented at the population or health system level.
Within the traditional translational research pipeline, ITS designs occupy a crucial space in dissemination and implementation research [11]. They help answer how evidence-based clinical and preventive interventions can be successfully adopted, scaled up, and sustained within community or service delivery systems after efficacy and effectiveness have been established [11]. The strength of ITS lies in its ability to control for underlying trends and secular changes, providing a robust counterfactual for what would have happened without the intervention.
Table 1: Key Characteristics of Interrupted Time Series Design
| Characteristic | Description | Importance in Health Evaluation |
|---|---|---|
| Pre-intervention Data Points | Multiple measurements before intervention | Establishes baseline trend and pattern |
| Post-intervention Data Points | Multiple measurements after intervention | Captures intervention effects over time |
| Known Intervention Time | Exact timing of intervention is specified | Allows precise modeling of intervention effects |
| Autocorrelation Consideration | Accounting for correlation between consecutive measurements | Ensures proper statistical inference |
| Seasonality Adjustment | Controlling for periodic fluctuations | Isolates intervention effects from seasonal patterns |
Purpose: To quantify intervention effects using segmented regression, the most commonly applied method in healthcare ITS studies [10].
Methodology:
Interpretation: The coefficients for level and slope changes represent the intervention's impact, adjusted for pre-existing trends.
Purpose: To model ITS data with complex autocorrelation patterns, seasonality, or non-stationarity [9].
Methodology:
Application Notes: ARIMA demonstrates consistent performance across different policy effect sizes and seasonal patterns, while GAMs show greater robustness to model misspecification [9].
ITS Analysis with ARIMA Modeling
ITS designs are exceptionally well-suited for evaluating population-level health policies because they can detect both immediate and gradual effects of policy implementation [9]. The strength of ITS in policy analysis lies in its ability to account for pre-existing trends, which is crucial when policies are implemented in dynamic healthcare environments.
Exemplar Applications:
Table 2: Health Policy Interventions Evaluated with ITS Designs
| Policy Type | Exemplar Study Focus | Typical Outcomes Measured | Data Collection Frequency |
|---|---|---|---|
| Legislative Policies | Bans on alcohol marketing [9] | Consumption rates, Mortality | Monthly/Quarterly |
| * Fiscal Policies* | Taxation changes [9] | Sales data, Hospitalizations | Monthly |
| * Regulatory Policies* | Prescription restrictions [11] | Prescribing rates, Adverse events | Monthly |
| * Coverage Policies* | Insurance expansion [11] | Utilization rates, Health outcomes | Quarterly/Annual |
Drug Utilization Review (DUR) represents a prime application for ITS designs in pharmaceutical research and regulation [12]. ITS methods can evaluate the impact of prospective, concurrent, and retrospective DUR programs on prescribing patterns, medication safety, and healthcare utilization.
Implementation Framework:
Sample Protocol for Drug Policy Evaluation:
Drug Utilization Review ITS Framework
ITS designs are widely used to evaluate health programs at the hospital, health system, or population level [10]. Programs represent the most common intervention type evaluated using ITS (35% of healthcare ITS studies), followed by policies (28%) [10].
Key Considerations for Program Evaluation:
Determining adequate sample size (number of time points) in ITS designs remains challenging, with only 6% of healthcare ITS studies reporting any sample size calculation [10]. While traditional rules of thumb suggest a minimum of 50 observations, requirements depend on multiple factors:
Simulation approaches are recommended for power analysis, particularly for complex models like GAM where effective degrees of freedom vary by smooth term [9].
Table 3: Methodological Challenges in ITS Analysis
| Challenge | Description | Recommended Approaches |
|---|---|---|
| Autocorrelation | Correlation between sequential observations | Use Durbin-Watson test; Employ ARIMA or correlated error models [9] [10] |
| Seasonality | Regular periodic fluctuations | Include seasonal terms; Use seasonal ARIMA; Apply seasonal adjustment [9] |
| Non-stationarity | Changing mean or variance over time | Apply differencing; Use integrated (I) component in ARIMA [9] |
| Multiple Interventions | Concurrent or sequential interventions | Include multiple intervention terms; Model complex intervention patterns [10] |
| Missing Data | Gaps in time series | Use appropriate imputation; Model missing data mechanism [10] |
Recent methodological reviews reveal significant reporting gaps in healthcare ITS studies [10]:
Table 4: Essential Methodological Tools for ITS Analysis
| Tool Category | Specific Solutions | Function/Application |
|---|---|---|
| Statistical Software | R (package: forecast), SAS PROC ARIMA, Stata itsa |
Model estimation, hypothesis testing, forecasting |
| Primary Analysis Methods | Segmented regression, ARIMA models, Generalized Additive Models (GAM) | Quantifying intervention effects, handling autocorrelation [9] |
| Autocorrelation Diagnostics | Durbin-Watson test, ACF/PACF plots, Ljung-Box test | Detecting and quantifying autocorrelation in residuals [10] |
| Data Visualization | Time series plots with intervention points, ACF plots, residual plots | Visual assessment of trends, intervention effects, model adequacy |
| Sample Size Planning | Simulation-based power analysis, heuristic approaches | Determining adequate number of time points [9] |
ITS Analytical Method Selection
Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the impact of large-scale health interventions when randomized controlled trials are not feasible or ethical [13]. The analysis of data from such designs hinges on a clear understanding of core time series concepts. Autocorrelation, seasonality, stationarity, and trend are not merely statistical properties; they are fundamental characteristics that, if unaccounted for, can severely bias the estimation of an intervention's effect [14] [13]. This document provides application notes and protocols for researchers and drug development professionals, framing these concepts within the practical context of ITS implementation research, such as evaluating a new drug's rollout or a policy change affecting prescribing practices.
Trend: A trend represents a long-term increase or decrease in the data, which does not necessarily have to be linear [15]. In a pharmaceutical context, a gradual, nationwide increase in the use of a particular drug class preceding an intervention would constitute a trend. Failing to control for this underlying trend can lead to misattributing the pre-existing growth to the intervention effect.
Seasonality: Seasonality refers to patterns that repeat themselves over a fixed and known period (e.g., time of year or day of the week) [15]. This is common in health data due to factors like weather patterns (e.g., higher antibiotic prescriptions in winter) or administrative processes (e.g., increased medicine dispensings at the end of a financial year) [13]. In ITS, unmodeled seasonality can create the illusion of an intervention effect when the observed change is merely part of a predictable, recurring cycle.
Autocorrelation (Serial Correlation): Autocorrelation describes the phenomenon where successive values in a time series are correlated with themselves over time [16]. In simpler terms, an observation at one time point (e.g., drug sales this month) is often a good predictor of the observation at the next time point (drug sales next month). This violates the standard statistical assumption of independent errors. The presence of autocorrelation can severely bias inferences, leading to underestimation of standard errors and overconfidence in the significance of the intervention effect [14].
Stationarity: A time series is stationary if its statistical properties—such as mean, variance, and covariance—are constant over time [13]. Many time series models, including those based on Autoregressive Integrated Moving Average (ARIMA), require the data to be stationary. An interruption itself, like a policy change, is a structural break that induces non-stationarity. Therefore, the goal is often to achieve stationarity in the data before the intervention to build a reliable model, which is then used to assess the impact of the interruption [17] [18].
These four concepts are deeply intertwined. A time series can exhibit a trend, upon which seasonal patterns are superimposed, and the deviations from these patterns (residuals) may be autocorrelated. In ITS analysis, the primary risk is that these inherent patterns can be confounded with the intervention effect. For instance, a sharp change following an intervention might be part of a seasonal cycle, or a pre-existing downward trend might make an intervention appear more effective than it truly is. Proper ITS modeling requires isolating the intervention effect from these other components.
Table 1: Core Terminology and ITS Implications
| Term | Core Definition | Primary Risk in ITS Analysis | Common Remedial Actions |
|---|---|---|---|
| Trend | Long-term, non-random, directional movement in the data [15]. | Confounding the intervention effect with a pre-existing slope. | Detrending via differencing; including a time variable in segmented regression. |
| Seasonality | Fixed-frequency, recurring patterns (e.g., yearly, quarterly) [15]. | Misinterpreting a predictable, recurrent change as an intervention effect. | Seasonal differencing; including seasonal dummy variables; using Seasonal ARIMA (SARIMA). |
| Autocorrelation | Correlation of a time series with its own lagged values [16]. | Biased standard errors, leading to overestimation of the intervention's significance [14]. | Using models that explicitly model the error structure (e.g., ARIMA, GLS). |
| Stationarity | Constant statistical properties (mean, variance) over time [13]. | Invalid model parameters and spurious regression results. | Differencing (regular and seasonal); transformations (e.g., log); explicit trend modeling. |
This section outlines a standard workflow for preparing and analyzing an ITS dataset, focusing on diagnosing and managing trend, seasonality, and autocorrelation.
Objective: To visually inspect the raw time series data for initial patterns and prepare it for formal analysis.
Data Loading and Formatting: Load the data (e.g., a CSV file with monthly counts of a drug's dispensings) into a statistical software environment (e.g., R, Python, SAS). Ensure the time variable is correctly parsed as a date-time object and set as the series index. Python Code Snippet:
Initial Visualization: Plot the raw time series data against time. This is the first and most crucial step for identifying obvious trends, seasonal cycles, and potential structural breaks at the intervention point. Python Code Snippet:
Objective: To formally decompose the time series into its constituent parts and test for stationarity.
Time Series Decomposition: Decompose the series into trend, seasonal, and residual components. This helps visualize the contribution of each. Python Code Snippet (using statsmodels):
Test for Stationarity: Apply statistical tests to check the null hypothesis of non-stationarity.
Interpretation: A low p-value (e.g., <0.05) in the ADF test allows rejection of the null, suggesting stationarity. The converse is generally true for the KPSS test.
Addressing Non-Stationarity: If the series is non-stationary, apply differencing.
Y_t - Y_{t-1}) to remove a linear trend.Y_t - Y_{t-12} for monthly data) [19].
Python Code Snippet:
Re-run the stationarity tests on the differenced data.
Objective: To quantify and model the autocorrelation structure and fit an appropriate ITS model.
Plot Autocorrelation Functions:
Model Selection and Fitting:
time, intervention, and time after intervention.Model Diagnostics: Check the residuals of the final model. They should resemble white noise (i.e., no significant autocorrelations). Plot the ACF/PACF of the residuals and perform a Ljung-Box test.
The following workflow diagram illustrates the logical relationship and sequence of these analytical steps.
For researchers implementing ITS analyses, the following "research reagents" are essential computational tools and statistical constructs.
Table 2: Key Research Reagents for ITS Analysis
| Tool/Reagent | Type | Function in ITS Analysis |
|---|---|---|
| Statistical Software (R/Python/SAS) | Software Platform | Provides the computational environment for data manipulation, modeling, and visualization. |
| Augmented Dickey-Fuller (ADF) Test | Statistical Test | Formally tests the null hypothesis that a time series has a unit root (is non-stationary) [19]. |
| Autocorrelation Function (ACF) Plot | Diagnostic Plot | Visualizes the correlation between a time series and its lagged values, helping to identify AR/MA processes and seasonality [13]. |
| Partial Autocorrelation Function (PACF) Plot | Diagnostic Plot | Displays the partial correlation of a time series with its own lagged values, controlling for intermediate lags; crucial for identifying the order of autoregressive (AR) processes. |
| Seasonal Decomposition (e.g., STL) | Analytical Method | Separates a time series into trend, seasonal, and residual components, allowing for a clear inspection of each [20] [19]. |
| SARIMA Model | Statistical Model | Extends ARIMA to explicitly model seasonal autocorrelation patterns, defined by parameters (p,d,q)(P,D,Q,m) [13]. |
Differencing (Y_t - Y_{t-1}) |
Data Transformation | A method to remove trend and achieve stationarity by computing the changes between consecutive observations [13]. |
The rigorous application of ITS design is particularly relevant in the context of contemporary drug development. The industry is increasingly characterized by the use of artificial intelligence for drug discovery, a growth in personalized and precision medicine, and the adoption of virtual clinical trials and digital health technologies [21] [22]. These trends generate complex, longitudinal data perfect for ITS evaluation. For instance, an ITS could be used to assess the impact of an AI-driven diagnostic tool on time-to-patient-identification for a rare disease trial, or to evaluate how a new policy on personalized medicine reimbursement affected the uptake of a targeted therapy. In all these scenarios, controlling for autocorrelation, seasonality, and underlying trends is essential for deriving valid, actionable insights that can inform regulatory and commercial decisions.
The interrupted time series (ITS) design is a powerful quasi-experimental methodology used to evaluate the effects of interventions when randomized controlled trials are not feasible, ethical, or practical [23] [24]. Within this framework, segmented regression has emerged as the most prevalent and recommended statistical technique for analyzing time series data before and after a well-defined intervention [24] [25]. Also known as piecewise or broken-stick regression, this method allows researchers to quantify whether an intervention causes a significant change in the level or trend of an outcome of interest, beyond any pre-existing secular trends [23] [26].
The fundamental strength of segmented regression in ITS analysis lies in its ability to distinguish intervention effects from underlying trends that would have occurred regardless of the intervention [24]. This addresses a critical limitation of simple before-and-after comparisons, which may wrongly attribute secular trends to the intervention itself [23]. In implementation research across healthcare, public policy, and pharmaceutical development, segmented regression provides a robust analytical foundation for causal inference about real-world interventions [23] [27].
Segmented regression models for ITS partitions a series of observations into pre- and post-intervention segments, fitting separate regression lines to each interval [26]. The most common parameterization for a continuous outcome variable involves four key parameters [23] [28]:
Table 1: Core Coefficients in Segmented Regression for ITS Analysis
| Parameter | Interpretation | Causal Inference Question |
|---|---|---|
| β₀ | Baseline level of outcome at time zero | What was the starting level? |
| β₁ | Pre-intervention slope (secular trend) | What was the underlying trend before intervention? |
| β₂ | Immediate level change following intervention | Did the intervention cause an immediate shift? |
| β₃ | Change in slope from pre- to post-intervention | Did the intervention alter the ongoing trend? |
The basic segmented regression model is expressed as:
Yt = β₀ + β₁ × time + β₂ × intervention + β₃ × post-time + εt
Where Yt is the outcome at time t, "time" indicates elapsed time since study start, "intervention" is a dummy variable (0 pre-intervention, 1 post-intervention), and "post-time" indicates time since intervention started (0 before intervention, 1,2,3... after) [25].
Research has identified two common parameterizations of segmented regression with important interpretive differences [28] [29]. The approach by Wagner et al. uses (T-Ti) in the interaction term, where Ti is the intervention start time, making β₂ directly interpretable as the immediate level change [28]. In contrast, the approach by Bernal et al. uses simple time (T) in the interaction term, where β₂ represents the difference in intercepts at time zero rather than the immediate effect at intervention time [28] [29].
This distinction is crucial because using the incorrect interpretation can lead to erroneous conclusions about intervention effects [29]. Regardless of parameterization, the immediate level change at intervention time Ti is calculated as β₂ + β₃ × Ti in the Bernal parameterization, but is directly estimated as β₂ in the Wagner parameterization [28].
For valid causal inference using segmented regression in ITS designs, several key assumptions must be met:
The assumption of independent errors is frequently violated in time series data due to autocorrelation, where consecutive measurements are correlated [24]. When autocorrelation exists but is ignored, standard errors may be underestimated, increasing Type I error rates [24]. Appropriate statistical techniques, such as including autoregressive terms or using generalized estimating equations (GEE), should be employed to address this issue [27].
For evaluating an intervention implemented at a single, well-defined time point:
Data Preparation
Model Specification
Interpretation
Three-Segment Model for Transition Periods When interventions are gradually implemented or effects manifest over a transition period [25]:
Plateau Model with Estimated Breakpoint When the intervention timing or breakpoint is unknown and must be estimated from data [30]:
Table 2: Comparison of Segmented Regression Approaches for ITS Designs
| Model Type | Key Features | Indications | Statistical Considerations |
|---|---|---|---|
| Classic Two-Segment | Single breakpoint at known intervention time; estimates level and slope changes | Sharp, immediate interventions with known implementation timing | Autocorrelation adjustment; sufficient data points per segment (≥8) |
| Three-Segment with Transition | Models gradual implementation; accounts for transition period using CDFs | Interventions phased over time; training periods; gradual effect manifestation | Selection of appropriate distribution pattern for transition; sensitivity analysis for transition length |
| Plateau with Estimated Breakpoint | Estimates breakpoint from data; continuity constraints | Unknown intervention timing; natural thresholds; effect saturation | Nonlinear estimation; initial parameter guesses; more complex implementation |
| Multivariate Segmented | Multiple independent variables with potential breakpoints | Complex interventions; multiple simultaneous components | Increased model complexity; potential for overfitting |
Table 3: Essential Methodological Tools for Segmented Regression Analysis
| Tool/Technique | Function/Purpose | Implementation Examples |
|---|---|---|
| Segmented Package (R) | Fitting segmented regression models with estimated breakpoints | segmented() function for breakpoint detection and piecewise terms |
| PROC NLIN (SAS) | Nonlinear regression for complex segmented models with constraints | Plateau models with smoothness constraints at estimated breakpoints |
| Generalized Estimating Equations (GEE) | Accounting for autocorrelation in correlated time series data | proc genmod in SAS; geeglm in R for panel ITS data |
| Cumulative Distribution Functions (CDFs) | Modeling transition periods in optimized segmented regression | Uniform, normal, log-normal distributions for gradual effects |
| Durbin-Watson Test | Detecting autocorrelation in regression residuals | Statistical testing for serial correlation in time series errors |
A frequent error in segmented regression parameterization involves incorrect specification of the interaction term [29]. Researchers must use the product between the intervention variable and time elapsed since intervention start (T-Ti), rather than time since study beginning (T), to obtain valid estimates of the immediate level change [29]. Simulation studies demonstrate that using the incorrect parameterization can produce substantially biased estimates of the intervention's immediate effect [29].
Multiple Intervention Components For complex interventions with several components introduced at different times:
Multi-site ITS Designs When data comes from multiple implementation sites:
Segmented regression remains the gold standard analytical method for interrupted time series designs in implementation research, providing robust causal inference about intervention effects while accounting for underlying secular trends [23] [24]. When properly specified with attention to key assumptions—particularly regarding parameterization, autocorrelation, and intervention timing—it offers researchers across scientific domains a powerful tool for evaluating real-world interventions [28] [29]. The continued development of optimized segmented regression approaches, particularly for handling transition periods and complex intervention scenarios, further enhances its applicability to contemporary implementation research challenges [25] [27].
Interrupted Time Series (ITS) analysis is a powerful quasi-experimental design for evaluating the population-level impact of health policy interventions, pharmaceutical regulations, and public health initiatives when randomized controlled trials are not feasible [13] [31]. Within this framework, Autoregressive Integrated Moving Average (ARIMA) and Seasonal ARIMA (SARIMA) models provide sophisticated analytical approaches that account for complex temporal structures, including autocorrelation, trends, and seasonal patterns, which simpler segmented regression models may inadequately capture [13] [32]. For researchers and drug development professionals, these models offer a robust methodology for determining whether an intervention—such as a new drug policy, vaccination campaign, or market approval—creates a significant deviation from pre-existing trends in outcomes like prescribing rates, disease incidence, or product demand [13] [32] [33].
The core strength of ARIMA/SARIMA models lies in their ability to model the outcome variable based on its own past values and previous forecast errors, while explicitly accounting for temporal dependencies that violate the independence assumption of standard statistical tests [13] [34]. This is particularly valuable in pharmaceutical and public health research, where data often exhibit seasonal fluctuations (e.g., annual influenza patterns, quarterly reporting cycles) and serial correlation that must be controlled to accurately isolate intervention effects [13] [32]. By properly addressing these temporal structures, ARIMA/SARIMA models reduce biased estimation of intervention impacts and provide more valid causal inference in observational settings [35].
ARIMA models combine three primary components to describe and forecast time series data. The model is characterized by three parameters: (p, d, q), where:
AR (Autoregressive component - p): This component models the current value of the time series as a linear combination of its previous values [13] [34]. An autoregressive model of order p (AR(p)) can be expressed as:
( Yt = c + \phi1 Y{t-1} + \phi2 Y{t-2} + \dots + \phip Y{t-p} + \varepsilont )
where (Yt) is the value at time t, (c) is a constant, (\phi1, \dots, \phip) are the autoregressive parameters, and (\varepsilont) represents the error term [13].
I (Integrated component - d): This component involves differencing the time series to make it stationary—removing trends and achieving constant statistical properties over time [13] [34] [36]. The order of differencing (d) indicates how many times differencing is applied. First-order differencing is expressed as:
( Yt' = Yt - Y_{t-1} )
Stationarity is crucial for ARIMA modeling as it ensures stable parameters over time, typically verified using tests like the Augmented Dickey-Fuller (ADF) test [37] [36].
MA (Moving Average component - q): This component models the current value based on the residual errors from previous time points [13] [34]. A moving average model of order q (MA(q)) is expressed as:
( Yt = c + \varepsilont + \theta1 \varepsilon{t-1} + \theta2 \varepsilon{t-2} + \dots + \thetaq \varepsilon{t-q} )
where (\theta1, \dots, \thetaq) are the moving average parameters [13].
The complete ARIMA(p,d,q) model combines these elements to regress the time series on its own lagged values and lagged forecast errors [34] [36].
For time series with seasonal patterns, the SARIMA model extends ARIMA by incorporating seasonal components. A SARIMA model is denoted as (p, d, q)(P, D, Q)_m, where [38] [34]:
The seasonal component models patterns that repeat at fixed intervals, addressing regular fluctuations such as increased prescribing of certain medications in winter months or quarterly reporting cycles in pharmaceutical sales [32] [38]. Seasonal differencing (D) removes seasonal trends, for instance, by computing ( Yt - Y{t-m} ) [34].
Step 1: Data Preprocessing
Step 2: Stationarity Testing and Differencing
Table 1: Stationarity Assessment and Differencing Guidelines
| Scenario | ADF Test Result | Recommended Action | Target d/D |
|---|---|---|---|
| No trend, constant variance | p < 0.05 | No differencing needed | d = 0 |
| Linear trend, constant variance | p > 0.05 | First-order differencing | d = 1 |
| Nonlinear trend, changing variance | p > 0.05 | Second-order differencing or transformation | d = 2 |
| Seasonal pattern present | p > 0.05 at seasonal lags | Seasonal differencing | D = 1 with appropriate m |
Step 3: Determine AR and MA Orders Using ACF and PACF
Step 4: Model Selection and Validation
Table 2: Interpretation of ACF and PACF Patterns for Model Identification
| Pattern | ACF Behavior | PACF Behavior | Suggested Model |
|---|---|---|---|
| AR(p) | Decays exponentially or sinusoidal | Significant spikes at lag p, then cuts off | AR(p) with order p |
| MA(q) | Significant spikes at lag q, then cuts off | Decays exponentially or sinusoidal | MA(q) with order q |
| ARMA(p,q) | Decays after lag q | Decays after lag p | ARMA(p,q) |
| Seasonal AR | Decays at seasonal lags | Significant spikes at seasonal lags | Seasonal AR(P) |
| Seasonal MA | Significant spikes at seasonal lags | Decays at seasonal lags | Seasonal MA(Q) |
Step 5: Modeling Intervention Effects
ARIMA/SARIMA models have demonstrated substantial utility across various pharmaceutical and public health research contexts:
In a study evaluating Australia's policy change restricting quetiapine prescriptions, ARIMA modeling quantified a significant reduction in inappropriate prescribing following the intervention, demonstrating its value for pharmaceutical policy analysis [13]. Similarly, research on COVID-19's impact on routine immunization in Kenya employed SARIMA to account for seasonal patterns in vaccine coverage, revealing immediate decreases in pentavalent and measles/rubella vaccine doses following pandemic onset, with recovery within approximately four months [32].
ARIMA/SARIMA models provide critical forecasting capabilities for pharmaceutical supply chain management. Studies comparing forecasting approaches found that time series models effectively predict drug demand, enabling optimized production, inventory management, and market responsiveness [33]. Accurate forecasting is particularly valuable for pharmaceutical companies where prediction errors can significantly impact operational efficiency and resource allocation [39] [33].
In infectious disease research, SARIMA models help quantify the impact of public health interventions by accounting for both seasonal patterns and underlying trends [32] [31]. For example, studies have evaluated antibiotic stewardship programs, vaccination campaigns, and pandemic control measures while controlling for autocorrelation and seasonal variation in disease incidence [32] [31].
Table 3: Essential Computational Tools for ARIMA/SARIMA Implementation
| Tool/Software | Primary Function | Application Context |
|---|---|---|
| R statistical software (statsmodels package) | Model fitting and diagnostics | Comprehensive time series analysis [13] [38] |
| Python (statsmodels, pmdarima) | Automated parameter selection and forecasting | Flexible implementation with machine learning integration [37] [36] |
| Augmented Dickey-Fuller test | Stationarity testing | Determining differencing order (d) [37] [36] |
| ACF/PACF plots | Model order identification | Visual guidance for p, q, P, Q selection [13] [36] |
| AIC/BIC criteria | Model comparison | Selecting optimal parameter combinations [34] |
ARIMA/SARIMA Model Implementation Workflow for ITS Studies
When designing ITS studies using ARIMA/SARIMA models, statistical power depends on several factors. Simulation studies indicate that [35]:
Smaller effect sizes (<0.5) or fewer time points may yield inadequate power, potentially leading to false conclusions about intervention effectiveness [35].
ARIMA and SARIMA models provide robust analytical frameworks for evaluating interventions in interrupted time series designs, particularly when data exhibit complex temporal structures including trends, autocorrelation, and seasonal patterns. By properly accounting for these features, researchers in pharmaceutical development and public health can obtain more valid estimates of intervention effects, leading to better-informed policy decisions and resource allocation. The structured protocol outlined in this document offers a systematic approach to model identification, estimation, and interpretation, supporting rigorous evaluation of health interventions in observational settings where randomized trials are not feasible.
Generalized Additive Models (GAMs) represent a powerful extension of Generalized Linear Models (GLMs) that replace the linear relationship between predictors and outcome with flexible smooth functions, enabling the modeling of complex, non-linear patterns without requiring prior specification of the relationship's form [40] [41]. In the context of interrupted time series (ITS) design implementation research, this flexibility is particularly valuable for evaluating policy interventions or treatment effects where the underlying trends may follow non-linear patterns that traditional segmented regression cannot adequately capture [9] [42].
The fundamental equation for a GAM can be expressed as:
[g(\mu) = \beta0 + f1(x1) + f2(x2) + \cdots + fp(x_p)]
where (g(\mu)) is the link function, (\beta0) is the intercept, and (fj(x_j)) are smooth functions of the covariates [40] [43]. This structure maintains the additivity of GLMs while allowing for non-linear relationships through the smooth functions, striking a balance between interpretability and flexibility [44] [40].
Compared to traditional linear models, GAMs offer distinct advantages for ITS research. While linear models assume a straight-line relationship between predictors and outcome, GAMs can capture complex nonlinear trends common in real-world time series data [45] [42]. Unlike polynomial regression, which can produce wild extrapolations at the endpoints (Runge's phenomenon), GAMs use smoothing splines that provide more stable behavior at data boundaries [46]. Additionally, compared to complex machine learning approaches, GAMs maintain interpretability through their additive structure, allowing researchers to understand and communicate the effect of individual variables [44] [40].
Table 1: Comparison of Statistical Approaches for Interrupted Time Series Analysis
| Model Type | Key Characteristics | Assumptions | Strengths | Limitations |
|---|---|---|---|---|
| Segmented Linear Regression | Assumes linear trends before and after intervention | Linear relationships, independent errors | Simple interpretation, widely understood [42] | Poor performance with nonlinear trends [42] |
| ARIMA Models | Accounts for autocorrelation, trend, and seasonality | Stationarity after differencing | Handles complex autocorrelation structures [9] | Complex specification, less intuitive [9] |
| Generalized Additive Models (GAMs) | Flexible smooth functions capture nonlinear patterns | Additivity, smooth relationships | Captures nonlinear trends automatically, robust to model misspecification [9] [42] | Computational intensity, smoothing parameter selection [45] |
Table 2: Performance Comparison of GAMs vs. Alternative Methods in Simulation Studies
| Study Context | Comparison Model | Key Finding | Performance Metric |
|---|---|---|---|
| Policy Intervention Evaluation [42] | Segmented Linear Regression | GAMs showed better performance with nonlinear trends, similar performance with linear trends | Lower MSE and MPE for nonlinear data |
| Health Policy Analysis [9] | ARIMA | GAMs more robust to model misspecification; ARIMA more consistent with different effect sizes | Model accuracy under varying conditions |
| Clinical Prediction Rules [43] | Traditional Categorization | GAM-based categorization performed similarly to continuous predictors | No significant differences in AUC values |
gam.check() or similar functionality [40].
GAM Implementation Workflow for ITS Studies
GAM Mathematical Components and Structure
Table 3: Essential Software Tools for Implementing GAMs in ITS Research
| Tool Name | Type | Primary Function | Key Features for ITS |
|---|---|---|---|
| mgcv (R Package) [48] [42] | Software Library | GAM estimation and inference | Automatic smoothing parameter selection, various basis functions, AR1 error structures |
| pyGAM (Python Package) [46] | Software Library | GAM implementation in Python | Multiple regression types, grid search for optimization, custom loss functions |
| marginaleffects (R Package) [48] | Analysis Tool | Post-estimation interpretation | Conditional and marginal effects, predictions, hypothesis testing for GAMs |
| gam.check (mgcv) [40] | Diagnostic Tool | Model validation | Basis dimension checks, residual diagnostics, QQ-plots |
| Thin Plate Regression Splines [42] | Smoothing Method | Default smoother in mgcv | Optimal smoothing given basis dimension, no knot placement required |
GAMs offer particular utility in pharmaceutical and healthcare research where ITS designs are commonly employed to evaluate the impact of policy changes, treatment guidelines, or safety interventions. The non-linear modeling capability of GAMs allows researchers to detect and quantify intervention effects that may follow complex temporal patterns not captured by traditional methods [47] [42].
In a study evaluating the impact of Spain's 2012 cost-sharing reform on pharmaceutical prescriptions, GAMs revealed non-linear trends that would have been missed by segmented linear regression, providing more accurate estimates of the policy's cumulative effect [42]. Similarly, in clinical prediction research, GAMs have been used to develop optimal categorization schemes for continuous clinical variables, preserving critical prognostic information while creating clinically practical decision rules [43].
For drug safety monitoring, GAMs can model complex seasonal patterns in adverse event reports while evaluating the impact of safety warnings, and in clinical biomarker studies, GAMs have elucidated non-linear relationships between alcohol consumption and inflammatory markers like IL-6, revealing risk patterns that would be obscured by dichotomization or linear assumptions [47].
The flexibility of GAMs to accommodate non-normal distributions through appropriate link functions makes them particularly suitable for healthcare outcomes, which often include counts (e.g., hospitalizations), binary outcomes (e.g., mortality), or skewed continuous measures (e.g., healthcare costs) [41]. This distributional flexibility, combined with the ability to capture non-linear temporal patterns, positions GAMs as a powerful analytical tool for the complex longitudinal data common in pharmaceutical and health services research.
The Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the longitudinal effects of interventions implemented at a population level, such as new health policies, system changes, or drug utilization interventions [49] [50]. Within implementation research, ITS analysis enables researchers to determine whether an intervention has produced a significant effect beyond underlying trends by analyzing data points collected at regular intervals before and after an intervention point [51]. Despite its growing popularity in drug development and health services research, methodological challenges in its application persist, particularly regarding sample size determination, data aggregation strategies, and pre-specification of intervention effects [49] [51].
Recent evidence indicates substantial methodological gaps in current ITS practice. A cross-sectional survey of 153 drug utilization studies using ITS design found that only 28.1% clearly explained the rationale for using ITS, merely 13.7% clarified the rationale for their chosen model structure, and a mere 20.8% of studies using aggregated data justified the number of time points selected [51]. These identified shortcomings highlight the critical need for standardized protocols in ITS design and analysis.
This application note addresses three fundamental design considerations—sample size calculation, data aggregation principles, and pre-specification of intervention effects—to enhance the methodological rigor of ITS studies in implementation research. We provide detailed protocols and practical tools to help researchers navigate these complex methodological decisions within the context of drug development and healthcare policy evaluation.
Determining adequate statistical power and sample size in ITS studies presents unique challenges compared to traditional experimental designs. The "sample size" in ITS refers to the number of time points observed before and after an intervention, while power depends on multiple factors including the magnitude of intervention effects, underlying variance, autocorrelation, and the model structure itself [49]. Unlike standard power calculations for clinical trials, ITS power analysis must account for temporal dependencies in the data, often requiring specialized simulation-based approaches.
A recent methodological review of ITS studies in drug utilization research revealed that only 20.8% of studies using aggregated data provided justification for their selected number of time points, indicating a substantial gap in current reporting practices [51]. This omission is critical because underpowered ITS studies may fail to detect clinically significant intervention effects, while overly long series may waste resources and potentially introduce confounding from external factors.
Simulation-based approaches represent the current best practice for power calculation in ITS designs [49] [52]. These methods involve generating multiple synthetic datasets with known effect sizes under various assumptions about the data structure, then analyzing each dataset to estimate the probability of detecting the specified effects (statistical power).
Table 1: Key Parameters for Simulation-Based Power Analysis in ITS Studies
| Parameter Category | Specific Parameters | Considerations |
|---|---|---|
| Effect Size | Level change (β₂), Slope change (β₄) | Based on clinically meaningful difference or policy-relevant threshold |
| Time Series Structure | Number of pre- and post-intervention points, Total series length | Balance between statistical power and practical feasibility |
| Statistical Properties | Autocorrelation (ρ), Variance (σ²), Seasonality patterns | Estimate from preliminary data or literature review |
| Model Specifications | Regression type (linear, Poisson, negative binomial), Inclusion of controls | Match to outcome data type and study design |
For continuous outcomes, the segmented autoregressive model can be specified as:
Yₜ = β₀ + β₁Tₜ + β₂Xₜ + β₃(Tₜ - t₁)Xₜ + εₜ
where εₜ = ρεₜ₋₁ + uₜ, uₜ ~ N(0, σ²) [49]
For count outcomes, which are common in drug utilization research (e.g., monthly prescriptions, adverse events), observation-driven models such as Poisson or negative binomial regression with lagged terms on the conditional mean are appropriate [52]. The power to detect the same magnitude of parameters varies considerably depending on whether testing focuses on level changes, trend changes, or both, necessitating careful pre-specification of primary hypotheses [52].
Protocol 1: Simulation-Based Power Calculation for ITS Studies
Define Primary Hypothesis: Clearly specify whether the expected intervention effect manifests as an immediate level change, a slope change, or both. This determines the key parameters (β₂, β₄, or both) for power calculation.
Estimate Baseline Parameters:
Specify Effect Sizes: Define clinically or policy-relevant effect sizes for level changes (β₂) and/or slope changes (β₄). Conduct power calculations across a plausible range of values.
Implement Simulation Code:
Vary Key Parameters Systematically: Execute simulations across different combinations of:
Create Power Curves: Generate graphical representations showing statistical power as a function of the number of time points for different effect sizes and autocorrelation values.
Select Final Design: Choose the number of time points that provides adequate power (typically 80% or higher) for clinically relevant effect sizes while considering practical constraints.
Table 2: Exemplary Power Analysis Results for Different ITS Scenarios
| Scenario | Pre/Post Points | Autocorrelation (ρ) | Level Change | Slope Change | Achieved Power |
|---|---|---|---|---|---|
| Base Case | 24/24 | 0.2 | 0.5 SD | 0.1 SD/t | 82% |
| High AC | 24/24 | 0.6 | 0.5 SD | 0.1 SD/t | 64% |
| Longer Series | 36/36 | 0.2 | 0.5 SD | 0.1 SD/t | 92% |
| Larger Effect | 24/24 | 0.2 | 0.8 SD | 0.15 SD/t | 96% |
| Count Outcome | 24/24 | 0.2 | IRR=1.5 | OR=1.2/t | 85% |
Data aggregation refers to the process of combining individual-level data into meaningful temporal units for analysis. Appropriate aggregation is critical in ITS designs as it directly affects the interpretation of intervention effects and statistical properties of the time series. Most ITS studies (97.4% in recent surveys) use aggregated data as the unit of analysis, with monthly intervals being the most common approach (73.8%) [51].
The choice of aggregation level involves balancing statistical precision, methodological requirements, and clinical relevance. Longer intervals (e.g., monthly or quarterly) typically reduce variability and mitigate autocorrelation issues but may obscure brief intervention effects or precise timing of changes. Shorter intervals (e.g., daily or weekly) offer finer temporal resolution but often exhibit higher variability and stronger autocorrelation.
Protocol 2: Systematic Approach to Data Aggregation in ITS Studies
Define Temporal Unit Based on Intervention Mechanism:
Ensure Consistency in Pre- and Post-Intervention Periods: Maintain identical aggregation units throughout the entire study period to avoid artificial discontinuities.
Address Missing Data Proactively:
Validate Aggregation Level:
Account for Seasonal Patterns:
Document Rationale Explicitly: Justify the chosen aggregation level based on intervention characteristics, data availability, and statistical considerations.
Data Aggregation Decision Pathway
The flowchart above illustrates a systematic approach to selecting appropriate data aggregation levels in ITS studies, balancing methodological requirements with practical considerations.
Pre-specification of hypothesized intervention effects represents a critical safeguard against Type I errors and data-driven conclusions in ITS analyses. Recent evidence indicates that only 13.7% of ITS studies in drug utilization research provide clear justification for their selected model structure [51]. Furthermore, approximately 15 studies provided incorrect interpretation of level change parameters due to improper time parameterization, highlighting the need for greater precision in model specification [51].
Pre-specification involves explicitly stating the expected nature, timing, direction, and magnitude of intervention effects before data collection or analysis. This practice enhances research transparency, minimizes analytical flexibility, and strengthens causal inferences drawn from ITS designs.
Protocol 3: Pre-Specification Framework for Intervention Effects
Define Effect Mechanism:
Specify Timing Relationships:
Quantify Expected Effect Size:
Document Functional Form:
Yₜ = β₀ + β₁Tₜ + β₂Xₜ⁽¹⁾ + β₃Xₜ⁽²⁾ + β₄(Tₜ - t₁)Xₜ¹ + β₅(Tₜ - t₂)Xₜ⁽²⁾ + εₜ
Intervention Effect Pre-Specification Workflow
The diagram above outlines a systematic workflow for pre-specifying intervention effects in ITS studies, ensuring transparent and methodologically sound hypothesis development.
For interventions with phased implementation or ramp-up periods, the standard two-phase ITS model may be insufficient. In such cases, three-phase ITS designs more accurately capture the intervention's temporal structure [49]. These designs are particularly relevant for:
In three-phase ITS designs, the model includes two change points (t₁ and t₂) representing transitions between pre-implementation, ramp-up/partial implementation, and full implementation phases [49]. The coefficients β₂ and β₃ represent immediate level changes following the first and second transitions, while β₄ and β₅ represent slope changes during the second and third phases.
Table 3: Essential Methodological Tools for ITS Implementation Research
| Tool Category | Specific Tool/Technique | Function/Purpose |
|---|---|---|
| Statistical Software | R packages (tseries, forecast, segmented) | Conduct time series analysis, including autocorrelation testing and segmented regression |
| Power Analysis Tools | Simulation-based power calculation scripts [49] [52] | Determine required number of time points for adequate statistical power |
| Data Management Platforms | Electronic Data Capture (EDC) systems, Interactive erroneous data platforms [53] | Identify potential outliers, perform quality checks, and manage temporal data |
| Visualization Tools | Interactive Tables, Listings, and Figures (TLFs) [53] | Monitor data patterns, identify trends, and communicate findings effectively |
| Model Specification Aids | Segmented regression templates, Three-phase ITS code [49] | Implement correct model parameterization and avoid interpretation errors |
| Bias Assessment Tools | Autocorrelation tests (Durbin-Watson, Ljung-Box), Seasonality diagnostics | Identify and address threats to validity in time series analysis |
Comprehensive Protocol 4: Implementing a Rigorous ITS Study
Pre-Study Planning Phase:
Data Collection and Aggregation:
Analytical Phase:
Reporting and Interpretation:
Methodologically rigorous ITS designs require careful attention to sample size determination, data aggregation strategies, and pre-specification of intervention effects. The protocols and tools provided in this application note address critical gaps identified in current practice, particularly the underutilization of power analysis, inadequate justification of aggregation levels, and insufficient model specification. By adopting these structured approaches, researchers in drug development and implementation science can enhance the validity, transparency, and interpretability of their ITS studies, ultimately contributing to more robust evidence for healthcare decision-making.
Future methodological development should focus on standardized power calculation tools for complex ITS designs, improved handling of hierarchical structures in time series data, and best practice guidelines for reporting ITS analyses in implementation research contexts.
Interrupted Time Series (ITS) design is a powerful quasi-experimental method used to evaluate the effects of interventions introduced at a specific point in time. In drug utilization research, these interventions can range from new clinical guidelines and drug pricing policies to prescription restrictions and public health campaigns [51]. The core strength of ITS analysis lies in its ability to model longitudinal data, using pre-intervention trends to forecast a counterfactual—what would have happened in the absence of the intervention—and then comparing this forecast to the observed post-intervention data [5]. This makes ITS particularly valuable when randomized controlled trials (RCTs) are impractical, unethical, or too costly, such as when evaluating population-level health policies [51] [5].
ITS design offers a significant advantage over simple before-and-after studies by using multiple data points before and after the intervention. This allows researchers to account for underlying secular trends and natural fluctuations in the data, thereby providing a more robust estimate of the intervention effect [5]. A well-executed ITS can estimate two primary effects: an immediate level change following the intervention and a sustained slope change in the trend over time [5].
Before commencing data analysis, several critical design elements must be addressed to ensure the validity of the ITS study. The foundational assumption of any ITS is that, in the absence of the intervention, the pre-intervention trend would have continued unchanged into the post-intervention period [5]. Violations of this assumption lead to biased results.
A properly structured dataset and a systematic workflow are prerequisites for a successful ITS analysis. The diagram below outlines the key stages from data preparation to interpretation.
Segmented regression is the most frequently used method for analyzing ITS data [51] [5]. It models the outcome as a function of time and the intervention, allowing for separate intercepts and slopes before and after the intervention. The primary statistical model can be formulated as:
Yₜ = β₀ + β₁ × Tₜ + β₂ × Xₜ + β₃ × (Tₜ - T_intervention) × Xₜ + εₜ
Table 1: Variables and parameters in the core ITS segmented regression model.
| Variable/Parameter | Symbol | Description |
|---|---|---|
| Outcome | Yₜ | The drug utilization measure (e.g., consumption rate) at time t. |
| Baseline Level | β₀ | The starting level of the outcome at the beginning of the time series. |
| Time | Tₜ | The time elapsed since the start of the observation period (e.g., 1, 2, 3...). |
| Pre-Intervention Trend | β₁ | The slope (trend) of the outcome during the pre-intervention period. |
| Intervention Indicator | Xₜ | A dummy variable: 0 for pre-intervention points, 1 for post-intervention. |
| Immediate Level Change | β₂ | The estimated immediate change in the outcome level following the intervention. |
| Time Post-Intervention | (Tₜ - T_intervention) | Time since the intervention began (0 at the interruption point, then 1, 2...). |
| Slope Change | β₃ | The estimated difference between the pre- and post-intervention slopes. |
| Error Term | εₜ | The random, unexplained variability at time t; must be checked for autocorrelation. |
Failure to account for complex statistical properties of time series data is a common source of bias. The following table summarizes critical issues and recommended actions, based on common deficiencies found in recent literature [51].
Table 2: Key methodological issues and analytical responses in ITS analysis.
| Methodological Issue | Analytical Consideration | Recommended Action |
|---|---|---|
| Autocorrelation | Successive observations are correlated. | Use Durbin-Watson or Ljung-Box tests to detect it. Employ Prais-Winsten regression, ARIMA, or Generalized Least Squares (GLS) to correct for it. |
| Seasonality | Periodic, predictable fluctuations (e.g., monthly). | Include seasonal terms (e.g., sine/cosine functions) or dummy variables in the model. |
| Non-Stationarity | The mean or variance of the series changes over time. | Use Dickey-Fuller test. If non-stationary, de-trend or difference the data. |
| Model Specification | Incorrectly interpreting parameters. | Ensure β₂ is correctly interpreted as the immediate level change at the intervention point, not a change at the end of the series [51]. |
| Hierarchical Data | Data clustered within hospitals or regions. | Use mixed-effects (multilevel) models to account for within-cluster and between-cluster heterogeneity. |
The coefficients from the segmented regression model provide the estimates for the core intervention effects. The immediate effect of the intervention is given directly by β₂. The change in the trend is given by β₃. The post-intervention slope is calculated as (β₁ + β₃). It is crucial to report both the point estimates and their confidence intervals to convey the precision of these estimates.
A common pitfall in interpretation involves the parameterization of time. In the model presented in Table 1, β₂ represents the change in level that occurs immediately after the intervention, comparing the observed value to the value predicted by the pre-intervention trend. Some studies have incorrectly specified the model, leading to a misinterpretation of this parameter [51].
Table 3: Essential "research reagents" and resources for implementing an ITS study in drug utilization research.
| Item / Concept | Function / Purpose in ITS Analysis |
|---|---|
| Segmented Regression | The primary statistical model used to estimate level and slope changes associated with an intervention. |
| Autocorrelation Function (ACF) Plot | A diagnostic plot used to visualize and identify autocorrelation in the time series residuals. |
| Durbin-Watson Statistic | A statistical test used to detect the presence of autocorrelation in the residuals from a regression analysis. |
| ARIMA Model | (Autoregressive Integrated Moving Average) An alternative to segmented regression for complex time series, often better at modeling autocorrelation and seasonality. |
| Control Series | A parallel time series not exposed to the intervention, used to strengthen causal inference by accounting for external trends. |
| Sensitivity Analysis | A set of additional analyses (e.g., using different model specifications or excluding specific time points) to test the robustness of the primary findings. |
In Interrupted Time Series (ITS) design implementation research, a fundamental assumption is that observations are independent. However, data collected sequentially over time often violate this assumption due to serial correlation between adjacent measurements, a phenomenon known as autocorrelation [5] [54]. Positive autocorrelation, where consecutive values are more similar than distant ones, is most common and leads to * underestimated standard errors, *inflated Type I error rates, and potentially spurious conclusions about intervention effects [54] [55]. Addressing autocorrelation is therefore not merely a statistical formality but a critical step for ensuring the validity of causal inferences in ITS studies, particularly in drug development and public health intervention research where these designs are frequently employed when randomized trials are infeasible [5] [6]. This protocol details robust methods for detecting and correcting for autocorrelation to safeguard the integrity of research findings.
The Durbin-Watson (DW) test is a widely used statistical test for detecting lag-1 autocorrelation [54].
Visual analysis of the residuals from a preliminary ordinary least squares (OLS) regression model is a fundamental and highly recommended step.
When autocorrelation is detected, several statistical methods can be employed to obtain unbiased parameter estimates and valid standard errors. The choice of method depends on the length of the time series and the magnitude of the autocorrelation.
FGLS methods, such as Prais-Winsten (PW) and Cochrane-Orcutt (CO), are common approaches that use an iterative process to estimate the autocorrelation parameter (ρ) and transform the data to remove the correlation structure [54] [6].
Restricted Maximum Likelihood (REML) is a preferred method for longer time series as it reduces bias in the estimation of variance components, including the autocorrelation parameter [54] [6].
The Newey-West (NW) estimator corrects the OLS model's standard errors to account for both autocorrelation and heteroskedasticity, without changing the point estimates of the regression coefficients [54] [55] [6].
ARIMA models explicitly model the autocorrelation structure within the data-generating process itself [5] [6]. An ARIMA(p,d,q) model can be combined with independent variables (ARIMAX) to assess intervention effects.
The table below provides a comparative summary of these correction methods.
Table 1: Comparison of Statistical Methods for Correcting Autocorrelation in ITS Studies
| Method | Key Principle | Advantages | Disadvantages/Limitations |
|---|---|---|---|
| Prais-Winsten (PW) | FGLS estimation with data transformation. | More efficient for small samples than CO as it uses all data. | Performance can be poor in very short series. |
| Restricted Maximum Likelihood (REML) | Maximizes a likelihood function separated from fixed effects. | Less biased estimate of autocorrelation; good performance in longer series. | Computationally more intensive. |
| Newey-West (NW) | Corrects OLS standard errors for autocorrelation. | Simple; preserves original OLS coefficient estimates. | Can be inefficient with high autocorrelation [55]. |
| ARIMA | Explicitly models the autocorrelation structure. | Very flexible for complex time series patterns. | Complex model identification and fitting. |
The following diagram illustrates the logical workflow for addressing autocorrelation in an ITS analysis.
Figure 1. A logical workflow for detecting and correcting for autocorrelation in Interrupted Time Series (ITS) analysis.
Table 2: Key Research Reagent Solutions for ITS Analysis
| Item Name | Function/Application in ITS Research |
|---|---|
| Statistical Software (R/Stata) | Platform for implementing segmented regression, conducting autocorrelation tests (e.g., Durbin-Watson), and applying correction methods (e.g., Prais-Winsten, REML, Newey-West). |
| Segmented Regression Code | Pre-written scripts (e.g., in R or Stata) to specify the ITS model, which includes terms for baseline level, pre-intervention slope, change in level, and change in slope [5] [6]. |
| Durbin-Watson Test Function | A standardized statistical function within software packages used to formally test the null hypothesis of no first-order autocorrelation in the model residuals [54]. |
| Longitudinal Dataset | The primary data reagent. A structured dataset collected at regular intervals over time, with a sufficient number of pre- and post-interruption points (recommended minimum of 3 each, but more are preferred) [5]. |
| Data Extraction Tool (e.g., WebPlotDigitizer) | Software used to extract raw numerical data from published graphs in literature reviews when original datasets are unavailable, enabling re-analysis or meta-analysis [6]. |
In Interrupted Time Series (ITS) design, the accurate estimation of an intervention's effect depends entirely on the proper modeling of the underlying pre-existing trends and seasonal patterns [56]. A stationary time series is one whose statistical properties—such as mean, variance, and autocorrelation—do not change over time [57]. Conversely, non-stationarity manifests through trends (long-term increase or decrease), seasonality (periodic fluctuations), changing variance, or structural breaks [58] [59]. In public health and drug development research, failure to address these components can lead to severely biased estimates of intervention effectiveness. For instance, a gradual, pre-existing improvement in a health outcome could be mistakenly attributed to a new policy or treatment [56]. This article provides application notes and protocols for handling non-stationarity and seasonality within ITS studies, ensuring robust and interpretable results.
Non-stationarity presents a fundamental challenge for ITS analysis because it violates the assumption that the pre-intervention trend would have remained stable in the absence of the intervention [56].
T_t) [56].Differencing and decomposition are two primary techniques used to transform a non-stationary series into a stationary one or to isolate and remove nuisance components [57] [59].
Table 1: Summary of Non-Stationarity Components and Their Treatments
| Component | Description | Primary Handling Techniques |
|---|---|---|
| Trend | Long-term upward or downward movement | First-differencing, detrending with regression [59] |
| Seasonality | Regular, repeating patterns | Seasonal differencing, seasonal decomposition (STL/MSTL) [57] [60] |
| Changing Variance | Non-constant spread of data over time | Logarithmic, Box-Cox, or Yeo-Johnson transformations [58] [59] |
| Autocorrelation | Correlation between consecutive observations | Differencing, Autoregressive (AR) models [59] |
Objective: To identify obvious trends, seasonal patterns, and structural breaks through graphical analysis.
T*) to visually assess level and slope changes [56].Objective: To formally test the null hypothesis of non-stationarity (or stationarity) using statistical tests.
Table 2: Statistical Tests for Stationarity
| Test | Null Hypothesis (H₀) | Alternative Hypothesis (H₁) | Interpretation & Use in ITS |
|---|---|---|---|
| Augmented Dickey-Fuller (ADF) | The series has a unit root (non-stationary) [59]. | The series is stationary [59]. | A small p-value (e.g., <0.05) leads to rejecting H₀, suggesting stationarity. Used to confirm if differencing is needed. |
| KPSS Test | The series is trend-stationary [57] [59]. | The series has a unit root (non-stationary) [57] [59]. | A large test statistic (p-value <0.05) leads to rejecting H₀, suggesting differencing is required. Often used in conjunction with ADF [57]. |
Protocol for Determining Differencing:
ndiffs() function in R automates this process [57].Protocol for Determining Seasonal Differencing:
nsdiffs() function in R, which uses a measure of seasonal strength. A result of 1 suggests one round of seasonal differencing is required [57].Objective: To remove trend and seasonality by computing changes between observations.
Protocol 1: First Differencing
y_t, the first difference is y'_t = y_t - y_{t-1} [57].y' represents the period-to-period change. In a random walk model, if the differenced series is white noise, the original model is y_t = y_{t-1} + ε_t [57].Protocol 2: Seasonal Differencing
m, the seasonal difference is y'_t = y_t - y_{t-m} [57]. For monthly data with yearly seasonality, m=12.y_t = y_{t-m} + ε_t, which underpins seasonal naïve forecasts [57].Protocol 3: Combined Differencing
y''_t = (y_t - y_{t-m}) - (y_{t-1} - y_{t-m-1}) [57].Objective: To separate a time series into trend, seasonal, and residual components.
Protocol 4: STL Decomposition (Single Seasonality)
from statsmodels.tsa.seasonal import STL.
b. Specify the seasonal parameter, which must be an odd integer (e.g., 13 for yearly seasonality in monthly data) [59].
c. Fit the model: res = stl.fit().
d. Access components: res.trend, res.seasonal, res.resid.ts_seasonal_adj = ts - res.seasonal and can be used for further modeling [59].Protocol 5: MSTL Decomposition (Multiple Seasonality)
from statsmodels.tsa.seasonal import MSTL.
b. Specify the periods parameter with a tuple of all seasonal cycles (e.g., periods=(24, 24*7) for daily and weekly patterns in hourly data).
c. Fit the model: results = mstl.fit(). The result object contains a DataFrame with multiple seasonal components.Observation = Trend + Seasonal_24 + Seasonal_168 + Residual [60].
Figure 1: Workflow for handling non-stationarity and seasonality in ITS analysis.
Recent research highlights a critical pitfall: over-aggressive stationarization can remove valuable predictive information. Sudden, event-based changes (e.g., a spike in drug sales due to an outbreak) are inherently non-stationary but are crucial for accurate forecasting [61]. Over-stationarization occurs when smoothing processes strip out these important non-stationary properties, limiting the model's ability to react to real-world shocks [61].
Advanced frameworks like NSPLformer have been proposed to balance this contradiction. They employ:
Cutting-edge approaches like the DTAF framework address non-stationarity in both temporal and frequency domains simultaneously [62].
Table 3: Essential Analytical Tools for Non-Stationary Time Series Analysis
| Tool / Reagent | Function / Purpose | Example Use Case / Note |
|---|---|---|
| Augmented Dickey-Fuller (ADF) Test | Statistical test for unit root (non-stationarity) [59]. | Determining if first differencing is required. |
| KPSS Test | Statistical test for trend-stationarity [57] [59]. | Used with ADF for a robust stationarity assessment. |
| STL Decomposition | Decomposes series into Trend, Seasonal, and Residual components [59]. | Handling series with a single, dominant seasonal pattern. |
| MSTL Decomposition | Decomposes series with multiple seasonal patterns [60]. | Ideal for high-frequency data (e.g., hourly energy data with daily, weekly, yearly cycles). |
| Unobserved Components Model (UCM) | Model-based decomposition and forecasting [60]. | Does not require pre-differencing; models components as latent states. |
| Box-Cox / Yeo-Johnson Transform | Stabilizes variance in a time series [58] [59]. | Applied before differencing or decomposition to address heteroscedasticity. |
| Prophet | Forecasting procedure that handles multiple seasonality and trends automatically [60]. | Useful for rapid prototyping and for series with strong, predictable seasonal patterns. |
Figure 2: Logical relationship of time series decomposition.
Mastering differencing and decomposition is not merely a statistical exercise but a foundational requirement for producing valid, reliable, and actionable results from Interrupted Time Series studies. The protocols outlined provide a clear pathway for researchers to diagnose and treat non-stationarity and seasonality, from basic visual inspections to advanced multi-seasonal decomposition. As the field evolves, techniques that avoid over-stationarization and jointly model temporal and spectral dynamics offer promising avenues for more robust analysis, ultimately ensuring that evaluations of interventions in drug development and public health are built on a solid analytical foundation.
Interrupted Time Series (ITS) design is a quasi-experimental approach widely employed in public health and drug development research to evaluate the impact of interventions, such as policy changes or new therapy introductions, when randomized controlled trials are not feasible [9] [13]. A critical challenge in ITS analysis is model misspecification, which occurs when the statistical model does not correctly represent the underlying data structure, potentially leading to biased conclusions about an intervention's effect. Two of the most often recommended analytical approaches for ITS analysis are the Autoregressive Integrated Moving Average (ARIMA) model and Generalized Additive Models (GAM) [63] [9]. The selection between these methods is paramount, as empirical evidence demonstrates that the choice of statistical method can substantially alter conclusions about intervention impacts, with statistical significance disagreements ranging from 4% to 25% across different methods applied to the same datasets [6]. This application note provides a structured framework for selecting between ARIMA and GAM to ensure robust inferences in ITS studies, particularly within drug development and public health implementation research.
ARIMA and GAM approach time series analysis from fundamentally different perspectives, leading to distinct strengths and vulnerabilities regarding model misspecification.
ARIMA models are characterized by their focus on the autocorrelation structure within the data. They express the current value of a time series as a linear function of its past values (autoregressive component) and past forecast errors (moving average component), often requiring differencing to achieve stationarity [13]. This approach excels at capturing temporal dependence patterns where present observations depend on past values [64]. However, ARIMA requires that time series be stationary (constant mean and variance over time) and assumes a specific parametric form for the autocorrelation structure [64] [13]. Violations of these assumptions constitute common misspecification scenarios.
GAMs, in contrast, take a more flexible, non-parametric approach to trend modeling. They model time series as a sum of smooth functions of time, allowing data to determine the functional form of trends rather than imposing a predetermined structure [64]. This flexibility makes GAMs particularly adept at capturing complex, non-linear trends without requiring explicit specification of the functional form [64] [9]. The GAM framework can be expressed as: g(μ) = β₀ + f₁(x₁) + f₂(x₂) + ... + fₙ(xₙ), where g() is the link function, μ is the expected value of the outcome, and fₙ() are smooth functions of predictors [65].
Table 1: Performance Comparison of ARIMA and GAM Under Different Misspecification Conditions
| Misspecification Type | ARIMA Performance | GAM Performance | Key Evidence |
|---|---|---|---|
| Incorrect functional form | Vulnerable; assumes specific linear correlation structure | Highly robust; flexible smooth functions adapt to data shape | GAMs utilize non-parametric fitting with relaxed assumptions about relationship shapes [65] [64] |
| Autocorrelation structure misspecification | Vulnerable; relies on correct AR/MA order identification | Moderately robust; can accommodate residual autocorrelation | Incorrect ARIMA order selection leads to biased estimates; GAMs can use GAMM extension with correlation structures [66] [13] |
| Seasonality misspecification | Consistent performance with proper seasonal parameters | Variable performance depending on basis dimension specification | ARIMA explicitly models seasonality with seasonal parameters; GAM seasonal accuracy depends on Fourier series specification [64] [9] |
| Intervention effect shape misspecification | Vulnerable; requires pre-specification of impact shape | More robust; can adapt to varying intervention effect patterns | GAMs more robust when policy variables are misspecified [63] [9] |
| Non-stationarity handling | Requires differencing to achieve stationarity | Directly models non-linear trends without differencing | ARIMA requires stationary series; GAMs can model non-linear trends directly [64] [13] |
Simulation studies directly comparing ARIMA and GAM for ITS analysis have revealed important performance differences. ARIMA exhibits more consistent results across different policy effect sizes and in the presence of seasonality, while GAM demonstrates superior robustness when the model is misspecified, particularly regarding the shape of intervention effects [63] [9]. This suggests that when researchers lack certainty about the precise functional form of the intervention effect, GAM may provide more reliable inference.
Protocol Objective: To specify a robust ARIMA modeling procedure for ITS analysis that minimizes misspecification risk.
Step-by-Step Procedure:
Stationarity Assessment: Test for stationarity using Augmented Dickey-Fuller (ADF) or Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests. A non-stationary series (p > 0.05 for ADF) requires differencing [13].
Differencing Application: Apply first-order differencing (d=1) to remove trend: Ýₜ = Yₜ - Yₜ₋₁. For seasonal data, apply seasonal differencing: Ýₜ = Yₜ - Yₜ₋ₛ, where s is the seasonal period (e.g., 12 for monthly data) [13].
Model Identification: Examine Autocorrelation Function (ACF) and Partial ACF (PACF) plots of the differenced series to identify potential AR (p) and MA (q) orders [13].
Model Fitting: Fit multiple candidate ARIMA (p,d,q)(P,D,Q)ₛ models and compare using Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), selecting the model with the lowest value [65] [13].
Intervention Effect Incorporation: Include intervention terms in the final ARIMA model to estimate level and slope changes:
Model Validation: Check residuals for white noise using Ljung-Box test (p > 0.05 indicates adequate fit) and ensure no significant patterns remain in ACF/PACF of residuals [13].
Protocol Objective: To implement a flexible GAM for ITS analysis that captures complex temporal patterns while properly accounting for intervention effects.
Step-by-Step Procedure:
Basis Function Specification: Select appropriate basis functions for smooth terms. For seasonal patterns, Fourier series are recommended with basis dimension (k) determined by generalized cross-validation (GCV) [64].
Model Structure Definition: Specify the GAM structure incorporating multiple trend components [64]:
Model Fitting: Fit the GAM using a backfitting algorithm, which iteratively estimates each smooth function while adjusting for others, minimizing prediction errors through successive iterations [64].
Intervention Effect Modeling: Model intervention effects using two approaches [64] [9]:
Basis Dimension Checking: Verify that basis dimensions (k) are sufficiently large to capture true patterns without overfitting using k-index and p-values for smooth terms [64].
Model Validation: Use simulated historical forecasts with time-based cross-validation, holding out the most recent 20% of data, to assess forecasting accuracy and model calibration [64].
The following diagram illustrates the systematic decision process for choosing between ARIMA and GAM based on data characteristics and research context:
Table 2: Essential Computational Tools for ITS Analysis Implementation
| Research Reagent | Function | Implementation Examples |
|---|---|---|
| Stationarity Tests | Determines if time series has constant mean and variance, guiding need for differencing | Augmented Dickey-Fuller (ADF), Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests [13] |
| Autocorrelation Diagnostics | Identifies temporal dependence patterns to specify AR/MA orders in ARIMA | Autocorrelation Function (ACF), Partial ACF (PACF) plots [13] |
| Information Criteria | Compares model fit while penalizing complexity to prevent overfitting | Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) [65] |
| Backfitting Algorithm | Iteratively estimates smooth functions in GAM by minimizing prediction errors | Prophet package (Python/R), mgcv package (R) [64] |
| Time Series Cross-Validation | Assesses model forecasting performance while respecting temporal structure | Simulated historical forecasts with expanding/rolling windows [64] |
Robust ITS analysis in drug development and public health implementation research requires careful consideration of model selection between ARIMA and GAM approaches. ARIMA models provide more consistent results when underlying processes are well-understood and seasonal patterns are consistent, while GAM offers superior flexibility and robustness to misspecification of intervention effects and functional forms. Researchers should systematically evaluate their data characteristics, intervention properties, and analytical priorities using the provided decision framework to select the most appropriate methodology. In cases of uncertainty, applying both models and comparing results, or considering hybrid approaches, may provide the most robust inference for critical policy and clinical decisions.
Interrupted Time Series (ITS) design is a powerful quasi-experimental method used extensively in healthcare research to evaluate the impact of interventions, policies, or programs when randomized controlled trials are not feasible [67]. A fundamental assumption in classic segmented regression (CSR) analysis is that interventions have an immediate and permanent effect starting precisely at a known point in time. However, this assumption is frequently violated in real-world applications, where interventions often exhibit lagged effects and transition periods during which the full effect unfolds gradually [68] [69].
Understanding and appropriately modeling these transition phases is crucial for generating valid estimates of intervention effectiveness. This application note provides researchers with advanced methodologies to account for lagged intervention effects and transition periods in ITS analyses, with specific applications in pharmaceutical and clinical development settings.
The classic segmented regression model for ITS is specified as:
Yₜ = β₀ + β₁ × time + β₂ × intervention + β₃ × post-time + εₜ [68]
Where:
This model restricts the interruption effect to a single predetermined time point, assuming an instantaneous intervention effect [68]. In practice, however, interventions often require an adjustment period or are implemented gradually, creating a transition phase where effects are distributed over time rather than immediate.
Table 1: Comparison of ITS Approaches for Intervention Effects
| Model Type | Intervention Effect Assumption | Key Strengths | Key Limitations |
|---|---|---|---|
| Classic Segmented Regression | Instantaneous and permanent at fixed point | Simple implementation; Easy interpretation | Cannot model gradual effects; Misleading if transition period exists |
| Optimized Segmented Regression (OSR) | Distributed during transition period | Models realistic implementation patterns; Superior model fit | Requires specification of transition length and distribution pattern |
| ARIMA with Distributed Lag Terms | Unclear timing with distributed effects | Accounts for autocorrelation; Flexible effect distribution | Complex implementation; Requires larger sample size |
To address the limitations of CSR, Optimized Segmented Regression (OSR) models incorporate a transition period using cumulative distribution functions (CDFs). The OSR model is specified as:
Yₜ = β₀ + β₁ × time + β₂ × F(t) × intervention + β₃ × F(t) × post-time + εₜ [68]
Where F(t) is a piecewise function that models the transition period using CDFs:
F(t) = { CDF(t - T₀), T₀ < t ≤ T₂; 1, T₂ < t } [68]
The transition period extends from T₀ (nominal intervention start) to T₂ (full implementation), with L = T₂ - T₀ representing the transition length. The CDF can take various forms depending on the expected distribution of the intervention effect during the transition period:
For situations with unclear intervention timing or complex autocorrelation structures, the ARIMAITS-DL model combines ARIMA methodology with distributed lag functional terms:
(1-∑δᵢBⁱ)Yₜ = ∑wₖFₜ₋ₖ + Xₜβ + (1-∑θᵢBⁱ)εₜ [69]
Where:
This approach allows the intervention effect to be distributed over a specified interval [T₀-l₁, T₀+l₂], accommodating situations where the actual intervention timing is ambiguous or the effect manifests gradually.
Purpose: To evaluate intervention effects when a transition period between pre- and post-intervention phases is expected.
Materials Required:
Procedure:
Interpretation Guidelines:
Purpose: To model intervention effects when timing is unclear or effects are distributed over time with autocorrelation present.
Materials Required:
Procedure:
Interpretation Guidelines:
Purpose: To empirically determine optimal transition period length when theoretical guidance is unavailable.
Procedure:
Table 2: Data Requirements for Different ITS Approaches
| Method | Minimum Pre-Intervention Observations | Minimum Post-Intervention Observations | Total Series Length Recommendation | Handling of Autocorrelation |
|---|---|---|---|---|
| Classic Segmented Regression | 3 | 3 | 8+ time points | Requires additional tests and corrections |
| Optimized Segmented Regression | 3 | 3 (excluding transition) | 12+ time points | Requires additional tests and corrections |
| ARIMA with Distributed Lags | 20 | 20 | 40+ time points | Explicitly models autocorrelation structure |
Table 3: Essential Analytical Tools for Advanced ITS Analysis
| Tool Category | Specific Software/Packages | Key Functionality | Implementation Considerations |
|---|---|---|---|
| Statistical Software | R: stats, forecast, dlm packages |
ARIMA modeling, Distributed lag structures, Nonlinear optimization | Steeper learning curve but maximum flexibility |
| SAS: PROC ARIMA, PROC AUTOREG | Automated model selection, Comprehensive diagnostics | Enterprise-level stability and documentation | |
Stata: arima, newey commands |
Panel data extensions, Robust standard errors | Balanced approach for applied researchers | |
| Data Extraction Tools | WebPlotDigitizer | Digital extraction from published graphs | Essential for replication and meta-analysis |
| Visualization Tools | ggplot2 (R), matplotlib (Python) | Creation of publication-quality time series plots | Critical for communicating transition effects |
| Model Diagnostics | ACF/PACF plots, Ljung-Box test | Detection of residual autocorrelation | Required for validating model assumptions |
The methodologies described above have particular relevance in pharmaceutical and clinical development settings:
Clinical Guideline Implementation: When new treatment guidelines are introduced, their adoption typically follows a gradual pattern as clinicians require time to adjust practice behaviors. OSR models can capture this transition period more accurately than traditional methods [67].
Drug Policy Evaluations: Changes in drug formularies or reimbursement policies often have phased implementation, where effects distribute over time rather than occurring instantaneously [68].
Pharmacovigilance Studies: Safety-related interventions (e.g., black box warnings) may have delayed effects as awareness disseminates through the healthcare system, making distributed lag models particularly appropriate [69].
Quality Improvement Initiatives: Hospital quality programs often include training periods followed by gradual implementation, creating natural transition periods that should be accounted for in evaluation [68].
When applying these methods in regulatory contexts, researchers should pre-specify the analytical approach, justify the selected transition length based on theoretical or empirical grounds, and conduct sensitivity analyses to demonstrate the robustness of findings to alternative model specifications.
Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the effects of interventions or exposures that are introduced at a specific point in time. However, the validity of ITS research is often threatened by two significant methodological challenges: the presence of hierarchical data structures (e.g., repeated measurements within patients, patients clustered within clinics) and time-varying confounding. Time-varying confounding occurs when covariates that influence both subsequent treatment and the outcome evolve over time, creating complex feedback loops that standard regression methods fail to address adequately [70]. This application note provides researchers, scientists, and drug development professionals with structured protocols and analytical frameworks to manage these challenges within ITS studies, ensuring more robust and causally interpretable findings.
In observational studies, treatments or exposures are often adjusted over time in response to a patient's changing condition. This creates a scenario where time-dependent covariates are affected by previous treatment and subsequently influence future treatment decisions and outcomes. A classic example is in HIV management, where antiretroviral therapy (ART) influences CD4 counts, and these evolving CD4 counts subsequently influence future ART decisions and clinical outcomes [70]. Standard regression-based adjustment analyses condition on these time-varying confounders affected by prior treatment, which can introduce bias and lead to misleading conclusions [70].
Table 1: Key Identifiability Assumptions for Causal Inference with Time-Varying Treatments
| Assumption | Description | Application in ITS |
|---|---|---|
| Consistency | The observed outcome for a given treatment history corresponds to the potential outcome under that history. | Ensure clear definition of intervention and outcome measurement in ITS. |
| Positivity | For any combination of covariates and treatment history, there is a non-zero probability of receiving each treatment level. | Verify that at each time point, all treatment options are possible for some individuals. |
| Exchangeability | No unmeasured confounding; the treatment assignment is independent of potential outcomes given covariates and treatment history. | Measure and adjust for all common causes of treatment and outcome at each time point. |
| Non-Interference | One unit's treatment does not affect another unit's outcome. | Consider this assumption in clustered or hierarchical ITS designs. |
Hierarchical data structures in ITS designs require specialized analytical approaches that account for correlations within clusters. Multilevel models (also known as mixed-effects or hierarchical models) provide a flexible framework for analyzing such data, allowing for the incorporation of random effects to capture cluster-specific variations and the proper estimation of standard errors.
Objective: To estimate the causal effect of a time-varying treatment or exposure in an ITS design while adequately adjusting for time-varying confounders.
Materials:
Procedure:
Study Design Phase:
Data Preparation Phase:
Model Specification Phase:
Sensitivity Analysis Phase:
Table 2: Comparison of Methods for Addressing Time-Varying Confounding
| Method | Key Principle | Strengths | Limitations | Software Implementation |
|---|---|---|---|---|
| Inverse Probability of Treatment Weighting (IPTW) | Creates a pseudo-population where treatment is independent of time-varying confounders through weighting [70] | Intuitive; directly addresses time-varying confounding; resembles randomized trial | Susceptible to bias from extreme weights; requires correct treatment model specification | R: ipw package; Stata: teffects |
| G-Computation | Models the outcome process and simulates potential outcomes under different treatment regimes [70] | Efficient when outcome model is correctly specified | Prone to bias from outcome model misspecification; parametric g-formula requires correct model specification | R: gfoRmula package |
| Targeted Maximum Likelihood Estimation (TMLE) | Doubly robust method that combines initial outcome prediction with a targeting step to optimize treatment effect estimation [70] | Robust to misspecification of either outcome or treatment model; efficient; allows machine learning | Computational intensity; more complex implementation | R: tmle package |
| G-Estimation of Structural Nested Models | Directly models the causal effect of treatment while accounting for time-varying confounding [70] | Direct modeling of treatment effect; handles effect modification | Complex implementation; less intuitive for applied researchers | R: SNM package |
Objective: To appropriately analyze ITS data with hierarchical structure while accounting for within-cluster correlations.
Procedure:
Exploratory Analysis:
Model Specification:
Model Estimation and Checking:
Interpretation:
Time-Varying Confounding Feedback Mechanism
Analytical Method Selection Workflow
Table 3: Essential Methodological Tools for Hierarchical Data and Time-Varying Confounding Analysis
| Tool Category | Specific Solution | Function | Implementation Considerations |
|---|---|---|---|
| Statistical Software | R with specialized packages | Provides open-source environment with comprehensive causal inference packages | Use ipw for IPTW, tmle for TMLE, lme4 for hierarchical models [71] |
| Statistical Software | Stata | Offers robust implementation of g-methods through commands like xtset and teffects |
Particularly strong for panel data and econometric approaches [71] |
| Statistical Software | SAS | Handles complex longitudinal data structures and advanced statistical procedures | PROC GENMOD for GEE, PROC MIXED for multilevel models [71] |
| Causal Inference Methods | Inverse Probability of Treatment Weighting (IPTW) | Creates a pseudo-population where treatment is independent of confounders [70] | Requires correct specification of treatment model; check for extreme weights |
| Causal Inference Methods | Targeted Maximum Likelihood Estimation (TMLE) | Doubly robust method that protects against model misspecification [70] | Can incorporate machine learning for model fitting; more computationally intensive |
| Causal Inference Methods | G-Computation | Estimates causal effects by simulating potential outcomes under different treatment regimes [70] | Requires correct specification of outcome model; parametric g-formula may be sensitive to model misspecification |
| Missing Data Handling | Multiple Imputation | Accounts for uncertainty in missing data by creating and analyzing multiple complete datasets | Preferable to complete case analysis when data are missing at random [70] |
| Sensitivity Analysis | Quantitative bias analysis | Assesses robustness of results to potential unmeasured confounding or selection bias | Particularly important when claiming causal effects from observational data |
Objective: To evaluate the performance of different methods for addressing time-varying confounding in hierarchical ITS data under controlled conditions.
Experimental Setup:
Data Generation:
Method Implementation:
Performance Metrics:
Objective: To demonstrate the application of methods for time-varying confounding in real-world hierarchical ITS studies.
Case Study: Evaluation of a new drug therapy in electronic health record data with monthly measurements over two years, with patients nested within clinics.
Procedure:
Data Preparation:
Primary Analysis:
Secondary Analysis:
Missing Data Handling:
Effectively managing hierarchical data and time-varying confounding is essential for deriving valid causal inferences from Interrupted Time Series studies in drug development and clinical research. The protocols and frameworks presented in this application note provide researchers with practical guidance for addressing these methodological challenges. Current evidence suggests that while singly robust methods like IPTW remain prevalent in epidemiological studies, doubly robust approaches such as TMLE offer superior protection against model misspecification and should be more widely adopted [70]. Furthermore, the inadequate handling of missing data in many applied studies highlights the need for greater methodological rigor and transparency in reporting. By implementing these advanced causal inference methods while maintaining careful attention to hierarchical data structures, researchers can strengthen the evidentiary foundation for therapeutic decisions and health policy interventions.
Interrupted Time Series (ITS) design is a powerful quasi-experimental methodology increasingly employed in public health, epidemiology, and pharmaceutical outcome research to evaluate the effects of interventions, policy changes, or exposures when randomized controlled trials are infeasible or unethical [72]. This design involves collecting data at multiple timepoints before and after a defined interruption, enabling researchers to analyze whether the intervention caused a significant deviation from the pre-existing trend [73]. The validity of causal inferences drawn from ITS studies hinges entirely on proper model specification and rigorous validation. Residual analysis and forecast accuracy assessment together provide a comprehensive framework for verifying model assumptions, detecting specification issues, and quantifying predictive performance, thereby ensuring the reliability of effect estimates in thesis research and professional drug development contexts.
In time series analysis, residuals represent the differences between observed values and the values fitted by the model to the training data: ( r = y - ŷ ) [74]. It is critical to distinguish residuals from forecast errors, which are calculated on genuinely new, out-of-sample test data [75]. Residuals serve as proxies for the unobservable true model errors and contain valuable information about model adequacy. They play an indispensable diagnostic role by revealing patterns that indicate violations of key regression assumptions, including linearity, independence, constant variance, and normality [76] [77]. For ITS designs specifically, where the goal is to estimate causal effects by comparing pre- and post-intervention trends, any systematic pattern in the residuals suggests the model has not fully captured the underlying data structure, potentially biasing the intervention effect estimate.
The validity of any regression model, including ITS models, depends on several assumptions about the residuals. Violations of these assumptions can lead to inefficient estimates, incorrect standard errors, and invalid inferences [76] [78].
Table 1: Core Regression Assumptions and Violation Consequences
| Assumption | Description | Consequence of Violation |
|---|---|---|
| Independence | Errors are uncorrelated with each other | Standard errors underestimated, test statistics inflated |
| Homoscedasticity | Constant error variance across observations | Inefficient parameter estimates, invalid standard errors |
| Normality | Errors follow a normal distribution | Invalid confidence intervals and p-values |
| Linearity | Relationship between predictors and outcome is linear | Biased and inconsistent parameter estimates |
Visual inspection of residual plots provides the most intuitive method for detecting model misspecification and assumption violations. Researchers should generate and systematically examine a series of plots.
The primary graphical tool is the residuals versus fitted values plot. A well-specified model will display residuals randomly scattered around the horizontal line at zero, with constant variance and no discernible patterns [76] [79] [77]. Specific patterns indicate different problems:
The Normal Quantile-Quantile (Q-Q) plot assesses the normality assumption by plotting residual quantiles against theoretical quantiles from a normal distribution. Points should closely follow the 45-degree reference line. Substantial deviations indicate non-normality, which can affect the validity of statistical inferences [76] [79] [77].
The scale-location plot (plotting the square root of the absolute standardized residuals against fitted values) provides another view for detecting heteroscedasticity. A horizontal line with randomly scattered points indicates constant variance, while an increasing or decreasing trend indicates heteroscedasticity [76].
For time series data, the residuals versus observation order plot is crucial for detecting autocorrelation. A random scatter indicates independence, while sequences of positive or negative residuals or cyclical patterns suggest temporal correlation that should be accounted for in the model [77] [78].
Outliers and influential observations can disproportionately affect the fitted model in ITS analyses, potentially biasing intervention effect estimates. Several diagnostic statistics help identify these points:
Table 2: Outlier and Influence Diagnostics
| Diagnostic Measure | Purpose | Interpretation Threshold | ||
|---|---|---|---|---|
| Studentized Residuals | Identify outliers relative to model fit | Value | > 3 suggests potential outlier | |
| Leverage (h) | Identify extreme points in predictor space | h > 2p/n (p = parameters, n = sample size) | ||
| Cook's Distance (D) | Measure overall influence on coefficients | D > 1.0 or visually distinct values | ||
| DFBETAS | Measure influence on specific coefficients | DFBETAS | > 2/√n |
While residual analysis assesses model fit, forecast accuracy evaluation tests the model's predictive performance on new data. This distinction is crucial for ITS studies aiming to make causal inferences, as a model that overfits the training data will perform poorly in prediction and provide unreliable effect estimates [75]. The gold standard approach involves splitting the data into training and test sets, where the training set is used for model fitting and the test set is reserved for evaluating genuine forecasts [75]. For ITS designs, careful consideration must be given to this split to ensure both pre- and post-intervention patterns are appropriately represented.
Different accuracy metrics provide complementary perspectives on forecast performance, each with distinct advantages and limitations.
Table 3: Forecast Accuracy Metrics Comparison
| Metric | Formula | Advantages | Disadvantages |
|---|---|---|---|
| MAE | ( \text{mean}(|e_{t}|) ) | Easy to interpret, robust to outliers | Does not penalize large errors heavily |
| RMSE | ( \sqrt{\text{mean}(e_{t}^2)} ) | Sensitive to large errors, widely used | Not robust to outliers, scale-dependent |
| MAPE | ( \text{mean}(|100 e{t}/y{t}|) ) | Unit-free, intuitive interpretation | Undefined for zero values, biased for low volumes |
| MASE | ( \text{mean}(|e_{t}/\text{Naive MAE}|) ) | Unit-free, robust, comparable across series | Less intuitive than percentage error |
Recent methodological advances have integrated machine learning algorithms with traditional ITS frameworks to enhance model flexibility and accuracy. For example, a two-stage ITS framework can compare traditional ARIMA models with machine learning approaches like Neural Network Autoregression (NNETAR) and Prophet-Extreme Gradient Boosting (Prophet-XGBoost) [73]. In a case study evaluating the health effects of the 2018 wildfire smoke event in San Francisco County, the Prophet-XGBoost hybrid demonstrated superior performance in estimating excess respiratory hospitalizations, highlighting the potential of these methods for complex ITS analyses in public health and pharmaceutical outcomes research [73].
ITS analyses present unique validation challenges that require specialized approaches:
Table 4: Essential Analytical Tools for ITS Model Validation
| Tool Category | Specific Solution | Primary Application in Research |
|---|---|---|
| Statistical Software | R with forecast, tsoutliers packages | Comprehensive time series modeling and residual diagnostics |
| Statistical Software | Python with statsmodels, scikit-learn | Flexible machine learning integration with traditional ITS |
| Specialized ITS Algorithms | Two-stage ITS with Prophet-XGBoost [73] | Enhanced counterfactual prediction in complex scenarios |
| Data Extraction Tools | WebPlotDigitizer [72] | Accurate data extraction from published ITS graphs |
| Reference Datasets | Curated ITS repository (430 datasets) [72] | Methodological benchmarking and teaching examples |
| Influence Diagnostics | Cook's Distance, DFBETAS [76] [77] | Identification of overly influential observations |
| Accuracy Metrics | MASE, RMSE, MAPE [75] | Comparative forecast performance assessment |
Robust model validation through residual analysis and forecast accuracy assessment is fundamental to producing reliable, reproducible ITS research. These techniques provide complementary perspectives on model performance: residual analysis ensures the theoretical assumptions underlying causal inference are met, while forecast accuracy evaluation tests the model's predictive validity in practical scenarios. For researchers and drug development professionals implementing ITS designs, the integrated protocol presented here—combining graphical diagnostics, influence assessment, and multiple accuracy metrics—provides a comprehensive framework for validating models before drawing causal conclusions about interventions, policies, or treatments. As ITS methodologies continue evolving with machine learning integration, these validation principles will remain essential for maintaining scientific rigor in observational studies across public health, epidemiology, and pharmaceutical sciences.
Interrupted Time Series (ITS) analysis is a powerful quasi-experimental design for evaluating the impact of large-scale health interventions, such as policy changes or new drug implementations, when randomized controlled trials are not feasible [9] [13]. The selection of an appropriate statistical model is paramount for generating valid inferences about intervention effects. Two commonly employed analytical approaches for ITS are the Autoregressive Integrated Moving Average (ARIMA) model and Generalized Additive Models (GAM), each with distinct theoretical foundations and performance characteristics [9].
This application note provides a structured comparison of ARIMA and GAM performance within the context of ITS designs, offering drug development professionals and public health researchers evidence-based guidance for model selection. We synthesize recent empirical findings, present quantitative performance metrics, and provide detailed experimental protocols to facilitate robust implementation in health intervention research.
ARIMA models express a time series variable based on its own past values (autoregressive component), past forecast errors (moving average component), and differencing to achieve stationarity [13]. The model is characterized by three parameters: AR order (p), degree of differencing (d), and MA order (q), denoted as ARIMA(p,d,q). For seasonal data, this extends to seasonal ARIMA (SARIMA) that incorporates seasonal periodicities [80].
Key assumptions include:
GAMs are an extension of generalized linear models that use smooth functions to model predictor variables, allowing for flexible modeling of non-linear relationships without specifying the functional form a priori [65] [9]. The model takes the form:
[g(μi) = β0 + f1(x{1i}) + f2(x{2i}) + ... + fp(x{pi})]
where (g()) is the link function, (μi = E(yi)), and (f_j()) are smooth functions of the predictor variables [65].
Table 1: Model Performance in Forecasting COPD Patient Numbers in China
| Model | R² | MAE | MAPE | RMSE | AIC/BIC | Data Features |
|---|---|---|---|---|---|---|
| GAM | Highest | Lowest | Lowest | Lowest | Optimal [65] | Non-linear trends, complex patterns |
| ARIMA(0,1,2) | Intermediate | Intermediate | Intermediate | Intermediate | Intermediate [65] | Linear trends, stationary data |
| Curve Fitting | Lowest | Highest | Highest | Highest | Suboptimal [65] | Simple trends, limited data points |
Table 2: Performance Under Different Structural Break Scenarios (Depression Incidence Forecasting)
| Scenario | Best Performing Model | SMAPE | Key Advantage |
|---|---|---|---|
| No significant structural breaks | Multivariate TFT | 11.6% | Captures complex relationships [83] |
| Stable burden level | TFT | 11.6-13.2% | Robustness to temporary interruptions [83] |
| Incidence surge remaining high | VARIMA/ARIMA | 14.8-16.4% | Better captures persistent level shifts [83] |
| Fluctuating outbreaks | TFT | Lower than ARIMA | Handles sharp, temporary interruptions [83] |
Table 3: Simulation Performance in Interrupted Time Series Analysis
| Condition | Better Performing Model | Rationale |
|---|---|---|
| Different policy effect sizes | ARIMA | More consistent results [9] |
| Presence of seasonality | ARIMA | Explicit seasonal modeling capability [9] |
| Model misspecification | GAM | Greater robustness to specification errors [9] |
Objective: To implement an ARIMA model for evaluating the impact of a health policy intervention on a continuous outcome measure.
Materials and Software Requirements:
forecast, tseries packages, or Python with statsmodelsProcedure:
Interpretation: The intervention effect is assessed by the significance and magnitude of the intervention coefficients, indicating immediate level changes or trend alterations post-intervention.
Objective: To implement a GAM for evaluating health intervention effects while accommodating non-linear trends.
Materials and Software Requirements:
mgcv package or Python with pygamProcedure:
gam(outcome ~ s(time) + intervention + post_intervention_time, data)Interpretation: The intervention effect is evaluated through the parametric components representing immediate level changes and altered trends, while the smooth function captures underlying non-linear patterns.
Diagram 1: Model Selection Decision Tree for ITS Analysis
Table 4: Key Analytical Tools for ITS Implementation
| Tool/Software | Function | Implementation Notes |
|---|---|---|
R forecast package |
Automated ARIMA modeling | Provides auto.arima() for parameter selection [80] |
R mgcv package |
GAM implementation | Handles smooth function estimation with automatic smoothing parameter selection [65] |
Python statsmodels |
Time series analysis | Implements ARIMA and seasonal ARIMA models [83] |
| Augmented Dickey-Fuller test | Stationarity testing | Determines need for differencing in ARIMA [80] |
| AIC/BIC criteria | Model selection | Penalized likelihood measures for comparing model fit [65] [84] |
| Time series cross-validation | Model validation | Maintains temporal ordering during validation [80] |
Interrupted Time Series (ITS) design is a powerful quasi-experimental method for evaluating the impact of interventions or exposures in public health, clinical science, and policy research. Unlike randomized controlled trials, ITS studies investigate interventions that occur at specific, known time points across entire populations or systems, making them particularly valuable for assessing real-world policy changes, healthcare interventions, and public health initiatives. The fundamental strength of ITS design lies in its ability to model underlying secular trends from pre-interruption data, creating a counterfactual for what would have occurred without the intervention. However, this analytical complexity also introduces unique challenges for transparency and reproducibility. Without comprehensive reporting of methodological choices, data handling procedures, and analytical techniques, the validity and interpretability of ITS findings can be compromised, potentially leading to erroneous conclusions about intervention effectiveness.
The growing emphasis on research transparency reflects a broader movement within the scientific community to address what many researchers perceive as a reproducibility crisis. In response, governmental bodies, research organizations, and scientific communities have developed frameworks to establish "gold standards" for scientific practice. These standards emphasize that research must be reproducible, transparent, communicative of error and uncertainty, collaborative, and skeptical of its own findings [85]. For ITS studies specifically, transparent reporting is essential because the analytical approach involves multiple consequential decisions about model specification, handling of autocorrelation, and interpretation of complex temporal patterns. This document provides comprehensive guidance on applying these emerging standards specifically to ITS research, with detailed protocols designed to enhance the transparency, reproducibility, and overall scientific rigor of interrupted time series studies.
The foundation of transparent ITS research rests on several core principles that align with broader scientific integrity policies while addressing the specific methodological challenges of time series analysis. First, researchers must prioritize complete methodological disclosure, ensuring that all analytical choices are explicitly documented and justified rather than left for readers to infer. This principle echoes the fundamental reporting ethic articulated by Douglas Altman, who famously stated that "readers should not have to infer what was probably done; they should be told explicitly" [86]. Second, ITS reporting should embrace analytical transparency by making available the data, code, and models used to generate findings, allowing for independent verification of results. Third, researchers must communicate uncertainties inherent in both the data and models, including confidence intervals around effect estimates and discussions of model limitations.
These principles align with the "Gold Standard Science" framework articulated in recent governmental policy, which emphasizes that federally funded research must be "reproducible, transparent, communicative of error and uncertainty, collaborative and interdisciplinary, skeptical of its findings and assumptions, structured for falsifiability of hypotheses, subject to unbiased peer review, accepting of negative results as positive outcomes, and without conflicts of interest" [85]. For ITS studies specifically, this means documenting not just what analyses were conducted, but why specific statistical approaches were selected, how missing data were handled, what sensitivity analyses were performed, and how potential confounding factors were addressed.
While no reporting guidelines have been developed specifically for ITS studies, researchers can adapt relevant elements from established frameworks for clinical trials and observational studies. The SPIRIT 2013 statement (Standard Protocol Items: Recommendations for Interventional Trials) and its 2025 update provide valuable guidance for documenting study protocols, while the CONSORT statement offers complementary guidance for reporting completed studies [86] [87]. Although developed for randomized trials, many SPIRIT and CONSORT elements have direct relevance to ITS studies, particularly those evaluating health interventions.
Table: Adapted SPIRIT/CONSORT Items Relevant to ITS Study Reporting
| Reporting Element | Application to ITS Studies | SPIRIT 2025 Section |
|---|---|---|
| Protocol availability | Publicly sharing the detailed statistical analysis plan before conducting analysis | Open Science section [86] |
| Data sharing | Making de-identified time series data available in trusted repositories | Open Science section [86] |
| Analysis code transparency | Sharing code used for statistical modeling and visualization | Open Science section [86] |
| Outcome definitions | Precisely defining how and when outcomes were measured | Methods section [88] |
| Statistical methods | Detailing approaches to handle autocorrelation and model selection | Methods section [88] |
The recently updated SPIRIT 2025 statement introduces a new open science section that emphasizes the importance of trial registration, protocol and statistical analysis plan availability, data sharing policies, and disclosure of conflicts of interest [86] [88]. These elements are equally crucial for ITS studies, particularly when research informs clinical or policy decisions. Similarly, the Transparency and Openness Promotion (TOP) Guidelines provide a structured framework for implementing open science practices across seven research practices: study registration, study protocol, analysis plan, materials transparency, analysis code transparency, data transparency, and reporting transparency [89]. The TOP Guidelines implement three increasing levels of transparency—from simple disclosure to independent certification—giving researchers clear benchmarks for improving the transparency of their ITS studies.
The choice of statistical method in ITS analysis can substantially influence study conclusions, making transparent reporting of analytical approaches particularly important. Empirical research comparing six statistical methods for analysing ITS data has demonstrated that methodological choices can meaningfully affect point estimates, standard errors, confidence intervals, and p-values [6]. This comprehensive evaluation applied different statistical approaches to 190 published time series, providing robust evidence about how method selection impacts findings in real-world research contexts.
The six methods evaluated included: (1) ordinary least squares regression (OLS), which provides no adjustment for autocorrelation; (2) OLS with Newey-West standard errors (NW), which adjusts standard errors for autocorrelation; (3) Prais-Winsten (PW), a generalized least squares method that accounts for autocorrelation; (4) restricted maximum likelihood (REML) with and without Satterthwaite approximation; and (5) autoregressive integrated moving average (ARIMA), which explicitly models lagged dependencies [6]. Each method approaches the fundamental segmented regression model differently, particularly in how they handle the error term (εₜ) and account for autocorrelation.
Table: Comparison of Statistical Methods for ITS Analysis Based on 190 Published Series
| Statistical Method | Handling of Autocorrelation | Impact on Significance Findings | Considerations |
|---|---|---|---|
| Ordinary Least Squares (OLS) | No adjustment | 4-25% disagreement with other methods | Underestimates SE with positive autocorrelation [6] |
| Newey-West (NW) | Adjusts standard errors | Moderate impact on significance | Robust to unknown autocorrelation form [6] |
| Prais-Winsten (PW) | Models error structure | Substantial impact on significance | Generalizes OLS for correlated errors [6] |
| REML | Models error structure | Substantial impact on significance | Reduces bias in variance estimates [6] |
| ARIMA | Explicitly models lags | Substantial impact on significance | Flexible for complex time patterns [6] |
The empirical findings revealed that the choice of statistical method can lead to importantly different conclusions about intervention effects. Across the 190 analyzed series, statistical significance (categorized at the 5% level) differed between methods in 4% to 25% of pairwise comparisons, depending on which methods were being contrasted [6]. This variation underscores the critical importance of pre-specifying and transparently reporting the statistical approach in ITS studies. Researchers should avoid drawing naive conclusions based solely on statistical significance without considering how methodological choices might influence results.
A fundamental challenge in ITS analysis is the presence of autocorrelation (serial correlation), where data points close in time tend to be more similar than those further apart. Positive autocorrelation, which commonly occurs in time series data, inflates Type I error rates when unaccounted for, potentially leading to false conclusions about intervention effects. Different statistical methods address autocorrelation in distinct ways, which explains their varying performance characteristics.
The empirical evaluation found that estimates of autocorrelation differed substantially depending on the method used and the length of the time series [6]. This variation has important implications for statistical inference, as accurate characterization of autocorrelation structure affects both parameter estimates and their precision. Longer time series generally provide more reliable estimates of autocorrelation, but also present greater risk of structural changes unrelated to the intervention being studied. The research highlights that pre-specification of the statistical method is essential, and that sensitivity analyses using different approaches should be conducted to assess the robustness of findings to methodological choices.
Comprehensive protocol development before conducting ITS analysis is essential for minimizing selective reporting and ensuring analytical rigor. The following protocol provides a structured approach for designing and conducting transparent ITS studies:
Protocol: ITS Study Design and Analysis
Procedural Framework:
The following workflow provides a detailed, practical guide for conducting ITS analysis in a manner that promotes transparency and reproducibility:
ITS Analytical Workflow
Data Preparation and Validation
Exploratory Data Analysis
Model Specification
Model Fitting and Diagnostics
Sensitivity Analysis
Interpretation and Reporting
The following table details key methodological "reagents" essential for conducting rigorous ITS studies. These tools represent the conceptual and practical resources needed to implement transparent and reproducible interrupted time series analyses.
Table: Essential Methodological Tools for ITS Research
| Tool Category | Specific Examples | Application in ITS Studies |
|---|---|---|
| Statistical Software | R (with packages like forecast, tseries, nlme), Python (statsmodels), Stata, SAS |
Implementation of segmented regression models with appropriate error structures [6] |
| Study Design Frameworks | SPIRIT 2013/2025, CONSORT, TOP Guidelines | Protocol development and reporting standards adaptation [86] [89] |
| Analytical Methods | OLS, Newey-West, Prais-Winsten, REML, ARIMA | Statistical modeling approaches with different approaches to autocorrelation [6] |
| Data Visualization Tools | Line charts, bar charts, combination charts | Visual representation of time trends and intervention effects [90] |
| Transparency Infrastructure | ClinicalTrials.gov, OSF, GitHub, Dataverse | Study registration, data sharing, and code dissemination [89] |
Successfully implementing these research tools requires careful planning and consistent application throughout the research process. For statistical software, researchers should select programs that offer multiple approaches to time series analysis and explicitly document the specific packages, functions, and version numbers used in their analyses. This level of detail is essential for computational reproducibility. When applying study design frameworks like SPIRIT 2025, researchers should focus particularly on the open science elements, including trial registration, protocol availability, statistical analysis plan, and data sharing arrangements [86]. These elements, though developed for clinical trials, provide a robust framework for establishing the transparency of ITS studies.
For analytical methods, the empirical evidence strongly suggests that researchers should avoid relying exclusively on a single statistical approach [6]. Instead, studies should pre-specify a primary method while planning sensitivity analyses using alternative approaches. This methodological triangulation helps assess whether substantive conclusions are robust to different analytical assumptions. Data visualization should include both the complete time series with fitted models and diagnostic plots showing residual patterns, as these visualizations help readers understand both the primary findings and the adequacy of the statistical model. Finally, transparency infrastructure should be utilized to make de-identified data, analytical code, and study materials publicly available whenever possible, following the TOP Guidelines' recommendations for citation and sharing practices [89].
The credibility and utility of interrupted time series research depends fundamentally on the transparency and reproducibility of its methods and reporting. As empirical evidence demonstrates, the choice of analytical method in ITS studies can substantially influence conclusions about intervention effectiveness, making comprehensive methodological reporting essential for proper interpretation of findings. By adopting the standards and protocols outlined in this document, researchers can enhance the rigor, transparency, and reproducibility of their ITS studies, contributing to more reliable evidence for healthcare, policy, and scientific decision-making.
The movement toward "Gold Standard Science" emphasizes that rigorous research must be not only methodologically sound but also transparent in its execution and reporting [85]. For interrupted time series studies, this means pre-specifying analytical plans, comprehensively reporting methodological choices, acknowledging uncertainties, and sharing data and code to enable verification and extension of findings. As reporting standards continue to evolve, researchers conducting ITS studies should monitor updates to frameworks like SPIRIT, CONSORT, and TOP Guidelines, adapting their practices to incorporate emerging best practices in transparent and reproducible research.
Interrupted Time Series (ITS) designs analyze changes in level and trend following an intervention to infer causal effects. Incorporation of a control series—a group not receiving the intervention—transforms a single-arm ITS into a much more robust quasi-experimental design. This approach mitigates biases from concurrent events (history threats) and other secular trends that could be mistaken for an intervention effect. The core causal question is: "How would the outcome have trended in the intervention group had the intervention not occurred?" A well-chosen control series provides the data-driven counterfactual necessary to answer this [91] [92].
The potential outcomes framework, or Rubin Causal Model, formalizes this logic. For each unit i at time t, we define:
The causal effect is τ = Y1it - Y0it. The fundamental problem of causal inference is that we can only observe one of these potential outcomes for any i,t pair [93] [94]. The control series provides an estimate of the missing Y0it for the intervention group post-intervention, enabling the estimation of τ.
Sensitivity analysis tests the robustness of causal conclusions to violations of the study's underlying assumptions [95]. In observational ITS studies, key assumptions—such as the absence of unmeasured confounding or the correct specification of the control series—are untestable with the data at hand. Sensitivity analysis quantifies how strong an unmeasured confounder would need to be to explain away the observed effect, or how much the effect estimate might vary under different plausible model specifications [95] [96].
This process is crucial for building trust in causal claims, especially for regulatory decisions where randomized trials are not feasible. Historical examples, like the analysis linking smoking to lung cancer, used sensitivity analysis to convincingly rule out alternative explanations, thereby strengthening the causal argument [95].
Objective: To identify a control group that represents the counterfactual trend of the intervention group.
Procedure:
Validation Table:
| Validation Metric | Target Threshold | Statistical Test/Method |
|---|---|---|
| Pre-intervention Trend Parallelism | p > 0.05 for group-time interaction term | Linear regression with time-by-group interaction term |
| Covariate Balance | Standardized Mean Difference < 0.1 | Two-sample t-test, Chi-square test |
| Model Fit (Pre-period) | R² > 0.8, MAPE < 10% | Segmented regression, forecasting models |
Objective: To estimate the level and trend change in the intervention group relative to the control series.
A segmented regression model for an ITS with a control series can be specified as follows:
Yt = β0 + β1Tt + β2Xt + β3TtXt + β4Zt + β5ZtTt + β6ZtXt + β7ZtTtXt + εt
Where:
Key Coefficients:
Table: Interpretation of Segmented Regression Coefficients
| Coefficient | Interpretation |
|---|---|
| β0 | Starting level of the outcome in the control group. |
| β1 | Pre-intervention trend in the control group (baseline trend). |
| β2 | Immediate level change in the control group at the intervention timepoint. |
| β3 | Trend change in the control group after the intervention. |
| β4 | Difference in starting level between intervention and control groups. |
| β5 | Difference in pre-intervention trends between intervention and control groups. |
| β6 | Causal Effect: Immediate level change attributable to the intervention. |
| β7 | Causal Effect: Sustained trend change attributable to the intervention. |
Objective: To assess how an unmeasured confounder could alter the causal conclusion.
Procedure (Using an E-Value Approach):
Objective: To test if the causal effect estimate is robust to different methods of constructing the counterfactual.
Procedure:
Sensitivity Analysis Summary Table:
| Method | Question Answered | Interpretation of Robust Result |
|---|---|---|
| E-Value | How strong must an unmeasured confounder be? | Large E-value; confounder stronger than known risks is unlikely. |
| Synthetic Control | Is the effect consistent with a more data-driven counterfactual? | Effect estimate remains statistically and substantively significant. |
| Placebo Test (Time) | Does a spurious "effect" appear before the intervention? | Null effect is found at all false intervention timepoints. |
| Model Specification | Does the effect depend on specific modeling choices? | Effect estimate is stable across plausible alternative models. |
Table: Essential Methodological Tools for Causal ITS Analysis
| Tool / Reagent | Function in Causal Inference | Example Software/Package |
|---|---|---|
| Segmented Regression Framework | Estimates level and trend changes in outcome associated with an intervention, controlling for pre-existing trends. | Base R (lm), Python (statsmodels), SAS (PROC AUTOREG) |
| Autocorrelation Handling | Corrects for correlation between successive time points, which violates independence assumption and biases standard errors. | R (forecast::auto.arima), Stata (xtregar) |
| Synthetic Control Method | Creates a weighted combination of control units to build a more comparable counterfactual for a single treated unit. | R (Synth), Python (SyntheticControlMethods) |
| Propensity Score Matching | Balances observed covariates between intervention and control series in the pre-period to improve comparability. | R (MatchIt), Stata (teffects psmatch) |
| Sensitivity Analysis (E-Value) | Quantifies robustness of a causal conclusion to potential unmeasured confounding. | R (EValue), Online E-Value calculators |
Causal Inference Workflow
ITS Design with Control Series
Drug utilization research provides critical insights into the real-world use, safety, and effectiveness of pharmaceutical products, informing healthcare policy and clinical practice. Within this field, the interrupted time series (ITS) design has emerged as a powerful quasi-experimental method for evaluating the causal effects of interventions when randomized controlled trials (RCTs) are impractical, unethical, or prohibitively expensive [97]. This review assesses the current reporting quality of ITS studies in drug utilization research, identifying common methodological strengths and deficiencies while providing structured guidance to enhance the rigor, transparency, and reproducibility of future research.
The ITS design is considered one of the strongest quasi-experimental approaches because it uses longitudinal data collected at multiple, equally spaced time points before and after a well-defined intervention to evaluate whether the intervention caused a significant change in outcomes [10]. By accounting for pre-existing trends and autocorrelation, ITS analyses can distinguish true intervention effects from underlying secular trends, making them particularly valuable for studying the impact of policy changes, new guidelines, or quality improvement initiatives on drug utilization patterns [97].
The ITS design operates on the fundamental principle of comparing observed post-intervention data points against a counterfactual—what would have occurred had the pre-intervention trend continued unchanged [97]. This comparison relies on three essential analytical components that must be clearly defined and reported in any ITS study:
Segmented regression represents the most common analytical approach for ITS designs, used in approximately 78% of healthcare ITS studies [10]. This method models the outcome variable as a function of time, intervention status, and time since intervention, formally testing whether the intervention was associated with statistically significant changes in level or trend.
Several methodological factors must be addressed to ensure valid ITS analysis and interpretation. The table below summarizes key considerations and their implications for study validity:
Table 1: Critical Methodological Considerations in ITS Designs
| Consideration | Description | Impact on Validity |
|---|---|---|
| Autocorrelation | Correlation between successive measurements | Can lead to underestimated standard errors and inflated Type I errors if unaddressed [10] |
| Seasonality | Periodic, predictable patterns in data | May confound intervention effects if not accounted for in models |
| Non-stationarity | Systematic changes in outcome unrelated to intervention | Can create spurious associations if misinterpreted as intervention effects |
| Outliers | Extreme values disproportionate to overall pattern | May disproportionately influence model estimates and conclusions |
| Sample size | Number of pre- and post-intervention observations | Affects statistical power to detect meaningful intervention effects |
Despite their importance, these methodological elements are frequently underreported. A systematic assessment of 116 healthcare ITS studies found that only 55% considered autocorrelation, with just 63% of those reporting formal statistical tests [10]. Similarly, seasonality was addressed in only 24% of studies, and non-stationarity in a mere 8% [10].
A methodological review of ITS designs across healthcare settings revealed significant gaps in reporting quality that limit the reliability and interpretability of findings [10]. The assessment examined 116 ITS studies published in 2015, with many focusing on drug utilization outcomes. Key findings regarding reporting completeness include:
Table 2: Reporting Completeness in Healthcare ITS Studies (n=116)
| Reporting Element | Studies Including Element | Percentage |
|---|---|---|
| Clear intervention definition | 110 | 95% |
| Analysis method described | 115 | 99% |
| Number of pre-intervention points stated | 34 | 29% |
| Number of post-intervention points stated | 32 | 28% |
| Autocorrelation considered | 63 | 55% |
| Seasonality addressed | 28 | 24% |
| Sample size justification | 7 | 6% |
| Primary outcome specified | 30 | 26% |
This assessment identified particular deficiencies in reporting time point justification, with less than 30% of studies specifying the number of pre- and post-intervention data points [10]. This omission is critical because sufficient data points are necessary to establish pre-intervention trends and detect post-intervention changes with adequate statistical power.
Inconsistent terminology represents another challenge in the ITS literature. Among the 74 studies (64%) that provided some definition of their design, significant variation existed in how authors characterized the ITS approach [10]. Definitions included "interrupted time series," "quasi-experimental," "time series," "observational," "cohort," and "cross-sectional study," with only two studies consistently using the same terminology throughout their manuscript [10]. This terminology inconsistency creates confusion for readers and may reflect underlying methodological misunderstandings among authors.
To address identified reporting deficiencies, researchers should adhere to the following minimum reporting standards when publishing ITS studies in drug utilization research:
For researchers implementing ITS designs, the following step-by-step protocol ensures methodological rigor:
Step 1: Visual Data Exploration
Step 2: Model Specification
Step 3: Autocorrelation Assessment
Step 4: Model Refinement
Step 5: Intervention Effect Estimation
Step 6: Validation and Sensitivity Analysis
Although formal sample size calculations for ITS designs are complex and infrequently reported (only 6% of studies) [10], researchers should consider the following guidelines:
Effective visualization is critical for both analyzing ITS data and communicating findings. The following standardized approach ensures clarity and consistency in graphical representation.
Visualization 1: Analytical workflow for interrupted time series studies
Visualization 2: Key components and parameters in ITS analysis
Successful implementation of ITS designs in drug utilization research requires both methodological expertise and appropriate analytical tools. The following table details essential "research reagents" for conducting robust ITS analyses:
Table 3: Essential Research Reagents for ITS Studies in Drug Utilization Research
| Tool Category | Specific Examples | Function in ITS Analysis |
|---|---|---|
| Statistical Software | R (package: its.analysis), SAS (PROC AUTOREG), Stata (itsa), Python (statsmodels) |
Implement segmented regression models, account for autocorrelation, estimate intervention effects [97] |
| Data Visualization Tools | ggplot2 (R), matplotlib (Python), Stata graphics | Create time series plots showing pre/post-intervention trends, highlight intervention points, display model fits |
| Autocorrelation Diagnostics | Durbin-Watson test, Ljung-Box test, ACF/PACF plots | Detect and quantify correlation between successive observations, inform model selection [10] |
| Sample Size Calculators | Power calculations for ITS designs (simulation-based) | Determine adequate number of pre/post observations to detect clinically meaningful effect sizes |
| Reporting Guidelines | CONSORT extension for ITS, RECORD statement | Ensure comprehensive reporting of methods, results, and limitations [10] |
The current state of reporting quality in drug utilization research using ITS designs shows significant room for improvement. While the method itself represents a powerful quasi-experimental approach for evaluating interventions, deficiencies in reporting analytical methods, handling autocorrelation, justifying sample sizes, and using consistent terminology limit the reliability and interpretability of findings.
Moving forward, the field would benefit from developing and adopting formal reporting guidelines specific to ITS designs, similar to CONSORT for randomized trials or STROBE for observational studies. Such guidelines would address the unique methodological features of ITS analyses and promote more transparent, reproducible research practices. Additionally, increased attention to statistical fundamentals—including autocorrelation assessment, seasonality adjustment, and appropriate sample size considerations—would enhance the validity of causal inferences drawn from ITS studies in drug utilization research.
As pharmaceutical interventions and policy changes continue to evolve rapidly, rigorous application of ITS designs will remain essential for generating timely evidence about the real-world impacts of these changes on drug utilization patterns and patient outcomes.
Interrupted Time Series design is an indispensable tool for evaluating population-level interventions in healthcare and drug development, offering a rigorous framework for causal inference when randomized trials are not feasible. Successful implementation hinges on a deep understanding of its foundational principles, careful selection and application of analytical methods like segmented regression, ARIMA, or GAM, and diligent attention to common pitfalls such as autocorrelation and seasonality. As methodological research advances, future efforts should focus on the adoption of formal reporting guidelines, improved handling of hierarchical data structures, and the development of more accessible sample size estimation techniques. By mastering these elements, researchers can leverage ITS to generate robust, actionable evidence that directly informs clinical practice and public health policy.