Post-acute Sequelae of COVID-19 (PASC), also known as Long COVID, is a broad grouping of a range of long-term symptoms following acute COVID-19 infection. An understanding of characteristics that are predictive of future PASC is valuable, as this can inform the identification of high-risk individuals and future preventative efforts. However, current knowledge regarding PASC risk factors is limited.
Estimating the marginally adjusted dose-response curve for continuous treatments is a longstanding statistical challenge critical across multiple fields. In the context of parametric models, mis-specification may result in substantial bias, hindering the accurate discernment of the true data generating distribution and the associated dose-response curve. In contrast, non-parametric models face difficulties as the dose-response curve isn't pathwise differentiable, and then there is no n...
In environmental epidemiology, identifying subpopulations vulnerable to chemical exposures and those who may benefit differently from exposure-reducing policies is essential. For instance, sex-specific vulnerabilities, age, and pregnancy are critical factors for policymakers when setting regulatory guidelines. However, current semi-parametric methods for heterogeneous treatment effects are often limited to binary exposures and function as black boxes, lacking clear, interpretable rules for subpopulation-specific policy interventions. This study introduces a novel method using cross-...
The validity of medical studies based on real-world clinical data, such as observational studies, depends on critical assumptions necessary for drawing causal conclusions about medical interventions. Many published studies are flawed because they violate these assumptions and entail biases such as residual confounding, selection bias, and misalignment between treatment and measurement times. Although researchers are aware of these pitfalls, they continue to occur because anticipating and addressing them in the context of a specific study can be challenging without a large, often unwieldy,...
While there is growing consensus that real-world data should play a larger role in generating causal evidence for health care, it is less clear whether and how AI can help. Current approaches to AI-driven analysis of health data are ill-equipped to account for the many threats to causal validity. However, the current human-reliant pipeline for causal analysis also falls short: analyses are complex, require multidisciplinary expertise, and are slow, labor-intensive and error-prone. Here, we speculate how a “human-in-the-loop” AI-based system could help relieve bottlenecks to high-...
This manuscript explores the intersection of surrogate outcomes and adaptive designs in statistical research. While surrogate outcomes have long been studied for their potential to substitute long-term primary outcomes, current surrogate evaluation methods do not directly account for the potential benefits of using surrogate outcomes to adapt randomization probabilities in adaptive randomized trials that aim to learn and respond to treatment effect heterogeneity. In this context, surrogate outcomes can benefit participants in the trial directly (i.e. improve expected outcome of...
Understanding treatment effects on health-related outcomes using real-world data requires defining a causal parameter and imposing relevant identification assumptions to translate it into a statistical estimand. Semiparametric methods, like the targeted maximum likelihood estimator (TMLE), have been developed to construct asymptotically linear estimators of these parameters. To further establish the asymptotic efficiency of these estimators, two conditions must be met: 1) the relevant components of the data likelihood must fall within a Donsker class, and 2) the estimates of nuisance...