
Meet the Stellarus Speakers
Yun Hu
Bio: Yun Hu is a Data Scientist Consultant specializing in healthcare analytics, with a focus on applying machine learning and advanced data techniques to improve maternal and population health outcomes. Yun has experience developing predictive models and risk stratification frameworks, including work on early pregnancy identification and preterm birth risk prediction initiatives. With a strong background in data analysis and health data systems, Yun has partnered with cross-functional teams to translate complex data into actionable insights that support care management and policy goals. Yun’s work emphasizes leveraging data-driven solutions to enable earlier interventions, improve quality of care, and reduce healthcare costs. Yun holds expertise in predictive modeling, data engineering, and healthcare analytics, and is passionate about using data to drive meaningful impact in public health.
Talk Title: "Early Prenatal Initiative: Risk Stratification Prototype"
Rupali Roy
Bio: Rupali Roy is a Data Scientist at Stellarus, where she focuses on applying advanced machine learning and generative AI technologies in the healthcare domain. She previously worked across the aerospace and oil & gas industries, developing machine learning solutions for workflow optimization, recommendation systems, and anomaly detection. Rupali earned her Master’s degree in Data Science and Machine Learning from University of Texas at Austin in 2021. Her current research interests lie in leveraging large language models (LLMs) and advanced NLP techniques to improve healthcare analytics and reduce manual review efforts. Her recent work centers on analyzing unstructured clinical notes to extract critical information related to HEDIS quality measures and risk adjustment. She has been actively developing LLM-based systems to automate the extraction of these essential data elements, significantly improving efficiency over traditional manual chart review processes.
Talk Title: "A Two-Stage LLM Pipeline for Classifying Non-Network Request Authorization Notes"
Abstract: Health plans must classify non-network requests into regulatory out-of-network reason categories — a process that is manual, time-consuming, and inconsistent. We address this by building a two-stage LLM pipeline that first classifies authorization notes using a domain-enriched prompt with GPT-4o, then re-classifies ambiguous "Other"-labeled records using GPT-5.2. Injecting domain knowledge, disambiguation rules, and structured metadata into the prompts significantly improved classification accuracy over a baseline approach. A three-phase evaluation combining expert labels, LLM-as-Judge, and pairwise comparison validated the gains while also revealing that automated LLM-based evaluation alone is unreliable for domain-specific tasks.
Roanne Toretsky
Bio: Roanne Toretsky is a Data Scientist specializing in the design and evaluation of care management programs, currently at Stellarus in Oakland, CA. Her experience applying advanced statistical methods to real-world data spans the healthcare and clinical research industries. She holds an MPH from Boston University and a BSE from the University of Pennsylvania.
Talk Title: "Operationalizing Targeted Maximum Likelihood Estimation in Practice"
Abstract: Health plans invest heavily in care management and population health initiatives to improve outcomes and control costs for insured members. Accurately estimating the causal impact of these programs is critical for guiding investment decisions, reporting outcomes, and continuously improving care delivery. As demand grew to scale analytic capabilities to measure care management program impact, limitations of the prior analyst‑dependent workflow became increasingly apparent, including bias‑prone methodologies, data engineering bottlenecks, and redundant analytic effort. These challenges highlighted the need to modernize not only how impact was measured, but also how results were delivered and consumed by the business. We present an end‑to‑end modernization journey from ad hoc, analyst‑driven workflows to a platform‑based causal inference pipeline centered on Targeted Maximum Likelihood Estimation (TMLE). The pipeline is built on standardized features, deployed through CI/CD pipelines ,and governed by enterprise SDLC controls. By integrating flexible machine‑learning estimation and standardized reporting, this approach enables rigorous impact measurement and timely decision‑ready insights to business stakeholders.