◆ Portfolio Projects
Three projects from the MS Applied Data Science program, each representing a different domain, technique set, and stakeholder context.
Using historical inspection patterns to help restaurant owners prepare proactively for future inspections, shifting the NYC DOHMH from a reactive scheduling model to a risk-based, data-driven approach.
The NYC Department of Health and Mental Hygiene currently conducts restaurant inspections on a fixed schedule, typically once per year for all establishments regardless of their compliance history. This one-size-fits-all approach means limited inspector resources are distributed evenly even when risk is not.
This project built a machine learning system that identifies restaurants at high risk of receiving poor inspection grades (B or C) before their scheduled inspections occur. By shifting from reactive to proactive, inspectors can focus where they are most needed, potentially preventing food safety incidents before they happen.
The model analyzed multiple signals: restaurant characteristics such as cuisine type, borough, and establishment age; historical inspection patterns including prior grades and violation counts; temporal trends in violation frequency; and neighborhood-level factors that correlate with compliance patterns.
A key design principle was interpretability. Unlike black-box models that maximize accuracy at the expense of explainability, this system produces predictions that can be explained to restaurant owners and justified to the public — making the choice of model architecture both a technical and ethical decision.
While prior research has explored predictive inspection models, this project incorporated a richer feature set including temporal patterns and neighborhood characteristics beyond what earlier studies have used. The focus on NYC's A/B/C grading system made predictions directly actionable for restaurant owners who understand exactly what each grade means for their business.
The model was designed with multiple stakeholder audiences in mind: health inspectors need prioritized risk queues; restaurant owners need understandable risk signals; the public needs confidence the system is fair. These competing needs shaped every design decision.
Analyzing New York State's Tuition Assistance Program from 2000 to present to surface inequities in aid distribution across income levels, age groups, and institution types.
New York's Tuition Assistance Program is the state's largest financial aid initiative, supporting eligible residents in paying tuition at in-state colleges. But as tuition costs have climbed and demographics have shifted, the question of whether TAP has kept pace — and whether it serves all groups equitably — is both a policy and a data question.
This project used publicly available annual records of TAP recipient counts and total award amounts, categorized by income group, age group, and program type, to trace how the program has evolved since 2000. The dataset contains both numerical and categorical variables, making it suitable for time series and comparative breakdowns.
The primary deliverable was a Tableau dashboard built for a non-technical audience: policymakers, students, and administrators who need to understand funding patterns without statistical expertise. The design focused on clarity and narrative, leading with the big-picture trend then allowing users to drill into demographic breakdowns.
A key design decision was to frame the data around access and affordability rather than raw award totals — turning a dataset into an argument about equity.
The analysis revealed meaningful shifts in how TAP dollars are distributed across income brackets and age groups over two decades. Certain program types saw significant changes in recipient volume that do not align with broader enrollment trends, suggesting structural shifts in eligibility or program design worth further investigation.
Financial aid access is a gateway to educational opportunity, and data can make visible the patterns that policy debates often treat as abstract.
Investigating 114,000+ chemical disclosure records from California's Safe Cosmetics Program to identify usage trends, company behavior, and the relationship between chemical complexity and product discontinuation.
California's Safe Cosmetics Program requires manufacturers to report cosmetic ingredients known or suspected to cause cancer, birth defects, or reproductive harm. The result is a rich, policy-relevant public dataset spanning 13 years and over 114,000 product-chemical combinations.
This project examined how chemicals are distributed across cosmetic products, companies, and product categories, and whether chemical count correlates with product discontinuation rates.
The raw dataset contained 114,635 rows across 22 columns. Through systematic deduplication, removal of records with missing critical fields, and feature engineering, the cleaned dataset was brought to approximately 41,000 high-quality observations across 13 key analytical variables.
Every decision about what to keep and remove reflects assumptions about what matters — and documenting those decisions transparently was a core part of the project.
Products containing higher chemical counts were significantly more likely to be discontinued, suggesting that product complexity is a marker of both regulatory risk and consumer safety concern. Titanium dioxide appeared in 31,989 records, making it by far the most commonly reported ingredient in the dataset.
Reframing this project for portfolio presentation means centering the human story: chemical transparency in consumer products has measurable consequences for what stays on shelves. This is a public health story, not just a technical one.