Project

Parking Availability Data Pipeline

UC San Diego · AWS · Data Engineering

UC San Diego's parking occupancy data is sourced from multiple parking occupancy sensor systems (including Indect, Park Assist, Mistall, ParkingLogix, etc.) which are all supplied by different vendors, each with their own platform, naming conventions, and data formats. The goal of this project was to consolidate all occupancy data channels into a single, consistent dataset that could support live availability (for signage and the campus app) and historical analytics (for planning, reporting, and policy decisions).

Completed as a Student Application & Integrations Engineer at IT Services.

Before this project, generating reports on parking occupancy at UC San Diego was a hassle. Analysts first had to visit each individual sensor system dashboard, select date ranges month by month, and export the data as CSV files. Afterward, the data needed to be cleaned in Excel and stitched together with exports from other sensor systems if comparisons were required. Once completed, the workbook was then sent to the Business Intelligence team to create a snapshot dashboard. This process was time-consuming, manual, and difficult to scale.

To add to the complexity, parking occupancy sensor systems were not always accurate in how they categorized parking space types. UC San Diego operates under a nuanced parking eligibility model, and allocations frequently change due to policy updates, construction, or operational needs. These changes were not consistently reflected across vendor systems, which led to unreliable internal reports and, in some cases, inaccurate live availability shown on digital signage and the campus mobile app.

To address this, I led efforts to build a centralized data pipeline that serves as a consistent backbone for both live and historical parking data. Using serverless functions with AWS Lambda, S3 buckets, and RDS, data from vendor APIs is regularly pulled in, normalized, and validated against an internal inventory dataset that defines parking location, level, and eligibility.

Processed data is stored in PostgreSQL schemas designed for time-based analysis, with historical snapshots captured at five-minute intervals. In parallel, cleaned live feeds are published as structured JSON and XML outputs that power digital signage and the campus mobile app.

To verify data correctness, I conducted validation both in the field and remotely by reviewing imagery captured by occupancy sensors. These checks helped identify mismatches between reported availability and on-the-ground conditions, ensuring that the pipeline reflected how parking spaces were actually being used.

Together, this system significantly reduced manual reporting effort, improved trust in parking availability data, and created a scalable foundation for future analytics and planning as additional vendors and inventory data are integrated.