Air Quality Data Structure and Standardization for Road Transport Emission Monitoring
- Posted
- Server
- Preprints.org
- DOI
- 10.20944/preprints202602.1163.v1
Air Quality (AQ) plays a critical role in public health and urban sustainability, but drawing insights from Air Quality data remains challenging due to fragmented sources, inconsistent formats and varying measurement standards and devices. This paper explores the architecture and standardization of Air Quality datasets from major global monitoring systems, specifically the U.S. EPA’s Air Quality System (AQS) and European Environment Agency (EEA) networks, emphasizing discrepancies in pollutant units, reporting frequencies and metadata quality. The report outlines key pollutants due to road transport emissions and how they are measured using a range of technologies, from fixed regulatory stations to low-cost and satellite-based sensors. The inconsistency in schema design and the lack of interoperability across datasets hinder the scalability of machine learning (ML) pipelines, which rely on clean and harmonized inputs. To address this, an application named “Data Manager Tool” is introduced that ingests, transforms and standardizes heterogeneous AQ data into a centralized “PostgreSQL” database using a star schema. This allows more efficient querying, integration and modeling. The report discusses practical applications of this system, and how it paves the way for scalable ML-based analysis of pollution trends. Future efforts will focus on professional ML approaches, integration of mobile sensor data, and extending the framework to support predictive models and optimization using meteorological and transport datasets.