Low-cost sensors measurements are noisy, which limits large-scale adaptability in airquality monitoirng. Calibration is generally used to get good estimates of air quality measurements out from LCS. In order to do this, LCS sensors are typically co-located with reference stations for some duration. A calibration model is then developed to transfer the LCS sensor measurements to the reference station measurements. Existing works implement the calibration of LCS as an optimization problem in which a model is trained with the data obtained from real-time deployments; later, the trained model is employed to estimate the air quality measurements of that location. However, this approach is sensor-specific and location-specific and needs frequent re-calibration. The re-calibration also needs massive data like initial calibration, which is a cumbersome process in practical scenarios. To overcome these limitations, in this work, we propose Sens-BERT, a BERT-inspired learning approach to calibrate LCS, and it achieves the calibration in two phases: self-supervised pre-training and supervised fine-tuning. In the pre-training phase, we train Sens-BERT with only LCS data (without reference station observations) to learn the data distributional features and produce corresponding embeddings. We then use the Sens-BERT embeddings to learn a calibration model in the fine-tuning phase. Our proposed approach has many advantages over the previous works. Since the Sens-BERT learns the behaviour of the LCS, it can be transferable to any sensor of the same sensing principle without explicitly training on that sensor. It requires only LCS measurements in pre-training to learn the characters of LCS, thus enabling calibration even with a tiny amount of paired data in fine-tuning. We have exhaustively tested our approach with the Community Air Sensor Network (CAIRSENSE) data set, an open repository for LCS.
Low-cost sensors (LCS) are affordable, compact, and often portable devices designed to measure various environmental parameters, including air quality. These sensors are intended to provide accessible and cost-effective solutions for monitoring pollution levels in different settings, such as indoor, outdoor and moving vehicles. However, the data produced by LCS is prone to various sources of error that can affect accuracy. Calibration is a well-known procedure to improve the reliability of the data produced by LCS, and several developments and efforts have been made to calibrate the LCS. This work proposes a novel Estimated Error Augmented Two-phase Calibration (\textit{EEATC}) approach to calibrate the LCS in stationary and mobile deployments. In contrast to the existing approaches, the \textit{EEATC} calibrates the LCS in two phases, where the error estimated in the first phase calibration is augmented with the input to the second phase, which helps the second phase to learn the distributional features better to produce more accurate results. We show that the \textit{EEATC} outperforms well-known single-phase calibration models such as linear regression models (single variable linear regression (SLR) and multiple variable linear regression (MLR)) and Random forest (RF) in stationary and mobile deployments. To test the \textit{EEATC} in stationary deployments, we have used the Community Air Sensor Network (CAIRSENSE) data set approved by the United States Environmental Protection Agency (USEPA), and the mobile deployments are tested with the real-time data obtained from SensurAir, an LCS device developed and deployed on moving vehicle in Chennai, India.