Have a personal or library account? Click to login
Comparison of outlier detection approaches in a Smart Cities sensor data context Cover

Comparison of outlier detection approaches in a Smart Cities sensor data context

Open Access
|Feb 2024

Figures & Tables

Figure 1:

Map of the PurpleAir network of sensors in Athens Greece.
Map of the PurpleAir network of sensors in Athens Greece.

Figure 2:

Outliers by IQR method on daily data with (a) extreme high-temperature values on sensor, (b) continuous malfunction of temperature sensor, (c) continuous malfunction of PM10.00μm/m3 on sensor, (d) continuous malfunction of temperature sensor. IQR, interquartile range; PM, particulate matter.
Outliers by IQR method on daily data with (a) extreme high-temperature values on sensor, (b) continuous malfunction of temperature sensor, (c) continuous malfunction of PM10.00μm/m3 on sensor, (d) continuous malfunction of temperature sensor. IQR, interquartile range; PM, particulate matter.

Figure 3:

(a) Daily PM 10.0μm/m3 outliers by IQR method with extremely high values due to sensor malfunction, (b) Hourly PM 10.0μm/m3 outliers with GESD method. GESD, generalized extreme studentized deviate; IQR, interquartile range; PM, particulate matter.
(a) Daily PM 10.0μm/m3 outliers by IQR method with extremely high values due to sensor malfunction, (b) Hourly PM 10.0μm/m3 outliers with GESD method. GESD, generalized extreme studentized deviate; IQR, interquartile range; PM, particulate matter.

Figure 4:

Outliers/observations (%) before and after filter for (a) IQR method on daily data, (b) GESD method on daily data, (c) IQR method on hourly data, and (d) GESD method on hourly data. GESD, generalized extreme studentized deviate; IQR, interquartile range; PM, particulate matter.
Outliers/observations (%) before and after filter for (a) IQR method on daily data, (b) GESD method on daily data, (c) IQR method on hourly data, and (d) GESD method on hourly data. GESD, generalized extreme studentized deviate; IQR, interquartile range; PM, particulate matter.

Figure 5:

OK of hourly temperature data on 2019-05-24 00:00:00 UTC (a) with an extreme value, (b) without outliers. OK, ordinary kriging.
OK of hourly temperature data on 2019-05-24 00:00:00 UTC (a) with an extreme value, (b) without outliers. OK, ordinary kriging.

PurpleAir sensor data, Primary and Secondary data sets of Channels A and B, gray cells represent the selected parameters of the study (PurpleAir, 2022)

PRIMARY
CHANNEL ACHANNEL B
Field 1PM1.0 (CF = 1) μg/m3PM1.0 (CF = 1) μg/m3
Field 2PM2.5 (CF = 1) μg/m3PM2.5 (CF = 1) μg/m3
Field 3PM10.0 (CF = 1) μg/m3PM10.0 (CF = 1) μg/m3
Field 4Uptime (min)Free HEAP memory
Field 5RSSI (WiFi signal strength)ADC0 (analog input) voltage
Field 6Temperature (F)FIRMWARE 2.5 and up: atmospheric pressure
Field 7Humidity (%)FIRMWARE 4.10 and up: Bosch BSEC IAQ when BME680 gas sensor is present
Field 8PM2.5 (CF = ATM) μg/m3PM2.5 (CF = ATM) μg/m3

SECONDARY

Field 10.3 μm particles/dL0.3 μm particles/dL
Field 20.5 μm particles/dL0.5 μm particles/dL
Field 31.0 μm particles/dL1.0 μm particles/dL
Field 42.5 μm particles/dL2.5 μm particles/dL
Field 55.0 μm particles/dL5.0 μm particles/dL
Field 610.0 μm particles/dL10.0 μm particles/dL
Field 7PM1.0 (CF = ATM) μg/m3PM1.0 (CF = ATM) μg/m3
Field 8PM10 (CF = ATM) μg/m3PM10 (CF = ATM) μg/m3

OK RMSE of hourly temperature data on 2019-05-24 00:00:00 UTC, before and after outlier filter for 10 repetitions

Before filter4,083.3518,997.6414,102.0434,080.238544.7524,141.303426.2134,087.4498,030.8593,272.878
After filter0.2090.5400.5010.2040.1550.5070.2850.3120.5030.245

IQR and GESD outliers on daily data without duplicates, for Temperature (°C), Humidity (%), and PM (1_0 μm/m3, 2_5 μm/m3, 10_0 μm/m3) before and after filter application

BEFORE FILTER
Observations (n)IQR outliers (n)GESD outliers (n)Outlier observations in both methods (n)IQR outliers/observations (%)GESD outliers/observations (%)Both methods/observations (%)
Temperature (°C)29,0403807353801.32.51.3
Humidity (%)29,04094234940.30.80.3
PM1.0 μm/m3 cf_129,4376651,3026652.34.42.3
PM2.5 μm/m3 cf_129,4377161,4947082.45.12.4
PM10.0 μm/m3 cf_129,4377511,5527512.65.32.6
PM1.0 μm/m3 cf_atm29,4355608355531.92.81.9
PM2.5 μm/m3 cf_atm29,4375969265962.03.12.0
PM10.0 μm/m3 cf_atm29,4356081,0516062.13.62.1

AFTER FILTER

Temperature (°C)28,5522215792210.82.00.8
Humidity (%)29,04094234940.30.80.3
PM1.0 μm/m3 cf_129,3165541,1885541.94.11.9
PM2.5 μm/m3 cf_129,3165921,3605842.04.62.0
PM10.0 μm/m3 cf_129,3166241,4176242.14.82.1
PM1.0 μm/m3 cf_atm29,3164437134361.52.41.5
PM2.5 μm/m3 cf_atm29,3184858074851.72.81.7
PM10.0 μm/m3 cf_atm29,3164959304931.73.21.7

Outliers of IQR and GESD methods on daily data for temperature (°C), humidity (%), and PM (1_0 μm/m3, 2_5 μm/m3, 10_0 μm/m3) before and after filter application

BEFORE FILTER
Observations (n)IQR outliers (n)GESD outliers (n)Outlier observations in both methods (n)IQR outliers/observations (%)GESD outliers/observations (%)Both methods/observations (%)
Temperature (°C)45,7401,0941,9321,0342.44.22.3
Humidity (%)45,7402605562600.61.20.6
PM1.0 μm/m3 cf_146,3051,6552,7451,6553.65.93.6
PM2.5 μm/m3 cf_146,3051,8223,0421,8153.96.63.9
PM10.0 μm/m3 cf_146,3051,8693,1461,8624.06.84.0
PM1.0 μm/m3 cf_atm46,2991,4982,0191,4883.24.43.2
PM2.5 μm/m3 cf_atm46,3051,6322,1931,5373.54.73.3
PM10.0 μm/m3 cf_atm46,2991,7622,5581,6263.85.53.5

AFTER FILTER

Temperature (°C)44,9286241,4706241.43.31.4
Humidity (%)45,7402605562600.61.20.6
PM1.0 μm/m3 cf_146,0911,3862,4491,3783.05.33.0
PM2.5 μm/m3 cf_146,0911,5492,7381,5453.45.93.4
PM10.0 μm/m3 cf_146,0911,5982,8541,5933.56.23.5
PM1.0 μm/m3 cf_atm46,0891,2321,7411,2252.73.82.7
PM2.5 μm/m3 cf_atm46,0951,3761,8971,2823.04.12.8
PM10.0 μm/m3 cf_atm46,0891,4752,2311,3403.24.82.9

Outliers of IQR and GESD methods on hourly data for temperature (°C), humidity (%), and PM (1_0 μm/m3, 2_5 μm/m3, 10_0 μm/m3) before and after filter application

BEFORE FILTER
Observations (n)IQR outliers (n)GESD outliers (n)Outlier observations in both methods (n)IQR outliers/observations (%)GESD outliers/observations (%)Both methods/observations (%)
Temperature (°C)1,074,3425,6437,4714,2720.40.70.4
Humidity (%)1,074,3426,3737,1966,0260.60.70.6
PM1.0 μm/m3 cf_11,087,43449,74270,94448,0464.46.54.4
PM2.5 μm/m3 cf_11,087,43452,84873,64751,0914.76.84.7
PM10.0 μm/m3 cf_11,087,43454,93675,76853,1414.97.04.9
PM1.0 μm/m3 cf_atm1,087,36237,21646,94634,1703.44.33.1
PM2.5 μm/m3 cf_atm1,087,43438,95446,01134,9363.64.23.2
PM10.0 μm/m3 cf_atm1,087,36249,34467,68645,5954.56.24.2

AFTER FILTER

Temperature (°C)1,056,4632,9844,6822,8120.30.40.3
Humidity (%)1,074,3426,3737,1966,0260.60.70.6
PM1.0 μm/m3 cf_11,082,63846,12167,05744,4444.36.24.1
PM2.5 μm/m3 cf_11,082,63149,38769,96847,6504.66.54.4
PM10.0 μm/m3 cf_11,082,61950,82471,44949,0524.76.64.5
PM1.0 μm/m3 cf_atm1,082,57633,89643,63730,8873.14.02.9
PM2.5 μm/m3 cf_atm1,082,64635,25742,11631,2493.33.92.9
PM10.0 μm/m3 cf_atm1,082,57345,92963,84242,1884.25.93.9

IQR and GESD outliers on hourly data without duplicates, for Temperature (°C), Humidity (%), and PM (1_0 μm/m3, 2_5 μm/m3, 10_0 μm/m3) before and after filter application

BEFORE FILTER
Observations (n)IQR outliers (n)GESD outliers (n)Outlier observations in both methods (n)IQR outliers/observations (%)GESD outliers/observations (%)Both methods/observations (%)
Temperature (°C)682,0283,5334,7633,5310.50.70.5
Humidity (%)682,0282,6853,8172,6850.40.60.4
PM1.0 μm/m3 cf_1691,21028,16140,47328,1614.15.94.1
PM2.5 μm/m3 cf_1691,21029,51542,62429,5154.36.24.3
PM10.0 μm/m3 cf_1691,21030,36443,83130,3644.46.34.4
PM1.0 μm/m3 cf_atm691,15918,07622,09918,0742.63.22.6
PM2.5 μm/m3 cf_atm691,21018,87423,09518,8662.73.32.7
PM10.0 μm/m3 cf_atm691,15922,39633,15622,0203.24.83.2

AFTER FILTER

Temperature (°C)671,2772,0683,4862,0660.30.50.3
Humidity (%)682,0282,6853,8172,6850.40.60.4
PM1.0 μm/m3 cf_1688,54426,45038,70626,4503.85.63.8
PM2.5 μm/m3 cf_1688,53727,69240,66227,6924.05.94.0
PM10.0 μm/m3 cf_1688,52728,39841,69028,3984.16.14.1
PM1.0 μm/m3 cf_atm688,50016,23420,23516,2322.42.92.4
PM2.5 μm/m3 cf_atm688,54917,05021,21917,0422.53.12.5
PM10.0 μm/m3 cf_atm688,49720,50831,11520,1323.04.52.9
Language: English
Submitted on: Sep 6, 2023
|
Published on: Feb 14, 2024
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Sofia Zafeirelli, Dimitris Kavroudakis, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.