Have a personal or library account? Click to login
A Deep-Learning Method for the Prediction of Socio-Economic Indicators from Street-View Imagery Using a Case Study from Brazil Cover

A Deep-Learning Method for the Prediction of Socio-Economic Indicators from Street-View Imagery Using a Case Study from Brazil

Open Access
|Feb 2022

Figures & Tables

dsj-21-1356-g1.png
Figure 1

The 30 municipalities of the Vale do Ribeira in southeastern Brazil. The boundary between the States within which the Vale do Ribeira lies is shown by a yellow line. Smaller divisions within each municipality are ‘census sectors’, each containing about 300 households. These census sectors are the finest subdivision available in IBGE publications (https://ibge.gov.br).

dsj-21-1356-g2.png
Figure 2

Workflow diagram showing the four main steps: 1) dataset preparation, 2) feature extraction, 3) training, and 4) validation. Street image samples were obtained from Google Street View.

Table 1

Statistics of census sectors, municipalities and states in the Vale do Ribeira.

STATEMUNICIPALITYNUM. CENSUS SECTORSSTATEMUNICIPALITYNUM. CENSUS SECTORS
PRAdrianópolis21SPItariri61
SPApiaí55SPItaóca9
SPBarra Do Chapéu11SPJacupiranga26
SPBarra Do Turvo14SPJuquitiba54
PRBocaiúva Do Sul23SPJuquiá36
SPCajati38SPMiracatu48
SPCananéia27SPPariquera-açu27
PRCerro Azul42SPPedro De Toledo17
PRDoutor Ulysses13SPRegistro69
SPEldorado30SPRibeira8
SPIguape60PRRio Branco Do Sul58
SPIlha Comprida28SPSete Barras26
SPIporanga19SPSão Lourenço Da Serra27
PRItaperuçu34SPTapiraí15
SPItapirapuã Paulista9PRTunas Do Paraná12
Table 2

The Income Score and income range for each HDI income class calculated on a monthly basis. The details of the source data are found in section 6.

INCOME SCOREHDI–INCOME VALUEABSOLUTE INCOME REFERENCEUSD (2021 6 APRIL)
1HDI [0.00–0.20]R$ 8.00–R$ 813.00$1.41–$143.52
2HDI [0.20–0.40]R$ 813.00–R$ 1618.00$ 143.52–$285.63
3HDI [0.40–0.60]R$ 1618.00–R$ 2423.00$ 285.63–$427.74
4HDI [0.60–0.80]R$ 2423.00–R$ 3228.00$ 427.74–$569.85
5HDI [0.80–1.00]R$ 3228.00–R$ 4033.00$569.85–$711.97
dsj-21-1356-g3.png
Figure 3

The sampling regime using an example of the Juquitiba municipality. The red dots represent the defined geolocated points searched by the crawler algorithm.

dsj-21-1356-g4.png
Figure 4

Geolocations at which GSV images were available, the census sectors shown in blue. In white the sectors in which there was no GSV available.

dsj-21-1356-g5.png
Figure 5

Examples of two panoramic images for each income score taken randomly from Vale do Ribeira. Each image sample was taken over four angles.

dsj-21-1356-g6.png
Figure 6

The distribution of images sampled (112,368 in total) for each income score for both the training and the testing datasets.

dsj-21-1356-g7.png
Figure 7

The distribution of income scores for the mode for each census sector is shown for (a) the observed (true) income score, and (b) the predicted income score. The scale ranges from 1 to 5, corresponding to the lowest to highest income score.

dsj-21-1356-g8.png
Figure 8

Differences between the real and predicted labels for each census sector of Vale do Ribeira. The grey scale shows the difference between the real and predicted labels (Eq. 3), with d = 3 being a big difference, and d = 0 a perfect match. The red dots represent geolocations for which street images were available. Census sectors for which street imagery was not available are shaded green.

dsj-21-1356-g9.png
Figure 9

Distribution of predictions by the model indicating the prediction difference of the income score and the number of images per census sector (points). Each income score category includes the 500 census sectors available in Vale do Ribeira. The scale bar shows the difference from 0 (exact match) to 3 (poor match) between the real label and the predicted label (Eq. 3).

Table 3

Prediction results using the test set for each fold. Each column represents a different metric, from left to right the percentage of correctly predicted classes with an error margin of ± 0, ± 1, mean absolute error (MAE), Pearson’s correlation coefficient (r), and Kendall’s Tau rank correlation coefficient (τ).

ACCURACY ERROR MARGIN (%)MAErτ
±0±1
fold 053780.210.740.30
fold 155800.210.720.32
fold 256800.200.710.32
fold 357810.230.670.34
fold 456810.220.690.33
Avg fold.55800.210.710.32
dsj-21-1356-g10.png
Figure 10

Confusion matrix between observed and predicted income scores. Each cell represents the percentage of the correct predictions. The best case would be a complete diagonal line with perfect accuracy.

Language: English
Submitted on: Apr 22, 2021
Accepted on: Jan 29, 2022
Published on: Feb 11, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Jeaneth Machicao, Alison Specht, Danton Vellenich, Leandro Meneguzzi, Romain David, Shelley Stall, Katia Ferraz, Laurence Mabile, Margaret O’Brien, Pedro Corrêa, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.