Table 1
City-Data Corpus post count, word count, and date range of postings per forum.
| THREAD TOPIC TITLE | LABEL | SAMPLES | DATE RANGE | SUMMARY |
|---|---|---|---|---|
| How’s everyone doing amongst the Coronavirus shut down? (home, movies) (Advameg, Inc 2020) | coronavirus | 481 | 2020/03/16 – 2020/07/27 | Posts discuss experiences with public shutdowns in the opening months of the COVID-19 pandemic in Philadelphia. Content dwells on governmental interventions and the status of public services than the etiology or effects of COVID-19. |
| “Official Greater Philadelphia Area Crime Thread” (Advameg, Inc, 2013) | crime | 2,402 | 2013/04/11 – 2020/01/15 | Posts discuss and share information about crime in Philadelphia. |
| Official Philadelphia Metro Crime Thread (York, Chester: apartment complexes, houses, unemployment) (Advameg, Inc, 2012a) | crime | 1,284 | 2012/01/12 – 2013/03/11 | Posts discuss and share information about crime in the Philadelphia metro area. |
| Philadelphia 2035 (Houston: foreclosure, neighborhoods, wage) (Advameg, Inc. 2011) | plan | 6,796 | 2011/06/14 – 2020/01/14 | Posts discussing Philadelphia’s 2035 civic renovation plan authored by the Philadelphia City Planning Commission (2023). |
| Retail coming to Philadelphia (Penn, Burlington: real estate, house, buying) (Advameg, Inc. 2012b) | retail | 4,045 | 2012/11/27 – 2020/01/20 | Posts discuss new retail business developments in Philadelphia. |
Table 2
Tabular data model for City-Data.com forum posts.
| POST_ID | POST_BODY | POST | DATETIME | QUOTE_ID | QUOTE_BODY | QUOTE | FORUM |
|---|---|---|---|---|---|---|---|
| Numerical post identifier | Post HTML | Post text | YYYY-MM-DD HH:MM:SS | Numerical post identifier | Quoted reply HTML | Quoted reply text. | forum title label |
Table 3
SE-Topics Coherence and Diversity Scores.
| TOPIC MODEL TYPE | UMASS | PUW | JD | WE-CD |
|---|---|---|---|---|
| SE-Topics post | –4.10 | 0.74 | 0.85 | 0.06 |
| SE-Topics guided topic titles | –2.76 | 0.54 | 0.63 | 0.06 |
| SE-Topics guided initial posts | –2.56 | 0.52 | 0.62 | 0.07 |
| SE-Topics guided high degree | –2.53 | 0.54 | 0.63 | 0.06 |
| SE-Topics threads | –5.46 | 0.74 | 0.86 | 0.12 |
| SE-Topics guided threads topic titles | –4.84 | 0.69 | 0.79 | 0.06 |
| SE-Topics guided threads initial posts | –4.57 | 0.67 | 0.78 | 0.21 |
| SE-Topics guided threads high degree | –4.26 | 0.67 | 0.78 | 0.21 |
| MEAN | –3.88 | 0.63 | 0.74 | 0.11 |
| MAX | –2.53 | 0.74 | 0.85 | 0.21 |
| Q3 | –2.66 | 0.71 | 0.82 | 0.17 |
| MEDIAN | –4.18 | 0.67 | 0.78 | 0.07 |
| Q1 | –4.70 | 0.54 | 0.62 | 0.06 |
| MIN | –5.46 | 0.52 | 0.62 | 0.06 |
Table 4
LDA Topic Modeling Coherence and Diversity Scores.
| TOPIC MODEL TYPE | UMASS | PUW | JD | WE-CD |
|---|---|---|---|---|
| LDA posts | –2.59 | 1.000 | 0.95 | 0. 30 |
| guided LDA (topic titles) | –2.63 | 1.000 | 0.95 | 0. 27 |
| guided LDA (high degree) | –2.10 | 0.950 | 0.95 | 0. 29 |
| guided LDA (initial posts) | –2.36 | 0.975 | 0.96 | 0. 26 |
| LDA threads | –2.10 | 0.950 | 0.95 | 0. 30 |
| LDA threads (high degree) | -2.15 | 0.950 | 0.95 | 0. 31 |
| LDA threads (topic titles) | –2.32 | 0.950 | 0.95 | 0. 30 |
| LDA threads (initial post) | –2.22 | 0.900 | 0.94 | 0. 32 |
| MEAN | –2.31 | 0.95 | 0.95 | 0.29 |
| MAX | –2.10 | 1.00 | 0.96 | 0.32 |
| Q3 | –2.12 | 0.98 | 0.95 | 0.30 |
| MEDIAN | –2.27 | 0.95 | 0.95 | 0.30 |
| Q1 | –2.47 | NaN | 0.95 | 0.28 |
| MIN | –2.63 | 0.9 | 0.94 | 0.26 |

Figure 1
Boxplots of SE-Topics and LDA Coherence Scores.

Figure 2
Boxplots of SE-Topics and LDA Diversity Scores.
Table 5
SE-Topics Guided Post (high degree nodes).
| TOPIC | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | people | white | crime | black | time | year | murder | think | city | police |
| 1 | city | philly | philadelphia | people | street | area | year | think | center | neighborhood |
| 2 | building | city | street | project | tower | think | center | market | development | broad |
| 3 | store | retail | city | mall | retailer | market | walnut | center | think | location |
Table 6
LDA (High Degree) Topic Model.
| TOPIC | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | money | state | local | people | issue | care | neighborhood | income | help | white |
| 1 | store | retail | mall | shopping | retailer | location | center | shop | gallery | also |
| 2 | city | think | philadelphia | philly | people | year | time | even | area | much |
| 3 | street | market | building | walnut | space | chestnut | center | east | block | south |
Table 7
Initial post ids per City-Data.com Corpus forum.
| FORUM | POST ID |
|---|---|
| coronavirus | 758124 |
| crime | 908813 |
| metro | 251839 |
| plan | 958364 |
| retail | 710692 |
Table 8
High degree City-Data.com Corpus forum posts used for guided topic modeling.
| FORUM | POST ID | DEGREE |
|---|---|---|
| coronavirus | 57681616 | 5 |
| crime | 50175634 | 5 |
| metro | 22518391 | 5 |
| plan | 38332691 | 5 |
| retail | 38055575 | 5 |
Table 9
BERTopic Coherence and Diversity Scores for City-Data.com Corpus.
| TOPIC MODEL TYPE | UMASS | PUW | JD | WE-CD |
|---|---|---|---|---|
| BERTopic posts | –5.12 | 0.52 | 0.78 | 0.12 |
| BERTopic threads | –5.59 | 0.49 | 0.79 | 0. 28 |
Table 10
BERTopic Post Topic Model. Note that Topic –1 indicates outliers.
| TOPIC | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|---|
| –1 | city | like | people | just | philly | philadelphia | think | dont | street | center |
| 0 | city | like | think | new | just | people | building | philadelphia | dont | philly |
| 1 | white | people | city | dont | like | crime | im | year | just | murders |
| 2 | like | inga | news | just | dont | people | article | think | does | thread |
| 3 | bau | hello | nasty | bart | fancy | update | haha | awful | ok | finally |
Table 11
BERTopic Thread Level Topic Model. Topic –1 indicates outliers.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| –1 | city | like | people | philly | just | think | dont | philadelphia | new | street |
| 0 | city | like | new | store | think | just | philadelphia | retail | stores | people |
| 1 | people | crime | city | dont | just | like | white | im | know | year |
| 2 | hmmm | hello | fancy | update | haha | |||||
| 3 | inga | like | just | news | article | dont | read | writing | people | does |
