
Figure 1
An example of question data (body text) and metadata (marked as XML elements).

Figure 2
Development of member population.

Figure 3
Geographic distribution.

Figure 4
Working sectors (basic assumptions: students are attributed to academia and European Union institutions to international organisations).

Figure 5
Academic background.

Figure 6
Modified processing steps and tools.
Table 1
Sampling overview.
| Q/Q+AKE (PERIOD) | PERIOD FROM-TILL (MONTH/YEAR) | TOTAL NUMBER OF Q (N) | SAMPLE SIZE (S) | PERCENTAGE OF THE TOTAL NUMBER OF Q THAT THE SAMPLE REPRESENTS (%) |
|---|---|---|---|---|
| IZ’ (17th) | 10/2015–6/2019 | 32,910 | 500 | 1.5 |
| ΙΣT’ (16th) | 2/2015–8/2015 | 4,605 | 500 | 10.9 |
| IE’ (15th) | 6/2012–12/2014 | 27,377 | 500 | 1.8 |
| ΙΓ’ (13th) | 10/2009–4/2012 | 35,103 | 500 | 1.4 |
| Total | 10/2009–6/2019 | 99,995 | 2,000 | 2.0 |
Table 2
The corpus size and some basic stylometric indices.
| TEXT FILE(S) | TOKENS | TYPES | STANDARDISED TTR | MEAN WORD LENGTH (IN CHARACTERS) |
|---|---|---|---|---|
| ΙZ sample data final.txt | 162,874 | 21,409 | 50.10 | 5.71 |
| ΙΣT sample text final.txt | 181,147 | 21,668 | 49.08 | 5.67 |
| ΙE sample final.txt | 173,741 | 21,378 | 49.35 | 5.65 |
| ΙΓ sample final.txt | 121,103 | 16,238 | 47.16 | 5.65 |
| All four periods | 638,865 | 43,025 | 49.05 | 5.67 |
Table 4
Corpus attributes.
| ATTRIBUTE | VALUE |
|---|---|
| Text format | Plain text (txt) |
| Encoding | UTF-8 |
| Data format |
|
| Creation date | April–June 2019 |
| Publication date | 10 May 2021 |
| Language | Greek |
| Licence | CC BY-NC 2.0 |
| Repository | Zenodo: https://zenodo.org/record/4748989 |
| DOI | 10.5281/zenodo.4748989 |
