Spatial heterogeneity of population distribution in Poland: a geographically weighted regression approach to Zipf's law

Andrzej Cieślik

doi:10.2478/mgrsd-2025-0047

Introduction

With the collapse of communism in the late 1980s, Poland reoriented its international economic and political relations from the East to the West and started its integration with the European Union (EU). The most important part of the liberalisation process was the so-called Europe Agreement signed on 16 December 1991, which entered into force on 1 February 1994. The commercial part of the Europe Agreement, which allowed for a limited trade liberalisation, was implemented on 1 March 1992 under the so-called Interim Agreement. Gradual dismantling of barriers to international trade and foreign direct investment (FDI) was an important part of the integration process with the West, which culminated in Poland's accession to the EU on 1 May 2004. Therefore, international mobility of goods, services, and factors of production between Poland and the other EU member states substantially increased (Cieślik 2017, 2018; Cieślik & Tarsalewska 2023).

Trade liberalisation with the EU was complemented by the agreement with the European Free Trade Association (EFTA) member countries. This agreement was modelled on the Europe Agreement and was signed on December 10, 1992. It entered into force on November 15, 1993. In addition to economic integration with West Poland, the Czech and Slovak Federative Republic and Hungary, who signed on December 21, 1992. This agreement established the Central European Free Trade Area (CEFTA), which entered into force on March 1, 1993. The main aim of the founding countries was their full integration and membership in the EU. Therefore, from the beginning, trade integration under the CEFTA agreement was treated only as a supplement to the Europe Agreement covering countries in the region that were not members of the EU at that time (Cieślik 2007; Cieślik & Hagemejer 2011).

However, while there was a removal of barriers to trade in goods and services and mobility of factors of production with the West and the South, Poland's eastern frontiers remained relatively closed. This asymmetry in trade liberalisation and inflows of FDI from the West might have affected the distribution of economic activity and population within the country (Cieślik 2005a, b; 2013; Cieślik & Ryan 2005; Chidlow et al. 2009). People tend to migrate to the largest Polish cities. From the point of view of innovativeness and productivity, the regions with the largest cities are in the most favourable situation (Arendt & Grabowski 2019). In addition to inflows of people to the largest cities, the higher attractiveness of powiats located in the western part of Poland compared to the powiats located in the eastern part of Poland is reflected by the higher number of foreign workers (Górny & Śleszyński 2019).

Although international economic integration is often perceived as being beneficial at the country level, an uneven benefit distribution across different areas within integrating countries may occur (Krugman 1996; Baldwin et al. 2011). As a result, Poland may experience increasing spatial inequalities that would constitute a cohesion policy problem that absorbs large amounts of EU structural funds (Cieślik & Rokicki 2017; Misiak 2022; Cieślik & Misiak 2022, 2024; Kijek & Jóźwik 2023). Therefore, the main aim of this study is to examine the spatial heterogeneity of Zipf's law and explore its evolution over time using a geographically weighted regression (GWR) on 380 Polish powiats for 1995, 2004, and 2020.

Our contribution to the literature is two-fold. First, we analysed the temporal dimension of Zipf's law, which is a field in which there has been relatively little research. We have also explored this in the context of the new member states (NMS) after they acceded to the EU. Second, the prior literature generally tends not to consider the spatial heterogeneity issue. To date, most empirical studies have estimated unique Pareto exponents by assuming that they are spatially invariant. However, this is unlikely to be the case because the locations may differ from each other in terms of their economic structures, industrial variety, and other characteristics. Therefore, different values of the Pareto exponent may be observed in different areas. The use of GWR allows us to address this issue by estimating local Pareto exponents in a spatially varying manner.

The paper is organised as follows. Next, we have summarised the relevant literature. Then, we described the dataset and methods. Subsequently, we reported and interpreted our empirical results. Finally, we have summarised the main findings, discussed the limitations of the approach used, and concluded with policy recommendations and directions for further studies.

Literature Review

Zipf's law is often used as an analytical tool to investigate the Pareto optimality of population distributions (Auerbach 1913; Singer 1936; Zipf 1949; Eaton & Eckstein 1997; Gabaix 1999; Eeckhout 2004; Arshad et al. 2018). Deviations from this optimality may bring excessive agglomeration of population in certain areas within a country (Mutlu 1989; Bertinelli & Trobl 2007; Moomaw & Alvosabi 2007). In turn, this might lead to various economic and demographic problems, such as unequal distribution of income, energy shortages, pollution, and inadequacy of social and physical infrastructure facilities (Henderson 2003; Nitsch 2005, 2006; Deliktas et al. 2013; Lu & Wan 2014; Arshad et al. 2018; Duran & Cieślik 2021). The agglomeration processes might be self-reinforcing and lead to the formation of centre-periphery patterns and industry clusters in different parts of a country (Krugman 1991; Fujita & Thisse 1996; Ellison et al. 2010).

There is extensive literature investigating the applicability of Zipf's law in the cross-country context (Rosen & Rosnick 1980; Soo 2005). Most studies adopt a linear rank-size rule to estimate the Pareto exponent that indicates whether the city sizes are optimally, that is, proportionally distributed (Auerbach 1913; Singer 1936; Zipf 1949; Eaton & Eckstein 1997; Gabaix 1999; Arshad et al. 2018). Although empirical findings are relatively diverse, there is a general tendency to reject Pareto optimality (Rosen & Resnick 1980; Gabaix & Ioannides 2004; Soo 2005; Duran & Cieślik 2021).

In one of the earliest studies, Rosen and Resnick (1980) estimated the Pareto exponent by ordinary least squares (OLS) for 44 countries and found that it was, on average, equal to 1.136. Soo (2005), who studied Zipf's law for 73 countries using both OLS and Hill (1975) estimators, found that Pareto exponents were on average equal to 1.183 (OLS) and 1.160 (Hill), respectively. The results in his study for Poland were 1.18 (OLS) and 1.09 (Hill), respectively. Therefore, these results suggest that populations of Polish cities were more evenly distributed than predicted by Zipf's law.

There are only a few studies, to the best of our knowledge, that are devoted specifically to the NMS after they accede to the EU. The exception for Poland includes the study by Cieślik and Teresiński (2016), who used the sample of 908 observations at the gmina level. ⁽¹⁾ They studied both 306 urban communities and municipal parts of all 908 communities. In all the estimated specifications, they found that Zipf's law was not supported as Pareto coefficients were bigger than unity. This meant that the populations were more evenly distributed than predicted by Zipf's law. However, their analysis was limited to one year (2011), and they did not study the evolution of the population distribution over time in the post-accession period.

The available empirical evidence shows that Zipf's law is generally not supported in many countries, including Poland, and prior studies do not yield clear-cut conclusions. This indicates a need for more extensive and detailed studies using alternative research methods and datasets. Therefore, in contrast to the earlier studies detailed in this paper, we have accounted for spatial heterogeneities and investigated the evolution of Zipf's law over time using the recent dataset at the powiat level that still has not been used in the prior literature on the spatial distribution of population in Poland.

Our approach differs from a typical application of Zipf's law, which makes use of city populations. Powiats can be compared to counties that may include cities and towns and their agricultural hinterlands. Hence, powiats being administrative regions, we might expect that the central government would divide the powiats so that they are more equal in size than the distribution of city sizes.

Data and Methods

The statistical data are obtained from Statistics Poland (2022) and cover 380 Polish powiats between 1995 and 2020. To study the evolution of the spatial distribution of population in Poland, the following three-step research methodology was used.

Descriptive analysis

Firstly, we started with a visual inspection of the data and plotted on maps the populations of powiats for the initial year of our sample (1995), when Poland became a member of the World Trade Organisation (WTO), the year in which Poland accessed the EU (2004), and the more recent year (2020). Next, to determine the degree of spatiality, we applied a Global Moran's I test in a natural logarithmic form for these three years. Global Moran's I statistics can provide information on whether the population is distributed in a spatially correlated manner or randomly (Moran, 1950). The results are shown on Moran scatterplots that illustrate the strength of spatiality by visualising the relationship between the population of powiats against the corresponding values of neighbours.

Zipf's Law under spatially invariant Pareto exponent

Subsequently, we estimated the baseline equation postulated by Zipf's law, assuming that the Pareto exponent is invariant. Zipf's law, known as the rank-size rule, is specified as follows: (1) $R_{i} = A P_{i}^{- β}$ {R_i} = AP_i^{ - \beta } where R denotes city ranks, A is a constant term, P represents population sizes, and β is the Pareto exponent that needs to be estimated. When β=1, the distribution of city sizes is optimal. In contrast, when β>1 or β<1, deviations from the optimality are present.

The estimation of Zipf's law relies on the following cross-sectional regressions derived from (1) by taking natural logs: (2) $ln (R_{i}) = \partial + β l n (P_{i}) + W ρ \ln (P_{i}) + θ, where : θ_{i} = W λ θ_{j}$ \ln ({R_i}) = \partial + \beta ln({P_i}) + W\rho \,\ln ({P_i}) + \theta ,\,\,\,\,{\rm{where}}:\,{\theta _i} = W\lambda {\theta _j} and W is the spatial weights matrix defined as a 380 × 380 inverse distance matrix.

The model simplifies to the standard OLS model when spatial parameters satisfy the following condition: ρ=λ=0. However, if ρ≠0 and λ=0, we obtain the Spatial Autoregressive Model (SAR) in which spatial relatedness is incorporated into the dependent variable. In contrast, if ρ=0 and λ≠0, the model becomes a Spatial Error Model (SEM) in which the spatial dependence is assumed across the residuals. In our analysis, we focused on the level and significance of β, which is assumed to be unique throughout the country and spatially homogenous (invariant) as shown in equation (2).

Zipf's law under spatially varying Pareto exponent

We used GWR to relax the homogeneity of the Pareto exponent and estimate powiat-specific level coefficients. In this way, it is possible to illustrate spatial heterogeneities and related geographical patterns (Fotheringham et al. 2002; Paez et al. 2011; Bergs 2021). Hence, equation (2) needs to be transformed in the following way: (3) $ln (r_{i}) = \partial + β_{i} ln (p_{i}) + e_{i}$ \ln ({r_i}) = \partial + {\beta _i}\ln ({p_i}) + {e_i} where β_i denotes the local degree of the Pareto exponent. These regressions are estimated for the three years to investigate the potential spatial effects of Poland's integration with the EU.

Results

Descriptive analysis

Figure 1 shows the plot of the powiat populations. Darker (lighter) areas represent higher (lower) populations. In all the years of our sampling, we observed no spatial pattern. Populations tended to be lower in the northeastern parts of the country. Highly populated powiats are scattered but mostly observed in the southern parts. This pattern does not change substantially between 1995 and 2020.

Next, Figure 2 shows the Moran scatterplots. The calculated values of Moran's I statistics are 0.121, 0.129, and 0.152 for 1995, 2004 and 2020, respectively. Hence, all the panels indicate a low level of positive but increasing spatial dependence over time.

Zipf's Law results under spatially invariant Pareto exponent

Our baseline estimation results are shown in Table 1. Focusing on estimations for 1995, the Pareto coefficients are significant at 1% and in the range between −1.68 (OLS, SAR) and −1.69 (SEM). For 2004, the corresponding coefficients are also significant at 1% and equal −1.65 (OLS, SAR) and −1.66 (SEM). For 2020, the coefficients remain significant at 1% and equal −1.60 for all regressions. Looking at spatial parameters, rho is negative but not significant, while lambda is positive and significant. This means that SEM outperforms SAR.

Table 1.

Zipf coefficient estimation

	1995			2004			2020
	OLS	SAR	SEM	OLS	SAR	SEM	OLS	SAR	SEM
Constant	23.94***	24.12***	24.03***	23.59***	23.75***	23.67***	23.00***	23.01***	23.03***
Ln(P)	−1.68***	−1.68***	−1.69***	−1.65***	−1.65***	−1.66***	−1.60***	−1.60***	−1.60***
Rho		−0.026			−0.022			−0.001
Lambda			0.18**			0.17**			0.16**
N	380	380	380	380	380	380	380	380	380
R-Squared	0.95	0.95	0.95	0.95	0.95	0.95	0.94	0.94	0.94

Source: own estimations based on data from: Statistic Poland 2022

The main results can be summarised as follows. First, Zipf's law generally fails in Poland at the powiat level, and Pareto optimality is not supported, which is in line with the results of prior studies. Second, the distribution of population deviates from optimality since all the estimated values are above unity. This suggests that populations are more evenly distributed compared to the optimal proportional distribution. Given the use of powiat-level data, the finding that the size distribution of powiats is more equal than the Zipf's law prediction is in line with the expectations. Third, beta coefficients tend to decline slightly over time but remain higher than unity. Therefore, these results suggest that spatial inequality and dependence in Poland have been increasing over time.

Zipf's law results under spatially varying Pareto exponent

Figure 3 shows the GWR results obtained assuming a spatially varying Pareto exponent. In 1995, the β coefficient ranged between −1.85 and −1.59, in 2004, it ranged between −1.81 and −1.56, and in 2020, it ranged between −1.75 and −1.51. These results show that there is a substantial degree of spatial heterogeneity in this coefficient, so the assumption that it is homogeneous is rejected.

Clear geographical patterns can be observed. Pareto coefficients are the highest in the southern powiats, where deviations from the Pareto optimality are the largest. The absolute values of the coefficient are the lowest in the northeastern powiats. Hence, there is considerable spatial heterogeneity across Polish powiats. This means that different areas show different degrees of deviation from the Pareto optimum. Northeastern powiats are the closest to optimality, whereas southern powiats are the most distant.

These results have key implications, and the results obtained under the assumption of constant β might be misleading. According to Table 1, the Pareto exponent for 1995 is around −1.68. This means that it is underestimated for southern powiats as GWR estimation yields −1.85. As for the evolution of results over time, the general picture does not change substantially, meaning that spatial inequalities seem to be increasing over time.

Conclusions

In this study, we investigated the spatial heterogeneity of Zipf's law and its tendency over time using a class of regressions (OLS, SAR, SEM and GWR) on 380 Polish powiats for three years, that is, 1995, 2004, and 2020. We reached two main conclusions. First, Zipf's law is rejected since populations are distributed more evenly compared to the optimal case. However, this result is not homogeneous across the country. The estimated coefficients vary considerably across different areas and exhibit substantial heterogeneity. Therefore, it is necessary to incorporate this heterogeneity in the Zipf's law analysis, as excluding it might lead to biased results and policy recommendations. Second, although the results are not that different over time, that is, the population changes slowly, there are still some changes. The results likely suggest that spatial inequality and dependence are slightly increasing. This means that regional disparities within Poland will continue to exist and attempts to reduce them will exert considerable pressure on both the national and the EU budgets. However, this issue requires additional tests in future studies.

However, our study has some limitations that need to be addressed in further research. Population data from the Statistics Poland database used in this study have some drawbacks. The population of the largest Polish cities might be underestimated, as Statistics Poland data does not consider foreign workers. Foreign workers and immigrants dominate in the largest cities and some smaller towns located in the western part of Poland (Górny & Śleszyński 2019). Regions and powiats located in the western part of Poland were characterised by a substantially faster decrease in the unemployment rate, a more rapid increase in employment and a considerably larger number of job offers compared to the eastern regions (Grabowski & Korczak 2023). Therefore, in future studies, it would be useful to estimate the spatial regression model in which the population is replaced by the employment or unemployment rate.

Our study offers several other potential directions for future research. One possible innovation could be the examination of structural breaks in Pareto exponents during the transition process. Looking from a similar perspective, since population sizes might be affected by migration waves and economic cycles, time series methods developed to handle temporary movements could be used to avoid possible inaccuracies. The effects of inflows of foreign workers to Polish regions could be studied. A dynamic analysis of Zipf's law should also be implemented for other NMS that so far have received relatively little attention in the literature on the effects of European integration.

There are several studies devoted to validating Zipf's law in Central and East European countries before their accession to the EU. However, summarizing these studies goes beyond the scope of this paper. Deichmann and Henderson (2000) analyzed five major agglomerations and twenty six municipalities for 6 specific years between 1950 and 1995. The estimated Pareto coefficient ranged between 0.74 (1950) and 0.9 (1995). This meant that the distribution of cities in Poland was more uneven than that predicted by Zipf's law. Jażdżewska (2017) used data for major Polish cities from six population censuses (1950, 1960, 1970, 1978, 1988 and 2002) to study the dynamics in changes in their ranks. She found that the largest changes in the ranks of the urban settlement network in Poland occurred in the first post-war decade. Josic and Bašić (2018) tested Zipf's law using data for settlements and cities in Croatia culled from the Census of Population Survey for the year 2011. Their results supported Zipf's law in Croatia for most of the settlement sizes.

Spatial heterogeneity of population distribution in Poland: a geographically weighted regression approach to Zipf's law

Full Article

Paradigm

My account