Sources and patterns of uncertainty in construction MSMEs: A machine learning study in southwestern Colombia

Cristian David Tobar Montilla; Mariela Muñoz-Añasco; Adriana M. Nieto-Muñoz; Elvia Ruiz-Beltran

doi:10.2478/otmcj-2026-0005

Introduction

Uncertainty remains a persistent challenge in construction project management (PM), particularly in contexts characterised by complex operational, environmental and institutional conditions. In this context, uncertainty refers to the perceived unpredictability of key project dimensions for construction project planning, such as project completion time, overall cost and resource stability. It includes internal factors (coordination and estimation errors from PM teams) and external ones (climate variability, market dynamics and sociopolitical conditions) that may disrupt construction workflows and compromise project outcomes (Hazr and Ulusoy 2020). For micro, small and medium-sized enterprises (MSMEs), these conditions are compounded by limited financial resources, low levels of digitalisation and restricted institutional support, factors that directly affect accurate cost estimation, scheduling and effective project execution (Mahboob et al. 2024; Chen et al. 2025).

Recent studies on the identification of primary factors of uncertainty in construction projects have predominantly focused on large-scale infrastructure and risk frameworks based on static assessment tools. For example, studies by Ali et al. (2018) and Castañeda et al. (2025) have identified uncertainty and delay factors using expert-based methods, such as relative importance index (RII) and fuzzy analytic hierarchy process (FAHP). In contrast, studies by Shabani et al. (2023) and Erol et al. (2022) emphasised the relevance of linking uncertainty to strategic decision-makers or project complexity. Additionally, frameworks, such as Decision Making Trial and Evaluation Laboratory (DEMATEL), analytic network process (ANP), structural modelling and structured prioritisation approaches, have been employed to explore cause-effect relationships and rank risk drivers. However, these approaches are largely based on public or high-capacity projects and rely on expert judgement, limiting their applicability to smaller firms operating in data-scarce, volatile and under-institutionalised regions.

Although uncertainty in large-scale construction projects has been extensively analysed, the specific challenges experienced by MSMEs operating under fragile socio-institutional conditions remain insufficiently understood. In this context, Ulupui et al. (2024) provided the only study that explores internal and external risk factors in Indonesian MSMEs. However, the applicability of their findings to the construction industry is limited since the authors do not specify the operational sector of these firms. Therefore, there is still a lack of knowledge about the perception, identification and influence of internal sources of uncertainty (e.g., organisational dynamics and activity durations) and external sources (e.g., regional logistics and environmental instability) on construction projects developed by MSMEs of this sector. Furthermore, it is worth noting that existing studies on data-driven decision support models for uncertainty management often have a limited scope. For instance, some models focus exclusively on specific sources of uncertainty, such as safety issues on construction sites (Forteza et al. 2023), and are not readily applicable to firms with restricted data availability, as is typically the case with most MSMEs. Southwestern Colombia exemplifies a region under fragile socio-institutional conditions. This region is marked by logistical constraints, sociopolitical volatility and climatic unpredictability, which exacerbate uncertainty in construction operations. Furthermore, the scarcity of reliable data and regional challenges hinder the study of these dynamics. Consequently, most approaches to identifying risk factors in construction projects rely on expert assessments, as seen in the works of Ovalle et al. (2024) and Castañeda et al. (2025).

In this study, we addressed these gaps by applying machine learning techniques, specifically random forests (RFs) and classification trees (CTs), to identify patterns of perceived uncertainty based on company characteristics and operational signals. These methods are particularly suited to low-data and high-noise environments, enabling the detection of non-linear relationships and stable decision rules (CAMACOL and SENA 2021). The methodology is based on the theory of finite population sampling and leverages the robustness of ensemble learning when applied to small datasets. As noted in Luo et al. (2025) and Luan et al. (2020), such models are recommended when data acquisition is constrained, although exploratory insights are needed to guide practical decision-making and policy development.

Our empirical analysis draws on survey data from 25 construction MSMEs affiliated with the Colombian Chamber of Construction (CAMACOL). Despite the modest sample size, the survey captures a representative portion (54.23%) of the local MSME population. We operationalised 10 sources of internal and external uncertainty, each with associated factors and early warning signals, and examined both perceived frequency and magnitude using Likert-scale items.

It is important to note that our framework addresses uncertainty at a strategic and managerial level, not at an operational level. This distinction helps position our contributions within the field of project uncertainty management.

1.1

Contributions of our approach

This study makes several contributions to the understanding and management of uncertainty in MSMEs. First, it proposes a validated framework for assessing internal and external uncertainty in MSMEs based on frequency, magnitude and contextual signals. Second, it applies bootstrapped machine learning models to identify the most influential predictors and generate stable classification rules associated with high perceived uncertainty. Third, it demonstrates how firm-level features such as operational maturity, geographic presence and diversification relate to perceived uncertainty under real-world constraints.

In practical terms, this study delivers interpretable, data-informed tools that construction MSMEs can use to anticipate high-uncertainty scenarios and adjust planning strategies. It also offers a replicable approach that public agencies and development organisations can adapt to support resilient decision-making among small-scale builders in fragile contexts.

Materials and methods

2.1

Uncertainty source classification

Construction projects are dynamic undertakings that are influenced by internal and external conditions throughout all stages of the project lifecycle. Uncertainty arises from multiple sources that can affect the achievement of project goals. Internally generated uncertainties are related to the organisation’s systems, project resources and decision-making processes. These factors include organisational dynamics, activity durations, resource utilisation, changes in requirements, quality issues and resource availability. These sources are considered manageable by the organisation. In contrast, externally generated uncertainties arise from project circumstances that are beyond the organisation’s control. These include logistical aspects, environmental variability, sociopolitical factors, market conditions and technological changes (Hazr and Ulusoy 2020).

Categorising uncertainties from their source is important to design focused mitigation strategies. In this study, we adopted the uncertainty classification framework proposed by Hazr and Ulusoy (2020), which categorises uncertainties into internal and external categories, along with their respective sources. This structured taxonomy (Figure 1) guided the design of the survey instrument and the analytical strategy. Furthermore, the study conducted by Kuchta and Zabor (2022) inspired our approach to identify the frequency and magnitude of each uncertainty source. It is important to note that positive or negative changes from a source of uncertainty may not occur immediately, but are often preceded by signals, such as social conditions at the construction site or changes in client requirements. Therefore, recognising these early indicators allows for better-informed and more proactive project planning.

2.2

Survey design and variables

The survey instrument was designed in three sections. The first captured general company characteristics, the second assessed the perception of frequency and magnitude of uncertainty sources and the third gathered information on the perceived signals with each uncertainty.

2.2.1

Company characteristics

Company-specific variables aimed to capture key structural and operational differences across MSMEs. These included Geographical origin, Operational reach, Experience, Company size and Portfolio of economic activities.

2.2.2

Uncertainty sources and signals

The survey questions in this section were designed to assess participants’ self-reported perceptions concerning the frequency and magnitude of the uncertainty source. Both frequency and magnitude were measured using a 6-point and a 4-point Likert scale, respectively. For instance, a question about the frequency of expert use was: ‘How often are experts consulted for decision-making?’ (1 = ‘Never,’ 6 = ‘Always’). Concerning the magnitude of uncertainty, the question was: ‘How significantly does the involvement of experts increase throughout the project?’ (1 = ‘Very small,’ 4 = ‘Significant’). To acknowledge signals of uncertainty, a multiple-choice option was used for each source of internal and external uncertainty.

Lastly, regarding uncertainty signals, they were defined based on insights from the literature review. To address this aspect, the participants were asked, for example: ‘In your opinion, what is the most common cause of uncertainty due to expert use?’ (a) subjective information from the expert, (b) linguistic variability of the expert, (c) variability of data among a set of experts and (d) performance in resolution by experts.

In this sense, a total of 51 questions were developed based on the structure above to encompass all categories of uncertainty.

2.3

Survey application

The survey was administered to construction companies affiliated with the CAMACOL across the southwestern states of Cauca, Nariño, Huila, Putumayo and Valle del Cauca. CAMACOL acts as a central industry association that aggregates regional construction companies and ensures a minimum level of operational quality among its members.

To be affiliated with CAMACOL, companies must meet the following eligibility criteria: (i) a minimum of 1 year of legal operation demonstrated by active registration in the respective state Chamber of Commerce; (ii) verified business activity through documentation of ongoing of completed projects, services, or products offered; (iii) demonstrable technical capacity, including internal engineering and management teams and qualified external collaborators in structural design, geotechnical studies, project oversight and supervision and (iv) financial transparency through income statements, balance sheets, or income certifications provided by their accounting departments (CAMACOL 2022).

This ensured that the surveyed firms had relevant construction experience, fulfilled basic performance standards and had operational profitability within the industry. The online survey was distributed in April 2024 via institutional email contacts publicly listed in CAMA-COL’s online directories. About 59 affiliated companies, a total of 32 valid responses were received, resulting in a 54.23% response rate, which is considered acceptable for studies involving organisation representatives in emerging economies. This aligns with recent research in the Colombian context that employed a similar sample size to prioritise factors in construction projects by Ovalle et al. (2024).

2.4

Data processing and modelling

This study followed a structured modelling pipeline (see Figure 2) based on the CRISP-DM methodology (Schröer et al. 2021). The pipeline included five main stages: data preparation and cleaning, target variable construction, synthetic data augmentation, bootstrapped feature importance analysis and decision rule extraction and visualisation of CT and Ranking of feature importances. Finally, the implementation of data processing and machine learning techniques was implemented using the scikit-learn library in Python.

2.4.1

Data preparation and cleaning

Initial preprocessing excluded five respondents with fewer than 6 months of operation, to ensure maturity comparability (>1 year of operation) across firms. This left a final sample of 25 companies. Categorical variables (e.g., state of origin, economic activity) were converted to binary format using one-hot encoding, and all numeric variables were standardised to eliminate scale-related bias in model training.

For each uncertainty source, Likert-scale responses on frequency and magnitude were averaged. For instance, the total perceived frequency (or magnitude) of organisational uncertainty is the average of the uncertainty frequency related to the construction project, ambiguity in selection processes, expert use and decision makers’ attitudes. This process applies similarly for logistical, environmental, sociopolitical and market uncertainties, resulting in aggregate frequency and magnitude scores for model training.

2.4.1.1

Target variable construction

To train the classification models, we transformed the original 4-point Likert scale measuring uncertainty magnitude (1 = ‘Very small’ to 4 = ‘Significant’) into a binary classification label. Specifically, responses with an average magnitude score of 2 or lower were coded as ‘0 = Lower increase in uncertainty’ while responses with a score above 2 were labelled as ‘1 = Higher increase in uncertainty’.

For example, if a company rated the four items related to organisational uncertainty with scores of 2, 2, 3 and 3, the average magnitude would be 2.5. As these values exceed the threshold of 2, this company would be labelled as 1 (Higher increase in uncertainty) for this uncertainty source.

2.4.1.2

Synthetic data augmentation for small-sample modelling

Given the relatively small number of observations in our original dataset (n = 25), we implemented a data augmentation strategy based on resampling with replacement, as seen in Wang et al. (2022), to reach a reference sample size and improve the robustness of the machine-learning models without introducing class imbalance.

We preserved the original class distribution in the augmented dataset. The target number of augmented observations was established using the finite population sample size formula with a 95% confidence level, a 5% margin of error and an assumed maximum variability (p = 0.5) (Cochran 1977). With a reference population of 141 registered construction companies in the region (CAMACOL 2022), this resulted in a recommended sample size of 103 observations.

The class proportions from the original dataset were used to guide the augmentation. Specifically, if the proportion of class 0 was p₀ and class 1 was p₁, the augmented dataset was constructed to have approximately p₀ × 103 and p₁ × 103 observations, respectively. Each class subset was resampled with replacement until reaching its corresponding target size.

To validate the representativeness of the resampled dataset, we performed a non-parametric hypothesis test. Specifically, the Mann–Whitney U Test was applied to each input feature, comparing the distributions of the original and augmented samples within each class. Two tests were performed per variable: (i) original vs. augmented for class 0 and (ii) original vs. augmented for class 1. The null hypothesis stated that both groups came from the same distribution. The significance threshold was set at α = 0.05.

Table 1 shows that for all input variables, p-values exceeded 0.05, indicating no statistically significant differences between the original and augmented distributions within each class. These findings support the validity of the data augmentation strategy.

Tab. 1:

Validity of controlled data expansion strategy across internal and external sources of uncertainty.

Uncertainty source	Lower uncertainty class			Higher uncertainty class			Total observations
Uncertainty source	Original sample number	Augmented sample number	Lowest p-value across all variables	Original sample number	Augmented sample number	Lowest p-value across all variables	Total observations
Organisational	2	8	0.62	23	95	0.45	103
Activity durations	5	21	0.29	20	82	0.68
Resource use	7	29	0.69	18	74	0.52
Requirement changes and quality issues	13	54	0.54	12	49	0.56
Resource availability	8	33	0.47	17	70	0.40
Logistics	2	8	0.62	23	95	0.29
Environmental	4	16	0.32	21	87	0.52
Sociopolitical	5	21	0.29	20	82	0.47
Market	1	4	1.00	24	99	0.56
Technological	4	16	0.37	21	87	0.27

2.4.2

Bootstrap-enhanced feature and rule stability modelling

To enhance the reliability of both feature importance ranking and decision rules in our classification models, we implemented a bootstrap-based modelling approach with 1,000 iterations. This allowed us to assess the stability of results under repeated random sampling and to reduce biases introduced by the small dataset.

2.4.2.1

Feature importance stability with RF

We trained 1,000 RF models, each on resampled versions of the original dataset. Each RF classifier was applied to determine the relative importance of input variables, including company characteristics and reported uncertainty signals. The importance quantifies the contribution to the accuracy of the classification task (Megantara and Ahmad 2020). RF was selected for its ability to model complex, non-linear interactions and its effectiveness with relatively small datasets since RF is a collection of CTs built from the training dataset. Its classification output is based on the majority of results from the CTs created within the RF (Breiman 2001).

For every iteration, we computed the feature importances (mean decrease in impurity [MDI]), recorded their frequency (i.e., how often a feature had non-zero importance) and calculated their mean and standard deviation across iterations.

This resulted in a robust estimation of the most influential features, represented by:

Mean importance: Average contribution to node splitting across trees.
Frequency: Number of bootstrap iterations in which the feature was considered relevant (importance > 0)
Standard deviation: Variation in importance across models, indicating confidence in the ranking.

These aggregated results were visualised using horizontal bar plots with error bars and a colour gradient that reflected the bootstrap frequency (see Figures 3 and 5). This enabled the identification of both statistically and practically stable predictors for each uncertainty source.

2.4.2.2

Identification of stable classification rules via decision trees

In parallel, CTs were used to create interpretable decision rules for classifying companies that report a higher increase in each source of uncertainty. This machine learning framework provides a valuable way to visualise how decisions are derived from a dataset. This feature is essential as it improves project managers’ understanding of each step in the decision-making process.

Thus, we generated 1,000 bootstrapped CTs, applying cost complexity pruning with cross-validation in each iteration. For every tree, the best-performing alpha was selected to reduce overfitting. Trees were translated into decision rule sets (text-based structures), and the frequency of each unique rule combination was recorded.

The most recurrent tree structure across all iterations was selected as the final model (see Figures 4 and 6). This approach allowed us to:

Identify repeatable decision logic across resampling.
Provide managers with interpretable and robust rules for classifying uncertainty levels.

Results

3.1

Company characteristics and descriptive insights

As seen in Table 2, the surveyed construction firms, all located in southwestern Colombia, exhibit a predominance of microenterprises (92%) with an average operational age of 92.3 months (~7.7 years) (see Table 1). Geographically, most are based in Valle del Cauca (35%) and Cauca (35%), followed by Nariño (19%) and Huila (12%). Companies typically operate in at least one state and engage in an average of two ISIC-classified activities, most frequently civil engineering works (42%), residential buildings (35%) and public utility infrastructure (23%).

Tab. 2:

Descriptive statistics of the surveyed companies.

Variable meaning	Sub-variables	Mean	SD	Min	Max
Origin state of the company	Cauca	0.35	0.49	0	1
	Nariño	0.19	0.40	0	1
	Valle del Cauca	0.35	0.49	0	1
	Huila	0.12	0.33	0	1
Number of states where the company has projects	-	1.62	0.98	1	4
Number of months since the commercial registration date of the company	-	92.31	81.7	15	363
Size of the company	Micro	0.92	0.27	0	1
	Small	0.04	0.20	0	1
	Medium	0.04	0.20	0	1
Number of ISIC activities the company executes	-	2.00	0.94	1	4
Economic activities carried out by companies	Construction of residential buildings	0.35	0.49	0	1
	Construction of non-residential buildings	0.23	0.43	0	1
	Construction of roads and railways	0.12	0.33	0	1
	Construction of utility projects	0.23	0.43	0	1
	Construction of other civil engineering works	0.42	0.50	0	1
	Other specialised activities	0.27	0.45	0	1
	Real estate activities	0.04	0.20	0	1
	Architectural activities	0.15	0.37	0	1
	Technical consultancy	0.19	0.40	0	1

ISIC, international standard industrial classification of all economic activities; SD, standard deviation.

3.2

Perceived frequency and magnitude of uncertainty sources

Respondents reported higher levels of both frequency and impact across multiple uncertainty sources. According to Table 3, internally, the most frequently encountered issues included inherent project complexity (mean = 4.62), expert consultation (4.31) and activity duration deviations (4.12), highlighting the prevalence of operational and managerial challenges. Externally, market conditions (4.27) and inconsistent weather (4.00) were perceived as the most frequent disruptors.

Tab. 3:

Perceived frequency of the sources of uncertainty.

Source of uncertainty	Variable	Mean*	SD
Organisational	Inherent complexity of the construction project	4.62	1.30
	Ambiguity in selection criteria	3.85	1.59
	Experts’ consultation	4.31	1.26
	Risk-taking willingness of decision makers	3.96	1.18
Activity durations	Activity duration differing from actual duration	4.12	1.58
Resource use	Inaccurate resource estimation	3.81	1.52
Requirement changes and quality Issues	Changes in project requirements	3.77	1.27
Resource availability	Inflexible resource availability	3.54	1.36
Logistics	Safety issues	3.77	1.21
	Site access conditions	2.88	1.18
	Supply availability fluctuations	3.58	1.21
Environmental	Inconsistent weather	4.00	1.41
Environmental	Adverse geographic conditions	3.65	1.60
Sociopolitical	Policies and regulations	3.50	1.39
Sociopolitical	Social conditions	3.27	1.48
Market	Market conditions	4.27	1.56
Technological	Equipment reliability and construction methods	3.42	1.33

Likert scale from 1–6: 1 – never, 2 – very rarely, 3 – rarely, 4 – often, 5 – very often, 6 – always.

SD, standard deviation.

Magnitude ratings reflected similar patterns (see Table 4). Market conditions again topped the list (mean = 3.35), followed by safety risks (3.15), supply availability fluctuations (3.12) and inconsistent weather (3.08). Internally, organisational complexity (3.15) and resource estimation errors (2.92) were seen as key contributors to project instability.

Tab. 4:

Perceived magnitude of the sources of uncertainty (Likert scale: 1–4).

Source of uncertainty	Variable	Mean*	SD
Organisational	Inherent complexity of the construction project	3.15	0.78
	Ambiguity in selection criteria	2.77	0.95
	Experts’ consultation	2.69	0.88
	Risk-taking willingness of decision makers	2.81	1.06
Activity durations	Activity duration differing from actual duration	2.92	0.80
Resource use	Inaccurate resource estimation	2.92	1.06
Requirement changes and quality Issues	Changes in project requirements	2.38	1.02
Resource availability	Inflexible resource availability	2.85	0.97
Logistics	Safety issues	3.15	0.88
	Site access conditions	2.19	0.98
	Supply availability fluctuations	3.12	0.95
Environmental	Inconsistent weather	3.08	0.74
Environmental	Adverse geographic conditions	2.92	0.98
Sociopolitical	Policies and regulations	2.85	0.88
Sociopolitical	Social conditions	2.81	1.02
Market	Market conditions	3.35	0.75
Technological	Equipment reliability and construction methods	3.04	0.82

Likert scale 1–4: 1 – very small, 2 – minor, 3 – moderate, 4 – significant.

SD, standard deviation.

3.3

Internal uncertainty sources: Feature importance and classification rules

Random forest models revealed that months in service consistently emerged as the most influential predictor across internal uncertainty categories, suggesting that operational maturity increases a firm’s ability to recognise uncertainty. Feature importance and CT rules were computed using 1,000 bootstrapped models to ensure stability.

3.3.1

Organisational uncertainty

According to Figure 3a, the number of activities (MDI = 0.216) and subjective expert information (0.185) were the most influential. The most frequent classification path, appearing in 200 out of 1,000 trees (Figure 4a), indicated that 70.7% of companies with less than 327.5 months of operation and no reliance on subjective expert input were classified as experiencing higher uncertainty. The recurrence of this rule across bootstraps suggests that in MSMEs, the absence of structured mechanisms for expert knowledge integration contributes to persistent internal ambiguity, regardless of the firm’s maturity and portfolio diversity.

3.3.2

Activity duration uncertainty

As shown in Figure 3b, the most important predictor was months in service (0.388), followed by company location Origin: Valle del Cauca (0.187). The most frequent CT (230/1,000; Figure 4b) indicated that 50% of the companies operating outside Huila or Valle del Cauca were classified as experiencing higher uncertainty. This suggests that firms operating in other states like Cauca or Nariño may face greater challenges in scheduling due to less favourable logistical conditions and challenging climatic or topographic environments.

3.3.3

Resource use uncertainty

Months in service (0.341) and number of ISIC activities (0.157) were dominant features (Figure 3c). The most recurrent classification path (42/1,000; Figure 4c) classified 43.9% of firms with over 60 months of operation, less than 3.5 activities and headquarters outside as highly uncertain. This suggests that even firms with limited-service portfolios, often assumed to be simpler to manage, face persistent resource challenges, likely due to inflexible estimations of cost or cash flow.

3.3.4

Requirement changes and quality uncertainty

Figure 3d shows that months in service (0.379) and number of activities (0.224) were the most influential features. In the most frequent classification path (139/1,000; Figure 4d), 31.7% of companies with only one activity and no equipment changes were classified as experiencing higher uncertainty. This suggests that firms with limited operational scope and minimal design adjustments may still struggle with unstable requirements, potentially due to limited client engagement or insufficient technical capacity to adapt to the project’s evolving specifications.

3.3.5

Resource availability

In Figure 3e, months in service (0.315) and the number of states where the firms operate (0.175) were most influential. The classification rules observed in 35/1,000 trees (Figure 4e) revealed that 37.8% of companies with over 141 months of operation, more than 3.5 projects, and located outside Valle del Cauca experienced high uncertainty. Despite relatively low frequency, this rule suggests that regional expansion, even when backed by extensive experience, may amplify exposure to secure adequate resources, especially in areas where labour scarcity and fragmented access to renewable and non-renewable inputs remain constant constraints.

3.4

External uncertainty sources: Feature importance and classification rules

3.4.1

Logistic uncertainty

Material acquisition (0.212) and months in service (0.156) were the top features (Figure 5a). In the most frequent tree (238/1,000; Figure 6a), 57.3% of companies that did not report material acquisition or supply chain structure as uncertainty signals were still classified as experiencing higher uncertainty. This suggests the presence of latent logistical inefficiencies, such as inventory management and delays in resource availability, that go unreported yet affect project stability.

3.4.2

Environmental uncertainty

Figure 5b indicates that months in service (0.406) and number of activities (0.231) were most relevant. The dominant classification path (83/1,000; Figure 6b) classified 42.7% of companies with no more than 81 months of experience, limited diversification (≤2 activities) and who reported heavy rains as an uncertainty signal as experiencing higher uncertainty. This pattern suggests that early-stage firms with narrow operational scopes are especially vulnerable to climate-related disruptions, underscoring the need for adaptive strategies that consider regional rainfall intensity and topographic variability in the early phases of business consolidation.

3.4.3

Sociopolitical uncertainty

As shown in Figure 5c, months in service (0.202) and geographic spread (0.152) were the top features. In 71/1,000 trees (Figure 6c), 62.2% of firms with no more than two projects and no reported issues related to worker discontent or non-working days were classified as experiencing higher uncertainty. This indicates that sociopolitical uncertainty is not always linked to overt labour or institutional disruptions, but may instead reflect more latent or region-specific governance tensions that affect even firms operating under seemingly stable internal conditions.

3.4.4

Market uncertainty

Months in service (0.414) and geographic diversification (0.231) were the top predictors (Figure 5d). The most frequent classification path (626/1,000; Figure 6d) found that 82.9% of firms with more than 29 months of experience faced high market uncertainty. The high recurrence of this rule indicates that economic instability is systemic and increasingly recognised by experiencing firms.

3.4.5

Technological uncertainty

Figure 5e shows that the number of states where the company has projects (0.314) and the number of activities (0.234) were the strongest predictors. In 241/1,000 trees (Figure 6e), 74.4% of companies operating in up to three states and with two or fewer activities experienced high technological uncertainty. This pattern suggests that technological uncertainty is particularly salient among firms with limited territorial reach and operational diversification due to the inefficiency of resources and reduced access to technical assistance, digital infrastructure in less connected construction environments, such as Nariño and Huila.

Discussion

This study examines how different sources of uncertainty influence construction PM in MSMEs across southwestern Colombia. Using descriptive statistics, feature importance analysis and CT rules, we identified operational and contextual patterns shaping perceptions of uncertainty. The discussion covers four themes: experience and diversification, regional context, overlooked variables and systemic market uncertainty. These patterns are validated against the frequency and magnitude data from surveyed companies.

To support this interpretation, Table 5 summarises the most influential features and the most frequent classification rules that led to the identification of higher levels of uncertainty in each source. This table combines information from feature importance metrics (MDI) and decision tree thresholds to highlight which company characteristics, signals and regional attributes are associated with perceived uncertainty. By reviewing these statistical parameters across internal and external sources, the table offers a consolidated view of how factors such as operational experience, diversification, geographic origin and latent signals interact to shape a firm’s uncertainty exposure.

Tab. 5:

Summary of the most influential features and dominant classification rules associated with higher perceived uncertainty across domains.

Uncertainty source	Feature importance				CT rules for a higher level of uncertainty
Uncertainty source	Most important feature	MDI	Most important signal	MDI	Number of activities	Number of state projects	Months in service	Origin	Signal	Highest% companies classified in a leaf node (%)
Organisational	Number of activities	0.217	Subjective expert information	0.185		-	≤327	-	Subjective expert information	70.7
Activity durations	Months ¡n service	0.388	Leader decision timing	0.044	-	-	-	Outside of Valle del Cauca and Huila	-	50
Resource use	Months in service	0.341	Inflexible cost estimation	0.087	≤3	-	≥60	Outside of Cauca		43.9
Requirement changes and Quality issues	Months in service	0.379	Design changes	0.042	≥l	-	≥67 and ≤276	-	-	34.1
Resource availability	Months in service	0.315	Limited availability of capable workers in the area	0.066	≤3	-	≤I4I	Outside Valle del Cauca	-	37.8
Logistics	Material acquisition	0.212	Material acquisition	0.212	-	-			Other signals different to Material Acquisition and supply chain structure	57.3
Environmental	Months in service	0.406	Heavy rains	0.030	≤2	-	≤327	-	Heavy rains	42.7
Sociopolitical	Months in service	0.202	Worker social discontent	0.127	≤2	-		-	Other signals different to worker social discontent and non-working days granted	62.2
Market	Months in service	0.414	Supply prices	0.082	-	-	≥29	-	-	82.9
Technological	number of states where the company has projects	0.314	Renewable resource efficiency	0.048	≤2	≤3		-	-	74.4

Rules indicate conditions leading to the leaf node with the highest percentage of companies classified as experiencing high uncertainty.

CT, classification trees; NIDI, mean decrease in impurity.

In addition, although each uncertainty source was classified independently following the taxonomy proposed by Hazr and Ulusoy (2020), in real-world construction projects, these factors are interdependent, meaning that changes in one domain (e.g., internal uncertainty in resources) may propagate to others (e.g., schedule or cost). Future work could therefore approach this problem to capture these interdependencies more explicitly.

4.1

Operational maturity increases awareness but not immunity

Across nearly all sources of uncertainty, months in service emerged as the most influential variable (Table 5). However, experience did not reduce uncertainty. On the contrary, firms with over 60 months or 80 months of operation were frequently classified as highly uncertain, especially in the domains of market, resource use, environmental conditions and activity scheduling.

This suggests that operational maturity enhances firms’ ability to recognise and interpret uncertainty but does not necessarily shield them from its effects. Similar findings have been reported by Araque González et al. (2019), who emphasised the importance of structured organisational management, often lacking in younger firms. Martinsuo et al. (2024) further noted that organisational memory supports uncertainty awareness, even when mitigation capacity remains limited.

The descriptive data support this observation: for example, the companies reported higher magnitudes of uncertainty in organisational complexity (mean = 3.15) and resource use (2.92), which aligns with the classification models where months in service were a primary splitter in both domains. This indicates that seasoned companies are more attuned to instability signals but may still face structural constraints in mitigating them.

4.2

Narrow portfolios intensify vulnerability

Firms with limited service diversification or regional presence were more likely to experience high uncertainty, particularly in technological, organisational and environmental domains. While the average number of ISIC-classified activities per firm was only two, even this modest diversification appeared to increase complexity.

This is consistent with Khalife et al. (2024), who argued that broader portfolios increase exposure to stake-holder pressures, regulatory variability and coordination challenges. Li et al. (2021) similarly showed that multiactivity portfolios can lead to job burnout and diminished planning precision. In the Colombian context, Cuadros et al. (2019) highlighted how activity variability raises the likelihood of failing to meet contractual obligations.

Our results confirm that firms with only one activity were frequently classified as highly uncertain in domains, such as quality requirements. Meanwhile, firms with broader portfolios faced challenges in resource estimation.

In short, expanding services beyond a narrow operational core may increase uncertainty if firms lack the capacity to manage added complexity, especially among microenterprises.

It is essential to note that the number of activities, although used as a proxy for project complexity, does not necessarily capture the true structural complexity of the project. A smaller project with tightly coupled tasks and scarce resources may experience higher uncertainty than one with many loosely connected activities. Therefore, the interpretation of ‘number of activities’ in this study should be understood as an indicative, not definitive, measure of complexity.

4.3

Regional asymmetries shape exposure to uncertainty

Geographic origin was a decisive factor in several uncertainty domains. Firms located outside Valle del Cauca, especially in Cauca, Nariño and Huila, were more likely to report high uncertainty in resource availability and activity scheduling.

This pattern aligns with prior research on institutional fragmentation and labour shortages in these regions (Nepomuceno and Elafi 2024). Even experienced firms with broad project portfolios were affected, suggesting that regional constraints can override internal capacity. Environmental uncertainty followed a similar trend. Firms located in rain-prone areas, despite acknowledging these conditions in the survey, were still classified as having a high level of uncertainty. Prior work by Cardona-Almeida et al. (2022) and Charles (2024); Mattiace and Alberti (2024) confirms that current environmental assessments often fail to capture cumulative and residual effects that impact construction planning.

Overall, these findings show that even well-prepared firms are vulnerable to infrastructure deficits and climatic variability in less-developed regions, highlighting the role of territorial asymmetries in shaping uncertainty exposure.

4.4

Latent signals and unmeasured stressors

Interestingly, several firms were classified as highly uncertain even when they did not report typical warning signals. This was especially evident in the domains of logistics and sociopolitical uncertainty.

For example, 57.3% of firms that did not cite material acquisition issues or supply chain disruptions still showed high logistical uncertainty. Similarly, firms with no reported labour unrest or political conflict were frequently classified as uncertain in sociopolitical domains. These patterns suggest that latent stressors, such as sitelevel coordination problems, informal labour dynamics, or subtle governance failures, may be shaping uncertainty without being explicitly recognised.

This supports the argument that formal surveys often miss tacit or ambient uncertainty drivers, which machine learning models are better equipped to detect. In regions, such as Cauca, characterised by prolonged social conflicts (Charles 2024; Mattiace and Alberti 2024), unreported tensions may significantly impact project stability.

In essence, many MSMEs may underestimate or underreport the influence of embedded sociopolitical or logistical uncertainty, which nonetheless affects their operations in measurable ways.

4.5

Market as a structural source of uncertainty

Among all sources, market uncertainty stood out as systemic. Of note, 82.9% of firms with more than 29 months of experience were classified as highly uncertain in this domain. Inflation, fuel and material price fluctuations and unstable tax and credit policies were cited as persistent disruptors. These results align with Fernández (2022) and Soto-Ferrari and Chams-Anturi (2023), who highlighted how criminal extortion, macroeconomic volatility and weak financial systems shape market unpredictability in Colombia. Unlike other sources, this uncertainty appears to accumulate with experience, suggesting that prolonged exposure leads to more profound awareness of systemic fragility.

Consistently, market conditions had the highest mean frequency (4.27) and magnitude (3.35) among external uncertainties in the survey. In conclusion, market uncertainty operates as a background condition that even experienced firms cannot escape. Addressing it will require institutional reforms and macro-level interventions beyond the scope of individual project managers.

4.6

Comparison with previous approaches

To contextualise our methodological contribution, Table 6 summarises representative approaches from recent literature addressing the identification and prioritisation of uncertainty sources in construction projects, based on inputs from domain experts or organisational representatives.

Tab. 6:

Comparison with previous approaches to uncertainty assessment in construction projects and this study’s contribution.

Study	Approach/method	Project scale	Data nature	Limitation	Study contribution
Ali et al. (2018)	Expert-based RII	Public infrastructure	Five-item Likert scale from domain experts	Subjective weighting; limited empirical validation	Provides baseline prioritisation for public-sector risk budgeting; relies on expert weighting rather than empirical inference
Shabani et al. (2023)	Narrative search and semi-structured expert interviews	Public road projects	Expert-informed categorisation	Subjective categorisation; limited replicability across contexts	Enhanced understanding of contextual, operational and strategic uncertainty through expert narratives
Erol et al. (2022)	ANP model with a two-round Delphi process	Mega construction projects	Domain experts weighting	Subjective weighting and limited applicability to MSMEs	Risk quantification model for mega construction projects
Ulupui et al. (2024)	Partial Least Squares and RA for ARI	Multisector MSMEs from Indonesia	Five-item Likert scale from MSME representatives	Applied to MSMEs, but not the construction industry explicitly	Framework for quantifying the interactions of technological, organisational and environmental risk dimensions among MSMEs
Our approach	RF feature importance and CTs	Construction MSMEs from Colombia	Empirical survey data combined with class-preserving synthetic augmentation for small-sample modelling	Strategic-level focus; does not capture operational dynamics	First interpretable, machine-learning framework modelling ten internal and external uncertainty sources in construction MSMEs

ANP, analytic network process; ARI, adoption risk identification; CTs, classification trees; MSMEs, micro, small and medium-sized enterprises; RA, relative advantage; RF, random forest; RII, relative importance index.

The studies were analysed across five methodological and contextual dimensions: (1) the analytical approach or method used to identify and prioritize uncertainty sources, (2) the scale and type of projects examined (e.g., public, private and MSME), (3) the nature of the data employed (expert-based, qualitative or empirical survey data), (4) the main limitation acknowledged in each study and (5) the scope of their contribution.

Most previous studies have focused on large-scale public or infrastructure projects, primarily relying on expert judgement or qualitative assessments. These studies illustrate an evolution of uncertainty research, from subjective prioritisation methods towards increasingly data-driven frameworks. This study represents a pioneering and scalable approach that aligns more closely with the strategic realities of MSMEs. By applying interpretable machine learning, such as a ranking from RF feature importance function as well as the graphical basis of CTs, to direct survey data collected from company managers, the framework strengthens the empirical foundation for understanding how MSMEs perceive uncertainty, providing actionable insights for managers and policy-makers seeking to improve decision-making under volatile conditions.

Conclusions and practical implications

This study provides an integrated view of how internal and external sources of uncertainty are perceived by construction MSMEs in southwestern Colombia, a region marked by institutional fragility, logistical constraints and environmental variability. By combining survey-based evidence with interpretable machine learning techniques, RFs and CTs, we identified strategic and contextual patterns that shape perceived uncertainty exposure across 10 key domains.

A central finding is that experience enhances awareness of uncertainty rather than immunity to it. Firms with longer market presence or broader portfolios are more likely to experience uncertainty in resource availability, environmental uncertainty and technological capacity. This implies that maturity in MSMEs amplifies perception but cannot substitute for structural resilience. Additionally, the detection of firms, classified as highly uncertain despite not reporting typical warning signals, suggests the presence of latent stressors such as informal coordination, weak local governance, or infrastructure gaps that influence project stability.

In comparison with previous studies, this work marks a shift from expert-based and qualitative assessments towards data-driven and interpretable modelling. By applying machine learning to MSME-level data, it extends uncertainty research beyond large-scale projects and provides a replicable framework that complements traditional approaches while enhancing managerial decision-making.

From a managerial perspective, the proposed frame-work enables construction MSMEs to transform perceptions of uncertainty into actionable insights. Feature importance rankings and rule-based classifications can serve as practical instruments for internal diagnostics, early warnings and strategic planning. Managers can use these insights to anticipate disruptions and adapt planning practices to project conditions.

5.1

Limitations and future directions

Its regional scope and small sample size limit this study. The sample was restricted to 25 firms, which may affect generalisability, despite using bootstrapped models and data augmentation. The geographic scope was limited to four departments in southwestern Colombia; results may differ in other institutional or climatic contexts. Additionally, the survey captured perceptions at a single point in time. Future research should investigate longitudinal designs to examine how uncertainty evolves across various stages of the project. Self-reported data may also introduce bias, underscoring the need to integrate objective project records or sensor data where feasible. In addition, this study focused on strategic-level uncertainty perception rather than operational uncertainty management. Future work should extend the framework to operational construction datasets.

From a methodological perspective, this study employed RFs and CTs due to their interpretability and suitability for small datasets. Future work could compare these results with other machine learning approaches, such as XGBoost or rule-based ensembles, to assess model robustness and scalability. Future work could also more explicitly address the problem of capturing the interdependencies between internal and external uncertainty sources.

Sources and patterns of uncertainty in construction MSMEs: A machine learning study in southwestern Colombia

Full Article

Paradigm

My account