Expert-Based Risk Assessment for Flight Safety Using Kendall's W and Pearson's Chi-Square

Petuhovs, Igors

INTRODUCTION

Studies examining patterns and trends in aviation accidents (AA) and incidents indicate that in-flight technical malfunctions of aviation equipment occur about four times as frequently as errors committed by flight crews. Yet, when such incidents escalate into catastrophic events, human error is found to be roughly four times more likely to be the cause than technical failure. Most accidents and incidents in which an airworthy aircraft collides with the ground during controlled flight result primarily from violations of flight procedures and shortcomings in the professional performance of aviation personnel at various levels of the air transport system. This issue has become sufficiently serious and widespread that in various countries, airlines, along with international aviation organizations and professional associations, have intensified efforts to address it.

CURRENT APPROACH TO THE PROBLEM

The primary objective of flight safety (FS) management in air transport is to develop measures that prevent the ongoing trend toward hazardous situations in civil aviation by establishing a continuously operating FS monitoring and control system. This system should be based on the principles of international quality standards (ISO 9000) and the ICAO Safety Management System (SMS), particularly emphasizing the process-oriented approach to aviation enterprise operations. Such a system must be equipped with tools for technical and economic analysis and make use of scientific advances in the field of goal-oriented management of complex systems that ensure flight safety [1].

Key Data from 2024

In 2024, there were 14 fatal airline accidents worldwide, resulting in 304 fatalities. In Europe, three fatal accidents occurred in commercial air transport, causing three deaths. In general aviation, 27 fatal accidents involving non-complex airplanes led to 44 fatalities. These figures are consistent with historical data [2].

This also highlights the relevance of the topic under investigation. The process of flight safety monitoring based on the assessment of risk levels enables the identification of adverse factors occurring during flight and the prediction of their potential consequences.

The nature of flight crew errors in flight remains insufficiently studied, and the numerous unsystematic preventive measures currently in use have proven to be of limited effectiveness. Consequently, reliable methods for preventing aircraft accidents caused by crew errors have yet to be developed. One of the main reasons for this persistent problem lies in the inadequate approach taken by airline management to address it. While significant time and resources are spent analyzing detected errors, flight operations specialists often lack the ability to assess risk levels effectively—an assessment that would allow them to:

identify potentially hazardous situations;
evaluate the probability of danger occurrence;
select alternative measures to reduce the level of risk; and
assess the effectiveness of the implemented solutions.

Therefore, to improve flight safety management, it is essential to develop new methods that enable flight services to operate efficiently according to the assessed risk levels of adverse factors. These methods should be integrated into an Automated Control System (ACS) capable of collecting, storing, and analyzing relevant operational data and processing events across all hierarchical levels through a unified algorithmic framework.

Data and management

The proposed approach complies with the ICAO requirements set out in the Safety Management Manual (SMM). According to these guidelines, introducing the concept of an acceptable level of flight safety (ALoS) requires not only adherence to the standard safety principles and requirements already in place, but also the application of an approach based on measurable flight safety indicators.

An acceptable level of flight safety represents the goals defined by supervisory authorities, operators, and customers, which must be achieved and maintained within the field of flight safety. This level serves as a benchmark against which regulatory bodies can assess flight safety performance. When determining the acceptable level of safety, several factors must be considered: the existing level of operational risk, the cost–benefit ratio of improving the risk assessment system, and public expectations regarding safety in the aviation sector.

One possible method of doing so involves assessing the hazard coefficient of adverse factors based on the number of recorded incidents. The key principle in monitoring flight safety levels is therefore the evaluation of hazardous in-flight events.

In practice, airlines most often measure flight safety performance using data on severe adverse events. By continually reducing the number of such incidents, specialists can in turn decrease the overall frequency of aviation accidents. This relationship can be expressed as: $Δ f . possible . min = < Δ f . fact = Δ f . possible . max$ \Delta f.possible.\min = < \Delta f.fact = \Delta f.possible.\max where Δf represents the frequency of events.

A major limitation of this method, however, lies in its reliance on “high-severity” events (accidents and serious incidents) when analyzing risk levels and identifying adverse in-flight factors. This approach tends to be coarse and lacks precision in defining intermediate levels of risk. To minimize these shortcomings, the author proposes expanding the analysis to include not only dangerous occurrences and deviations, but also “other negative events” (see Fig. 2) [3].

Following the ICAO recommendations, “other negative events” refer to less critical cases that pose potential safety threats. Although these events may seem minor, they can serve as early indicators of latent problems in flight safety. Ignoring such underlying issues may lead to an increased number of more serious incidents. Recurrent events are particularly significant, as the data they provide are valuable for statistical evaluation.

The main difficulty of the current method lies in defining the weighting factor for each negative event, due to inherent inequalities [1]. Therefore, it is crucial to develop a system for the quantitative assessment of these weighting factors, based on the theory of risk evaluation.

It can thus be concluded that unifying the theoretical foundations of flight safety within the existing ICAO risk-based models remains incomplete for several reasons:

The concept and modelling of risk vary across disciplines (finance, ecology, engineering, etc.), and sector-specific features are often incorrectly taken as a foundation for new risk models.
From the general standpoint of mathematical formalization and the theory of random processes, only two models or formulas of risk can realistically be used: those based on the accident rate or the uncertainty of the studied phenomena. Moreover, within the broader theoretical framework, no significant difficulties arise in defining risk or assessing system safety based on risk interactions across various systems.
The direct transfer of methods from reliability theory to the evaluation of hazards caused by system failures does not yield satisfactory or unambiguous results. In particular, it fails to explain adequately the causes of accidents as rare and improbable events.

Method for Quantitative Evaluation of the Hazard Level of Adverse In-Flight Factors

Risk assessment makes it possible to classify identified events into groups of similar occurrences according to decreasing levels of risk. The resulting quantitative values can then be used to establish a priority order for implementing flight safety measures.

To determine risk levels based on operational monitoring data, the evaluation follows the rules of flight airworthiness that regulate the probabilities of special in-flight situations, as illustrated in Figure 3.

Here, CFC refers to control flight configuration, DS means dangerous or difficult situations, E stands for expected incident, CS for catastrophic situations; P_x(O) is the probability of special situation caused by a functional failure, and P_x(∑) is the total probability of special situations caused by functional failures).

In this case, the total risk assessment is expressed as the sum: $P_{x} + P_{CS} + P_{E} + P_{DS} + P_{CFC} .$ {{\rm{P}}_{\rm{x}}} + {{\rm{P}}_{{\rm{CS}}}} + {{\rm{P}}_{\rm{E}}} + {{\rm{P}}_{{\rm{DS}}}} + {{\rm{P}}_{{\rm{CFC}}}}.

For the purpose of achieving a higher level of flight safety, particular attention should be paid to seemingly minor events with limited immediate consequences. The risk level of such events is best assessed through expert evaluation, since the use of purely mathematical methods is often not feasible for this purpose. The main challenges in developing mathematical rules for the quantitative evaluation of adverse in-flight factors include:

difficulties in ranking such negative events;
difficulties in defining their potential consequences;
challenges in analyzing flight development within the entire chain of adverse factors;
difficulties in determining flight outcomes involving several event ranges within short time intervals; and
difficulties in assessing how one adverse factor may trigger others and contribute to the cascading development of events.

A graphical representation of the method for the quantitative evaluation of negative event frequency was shown above in Fig. 2, according to which the rule 1:10:30:600 (a conditional ratio of the recurrence of negative events) and 1:10:1000:10000(:>10000) (a conditional ratio of the recurrence of special in-flight situations) can be applied. Formally, this relationship is expressed as: $q_{A} : q_{F} : q_{SI} : q_{I} = 1 : 10 : 30 : 600,$ {q_A}:{q_F}:{q_{SI}}:{q_I} = 1:10:30:600, where q_A is the number of accidents, q_F the number of failures, q_SI the number of serious aviation incidents, and q_I the number of aviation incidents.

In the updated model of the flight conditions pyramid, the relationship between event categories is expressed as: $q_{CS} : q_{E} : q_{DS} : q_{CFC} : q_{WCFC} = 1 : 10 : 103 : 104 : (> 104),$ {q_{CS}}:{q_E}:{q_{DS}}:{q_{CFC}}:{q_{WCFC}} = 1:10:103:104:(> 104), where q_CS is the number of catastrophic situations, q_E the number of emergencies, q_DS the number of difficult situations, q_CFC the number of situations with complication of flight conditions, and q_WCFC the number of situations without complication of flight conditions.

This classification method is straightforward to apply and allows for continuous monitoring of the current risk level during flight safety management. The assessment is conducted in accordance with the risk classification presented in Table 1 and Table 2.

Table 1.

Probability of an accident for different event types.

i Index of event type	Event type (special situation in flight)	Q_i Accident probability	n_i Number of controllable events of type i	T
1	WCFC (Situation without complication of flight conditions)	Q₁=10⁻⁵	q₁ – Number of controllable events of WCFC type	Flight hours during the flight safety monitoring
2	CFC (Situation with complication of flight conditions)	Q₂=10⁻⁴	q₂ – Number of controllable events of CFC type
3	DS (Difficult situation)	Q₃=10⁻³	q₃ – Number of controllable events of DS type
4	E (Emergency)	Q₄=10⁻¹	q₄ – Number of controllable events of E type
5	CS (Catastrophic situation)	Q₅=10⁰	q₅ – Number of controllable events of CS type

Table 2.

Risk classification.

		Rank of consequences
		Insignificant (WCFC)	Insignificant (CFC)	Significant (DS)	Dangerous (E)	Catastrophic (CS)
Probability of event	Frequent 10⁻³ < Q ≤ 10⁰	Subject to analysis	*Unacceptable*	*Unacceptable*	*Unacceptable*	*Unacceptable*
	Quite probable Q ≤ 10⁻³	Subject to analysis	Subject to analysis	*Unacceptable*	*Unacceptable*	*Unacceptable*
	Probable Q ≤ 10⁻⁵	Acceptable	Subject to analysis	Subject to analysis	*Unacceptable*	*Unacceptable*
	Improbable Q ≤ 10⁻⁶	Acceptable	Acceptable	Subject to analysis	Subject to analysis	*Unacceptable*
	Extremely improbable Q ≤ 10⁻⁷	Acceptable	Acceptable	Subject to analysis	Subject to analysis	Subject to analysis

It is necessary to use expert evaluation to assign an event class, since other mathematical methods are not applicable. The manifestation of an adverse factor, the crew's actions to mitigate its consequences, and the resulting flight outcome are random events. Therefore, as an objective, integral measure for assessing flight safety, we adopt the probability of an unsuccessful flight outcome (failure or accident). This indicator is hereafter referred to as the flight risk level, Q. The mitigation-complexity factors together with the manifested adverse factors form a likelihood matrix: (1) ${α_{ij}} = \begin{array}{l} α_{11} \dots α_{ij} \dots α_{1 n} \\ \dots \dots \dots \dots \dots \dots \\ α_{i 1} \dots α_{ij} \dots α_{in} \\ \dots \dots \dots \dots \dots \dots \\ α_{m 1} \dots α_{mj} \dots α_{mn} \end{array}|$ \{{\alpha _{ij}}\} = \left| {\matrix{{{\alpha _{11}} \ldots {\alpha _{ij}} \ldots {\alpha _{1n}}} \hfill \cr {\ldots \ldots \ldots \ldots \ldots \ldots} \hfill \cr {{\alpha _{i1}} \ldots {\alpha _{ij}} \ldots {\alpha _{in}}} \hfill \cr {\ldots \ldots \ldots \ldots \ldots \ldots} \hfill \cr {{\alpha _{m1}} \ldots {\alpha _{mj}} \ldots {\alpha _{mn}}} \hfill \cr}} \right| where α_in denotes the number of flight stages and α_jm notes the number of adverse factors.

The outcome hazard factor of a special situation is defined as β_ij the probability of an aviation incident resulting from an unmitigated i-type adverse factor occurring at the j-th stage of flight. The outcome hazard factors of special situations in flight form a likelihood matrix: (2) ${β_{ij}} = \begin{array}{l} β_{11} \dots β_{ij} \dots β_{1 n} \\ \dots \dots \dots \dots \dots \dots \\ β_{i 1} \dots β_{ij} \dots β_{in} \\ \dots \dots \dots \dots \dots \dots \\ β_{m 1} \dots β_{mj} \dots β_{mn} \end{array}|$ \{{\beta _{ij}}\} = \left| {\matrix{{{\beta _{11}} \ldots {\beta _{ij}} \ldots {\beta _{1n}}} \cr {\ldots \ldots \ldots \ldots \ldots \ldots} \cr {{\beta _{i1}} \ldots {\beta _{ij}} \ldots {\beta _{in}}} \cr {\ldots \ldots \ldots \ldots \ldots \ldots} \cr {{\beta _{m1}} \ldots {\beta _{mj}} \ldots {\beta _{mn}}} \cr}} \right|

Introducing the concepts of the complexity factor for mitigating an adverse factor and the outcome hazard factor of a special flight situation makes it possible to define the overall danger factor of a special situation caused by the occurrence of an adverse factor, expressed as: (3) ${O P_{ij}} = {α_{ij}} \otimes {β_{ij}}$ \{{OP}_{ij}\} = \{{\alpha _{ij}}\} \otimes \{{\beta _{ij}}\} where the symbol ⊗ denotes element-wise (step-by-step) matrix multiplication. Accordingly, the value OP_ij represents the degree of hazard associated with an adverse factor. Given the known a priori probability of occurrence q_ij, the flight risk level Q_ij can be evaluated as: (4) $Q_{ij} = {OP}_{ij} \times q_{ij}$ {Q_{ij}} = {OP}_{ij} \times {q_{ij}}

By using the quantitative values of α and β, it becomes possible to determine OP based on the level of danger of the flight's special situation. This, in turn, enables the development of a rational strategy for crew actions to mitigate the consequences of adverse factors, the formulation of requirements for flight safety systems, and the establishment of appropriate training standards for flight crews.

This approach allows for the evaluation of the current level of flight safety during aircraft operation and the forecasting of the effectiveness of planned preventive measures.

RISK FACTOR AND MODELLING

The first flight operations risk category to be analyzed here is Controlled Flight into Terrain (CFIT). To identify the contributory risk factors and their interrelationships, human experts from various domains of flight operations were interviewed. Each general risk factor was subsequently decomposed into its component elements [4].

This process of decomposition and mapping of interrelationships is referred to as the risk structure. The structured knowledge elicitation method employed here is loosely based on the Analytical Hierarchy Process (AHP) [4].

To ensure that the experts' opinions were consistent and not random, the coefficient of concordance (W ) was used as a coordination criterion. Kendall's coefficient of concordance (W ) is a statistical measure used to assess the degree of agreement among multiple experts or judges evaluating the same set of items. It is a non-parametric statistic, particularly suitable when data do not meet the assumptions of parametric tests such as normality. The coefficient ranges from 0 to 1, where 0 indicates no agreement among experts and 1 represents perfect agreement [5].

Note that the W statistic should only be applied when measuring concordance between variables that evaluate the same general property of objects. If both positive and negative correlations are considered equally meaningful, this test would not be appropriate.

The ranking of adverse events was performed according to the classification presented in Table 3.

Table 3.

Ranging of adverse events by independent experts.

	1	2	i	M
1	c₁₁	c₂₁	c_1j	c_m₁
2	c₁₂	c₂₂	c_2j	c_m₂
….
j	c_1j	c_2j	c_ij	c_mj
...
n	c_1n	c_2n	c_in	c_mn

The number of columns (m) in the table corresponds to the number of experts who participated in the survey, while the number of rows (n) corresponds to the number of adverse events evaluated by those experts. At the intersection of the i-th column and the j-th row is the element C_ij, representing the rank (or position) assigned by the i-th expert to the j-th event.

Based on the data obtained from the expert survey table, both the hazard indicators of the evaluated events and the degree of agreement among expert judgments are assessed. These results make it possible to develop a hazard assessment scale, that is, to determine which type of situation (E, DS, CS, CFC, or WCFC) a given adverse risk factor is likely to produce.

First, the events are grouped into five categories: E (emergency), DS (difficult situation), CS (catastrophic situation), CFC (complication of flight conditions), and WCFC (without complication of flight conditions). The first group includes the most dangerous events, the second group somewhat less dangerous ones, and so on, with the fifth group containing the least dangerous events.

Within each group, events are ranked according to their degree of hazard – the most dangerous event occupies the first position, the next less dangerous event the second, and so forth. In this way, all events are systematically ranked by hazard level. In addition to these five main groups, two supplementary categories may also be identified: IC (indifferent condition) – events in which no danger arises, and RRC (risk-reducing condition) – events that may partially mitigate the likelihood of hazardous situations.

Kendall's coefficient of concordance (W) was then applied to the observations from all experts within each category independently. Kendall's W is calculated as follows: (5) $W = \frac{12 S}{m^{2} (n^{3} - n) - mT}$ W = {{12S} \over {{m^2}\left({{n^3} - n} \right) - mT}} where S is the sum-of-squares from the row sums of ranks R_i, n is the number of objects, m is the number of experts (observers) and T is a correction factor accounting for tied ranks.

For each row of ranks assigned by a given expert, the sum of ranks is calculated. This sum represents a random variable that, in the general case, provides an estimate of the variance relative to the maximum possible variance of ranks – effectively yielding a measure of rank correlation. (6) $Dx = \frac{1}{n - 1} \sum \sum_{i = 1}^{n} {(R_{i} - \bar{R})}^{2} = \frac{1}{n - 1} S$ Dx = {1 \over {n - 1}}\sum {\sum\limits_{i = 1}^n {{{({R_i} - \overline R)}^2} = {1 \over {n - 1}}S}} (7) $D_{\max} = \frac{m^{2} (n^{3} - n)}{12 (n - 1)}$ {D_{\max}} = {{{m^2}\left({{n^3} - n} \right)} \over {12(n - 1)}} (8) $\bar{R} = \frac{1}{n} \sum_{i = 1}^{n} (R_{i})$ \overline R = {1 \over n}\sum\limits_{i = 1}^n {({R_i})} (9) $S = \sum_{j = 1}^{m} (\sum_{i = 1}^{n} {(R_{ij} - \bar{R})}^{2})$ S = \sum\limits_{j = 1}^m {\left({\sum\limits_{i = 1}^n {{{\left({{R_{ij}} - \overline R} \right)}^2}}} \right)} (10) $T = \sum_{g = 1}^{q} (t_{g}^{3} - t_{g})$ T = \sum\limits_{g = 1}^q {\left({t_g^3 - {t_g}} \right)} where D denotes the calculated and maximum possible variance, R_i represents the rank and mean rank, q is the number of groups and t_g is the number of tied ranks in each ( g) of q groups.

The obtained value is evaluated for significance using the Pearson chi-square (χ²) test, by multiplying this coefficient by the number of experts and by the number of degrees of freedom (m − 1). The resulting criterion so obtained is then compared with the corresponding tabular value. If the calculated χ² exceeds the latter, the concordance coefficient under study is considered statistically significant. (11) $χ^{2} = n (m - 1) W$ {\chi ^2} = n(m - 1)W

The calculated χ² values are compared with the tabulated (critical) values corresponding to $χ_{T}^{2}$ \chi _T^2 degrees of freedom. This comparison determines the probability that the observed (calculated) value exceeds the tabular value, that is: (12) $P (χ^{2} > χ_{v}^{2}) = α$ P({\chi ^2} > \chi _v^2) = \alpha

If the obtained χ² values are statistically significant at a high confidence level (α > 0.95), this indicates that the concordance among the n experts is not due to chance. The developed mathematical model of integrated flight risk evaluation can therefore be applied to obtain quantitative estimates of risk levels for special flight situations using the expert evaluation method.

Critical χ² distribution values can be found in Table 8 of the Pearson distribution reference [6]. Thus, the coefficient of concordance (W) reflects the degree of agreement among multiple experts: the closer its value is to 1 (and the further from 0), the greater the consistency of expert opinions.

NUMERICAL EXAMPLE FOR ASSESSING CONSISTENCY

Experts were asked to express their opinions through a questionnaire and to rank the most significant factors contributing to serious flight events, assigning each a rank from 1 to 7. A higher number indicates a greater perceived significance of the factor.

Tables 4–5 present the results of this ranking exercise. The highest rank corresponds to the factor with the largest relative weight. Consequently, when selecting elements of a Safety Management System (SMS) for developing measures to enhance flight safety, special attention should be paid to the factor receiving the highest weight, as it plays the most influential role. The relative weights of all factors are shown in the last row of Table 5.

Table 4.

Factors affecting flight safety and assessment of their significance based on expert survey.

	Experts ↓	Weather conditions	Radio nav. System failures	Birds and foreign obstacle	Collisions on the ground	Financial conditions and economy	Psychological factor	Organization and Control
Factors →		1	2	3	4	5	6	7
Technician Stuff	1	7	3	2	4	5	6	1
Captain	2	3	5	4	5	3	6	2
Pilot 2	3	2	6	1	3	7	4	5
Dispatcher TC	4	4	6	5	3	3	5	2
Quality Manager	5	4	3	4	5	4	6	7
Airport services	6	4	2	3	5	6	5	7

Table 5.

Weight of the factors.

	1	2	3	4	5	6	7	ΣΣ X_ij = 177
1	7	3	2	4	5	6	1
2	3	5	4	5	3	6	2
3	2	6	1	3	7	4	5
4	4	6	5	3	3	5	2
5	4	3	4	5	4	6	7
6	4	2	3	5	6	5	7
Weights	24	25	19	25	28	32	24
	0.135	0.141	0.107	0.141	0.1581	0.1807	0.1355

To calculate the weights, the sums of all expert rankings across columns and rows were obtained. The total sum of all rankings is: $7 + 3 + 2 + 4 + 4 + 4 + \dots + 1 + 2 + 5 + 2 + 7 + 7 = 177 .$ 7 + 3 + 2 + 4 + 4 + 4 + \ldots + 1 + 2 + 5 + 2 + 7 + 7 = 177.

Next, the values for each column were summed to determine their relative weights. For example, in the first column the weight is 24/177, in the second column 25/177, and so forth. Based on these calculations, it was found that the psychological factor has the greatest weight, indicating that, according to the experts, it is the most significant contributor to flight safety risk.

The next step is to verify the consistency of expert opinions using Kendall's coefficient of concordance (W). If all ranks are identical – that is, if the experts are in complete agreement – then W = 1. In practice, however, the coefficient usually falls within the range 0 ≤ W ≤ 1. When W = 0 or takes a very small value, it indicates that there is little or no agreement among the experts.

Let us now calculate Kendall's W for the given data and test the degree of concordance among the experts. $S = Σ S_{i} = 27.5$ S = \Sigma {S_i} = 27.5

We calculate the sum of the event ranks for each expert, for example, in the first row 7 + 3 + 2 + 4 + 5 + 6 + 1 = 28, in the second row 3 + 5 + 4 + 5 + 3 + 6 + 2 = 28, and so on, then we find the average rank value R̄ = 29.5.

It should also be noted that the sum of ranks in each row was checked to ensure that the dataset indeed contains properly ranked data. Since there are seven subjects (events), the sum of rankings in each row should equal 1 + 2 + ⋯ + 7 = 7*(7 + 1) / 2 = 28, which it does (see Table 6). When there are multiple tied ranks, a revised definition of Kendall's W is used. For each rater j, define t_g, where each g represents a group of tied ranks for that rater, and t_g is the number of tied ranks in that group. An example of the distribution of tied ranks is presented in Table 7.

Table 6.

Rank of the experts.

	1	2	3	4	5	6	7	R_i	R̄ − R_i	S_i
1	7	3	2	4	5	6	1	28	−1.5	2.25
2	3	5	4	5	3	6	2	28	−1.5	2.25
3	2	6	1	3	7	4	5	28	−1.5	2.25
4	4	6	5	3	3	5	2	28	−1.5	2.25
5	4	3	4	5	4	6	7	33	3.5	12.25
6	4	2	3	5	6	5	7	32	2.5	6.25

For Expert 1 in the example, there are no tied ranks, and therefore T₁ = 0. Similarly, T₃ = 0. For Expert 6, there is one group of two tied ranks, and so T₆ = 2³ – 2 = 6. For Expert 2, there are two groups of tied ranks, giving T₂ = (2³ – 2) + (2³ – 2) = 6 + 6 = 12. Similarly, T₄ = 12 and for Expert 5, there is one group with three ties (ranks 1, 3, and 5), giving T₅ = 3³ – 3 = 24. Hence, the total correction factor for ties is T = 0 + 12 + 0 + 12 + 24 + 6 = 54 (see Table 7).

Table 7.

Ties in expert rankings.

	1	2	3	4	5	6	T_i
1	0	0	0	0	0	0	0
2	1	2	0	2	1	0	12
3	0	0	0	0	0	0	0
4	0	0	1	2	2	1	12
5	1	0	1	0	1	0	24
6	0	0	0	1	0	1	6

The coefficient of concordance is W = 12 × 27.5 / (6² × (7³ – 7) – 54 × 6) = 0.028. The resulting value W= 0.028 indicates a non-zero, but very low level of agreement among the experts. To determine whether this result is still statistically acceptable – and to verify whether Factor 6 can indeed be considered the most significant – we test its significance using Pearson's chi-square criterion: χ² = 7 × (6 – 1) × 0.028 = 0.981. The degrees of freedom in our case DoF = m – 1 = 7 – 1 = 6.

At a significance level of 98%, the calculated coefficient of concordance can be considered statistically significant. This means that, overall, the experts' assessments show a low but non-random level of agreement. Although general consistency among the experts is weak, the importance of one particular factor – identified as the most influential – remains statistically meaningful and should not be excluded when evaluating the performance of the flight safety management system (see Table 8).

Table 8.

Chi-square (χ²) distribution reference table listing critical values [6].

Degree of freedom	Level of significance α
Degree of freedom	0.99	0.975	0.95	0.9	0.1	0.05	0.025	0.01
1	------	0.001	0.004	0.02	2.7	3.8	5	6.6
2	0.02	0.051	0.103	0.211	4.6	6	7.4	9.2
3	0.115	0.216	0.352	0.584	6.25	7.8	9.4	11.3
4	0.297	0.484	0.711	1.064	7.78	9.5	11.1	13.3
5	0.554	0.831	1.15	1.61	9.24	11.1	12.8	15.1
6	0.872	1.24	1.64	2.2	10.65	12.6	14.4	16.8
7	1.24	1.69	2.17	2.83	12.02	14.1	16	18.5
8	1.65	2.18	2.73	3.49	13.36	15.5	17.5	20.1
9	2.09	2.7	3.33	4.17	14.68	16.9	19	21.7

CONCLUSIONS

The example presented herein demonstrates that this method of calculation is straightforward and accessible to anyone familiar with the basics of mathematics. To simplify the process, electronic spreadsheet templates can be used for the computations.

The obtained value of the risk level (R) characterizes the degree of hazard associated with the operation of an aviation system over a given period and allows for the determination of its safety rating. Similar calculations can be performed for individual airlines, aircraft types, or other operational categories. The results can be widely applied for monitoring and assessing flight safety status, and consequently, for managing flight safety more effectively.

Based on the results of the presented example, Factor 6 was identified as the most significant among those evaluated in the expert survey. This conclusion follows directly from the weighting results. Most experts agreed that the psychological factor has the greatest influence and can, in many cases, lead to catastrophic situations. However, although expert opinions indicate this factor's importance, Kendall's coefficient (W) showed a low level of agreement, suggesting a lack of strong consensus. The Pearson chi-square test confirmed the statistical significance of the factor but also revealed divergence among expert judgments.

This method is particularly recommended in cases where expert responses show moderate similarity, enabling the identification of key influences in complex control systems. Currently, probability estimates of specific events have not been included, as they are not yet required.

The W statistic should only be used to assess consistency among variables that measure the same general properties of objects or events. For instance, if both positive and negative correlations carry equal meaning within the same dataset, Kendall's W would not be appropriate. In flight safety analyses, such cases may occur when one parameter increases while another decreases within the same time frame, creating clear but opposite correlations.

If W = 0, this means that experts ranked the list of factors entirely differently or at random. Conversely, W = 1 indicates that all experts ranked the list identically, following a predetermined order. However, this leads to a limitation: the method does not assess probabilities or absolute parameter values but only examines the structure of expert responses— specifically whether their rankings coincide. If any participant imitates another's responses, the reliability of the results is compromised.

When resolving expert disagreements or when probability estimates are required, more advanced statistical techniques such as Cohen's Kappa or Spearman's rank correlation coefficient should be applied. These methods, however, were beyond the scope of the present study. For practical applications that require simplicity and rapid evaluation of expert survey results, the approach presented here remains the most suitable and efficient.

Expert-Based Risk Assessment for Flight Safety Using Kendall's W and Pearson's Chi-Square

Full Article

Paradigm

My account