Maintainability by design is an important quality characteristic of a product. A high level of maintainability means that repairs are quick, easy and economical [1]. An in-depth study of maintainability can support the development, maintenance and support of equipment, providing a solid material foundation for combat effectiveness. An important aspect of maintainability studies is maintainability estimation, which in many cases is an important part of the operational suitability assessment in testing and evaluation (T&E) of current equipment [2]. Maintainability (a probability measure), mean time to repair (MTTR), and maximum time to repair [3] are commonly used metrics for maintainability evaluation. According to Military Handbooks 470A (MIL-HDBK-470A), the sample size for the maintainability estimation should be at least 30 to ensure a high degree of reliability of the estimates [4]. However, it is practically impossible to obtain sufficient data for maintainability estimation in certain phases of equipment testing. This is partly due to the considerable cost of testing and partly due to the fact that mapping the performance of the equipment is no longer the primary purpose of testing. This may result in insufficient fault samples being available for maintenance operations.
Thanks to advances in data acquisition and storage technologies, it is possible to collect data from other test phases of the equipment, expert knowledge, and data from similar equipment in addition to the current field test data. If we want to integrate this data, Bayesian theory is a natural choice, and the prior distribution specification is a key issue in the Bayesian framework. Zellner et al. [5] introduced a data quality factor to measure the quality of prior information and field test data, which in turn leads to more accurate prior distributions. Ibrahim et al. [6] investigated the power prior distribution and applied it to regression estimation with good results. Zhou C et al. [7] proposed a demonstration method using mixture prior distributions for the problem of small-sample maintainability demonstration with multiple sources of prior information, using credibility weighting based on quality factors to integrate multiple prior distributions into a single one.
In some phases of T&E, such as operational testing (OT), one faces not only the problem that the sample size of maintenance time is small, but probably also the problem of under-representation of maintenance operations. Due to limitations in test time and conditions, some failure modes that take longer to uncover may not show up, while others may lack excitation conditions, resulting in a lack of the types of operations to fix them. When Bayesian theory is used to integrate prior information, this missing information about maintenance operations can be accounted for at the level of the entire system. Consequently, there are two levels of prior information to consider: historical maintenance data observed from a microscopic viewpoint that is similar to the type of maintenance operations in the field test, and historical maintenance data observed from a macroscopic viewpoint for the entire system that contains this maintenance data. In fact, there are many studies in the field of reliability assessment that have focused on the integration of different levels of prior information. Guo J et al. [8] effectively integrated various expert knowledge and data sources at subsystem and system levels using the Bayesian melding method (BMM) in analyzing system reliability. Yang L et al. [9] integrated multilevel prior information using an improved BMM, which flexibly balanced the contributions of the prior distributions involved in the integration by setting the weighting factors as hyperparameters. From the perspective of system theory, the integration of different levels of information manifests the dialectical relationship between the whole and the local, which helps to obtain a more reasonable prior distribution. However, no corresponding studies have yet been conducted in the field of maintainability assessment.
When determining the prior distribution, the addition of inappropriate prior information can lead to a poor posterior distribution, which can have a negative impact on the maintainability estimation. It is therefore also necessary to check whether there is any significant conflict between the prior information and the field test data. The detection of such a conflict is generally performed by the consistency test in previous studies. Typical methods include the graph comparison method [10] and the Bayesian credible interval method [11], etc. Zhang Z [12], Zhu Z [13], and Wang J [14] all screened prior information from the perspective of data consistency. They used parametric or non-parametric methods to check whether the prior information and the field data were statistically derived from the same population. However, the consistency test is performed at a certain significance level. If this level is exceeded, the prior distribution is discarded, which is a “black or white” approach that results in the loss of useful prior information.
To solve the aforementioned problems, this paper proposes a maintainability estimation method that considers the integration of two different levels of prior information with the field data. The four main contributions of this work can be summarized as follows:
In response to the problems of insufficient maintenance data and underrepresentation of maintenance operation types that existed in some test phases, a BMM-based maintainability estimation method that considers the prior information on different maintenance operations and the one about system level is proposed.
In view of a possible conflict between a prior and the field data, a checking method for a prior-data conflict is given when the maintenance time follows the exponential, normal, and lognormal distribution, respectively, as well as a method for avoiding a prior-data conflict based on mixture priors.
Adaptive sampling importance resampling (ASIR) is flexibly applied to several links of the method proposed in this paper to solve computational difficulties.
An interesting perspective is provided to gain a deeper understanding of the maintenance time distribution for complex equipment.
The rest of this paper is organized as follows: Section 2 presents the motivation for the research in this paper. Section 3 discusses the BMM-based maintainability estimation framework in detail. Sections 4 and 5 present a numerical case and a practical test case to validate and illustrate the benefits, respectively. The last section briefly summarizes this paper and draws three conclusions.
The maintenance time is a random variable that is usually considered to follow an exponential, normal or lognormal distribution [15]. However, it follows a mixture distribution when there is more than one time-dependent factor in the maintenance process, and its probability density curve then takes on a multi-peaked shape [16]. Literature [17] and [18] point out that the maintenance time sample set for complex equipment is a combination of samples from different types of maintenance operations, which is a typical heterogeneous dataset. If there are significant differences in the time expectations for these different types of maintenance operations, the sample set histogram will show multiple peaks, the typical sign of a mixture distribution. This representation is not incorrect, but it needs to be explained in more detail.
In general, simple or basic maintenance operations follow a normal distribution, maintenance operations that can be completed after a short adjustment time or a quick replacement of parts follow an exponential distribution, and the lognormal distribution is suitable for describing the maintenance time of all types of complex equipment [19]. The maintenance of complex equipment consists of many different types of maintenance operations, with the types that occur less frequently taking longer, and vice versa. This mechanism leads to a distribution of maintenance time that is skewed to the "left" and has a long "tail" to the right. Ideally, there should be a standard lognormal distribution to model the maintenance time of complex equipment. In practice, however, this is not the case in some test phases. For example, for equipment intended for a specific combat mission, the mission-critical subsystems are only some of its components, which, together with factors such as the limitations of test conditions and test time, means that certain types of maintenance operations are not performed, so some "small populations" are missing in the lognormal distribution "large population" that should be present, thus creating the phenomenon of "multiple peaks". In other words, the lognormal distribution is the "intrinsic nature" of the maintenance time of complex equipment, while the multi-peak phenomenon is the "extrinsic manifestation" under certain conditions.
The proposed method is described in detail in this section and its flowchart is shown in Fig. 1. First, the natural prior distributions for the system and each type of maintenance operation are determined based on historical data, and the induced system prior is evaluated based on the structural model and the natural priors for all maintenance operations. Then the BMM is used to obtain an updated prior for each maintenance operation, while a prior-data conflict elimination is used to determine an uninformative prior for each maintenance operation. The uninformative prior and the update prior are combined to generate the mixture prior and thus the posterior for each maintenance operation. Finally, the system maintainability is estimated based on the structural model and the posterior of all maintenance operations.

The flowchart of the proposed method.
The system maintainability Ms(t) at a given time t is a weighted mean of n probabilities [20]:
The parameters θ1,θ2,θ3,θ4,θ5 in (2) are all random variables in the Bayesian context whose prior distributions can be assumed to be in the form of conjugate priors, and probability density functions (p.d.f.s) of the following form can be derived [21]:
The hyperparameters α1 and β1 of π1(θ1) can be deter-mined using the moment method if historical test data are available [22]. If the maintenance time X follows an exponential distribution, the expected value μm and the variance
After replacing μm and
Since π2(θ2, θ32), π3(θ4, θ52) and π(ϕ1, ϕ22) are all N-IG densities, their hyperparameters are determined by the same approach. Taking π(ϕ1, ϕ22) as an example, the hyperparameters μ, k, r and λ are estimated as follows [23]:
In the Bayesian context, M1, M2 and M3 in (2) can also be regarded as functions of θ1, (θ2, θ3) and (θ4, θ5), respectively. Then the mapping relationship from the input vector θ = (θ1, θ2, θ3, θ4, θ5) to the output variable Ms can be established, denoted as W: θ → Ms, and the model can be further defined as Ms = W(θ), which is exactly the system maintainability model shown in (1) and (2). Since one Ms can be mapped by more than one θ, the model is irreversible. Take the distribution obtained from expert knowledge or historical experimental data as the natural prior and let q1(θ) be the natural prior of θ. In (3), M′s (t) is strictly monotonically decreasing with respect to ϕ1. Therefore, the following inverse function can be easily determined:
The integral in (12) does not yield a closed expression and can be approximated by numerical integration. The structural model Ms = W(θ) corresponds to a transformation of θ. Consequently, a distribution of Ms, called the induced prior distribution
The foregoing details show that based on the deterministic structural model and natural priors, the BMM integrates the prior information of each maintenance operation to the system level and further transfers the pooled system information to the maintenance operation level. The information integration and updating for different levels is thus realized. qθ(θ) integrates the prior information of Ms and θ, and also integrates the structural information of the model Ms = W(θ).
There is a significant discrepancy between the prior information at system level and the field test data. When integrated according to (14), there may still be a prior-data conflict, although this discrepancy is mitigated to some extent by the structural model Ms = W(θ). It is necessary to avoid such a conflict while preserving useful information in the prior.
In the Bayesian framework, a prior-data conflict arises whenever the prior concentrates most of its mass in the low-density region of the likelihood [25], which is then measured by P(s0) in the following equation:
For a sample x = (x1, …, xn ) of maintenance time X, S = x̄−1 is an MSS for θ1 in (2) if X follows an exponential distribution. Since x̄−1 ∼ IG(n, nθ1) [26], where IG is an abbreviation for Inverse Gamma, so:
Since the prior distribution of θ1 is Gamma(α1, β1), the posterior distribution is then given by θ1|x ∼ Gamma(α1 + n, β1 + nx̄) [27]. Consequently, the prior predictive density of x̄−1 can be derived according to Bayes' theorem:
Actually, the following equation is clearly known from the form of π3(θ4, θ52) in (4):
The posterior distribution of (θ4, θ52) is then given as:
The joint prior predictive density of (x̄, v2) can be derived according to Bayes' theorem as:
In the above description, P(s0) in (15) does not yield a closed analytical expression, whose Monte Carlo estimate is therefore given here:
- ➢
Step 1.
For given α1 and β1[μ3, k3, r3 and λ3], generate θ1[(θ4, θ52)] from π1(θ1)[π3(θ4, θ52)].
- ➢
Step 2.
For θ1 [(θ4, θ52)] generated from step 1, generate x̄−1 [(x̄, v2)] from p(x̄−1|θ1) [p(x̄, v2|θ4, θ52)], and evaluate m(x̄−1)[m(x̄, v2)] according to (17), (18).
- ➢
Step 3.
Repeat step 1 and step 2 many times and record the proportion of
wherem({\bar x^{- 1}}) \le m({\bar x_0}^{- 1})[m(\bar x,{v^2}) \le m({\bar x_0},v_0^2)], is an observation of x̄−1[(x̄, v2)] based on the sample x = (x1, …, xn ). This proportion is a Monte Carlo estimate of P(s0).{\bar x_0}^{- 1}[({\bar x_0},v_0^2)]
The expressions “[⋅]” in the above three steps correspond to the case where X follows a normal distribution. If X follows a lognormal distribution, it is sufficient to set Y = log( X) and then proceed as in the case of a normal distribution.
To avoid prior-data conflict, a mixture prior distribution π(θ|Dh, ψ) with the following form is used in this paper:
In this paper, prior-data conflict checks are required for each of the three maintenance operations. The informative prior π(θ|Dh ) of (22) should be sequentially replaced by the corresponding components in the updated prior qθ(θ), i.e., q(θ1), q(θ2, θ3) and q(θ4, θ5). As for the choice of the non-informative prior π(θ), the absence of the possibility of any prior-data conflict should be considered as a necessary characteristic of any non-informative prior for our purposes, as suggested in [25]. The choice of π(θ) is not unique and should be subjected to a process of trial and error, the result of which is that the variance of π(θ) is much larger than that of π(θ|Dh ). This choice should be such that μm ≈ x̄ in (6) if the maintenance time X follows an exponential distribution. If X follows a normal distribution, its marginal distribution is a generalized Student's t distribution, i.e., [23]:
According to Bayes' theorem, the posterior distribution
Sampling importance resampling (SIR), originally developed by Rubin [28], was used for posterior inference to overcome the challenges of sampling from complex priors and likelihood functions. The basic principle of SIR is to take a large number of samples from a known distribution, then reweight the samples, and finally resample the samples according to the weights to form the target distribution. However, the occasional presence of a small number of large importance weights may dominate the resampling process during the algorithm run, which may cause the SIR to not perform well. Liang [29] extended the pruned-enriched Rosenbluth method by setting upper bounds on the weights to limit the use of certain samples. The enrichment method explores a broader range of samples by splitting large weights into small ones. In this paper, ASIR is used for three contexts: the first is the updated prior qθ(θ) in (14), the second is the mixture prior π(θ|Dh, Ψ) in (22), and the third is the posterior
- ➢
Step 1.
Generate initial random samples. That is, generate a set of samples {θj, j = 1, …, m} from q1(θ).
- ➢
Step 2.
Calculate resampling weights.
Generate a set of samples {Ms j, j = 1, …, m} using the mapping relationship W: θ → Ms and calculate the resampling weight cj as follows:
(26) where W(θj) = Ms j. Based on {Ms j, j = 1, …, m}, KDE is used to obtain{c_j} = {\left[ {{{{q_2}\left[ {W\left({{{\boldsymbol{\theta}}_j}} \right)} \right]} \over {q_1^*\left[ {W\left({{{\boldsymbol{\theta}}_j}} \right)} \right]}}} \right]^{1 - \alpha}} , and q2[W(θj )] is calculated according to (12).q_1^*[W({{\boldsymbol{\theta}}_j})] Choose d weights randomly from {cj, j = 1, …, m} and sort them in descending order to obtain
.\{c_j^{'},j = 1, \ldots,d\} Define a set {yk, k = 1, …, d} with
and assign the element y⌊γ×d⌋ as the threshold value, where γ is an empirical threshold percentage, e.g., γ = 80 %. ⌊γ × d⌋ is the floor value of γ × d.{y_k} = \sum\nolimits_{j = 1}^k {c_j^{'}} Split any weight cb > y⌊γ×d⌋ in {cj, j = 1, …, m} into gb = ⌊cb /y⌊γ×d⌋ + 1⌋ number of weights cb /gb used for resampling the corresponding θb.
Repeat (b)-(d) until the predefined condition, such as a certain standard deviation of the sampling weights, is met.
- ➢
Step 3.
Resample from {θj, j = 1, …, m} with the importance weights derived in step 2.
The ASIR procedure for π(θ|Dh, ψ) is similar to the above procedure, where the initial samples are generated from π(θ) and π(θ|Dh) and the initial resampling weights have only two values, with ψ assigned to the samples from π(θ) and (1 − ψ) assigned to the samples from π(θ|Dh). The distribution of the new samples is π(θ|Dh, ψ). Furthermore, these new samples are resampled with their likelihood function values in (25) as initial weights, resulting in the samples of
In this section, a synthetic dataset is provided for demonstration purposes. Suppose there are 9 different types of maintenance operations for a system whose times follow normal distributions with different parameters. A set of randomly generated time samples from each of the 9 normal distributions is combined into a large sample set. The parameters are set as shown in Table 1. Obviously, the sample size in Table 1 is also the number of failures corresponding to the maintenance operation, and therefore can be used to calculate the weight wi in (1).
Parameter settings for maintenance operations.
| Number of maintenance operation type | Mean parameter | Variance parameter | Sample size |
|---|---|---|---|
| 1 | 33.49 | 15.07 | 132 |
| 2 | 75.35 | 64.59 | 24 |
| 3 | 17.88 | 8.33 | 187 |
| 4 | 44.45 | 23.13 | 96 |
| 5 | 97.05 | 160.39 | 12 |
| 6 | 25.29 | 10.81 | 172 |
| 7 | 129.12 | 942.91 | 4 |
| 8 | 11.21 | 8.98 | 154 |
| 9 | 58.00 | 39.06 | 53 |
Three datasets are generated for demonstration. Dataset 1 was generated using the settings in Table 1 with 834 maintenance time samples. The data in dataset 1 belonging to maintenance operations 1, 2 and 4 are filtered out and combined into dataset 2. The time samples for maintenance operations 1, 2 and 4 are generated again to form dataset 3, with sample sizes of 20, 4 and 14, respectively. These three data sets are treated as historical data of the system, historical data of maintenance operations 1, 2, and 4 and field test data, whose distribution characteristics are illustrated with histograms, KDE curves and probability plots, as shown in Fig. 2.

Illustration of the distribution characteristics of datasets 1, 2 and 3; (a)-(c) Histograms and KDE curves for dataset 1, 2 and 3; (d) Probability plot for dataset 1.
As can be seen in Fig. 2, dataset 1 shows a clear positive skewness and fits well with a lognormal distribution. The other datasets are multimodal. Based on datasets 1 and 2, the natural prior distributions of the system and the three maintenance operations with N-IG forms can be determined using (9). All hyperparameter estimates are listed in Table 2.
Hyperparameter estimates for the natural prior distributions.
| Hyperparameter source |
| k̂ | r̂ |
|
|---|---|---|---|---|
| System | 3.22 | 1 | 13.06 | 1.97 |
| Maintenance operation 1 | 33.55 | 1 | 8.50 | 49.87 |
| Maintenance operation 2 | 74.62 | 1 | 4.21 | 75.17 |
| Maintenance operation 4 | 44.90 | 1 | 6.45 | 45.98 |
For a given time t = 25, the induced system prior can be derived based on the structural model Ms = W(θ) and the natural prior distributions of maintenance operations 1, 2, and 4. Subsequently, the updated priors of the three operations can be evaluated according to (14), which are also treated as informative priors in Section 3.4.2. In addition, the non-informative priors of the maintenance operations can also be determined according to the method described in Section 3.4.2, whose hyperparameter settings are listed in Table 3. The p.d.f. plots for the three maintenance operations with informative and non-informative priors are shown in Fig. 3, where the informative priors were determined using KDE.
Hyperparameter settings for the non-informative priors.
| Number of maintenance operation type |
| k̂ | r̂ |
|
|---|---|---|---|---|
| 1 | 32.32 | 0.8 | 1.3 | 41 |
| 2 | 72.42 | 0.4 | 1.2 | 50 |
| 4 | 43.10 | 0.2 | 0.1 | 8 |

The p.d.f plots for the prior distributions: (a), (c) and (e) Informative priors for maintenance operations 1, 2 and 4, respectively; (b), (d) and (f) Non-informative priors for maintenance operations 1, 2 and 4, respectively.
As can be seen in Fig. 3, the regions of the heat maps in (b), (d) and (f) are more dispersed and flatter, which is a characteristic of non-informative priors. Using the methods described in Section 3.4.2, the mixture posteriors can be evaluated on the basis of non-informative and informative priors. In addition, the system maintainability at t = 25 can be estimated based on the methods described in Section 3.5. Similarly, by changing the value of t, the system maintainability at other times can also be estimated. For comparison purposes, system maintainability is also estimated using the following three methods:
- ➢
Method 1:
Dataset 1 (which is treated as the historical data of the system) is fitted with a lognormal distribution whose parameters are estimated using maximum likelihood estimation (MLE), and the values of its c.d.f. are treated as the estimates of system maintainability.
- ➢
Method 2:
The time samples of each maintenance operation in dataset 3 (which is treated as field test data) are fitted with a normal distribution whose parameters are estimated using MLE. According to (1), the weighted mean of all normal distribution c.d.f values is used as an estimate of the system maintainability.
- ➢
Method 3:
In the proposed method, the system-level prior information is ignored, i.e., dataset 1 is not used.
The three methods and the proposed method for estimating system maintainability are shown in Fig. 4.

Maintainability curves for four estimation methods based on the synthetic data.
As can be seen from Fig. 4, the estimates of Methods 2 and 3 are not far apart, as the same three normal components were used to generate datasets 2 and 3. Compared to Method 2, the estimates of Method 3 are slightly closer to those of Method 1 because Method 3 uses more time samples than Method 2 and therefore should provide more accurate estimates. The estimates of the proposed method are closer to those of Method 1 than to those of Methods 2 and 3 because it takes into account the system-level prior information compared to Method 3. Method 1 uses 834 time samples. Without taking into account the differences in test conditions between current tests and historical tests, it can be assumed that the estimates of Method 1 are closest to the true results. However, there are always differences between the various tests, and some of these are quite significant. In this study, this is thoroughly considered. Therefore, the proposed method provides the most reasonable results.
The data used in this case comes from the performance tests (PT) and OT for a specific type of wheeled armored vehicle (WAV) in 2023. There are several differences between PT and OT, such as those related to vehicle drivers, road conditions and individual task durations. In addition, the OT covered a much shorter period of time than the PT. In addition to several WACs of this type, there was also some accompanying test equipment that was used to form a tactical system to complete the planned combat missions. For the purpose of subsequent demonstrations and to meet the requirements of proprietary information protection, the actual data used in this case was modified accordingly. When analyzing the maintenance operations in the tests, it was found that some maintenance operations are similar and can be grouped into the same categories. Overall, the 28 maintenance operations that occurred during the OT are grouped into three categories and their maintenance times are found to follow exponential, normal, and lognormal distributions, respectively. In addition, prior information on the three maintenance operations was screened in the PT, including 58 maintenance time samples. Similar to the method in Section 4, the characteristics of the data are shown in Fig. 5. The three methods and the proposed method for estimating system maintainability are shown in Fig. 6.

Illustration for the distribution characteristics of practical data; (a)-(c) Histograms and KDE curves for system level data from PT, the three maintenance operation data from PT, and the data from OT; (d) Probability plot for OT data.

Maintainability curves for four estimation methods based on practical data.
As can be seen from Fig. 5 and Fig. 6, the characteristics of the datasets are similar to those in Section 4, and the result of comparing the maintainability estimates is also similar, although the practical maintenance operation times contain not only a normal distribution but also an exponential distribution and a lognormal distribution.
In some test phases of equipment, due to the limited test time and conditions, the test data used for the maintainability evaluation not only faces the problem of small sample size, but also the problem of insufficient representativeness of maintenance operations, which may cause the lognormal data that should have appeared to have a multi-peak state, which in turn poses a challenge to the maintainability estimation based on Bayesian information fusion. To overcome this challenge, two levels of prior information, the system level and the maintenance operation level, are integrated with the field test data via BMM. Mixture priors are used to avoid prior-data conflicts and a Bayesian posterior distribution is used to estimate system maintainability. The main conclusions are as follows:
BMM can consider prior information at different levels, expanding the channels for data sources.
The mixture priors provide a balance between the non-informative prior and the informative prior, avoiding prior-data conflicts in the Bayesian framework.
Compared with other traditional methods, the proposed method can provide more reasonable estimates for maintainability.
In addition to maintainability, MTTR and maximum repair time are also important metrics for maintainability estimation. The next main task will be to investigate the estimation of other metrics.