Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Included records by year_
| Year | Included records | Unique aircraft | Unique make-model | Region codes |
|---|---|---|---|---|
| 2021 | 1,500 | 811 | 185 | 9 |
| 2022 | 1,500 | 831 | 164 | 9 |
| 2023 | 1,430 | 848 | 176 | 8 |
| Total | 4,430 | 2,204 | 290 | 10 |
Split counts and held-out station/fleet composition_
| Panel A. Temporal split (chronological 70/10/20 by difficulty date). | ||||
|---|---|---|---|---|
| Metric | All data | Train | Calibration | Test |
| Difficulty Date range | 01 Jan 2021 – 22 Dec 2023 | 01 Jan 2021 – 14 Jan 2023 | 15 Jan 2023 – 24 Apr 2023 | 24 Apr 2023 – 22 Dec 2023 |
| Events (unique SDRs) | 4,430 | 3,101 (70.0%) | 443 (10.0%) | 886 (20.0%) |
| Stations (Receiving Region Code), n | 10 | 10 | 8 | 8 |
| Fleets (make-model), n | 290 | 245 | 79 | 143 |
| Aircraft (RegistryNumber), n | 2,204 | 1,605 | 263 | 552 |
| Operators (Operator Designator), n | 97 | 90 | 34 | 41 |
| JASC classes (JASC Code), n | 306 | 279 | 95 | 160 |
j_tar-2026-0009_tab_106
| Panel B. Station hold-out split (region generalization test). | ||
|---|---|---|
| Metric | Train stations (non-held-out) | Held-out stations (S1-S2) |
| Stations (ReceivingRegionCode), n | 8 | 2 |
| Events (unique SDRs) | 1,766 | 2,664 (60.14%) |
| Fleets (make-model), n | 177 | 192 |
| Fleet overlap with training (n) | – | 79 |
| Fleets exclusive to held-out (OOD), n | – | 113 |
| Aircraft (RegistryNumber), n | 935 | 1,312 |
| Operators (Operator Designator), n | 64 | 49 |
| ATA/JASC classes (JASC Code), n | 249 | 226 |
Missingness summary for the core modeling schema fields defined in Table 3, in the analytic corpus (n = 4,430)_
| Field | Missing (n) | Missing (%) |
|---|---|---|
| Difficulty Date | 0 | 0 |
| Submission Date | 0 | 0 |
| Receiving Region Code | 0 | 0 |
| Registry Number | 12 | 0.271 |
| Aircraft Make | 3 | 0.068 |
| Aircraft Model | 5 | 0.113 |
| Discrepancy | 0 | 0 |
| JASC Code | 0 | 0 |
Overall dataset summary (SDRS exports, 2021-2023)_
| Item | Value |
|---|---|
| Study period (Difficulty Date) | 01 Jan 2021 – 22 Dec 2023 |
| Records extracted (raw rows across files) | 4,929 |
| Records included after cleaning (unique SDRs with required fields) | 4,430 |
| Excluded (duplicates / missing critical fields / missing narrative) | 499 |
| Unique SDR identifiers (Operator Control Number) | 4,430 |
| Unique aircraft (RegistryNumber) | 2,204 |
| Unique operators (OperatorDesignator) | 97 |
| Location/station proxy available | Receiving Region Code |
| Unique region codes (Receiving Region Code) | 10 |
| Narrative availability (Discrepancy present) | 4,430 / 4,430 (100%) |
| Label availability (JASC Code present) | 4,430 / 4,430 (100%) |
Robustness under dataset shift on the station-comparable temporal test subset: top-1 accuracy with approximate 95% confidence intervals, and descriptive F1 metrics for non-held-out stations versus station-held-out regions_
| Split | n | Top-1 accuracy (95% CI) | Macro-F1 | Weighted-F1 |
|---|---|---|---|---|
| Time-held-out (temporal test, non-held-out stations) | 337 | 0.504 (0.451-0.557) | 0.263 | 0.468 |
| Station-held-out (regions S1 and S2) | 473 | 0.452 (0.408-0.497) | 0.185 | 0.421 |
| Temporal test (all stations, label-in-distribution; station-comparable subset) | 810 | 0.474 (0.440-0.508) | 0.209 | 0.437 |
| Absolute difference (non-held-out - station-held-out) | – | 0.052 (-0.018 to 0.122) | 0.078 | 0.047 |
Core data fields, operational meaning, and modeling role in the recommended SDRS schema_
| Field | Example / Format | Role in analysis | Notes |
|---|---|---|---|
| Aircraft/Tail (hashed) | a3f9… | Entity linkage | Enables repeat-defect and within-aircraft history without revealing identity |
| Timestamp | ISO datetime; optional bins (e.g., day/week/shift) | Temporal ordering, shift tests | Binning used when required for privacy oı stability |
| Station | ICAO/IATA code | Site effects, stationshift evaluation | Supports cross-station robustness testing |
| Write-up text | Free text | Primary NLP input | Defect narrative; cleaned/normalized for abbreviations where feasible |
| Action taken text (if available) | Free text | Evidence/outcome context | Useful for retrieval, resolution summarization, and traceability |
| JASC code (supervised label | ATA chapter/subchapter | Supervised label | Target for triage classification when present and reliable |
| Deferral indicators (if present) | MEL/CDL flag; deferral code | Operational constraint marker | Captures deferral behavior relevant to dispatch and risk |
| Closure status/time (if present) | Open/closed; minutes/hours | Outcome/efficiency metric | Supports time-to-close variability and operational impact analyses |
Example of the proposed decision-support output for a de-identified temporal test case_
| Current write-up | Top-3 JASC candidates | Retrieved precedents | Conformal output | Recommended workflow action | Final human disposition |
|---|---|---|---|---|---|
| Emergency exit light inoperative at R1 door; removed and replaced emergency exit battery pack M1675/STA 322R with a serviceable unit in accordance with B737-800 AMM 33-51-06; test satisfactory. | 1. JASC 3350 | Case 1: R1 door emergency exit light inoperative; battery pack replaced; operational check satisfactory. | Prediction-set size = 72; status = review required. | Review required – the shortlist and retrieved precedents are coherent, but predictive uncertainty remains too high for silent autosuggestion. | Human reviewer confirms JASC 3350 and records battery-pack replacement with satisfactory operational test. |
Dataset scale, JASC label imbalance, and split composition (SDRS, 2021-2023)_
| Quantity | All data | Train | Calibration | Test |
|---|---|---|---|---|
| Study period (Difficulty Date) | 01 Jan 2021 – 22 Dec 2023 | 01 Jan 2021 – 14 Jan 2023 | 15 Jan 2023 – 24 Apr 2023 | 24 Apr 2023 – 22 Dec 2023 |
| Events (unique SDRs) | 4,430 | 3,101 (70.0%) | 443 (10.0%) | 886 (20.0%) |
| Unique aircraft (Registry Number) | 2,204 | 1,605 | 263 | 552 |
| Unique operators (Operator Designator) | 97 | 90 | 34 | 41 |
| Unique fleets (make-model) (both fields present) | 290 | 245 | 79 | 143 |
| Stations (Receiving Region Code) | 10 | 10 | 8 | 8 |
| Distinct JASC classes (JASC Code) | 306 | 279 | 95 | 160 |
| Majority class count | 582 | 421 | 53 | 108 |
| Majority class share (%) | 13.14% | 13.58% | 11.96% | 12.19% |
| Top-10 classes cumulative share (%) | 43.68% | 43.28% | 53.72% | 44.58% |
| Minority class count (min frequency) | 1 | 1 | 1 | 1 |
| Majority/minority ratio (max/min) | 582:01:00 | 421:01:00 | 53:01:00 | 108:01:00 |
Ablation summary showing the incremental contribution of each pipeline module on the temporal test set_
| Configuration | Primary metric | Value |
|---|---|---|
| Classifier only | Top-1 accuracy | 0.509 |
| Classifier + shortlist | Top-3 accuracy | 0.683 |
| Retrieval evidence | Recall@10 | 0.688 |
| Conformal + abstention | Coverage@90% | 0.919 |
Prediction performance on the temporal test set after label-based in-distribution filtering (labels observed in training)_
| Model | Evaluated temporal test subset | Top-1 accuracy | Top-3 accuracy | Macro-F1 | Weighted-F1 | Excluded test cases with OOD labels (n) |
|---|---|---|---|---|---|---|
| TF-IDF + Linear SVM (baseline) | 848/886 (95.7%) | 0.5094 | 0.6828 | 0.2386 | 0.4768 | 38 |
Fleet composition (top 10 make-model by count)_
| Rank | Fleet (Aircraft Make + Aircraft Model) | Count |
|---|---|---|
| 1 | BOEING 7377H4 | 226 |
| 2 | AIRBUS A320232 | 209 |
| 3 | DOUG MD11F | 166 |
| 4 | EMB ERJ170200LR | 151 |
| 5 | AIRBUS A321231 | 134 |
| 6 | CNDAIR CL6002D24 | 128 |
| 7 | CNDAIR CL6002C10 | 124 |
| 8 | BOEING 737823 | 120 |
| 9 | BOEING 737890 | 109 |
| 10 | BOEING 737 | 106 |
Field/label availability (after cleaning)_
| Field | SDRS column | Availability in the included set |
|---|---|---|
| Unique control # (dedup/audit trail) | OperatorControlNumber | 100% |
| Event/occurrence date | DifficultyDate | 100% |
| Report/submission date | SubmissionDate | 100% |
| Aircraft ID | RegistryNumber | High (used for unique count) |
| Fleet composition | AircraftMake, AircraftModel | High |
| Location/station proxy | ReceivingRegionCode | High |
| Defect narrative (NLP input) | Discrepancy | 100% |
| System label (JASC code) | JASCCode | 100% |
Conformal prediction set efficiency on the temporal test set: empirical coverage and prediction-set size summary across target coverage levels (α)_
| Target coverage | α | q̂hat | Empirical coverage | Average set size | Median set size | 25th percentile set size | 75th percentile set size |
|---|---|---|---|---|---|---|---|
| 0.80 | 0.20 | 0.993285 | 0.828033 | 9.016490 | 8 | 4 | 13 |
| 0.85 | 0.15 | 0.995652 | 0.870436 | 15.93168 | 12 | 6 | 24 |
| 0.90 | 0.10 | 0.997414 | 0.923439 | 40.87986 | 23 | 10 | 57 |
| 0.92 | 0.08 | 0.997839 | 0.943463 | 60.44405 | 30 | 13 | 85 |
| 0.95 | 0.05 | 0.998647 | 0.971731 | 122.2839 | 83 | 22 | 245 |
| 0.98 | 0.02 | 0.999292 | 0.990577 | 196.5524 | 266 | 91 | 277 |
Dataset-shift diagnostics for robustness evaluation: narrative length and similarity-to-training for time-held-out versus station-held-out groups_
| Group similarity to training | n | Median tokens in narrative | Median max |
|---|---|---|---|
| Time-held-out (non-held-out stations) | 337 | 36 | 0.325201 |
| Station-held-out (S1 and S2) | 473 | 29 | 0.275161 |