Baseline bedside scores show moderate discrimination in heart failure with preserved ejection fraction (HFpEF) and vary by setting and endpoint.
Dynamic risk states outperform static risk labels. Discharge lung ultrasound B-lines quantify residual congestion and predict early post-discharge events.
Low-cost immuno-nutritional indices and patient-reported health add independent prognostic information, especially when tracked serially.
Diagnostic frameworks can stratify risk when complete components are available, but transportability and calibration remain constraints.
Heart failure with preserved ejection fraction (HFpEF) represents at least half of all heart failure cases and is increasingly regarded as a major clinical and public health concern [1]. The syndrome extends beyond the preserved left ventricular ejection fraction to encompass interacting myocardial, vascular, and extracardiac pro-inflammatory processes that yield a notably heterogeneous clinical phenotype [2]. Patients frequently present with multiple comorbidities, such as hypertension, diabetes mellitus, obesity, or chronic kidney disease, that complicate both diagnosis and management [3]. Despite its high prevalence, HFpEF remains an area of significant uncertainty, with high rates of morbidity and mortality that continue to challenge routine care [4].
Accurate prognostic assessment is likely to support a personalized management approach [5]. High-dimensional prognostic models, including machine-learning approaches, can achieve excellent discrimination in controlled datasets, but their bedside use is limited by data requirements [6]. By contrast, clinical risk scores rely on information clinicians already collect, so they are fast, transparent, cost-efficient, and reproducible.
This narrative review was developed to address a persistent gap in HFpEF care. Numerous prognostic scores have been published. Their clinical meaning, however, is often difficult to translate across settings, and the evidence base is heterogeneous. Also, cohorts differ in phenotype mix and disease severity, and endpoints are variably defined.
We therefore focused on instruments that clinicians can realistically compute in routine practice. Scores are treated as complementary perspectives on risk, not as candidates for a single ranking. We also analyze diagnostic frameworks that are commonly used to inform prognosis, even though they were developed for diagnostic evaluation. This pragmatic repurposing appears increasingly supported by observational data [7][8], but its performance remains sensitive to context and component availability. By comparing areas of agreement, contradiction, and uncertainty, we aim to clarify what this literature implies for clinical decisions.
This integrative narrative review aims to summarize clinician-usable prognostic scoring systems evaluated in HFpEF across care settings, highlight head-to-head comparisons where available and report key performance metrics (e.g., C-statistic/AUC and risk gradients), and synthesize practical implications for bedside risk stratification.
This narrative review assesses prognostic risk scores applied to patients with heart failure with preserved ejection fraction (HFpEF). We focused on instruments intended for bedside use rather than complex prognostic models or machine-learning approaches.
A focused search was performed in PubMed for human studies in adults (≥18 years), restricted to English-language publications from the last 10 years. We used a score-oriented strategy combining HFpEF terminology with both generic score terms and commonly cited score names. We excluded studies that were non-HFpEF-specific, not bedside-applicable, or based on biomarker-only or high-dimensional prediction models.
From each included study, we extracted the study setting and population, the HFpEF definition used, the inputs and scoring method for the score, the endpoints assessed, the follow-up duration when available, and the reported measures of prognostic performance. Endpoints of interest included all-cause death, cardiovascular death, heart-failure hospitalization, and prespecified composite cardiovascular outcomes.
Because the included studies differed in cohort characteristics, endpoint definitions, and follow-up, we did not pool estimates. Instead, we used a structured narrative synthesis to compare instruments across settings and endpoints. Discrimination was captured using AUC or concordance statistics (C-statistic/c-index). Strength of association was summarized using HRs or ORs, retaining the original scaling. Calibration was recorded when reported.
Findings were interpreted by score domain, study design (derivation vs validation), cohort setting, endpoint/time horizon, and feasibility.
To support methodological transparency and narrative quality, reporting was guided by the Scale for the Assessment of Narrative Review Articles (SANRA).
ChatGPT (OpenAI) was used as a language-support tool for English translation, grammar correction, and refinement of academic phrasing. Study selection, data extraction, and interpretation were conducted by the authors, who assume full responsibility for the accuracy and scientific integrity of the final manuscript.
Before detailing individual instruments, Table 1 maps the clinician-usable instruments included in this review. It summarizes each tool’s conceptual target and core components, alongside the clinical context in which it has been evaluated in HFpEF, such as acute admission triage, early post-discharge vulnerability, longitudinal ambulatory follow-up, or phenotype-specific pathways. The table also flags an important distinction for interpretation: several widely used scores were derived in mixed heart-failure cohorts and only later tested in HFpEF subsets, whereas other tools were developed or validated more directly in HFpEF-oriented populations. This difference has implications for external validity when results are applied across settings.
Main characteristics of prognostic scores
| Score (acronym) | Expanded name | Core components (concise) | Conceptual domain | Typical setting / use |
|---|---|---|---|---|
| ARIC AD-HFpEF [11] | ARIC Acute Decompensated HFpEF score | Age; systolic BP; BUN; sodium; hypoxia; heart rate; natriuretic peptides; anemia; underweight | Acute clinical burden / triage | Acute HFpEF admission; 28-day/1-year risk estimate |
| C2HEST [20] | Coronary artery disease; COPD; Hypertension; Elderly; Systolic HF; Thyroid disease | CAD; COPD; hypertension; age ≥75; prior systolic HF; thyroid disease | Comorbidity / systemic burden | Ambulatory HFpEF (TOPCAT) for background risk enrichment |
| CONUT [21][23][26][27][28] | Controlling Nutritional Status | Albumin; total cholesterol; lymphocyte count | Nutrition–inflammation | Older / hospitalized or post-discharge HFpEF |
| GNRI [22][24][27][28] | Geriatric Nutritional Risk Index | Albumin; weight-to-ideal-weight term | Nutrition–inflammation / frailty | Older/frail; hospitalized or ambulatory HFpEF |
| GWTG-HF [10] | Get With The Guidelines–Heart Failure risk score | Age; SBP; BUN; sodium; heart rate; COPD; race | Global clinical risk (mixed-EF derivation) | In-hospital acute HF; post-discharge risk stratification |
| H2FPEF [13][14][15][16][17] | Heavy; Hypertensive; Atrial fibrillation; Pulmonary hypertension; Elder; Filling pressure | BMI ≥30; ≥2 antihypertensives; AF; PASP/PH; age >60; E/e′ | Diagnostic framework repurposed for prognosis | Ambulatory/inpatient HFpEF; useful when stress testing unavailable |
| HFA-PEFF [14][15][17] | Heart Failure Association Pre-test assessment, Echocardiography & NP, Functional testing, Final aetiology | Echo + NP domains; functional testing (step-3) when available | Diagnostic framework repurposed for prognosis | Suspected/confirmed HFpEF; prognostic value highest with complete work-up |
| KCCQ [33] | Kansas City Cardiomyopathy Questionnaire (incl. KCCQ-12) | Patient-reported symptoms, function, quality of life | Patient-reported health status | Ambulatory/chronic HF (incl. HFpEF subsets) for outcome prediction and communication |
| LUS B-lines [19] | Lung ultrasound B-line count | B-line burden at discharge (residual pulmonary congestion) | Congestion physiology | Discharge risk stratification after acute HF (including HFpEF) |
| MAGGIC [9][17] | Meta-Analysis Global Group in Chronic HF risk score | Demographics; clinical status; comorbidities; therapies (standard MAGGIC variables) | Global clinical risk (mixed-EF derivation) | Chronic HF; validated in HFpEF subsets; longitudinal risk |
| MEDIA [18] | MEDIA echocardiographic score | PASP >40; IVC collapsibility <50%; average E/e′ >9; lateral s′ <7 | Echocardiography-only hemodynamic burden | Acute HFpEF and stable outpatient HFpEF |
| mGPS [29] | Modified Glasgow Prognostic Score | C-reactive protein; albumin | Inflammation–nutrition | Ambulatory HFpEF; 12-month outcomes |
| NRS-2002 [30] | Nutritional Risk Screening 2002 | BMI; weight loss; intake reduction; disease severity | Bedside nutrition screening | Acute HFpEF admissions; in-hospital risk (sex-specific effects) |
| SHFM [9] | Seattle Heart Failure Model | Clinical variables; labs; therapies; device therapy (model inputs) | Global clinical risk (mixed-EF derivation) | Chronic HF; comparator in HFpEF validation work |
| TRI [12] | TIMI Risk Index | Heart rate × (age/10)^2 ÷ systolic BP | Ultra-parsimonious acute triage | Acute HFpEF admission; in-hospital mortality triage |
| HALO [31] | HFpEF survivAL hOspitalization (HALO) score | Clinical severity; echocardiographic burden; natriuretic peptide load; prior HF hospitalization count | Multimodal HFpEF event risk (survival + recurrent admissions) | Recently hospitalized HFpEF; post-discharge risk stratification for survival and future admission burden |
| WATCH-DM [32] | WATCH-DM risk score | Age; BMI; BP; fasting glucose; creatinine; HDL-C; QRS; prior MI/CABG | Phenotype-specific (diabetes) | T2DM + HFpEF at discharge; ~1-year mortality |
Abbreviations: AF - atrial fibrillation; BMI - body mass index; BNP - B-type natriuretic peptide; BUN - blood urea nitrogen; CONUT - Controlling Nutritional Status; CRP - C-reactive protein; GNRI - Geriatric Nutritional Risk Index; HALO - HFpEF survivAL hOspitalization; HFpEF - heart failure with preserved ejection fraction; IVC - inferior vena cava; KCCQ - Kansas City Cardiomyopathy Questionnaire; LUS - lung ultrasound; LVEF - left ventricular ejection fraction; mGPS - modified Glasgow Prognostic Score; MLHFQ - Minnesota Living with Heart Failure Questionnaire; NT-proBNP - N-terminal pro–B-type natriuretic peptide; PASP - pulmonary artery systolic pressure; PNI - Prognostic Nutritional Index; TRI - TIMI Risk Index.
Because HFpEF case definitions varied across included cohorts (e.g., LVEF thresholds, use of natriuretic peptides, and diagnostic frameworks), these definitions are summarized in Supplementary Table 1 to aid interpretation of cross-study comparisons.
Supplementary Table 2 and 3 summarize outcome-oriented performance (mortality and hospitalization, respectively), whereas Table 2 synthesizes comparative applicability across clinical settings and phenotypes, highlighting when a tool is most informative and which inputs constrain bedside feasibility.
Practical applicability of clinician-usable HFpEF prognostic tools across settings and phenotypes
| Clinical context | Most feasible instruments | What they primarily capture | When they tend to add value | Key limitations / cautions |
|---|---|---|---|---|
| Acute admission triage (HFpEF) | ARIC AD-HFpEF; GWTG-HF; TRI [10][11][12] | Short-term mortality risk using readily available vitals and labs | Early triage and discharge planning; prioritizing follow-up intensity | GWTG-HF derived in mixed-EF populations; TRI trades physiologic specificity for speed |
| Discharge vulnerability after acute HF | LUS B-lines (± BNP/echo) [19] | Residual pulmonary congestion | Near-term readmission/death risk; decongestion targets before discharge | Requires operator familiarity; cut-points may vary across protocols |
| Ambulatory / longitudinal risk | MAGGIC (± BNP); KCCQ [9][33] | Global clinical burden and patient-reported status | Longitudinal counselling and shared decisions; complements event-focused tools | MAGGIC derived in mixed-EF cohorts; KCCQ evidence often from mixed HF samples |
| Echocardiography-only hemodynamic burden | MEDIA; H2FPEF (resting echo) [16][18] | Filling pressure, pulmonary pressure, venous congestion, longitudinal function | When natriuretic peptides are equivocal or stress testing unavailable | Generalizability depends on echo quality and cohort characteristics (e.g., AF exclusion in MEDIA ambulatory validation) |
| Older / frail or recently hospitalized HFpEF | GNRI; CONUT; mGPS; NRS-2002 [21][22][23][24][29][30] | Frailty-like vulnerability, malnutrition, inflammation | Risk refinement beyond BMI; identifies patients who may benefit from nutritional evaluation and closer follow-up | Definitions and thresholds differ; readmission associations may be less consistent than mortality |
| Phenotype-specific pathways | WATCH-DM (T2DM); C2HEST (comorbidity burden); nutrition indices for rhythm pathway [20][25][32] | Diabetes burden, systemic comorbidity, vulnerability affecting rhythm and recovery | Discharge decisions and surveillance tailored to phenotype | Derived/validated in specific cohorts; may require local adaptation to preserve performance |
| Multimodal HFpEF event risk | HALO; prior HF hospitalization count [31] | Structural/hemodynamic load plus prior events | Higher-risk recurrent admission phenotype | Data requirements (echo + biomarkers + history); external validation still limited |
Abbreviations: AF - atrial fibrillation; HFH - heart failure hospitalization; LUS - lung ultrasound; NP - natriuretic peptides.
MAGGIC (Meta-Analysis Global Group in Chronic Heart Failure) and GWTG-HF (Get With The Guidelines–Heart Failure) remain common entry points because they can be calculated quickly, using variables that are usually available early in care. Both tools were derived in broader, mixed–ejection fraction heart-failure populations. Most HFpEF evidence therefore reflects subgroup validation, not HFpEF-only derivation. This may partly explain why discrimination in HFpEF tends to land in the moderate range, with calibration and risk separation varying across settings and endpoints (see Supplementary Table 2 and Supplementary Table 3) [9][10]. MAGGIC appears to retain prognostic value in HFpEF, particularly once natriuretic peptide information is incorporated, although the incremental benefit may depend on cohort severity mix and endpoint definition (all-cause vs HF-specific hospitalization; first vs recurrent events) [9]. Where head-to-head data are available, MAGGIC performs similarly to other established global models in HFpEF. In the Rich et al. [9] validation cohort, discrimination for all-cause death was MAGGIC C-statistic 0.74 (95% CI 0.68–0.80) versus SHFM 0.72 (95% CI 0.67–0.78), with BNP alone performing comparably (0.76; 95% CI 0.70–0.81).
GWTG-HF, developed for in-hospital risk stratification in acute heart failure, appears to carry forward clinically meaningful stratification after discharge in HFpEF, but reported performance varies with acuity at index admission, comorbidity burden, and local discharge practices [10].
For acute triage, the ARIC AD-HFpEF (Atherosclerosis Risk in Communities Acute Decompensated HFpEF) score is conceptually distinct because it was designed around HFpEF admissions and short-to-intermediate follow-up. It formalizes what many clinicians already use implicitly at presentation, such as blood pressure, renal indices, oxygenation, natriuretic peptides, and markers of frailty [11]. The TIMI Risk Index (TRI) pushes simplicity even further. It uses only heart rate, age, and systolic blood pressure and still appears to identify a higher in-hospital mortality risk in acute HFpEF, making it attractive when time and data are limited [12]. The trade-off is predictability. TRI is best suited for front-end triage, not longitudinal risk (Supplementary Table 2) [12].
Diagnostic frameworks in HFpEF are increasingly used to estimate prognosis, although they were designed for case identification rather than outcome prediction. H2FPEF and HFA-PEFF are primarily diagnostic decision-support tools. Therefore, their prognostic use in established HFpEF should be considered off-label and interpreted with caution. This repurposing is still understandable. These scores capture pathophysiology linked to HFpEF events and can be calculated from routine clinical data collected during evaluation. Across ambulatory and inpatient cohorts, H2FPEF (Heavy, Hypertensive, Atrial Fibrillation, Pulmonary Hypertension, Elder, Filling pressure) generally stratifies risk of death and heart-failure hospitalization, and higher scores align with worse physiology, including exercise limitation and adverse hemodynamics [13][14]. However, cohorts differ in atrial fibrillation prevalence, obesity distribution, and diagnostic work-up. As a result, score distributions and observed risk gradients may shift even when the same cut-points are applied.
Head-to-head comparisons often favor H2FPEF over the HFA-PEFF algorithm when step-3 (stress) testing is unavailable, likely because H2FPEF can be computed from routine clinical data and resting echocardiography alone [14][15]. HFA-PEFF comprises Heart Failure Association-led pre-test assessment, echocardiography and natriuretic peptides, functional testing, and final etiologic classification. As summarized in Table 2, the apparent prognostic separation of these diagnostic frameworks is strongly conditioned by feasibility. Still, HFA-PEFF has prognostic value when natriuretic peptide data and stress-echocardiography components are obtained. Practical barriers and variable discrimination across settings may limit its use as a stand-alone tool [15]. In a hospitalized HFpEF cohort, the H2FPEF score correlated inversely with peak left-atrial strain and identified a higher risk of death or HF hospitalization, suggesting that H2FPEF continues to track atrial remodeling and carries prognostic information when stress testing is not feasible [16]. Przewłocka-Kośmala et al. [17] studied patients with exertional dyspnea and suspected HFpEF. In that head-to-head comparison, discrimination was similar for MAGGIC, H2FPEF, and step-2 HFA-PEFF (Harrell’s C = 0.637, 0.644, and 0.638, respectively), whereas incorporating step-3 exercise data improved discrimination (Harrell’s C = 0.715) [17]. This finding is setting-dependent (suspected HFpEF physiology cohort) and constrained by eligibility, because only patients in sinus rhythm were included (see Supplementary Table 2 and Supplementary Table 3).
Echo-anchored tools are often appealing in HFpEF because they target filling pressures, pulmonary hypertension, and systemic congestion, which are processes that are tightly linked to decompensation and subsequent hospital admission. The MEDIA score, based on pulmonary artery systolic pressure, inferior vena cava collapsibility, E/e′, and lateral s′, offers a compact way to translate a multiparametric echocardiogram into a structured risk estimate [18]. Its reported performance differs by setting. In acute HFpEF, higher scores are more strongly associated with short-term mortality. In stable outpatients, the signal shifts toward future heart-failure hospitalization (Supplementary Table 2 and Supplementary Table 3) [18]. This pattern suggests setting-dependent prognostic meaning.
Because HFpEF events are closely linked to congestion, markers of residual hemodynamic burden at discharge carry prognostic weight. Lung ultrasound (LUS) B-lines quantify pulmonary congestion. In acute HFpEF, a high B-line burden at discharge predicts 6-month rehospitalization or death and, in several analyses, outperforms admission measures [19]. These results come primarily from early post-discharge windows and discharge-based protocols, so they should not be compared directly with admission-based scores or longer follow-up cohorts. Combining discharge LUS with BNP (B-type natriuretic peptide) and diastolic parameters further improves classification [19]. A simple B-line count (using a Youden cut-point of 22 B-lines) can function as an imaging-based risk metric that complements global scores (e.g., GWTG-HF, MAGGIC) and HFpEF-specific instruments (MEDIA) with utility in the vulnerable post-discharge window. LUS behaves more like a modifiable risk state than a static label, which may explain its incremental value when measured at discharge rather than at presentation. (Table 2).
Some instruments enter HFpEF practice from adjacent clinical problems. The C2HEST score, initially developed to estimate incident atrial fibrillation, illustrates how a simple comorbidity-weighted framework may still enrich baseline risk in HFpEF, with associations extending beyond rhythm outcomes to death and hospitalization [20]. This should not be overinterpreted as a superior HFpEF prognostic model. Instead, it may be more appropriately interpreted as a low-friction baseline risk layer that reflects age and systemic comorbidity burden, factors that remain relevant across HFpEF phenotypes regardless of the dominant HFpEF mechanism. (Supplementary Table 3) [20].
A frequent topic across contemporary cohorts is that malnutrition and systemic inflammation carry independent prognostic weight in HFpEF and can be captured with low-cost indices derived from routine laboratory testing. GNRI (Geriatric Nutritional Risk Index), CONUT (Controlling Nutritional Status), and PNI (Prognostic Nutritional Index) each associate with mortality and with heart-failure readmission across several datasets (Supplementary Table 2 and 3) [21][22][23]. Among these measures, CONUT and GNRI tend to be most informative in older, hospitalized, or recently discharged cohorts [22][23]. Outpatient cohorts often show weaker absolute gradients, and follow-up horizons vary, limiting direct comparisons (Table 2).
In TOPCAT-Americas (Treatment of Preserved Cardiac Function Heart Failure With an Aldosterone Antagonist-American cohort), roughly one-third of participants met GNRI criteria for nutritional risk, and lower GNRI values independently tracked higher cardiovascular events and mortality [24]. This is trial-derived evidence in a selected population, not a universal HFpEF registry estimate. Nutritional risk quantified by CONUT, PNI, and NRI identified patients at higher risk of adverse rhythm-related outcomes in an HFpEF cohort undergoing rhythm control, reinforcing the broader prognostic role of malnutrition beyond mortality and HF readmission [25]. That cohort differs by pathway (rhythm management), endpoint mix, and comorbidity profile, so results should be interpreted within that context.
Equally important seems to be the temporal element. Worsening nutritional trajectories after discharge or during the index hospitalization (e.g., rising CONUT or falling GNRI) are associated with a higher subsequent risk, whereas improvement in these indices is associated with fewer later events (Supplementary Table 3) [26][27][28].
The modified Glasgow Prognostic Score (mGPS), which integrates CRP (C-reactive protein) and albumin, appears to add value over NT-proBNP for predicting 12-month death or HF hospitalization in ambulatory patients with HFpEF (Supplementary Table 2 and 3), thereby linking inflammation to clinically relevant outcomes [29]. Bedside instruments such as the nutritional index NRS-2002 (Nutritional Risk Screening 2002) can further identify high-risk inpatients, particularly men. This finding implies a sex-specific vulnerability that can be easily integrated into routine practices during hospitalization without additional cost [30]. From an implementation perspective, these indices are inexpensive and reproducible.
Scores that combine biomarkers, cardiac structure, and prior clinical events reflect the idea that HFpEF risk has multiple drivers, so focusing on a single type of information is unlikely to capture the full risk profile. HALO is a clear example. By integrating clinical severity, echocardiographic burden, natriuretic peptide load, and prior hospitalizations, it predicts survival and future admission burden, treating recurrent decompensation as part of the disease trajectory rather than a downstream complication [31]. WATCH-DM, oriented to the diabetic phenotype, uses routinely available discharge variables and appears to stratify 1-year mortality after HFpEF hospitalization, with performance comparable to MAGGIC, while being more explicitly tailored to a cardiometabolic risk profile [32]. Both tools are most interpretable when applied to the populations they were tested in (post-hospitalization cohorts; diabetes enrichment for WATCH-DM). They are less suited to uniform application across all HFpEF presentations (see comparative notes in the tables) [31][32].
Patient-reported health adds a complementary dimension. Kansas City Cardiomyopathy Questionnaire (KCCQ or KCCQ-12) predicts death and hospitalization (Supplementary Table 2 and 3). In a direct comparison, KCCQ showed higher discrimination than the Minnesota Living With Heart Failure Questionnaire (MLHFQ) (C-index 0.702 [95% CI 0.666–0.738] vs 0.658 [95% CI 0.621–0.695]) and a stronger risk gradient per 5-point change in the expected direction (adjusted HR 0.894 per 5-point KCCQ increase vs 1.077 per 5-point MLHFQ increase) [33]. Higher KCCQ reflects better health status, whereas higher MLHFQ reflects worse status. However, studies vary by instrument version, follow-up, and endpoint definition, which should be considered when comparing effect sizes across cohorts.
Patient-reported health adds a complementary dimension. Kansas City Cardiomyopathy Questionnaire (KCCQ or KCCQ-12) predicts death and hospitalization (Supplementary Table 2 and 3) and often outperforms symptom inventories such as MLHFQ [33]. However, studies vary by instrument version, follow-up completeness, and endpoint definition, which should be considered when comparing effect sizes across cohorts.
What happens over time tends to matter more than a single snapshot. Worsening CONUT or declining GNRI over months or even within a single hospitalization independently signals a higher subsequent risk, whereas stabilization or improvement appears to attenuate it [26][27][28]. Natriuretic peptide trajectories show similar dynamics and are partly captured by HALO via prior admission count [31]. The practical implication is that there is a need to move beyond a one-time score. Reevaluate with low-cost indices (CONUT, GNRI), global tools (MAGGIC, GWTG-HF), and congestion metrics (LUS B-lines) at clinically meaningful checkpoints, such as admission, discharge, 30–90 days, and after any re-hospitalization [28] [19]. Such serial assessments are straightforward and likely to catch turning points that a baseline score misses.
Building on this emphasis on reassessment, a layered workflow can align tools with the decision point. At admission or discharge, a global score can provide baseline context for triage and early planning (e.g., MAGGIC or GWTG-HF, and ARIC/TRI in acute presentations) [9][10][11][12]. At discharge, adding a congestion-focused measure such as LUS B-lines helps quantify residual risk that baseline clinical variables may not capture [19]. During early follow-up (around 30–90 days) and after any rehospitalization, repeating low-cost nutrition–inflammation indices (CONUT, GNRI) and updating biomarker burden can track whether risk is stable or undergoing meaningful change (see Fig 1) [26][27][28]. Across cohorts, congestion and trajectory sensitive measures tend to outperform baseline-only tools for near-term events.

Layered risk reassessment aligned to clinical decision points
The choice and sequencing of layers then become phenotype dependent. In older or frail patients, nutrition-based scores (GNRI, CONUT, mGPS) often outperform anthropometrics such as BMI, which is fluid-sensitive and may miss sarcopenia. These findings suggest that nutritional frailty is likely a key driver of events in this population [21][22] [23] [29]. In phenotype-specific settings, nutritional indices also inform prognosis: higher CONUT and lower PNI/NRI were associated with a greater risk of adverse rhythm outcomes in HFpEF patients undergoing rhythm control, supporting the use of nutrition screening to guide follow-up frequency and intervention thresholds [25]. In the diabetic phenotype, WATCH-DM provides a straightforward, diabetes-tailored stratification tool at discharge. Used alongside NT-proBNP (N-terminal pro–B-type natriuretic peptide) and, where feasible, KCCQ, it synthesizes biological signals with patient-reported outcomes [32][33]. Sex differences warrant attention as well. Bedside screening with NRS-2002 uncovers a pronounced mortality indicator in men during acute admissions, arguing for sex-aware escalation of nutritional support [30]. Finally, generalizability is not guaranteed. When phenotype distributions differ from derivation cohorts, local recalibration is preferable [13].
This review is narrative and integrative rather than a preregistered systematic review, and we did not pool estimates. The main limitation is heterogeneity across the underlying studies. Cohorts differed by setting (hospitalized, post-discharge, outpatient, trial), HFpEF definitions and diagnostic work-up, phenotype distributions, and endpoints. As a result, numerical performance metrics are not fully comparable, and transportability across contexts is constrained.
Reporting was inconsistent. Discrimination was frequently provided, whereas calibration was variably assessed and rarely standardized. Effect measures were reported on different scales, and robust external validation and within-cohort head-to-head comparisons were limited.
In addition, several tools discussed as prognostic tools (e.g., H2FPEF and HFA-PEFF) were originally designed for diagnostic evaluation; therefore, prognostic use represents off-label repurposing and may be sensitive to missing components and local phenotype mix.
Finally, the focus on clinician-usable scores may underrepresent higher-dimensional models that could improve prediction but require infrastructure and recalibration before routine implementation.
Baseline clinical scores in HFpEF offer only moderate discrimination. Performance appears to improve when assessment reflects what drives events, such as residual congestion, nutritional or inflammatory state, and how these change over time. A layered workflow seems most practical: start with an implementable clinical or diagnostic score (MAGGIC, GWTG-HF, H2FPEF/HFA-PEFF), add natriuretic peptides, check discharge lung ultrasound for B-lines, and follow simple nutrition indices plus KCCQ longitudinally. This mix is quick and inexpensive, yet very efficient. It may support admission triage, early post-discharge planning, and follow-up. Utility is likely phenotype-dependent: nutrition-centric indices are particularly informative in older or frail patients, while WATCH-DM provides diabetes-specific stratification.