Clinical Prognostic Scoring Systems in Heart Failure with Preserved Ejection Fraction: An Integrative Review of Risk Prediction Models

Anamaria Draghici; Gheorghe-Andrei Dan

doi:10.2478/rjim-2026-0002

What is new? What is important?

Baseline bedside scores show moderate discrimination in heart failure with preserved ejection fraction (HFpEF) and vary by setting and endpoint.
Dynamic risk states outperform static risk labels. Discharge lung ultrasound B-lines quantify residual congestion and predict early post-discharge events.
Low-cost immuno-nutritional indices and patient-reported health add independent prognostic information, especially when tracked serially.
Diagnostic frameworks can stratify risk when complete components are available, but transportability and calibration remain constraints.

INTRODUCTION

Heart failure with preserved ejection fraction (HFpEF) represents at least half of all heart failure cases and is increasingly regarded as a major clinical and public health concern [1]. The syndrome extends beyond the preserved left ventricular ejection fraction to encompass interacting myocardial, vascular, and extracardiac pro-inflammatory processes that yield a notably heterogeneous clinical phenotype [2]. Patients frequently present with multiple comorbidities, such as hypertension, diabetes mellitus, obesity, or chronic kidney disease, that complicate both diagnosis and management [3]. Despite its high prevalence, HFpEF remains an area of significant uncertainty, with high rates of morbidity and mortality that continue to challenge routine care [4].

Accurate prognostic assessment is likely to support a personalized management approach [5]. High-dimensional prognostic models, including machine-learning approaches, can achieve excellent discrimination in controlled datasets, but their bedside use is limited by data requirements [6]. By contrast, clinical risk scores rely on information clinicians already collect, so they are fast, transparent, cost-efficient, and reproducible.

This narrative review was developed to address a persistent gap in HFpEF care. Numerous prognostic scores have been published. Their clinical meaning, however, is often difficult to translate across settings, and the evidence base is heterogeneous. Also, cohorts differ in phenotype mix and disease severity, and endpoints are variably defined.

We therefore focused on instruments that clinicians can realistically compute in routine practice. Scores are treated as complementary perspectives on risk, not as candidates for a single ranking. We also analyze diagnostic frameworks that are commonly used to inform prognosis, even though they were developed for diagnostic evaluation. This pragmatic repurposing appears increasingly supported by observational data [7][8], but its performance remains sensitive to context and component availability. By comparing areas of agreement, contradiction, and uncertainty, we aim to clarify what this literature implies for clinical decisions.

This integrative narrative review aims to summarize clinician-usable prognostic scoring systems evaluated in HFpEF across care settings, highlight head-to-head comparisons where available and report key performance metrics (e.g., C-statistic/AUC and risk gradients), and synthesize practical implications for bedside risk stratification.

METHODOLOGY

This narrative review assesses prognostic risk scores applied to patients with heart failure with preserved ejection fraction (HFpEF). We focused on instruments intended for bedside use rather than complex prognostic models or machine-learning approaches.

Search strategy and study identification

A focused search was performed in PubMed for human studies in adults (≥18 years), restricted to English-language publications from the last 10 years. We used a score-oriented strategy combining HFpEF terminology with both generic score terms and commonly cited score names. We excluded studies that were non-HFpEF-specific, not bedside-applicable, or based on biomarker-only or high-dimensional prediction models.

Data extraction and outcomes of interest

From each included study, we extracted the study setting and population, the HFpEF definition used, the inputs and scoring method for the score, the endpoints assessed, the follow-up duration when available, and the reported measures of prognostic performance. Endpoints of interest included all-cause death, cardiovascular death, heart-failure hospitalization, and prespecified composite cardiovascular outcomes.

Comparative framework and statistical metrics

Because the included studies differed in cohort characteristics, endpoint definitions, and follow-up, we did not pool estimates. Instead, we used a structured narrative synthesis to compare instruments across settings and endpoints. Discrimination was captured using AUC or concordance statistics (C-statistic/c-index). Strength of association was summarized using HRs or ORs, retaining the original scaling. Calibration was recorded when reported.

Findings were interpreted by score domain, study design (derivation vs validation), cohort setting, endpoint/time horizon, and feasibility.

Reporting quality guidance and use of generative AI tools

To support methodological transparency and narrative quality, reporting was guided by the Scale for the Assessment of Narrative Review Articles (SANRA).

ChatGPT (OpenAI) was used as a language-support tool for English translation, grammar correction, and refinement of academic phrasing. Study selection, data extraction, and interpretation were conducted by the authors, who assume full responsibility for the accuracy and scientific integrity of the final manuscript.

Overview of included scores and component domains

Before detailing individual instruments, Table 1 maps the clinician-usable instruments included in this review. It summarizes each tool’s conceptual target and core components, alongside the clinical context in which it has been evaluated in HFpEF, such as acute admission triage, early post-discharge vulnerability, longitudinal ambulatory follow-up, or phenotype-specific pathways. The table also flags an important distinction for interpretation: several widely used scores were derived in mixed heart-failure cohorts and only later tested in HFpEF subsets, whereas other tools were developed or validated more directly in HFpEF-oriented populations. This difference has implications for external validity when results are applied across settings.

Table 1.

Main characteristics of prognostic scores

Score (acronym)	Expanded name	Core components (concise)	Conceptual domain	Typical setting / use
ARIC AD-HFpEF [11]	ARIC Acute Decompensated HFpEF score	Age; systolic BP; BUN; sodium; hypoxia; heart rate; natriuretic peptides; anemia; underweight	Acute clinical burden / triage	Acute HFpEF admission; 28-day/1-year risk estimate
C2HEST [20]	Coronary artery disease; COPD; Hypertension; Elderly; Systolic HF; Thyroid disease	CAD; COPD; hypertension; age ≥75; prior systolic HF; thyroid disease	Comorbidity / systemic burden	Ambulatory HFpEF (TOPCAT) for background risk enrichment
CONUT [21][23][26][27][28]	Controlling Nutritional Status	Albumin; total cholesterol; lymphocyte count	Nutrition–inflammation	Older / hospitalized or post-discharge HFpEF
GNRI [22][24][27][28]	Geriatric Nutritional Risk Index	Albumin; weight-to-ideal-weight term	Nutrition–inflammation / frailty	Older/frail; hospitalized or ambulatory HFpEF
GWTG-HF [10]	Get With The Guidelines–Heart Failure risk score	Age; SBP; BUN; sodium; heart rate; COPD; race	Global clinical risk (mixed-EF derivation)	In-hospital acute HF; post-discharge risk stratification
H2FPEF [13][14][15][16][17]	Heavy; Hypertensive; Atrial fibrillation; Pulmonary hypertension; Elder; Filling pressure	BMI ≥30; ≥2 antihypertensives; AF; PASP/PH; age >60; E/e′	Diagnostic framework repurposed for prognosis	Ambulatory/inpatient HFpEF; useful when stress testing unavailable
HFA-PEFF [14][15][17]	Heart Failure Association Pre-test assessment, Echocardiography & NP, Functional testing, Final aetiology	Echo + NP domains; functional testing (step-3) when available	Diagnostic framework repurposed for prognosis	Suspected/confirmed HFpEF; prognostic value highest with complete work-up
KCCQ [33]	Kansas City Cardiomyopathy Questionnaire (incl. KCCQ-12)	Patient-reported symptoms, function, quality of life	Patient-reported health status	Ambulatory/chronic HF (incl. HFpEF subsets) for outcome prediction and communication
LUS B-lines [19]	Lung ultrasound B-line count	B-line burden at discharge (residual pulmonary congestion)	Congestion physiology	Discharge risk stratification after acute HF (including HFpEF)
MAGGIC [9][17]	Meta-Analysis Global Group in Chronic HF risk score	Demographics; clinical status; comorbidities; therapies (standard MAGGIC variables)	Global clinical risk (mixed-EF derivation)	Chronic HF; validated in HFpEF subsets; longitudinal risk
MEDIA [18]	MEDIA echocardiographic score	PASP >40; IVC collapsibility <50%; average E/e′ >9; lateral s′ <7	Echocardiography-only hemodynamic burden	Acute HFpEF and stable outpatient HFpEF
mGPS [29]	Modified Glasgow Prognostic Score	C-reactive protein; albumin	Inflammation–nutrition	Ambulatory HFpEF; 12-month outcomes
NRS-2002 [30]	Nutritional Risk Screening 2002	BMI; weight loss; intake reduction; disease severity	Bedside nutrition screening	Acute HFpEF admissions; in-hospital risk (sex-specific effects)
SHFM [9]	Seattle Heart Failure Model	Clinical variables; labs; therapies; device therapy (model inputs)	Global clinical risk (mixed-EF derivation)	Chronic HF; comparator in HFpEF validation work
TRI [12]	TIMI Risk Index	Heart rate × (age/10)^2 ÷ systolic BP	Ultra-parsimonious acute triage	Acute HFpEF admission; in-hospital mortality triage
HALO [31]	HFpEF survivAL hOspitalization (HALO) score	Clinical severity; echocardiographic burden; natriuretic peptide load; prior HF hospitalization count	Multimodal HFpEF event risk (survival + recurrent admissions)	Recently hospitalized HFpEF; post-discharge risk stratification for survival and future admission burden
WATCH-DM [32]	WATCH-DM risk score	Age; BMI; BP; fasting glucose; creatinine; HDL-C; QRS; prior MI/CABG	Phenotype-specific (diabetes)	T2DM + HFpEF at discharge; ~1-year mortality

Abbreviations: AF - atrial fibrillation; BMI - body mass index; BNP - B-type natriuretic peptide; BUN - blood urea nitrogen; CONUT - Controlling Nutritional Status; CRP - C-reactive protein; GNRI - Geriatric Nutritional Risk Index; HALO - HFpEF survivAL hOspitalization; HFpEF - heart failure with preserved ejection fraction; IVC - inferior vena cava; KCCQ - Kansas City Cardiomyopathy Questionnaire; LUS - lung ultrasound; LVEF - left ventricular ejection fraction; mGPS - modified Glasgow Prognostic Score; MLHFQ - Minnesota Living with Heart Failure Questionnaire; NT-proBNP - N-terminal pro–B-type natriuretic peptide; PASP - pulmonary artery systolic pressure; PNI - Prognostic Nutritional Index; TRI - TIMI Risk Index.

Because HFpEF case definitions varied across included cohorts (e.g., LVEF thresholds, use of natriuretic peptides, and diagnostic frameworks), these definitions are summarized in Supplementary Table 1 to aid interpretation of cross-study comparisons.

Supplementary Table 2 and 3 summarize outcome-oriented performance (mortality and hospitalization, respectively), whereas Table 2 synthesizes comparative applicability across clinical settings and phenotypes, highlighting when a tool is most informative and which inputs constrain bedside feasibility.

Table 2.

Practical applicability of clinician-usable HFpEF prognostic tools across settings and phenotypes

Clinical context	Most feasible instruments	What they primarily capture	When they tend to add value	Key limitations / cautions
Acute admission triage (HFpEF)	ARIC AD-HFpEF; GWTG-HF; TRI [10][11][12]	Short-term mortality risk using readily available vitals and labs	Early triage and discharge planning; prioritizing follow-up intensity	GWTG-HF derived in mixed-EF populations; TRI trades physiologic specificity for speed
Discharge vulnerability after acute HF	LUS B-lines (± BNP/echo) [19]	Residual pulmonary congestion	Near-term readmission/death risk; decongestion targets before discharge	Requires operator familiarity; cut-points may vary across protocols
Ambulatory / longitudinal risk	MAGGIC (± BNP); KCCQ [9][33]	Global clinical burden and patient-reported status	Longitudinal counselling and shared decisions; complements event-focused tools	MAGGIC derived in mixed-EF cohorts; KCCQ evidence often from mixed HF samples
Echocardiography-only hemodynamic burden	MEDIA; H2FPEF (resting echo) [16][18]	Filling pressure, pulmonary pressure, venous congestion, longitudinal function	When natriuretic peptides are equivocal or stress testing unavailable	Generalizability depends on echo quality and cohort characteristics (e.g., AF exclusion in MEDIA ambulatory validation)
Older / frail or recently hospitalized HFpEF	GNRI; CONUT; mGPS; NRS-2002 [21][22][23][24][29][30]	Frailty-like vulnerability, malnutrition, inflammation	Risk refinement beyond BMI; identifies patients who may benefit from nutritional evaluation and closer follow-up	Definitions and thresholds differ; readmission associations may be less consistent than mortality
Phenotype-specific pathways	WATCH-DM (T2DM); C2HEST (comorbidity burden); nutrition indices for rhythm pathway [20][25][32]	Diabetes burden, systemic comorbidity, vulnerability affecting rhythm and recovery	Discharge decisions and surveillance tailored to phenotype	Derived/validated in specific cohorts; may require local adaptation to preserve performance
Multimodal HFpEF event risk	HALO; prior HF hospitalization count [31]	Structural/hemodynamic load plus prior events	Higher-risk recurrent admission phenotype	Data requirements (echo + biomarkers + history); external validation still limited

Abbreviations: AF - atrial fibrillation; HFH - heart failure hospitalization; LUS - lung ultrasound; NP - natriuretic peptides.

GLOBAL CLINICAL RISK SCORES USABLE AT THE BEDSIDE

MAGGIC (Meta-Analysis Global Group in Chronic Heart Failure) and GWTG-HF (Get With The Guidelines–Heart Failure) remain common entry points because they can be calculated quickly, using variables that are usually available early in care. Both tools were derived in broader, mixed–ejection fraction heart-failure populations. Most HFpEF evidence therefore reflects subgroup validation, not HFpEF-only derivation. This may partly explain why discrimination in HFpEF tends to land in the moderate range, with calibration and risk separation varying across settings and endpoints (see Supplementary Table 2 and Supplementary Table 3) [9][10]. MAGGIC appears to retain prognostic value in HFpEF, particularly once natriuretic peptide information is incorporated, although the incremental benefit may depend on cohort severity mix and endpoint definition (all-cause vs HF-specific hospitalization; first vs recurrent events) [9]. Where head-to-head data are available, MAGGIC performs similarly to other established global models in HFpEF. In the Rich et al. [9] validation cohort, discrimination for all-cause death was MAGGIC C-statistic 0.74 (95% CI 0.68–0.80) versus SHFM 0.72 (95% CI 0.67–0.78), with BNP alone performing comparably (0.76; 95% CI 0.70–0.81).

GWTG-HF, developed for in-hospital risk stratification in acute heart failure, appears to carry forward clinically meaningful stratification after discharge in HFpEF, but reported performance varies with acuity at index admission, comorbidity burden, and local discharge practices [10].

For acute triage, the ARIC AD-HFpEF (Atherosclerosis Risk in Communities Acute Decompensated HFpEF) score is conceptually distinct because it was designed around HFpEF admissions and short-to-intermediate follow-up. It formalizes what many clinicians already use implicitly at presentation, such as blood pressure, renal indices, oxygenation, natriuretic peptides, and markers of frailty [11]. The TIMI Risk Index (TRI) pushes simplicity even further. It uses only heart rate, age, and systolic blood pressure and still appears to identify a higher in-hospital mortality risk in acute HFpEF, making it attractive when time and data are limited [12]. The trade-off is predictability. TRI is best suited for front-end triage, not longitudinal risk (Supplementary Table 2) [12].

DIAGNOSTIC SCORES REPURPOSED FOR PROGNOSIS

Diagnostic frameworks in HFpEF are increasingly used to estimate prognosis, although they were designed for case identification rather than outcome prediction. H2FPEF and HFA-PEFF are primarily diagnostic decision-support tools. Therefore, their prognostic use in established HFpEF should be considered off-label and interpreted with caution. This repurposing is still understandable. These scores capture pathophysiology linked to HFpEF events and can be calculated from routine clinical data collected during evaluation. Across ambulatory and inpatient cohorts, H2FPEF (Heavy, Hypertensive, Atrial Fibrillation, Pulmonary Hypertension, Elder, Filling pressure) generally stratifies risk of death and heart-failure hospitalization, and higher scores align with worse physiology, including exercise limitation and adverse hemodynamics [13][14]. However, cohorts differ in atrial fibrillation prevalence, obesity distribution, and diagnostic work-up. As a result, score distributions and observed risk gradients may shift even when the same cut-points are applied.

Head-to-head comparisons often favor H2FPEF over the HFA-PEFF algorithm when step-3 (stress) testing is unavailable, likely because H2FPEF can be computed from routine clinical data and resting echocardiography alone [14][15]. HFA-PEFF comprises Heart Failure Association-led pre-test assessment, echocardiography and natriuretic peptides, functional testing, and final etiologic classification. As summarized in Table 2, the apparent prognostic separation of these diagnostic frameworks is strongly conditioned by feasibility. Still, HFA-PEFF has prognostic value when natriuretic peptide data and stress-echocardiography components are obtained. Practical barriers and variable discrimination across settings may limit its use as a stand-alone tool [15]. In a hospitalized HFpEF cohort, the H2FPEF score correlated inversely with peak left-atrial strain and identified a higher risk of death or HF hospitalization, suggesting that H2FPEF continues to track atrial remodeling and carries prognostic information when stress testing is not feasible [16]. Przewłocka-Kośmala et al. [17] studied patients with exertional dyspnea and suspected HFpEF. In that head-to-head comparison, discrimination was similar for MAGGIC, H2FPEF, and step-2 HFA-PEFF (Harrell’s C = 0.637, 0.644, and 0.638, respectively), whereas incorporating step-3 exercise data improved discrimination (Harrell’s C = 0.715) [17]. This finding is setting-dependent (suspected HFpEF physiology cohort) and constrained by eligibility, because only patients in sinus rhythm were included (see Supplementary Table 2 and Supplementary Table 3).

IMAGING AND FUNCTIONAL CONGESTION METRICS

Echo-anchored tools are often appealing in HFpEF because they target filling pressures, pulmonary hypertension, and systemic congestion, which are processes that are tightly linked to decompensation and subsequent hospital admission. The MEDIA score, based on pulmonary artery systolic pressure, inferior vena cava collapsibility, E/e′, and lateral s′, offers a compact way to translate a multiparametric echocardiogram into a structured risk estimate [18]. Its reported performance differs by setting. In acute HFpEF, higher scores are more strongly associated with short-term mortality. In stable outpatients, the signal shifts toward future heart-failure hospitalization (Supplementary Table 2 and Supplementary Table 3) [18]. This pattern suggests setting-dependent prognostic meaning.

Because HFpEF events are closely linked to congestion, markers of residual hemodynamic burden at discharge carry prognostic weight. Lung ultrasound (LUS) B-lines quantify pulmonary congestion. In acute HFpEF, a high B-line burden at discharge predicts 6-month rehospitalization or death and, in several analyses, outperforms admission measures [19]. These results come primarily from early post-discharge windows and discharge-based protocols, so they should not be compared directly with admission-based scores or longer follow-up cohorts. Combining discharge LUS with BNP (B-type natriuretic peptide) and diastolic parameters further improves classification [19]. A simple B-line count (using a Youden cut-point of 22 B-lines) can function as an imaging-based risk metric that complements global scores (e.g., GWTG-HF, MAGGIC) and HFpEF-specific instruments (MEDIA) with utility in the vulnerable post-discharge window. LUS behaves more like a modifiable risk state than a static label, which may explain its incremental value when measured at discharge rather than at presentation. (Table 2).

PRAGMATIC RISK TOOLS OUTSIDE HFPEF-SPECIFIC SCORES

Some instruments enter HFpEF practice from adjacent clinical problems. The C2HEST score, initially developed to estimate incident atrial fibrillation, illustrates how a simple comorbidity-weighted framework may still enrich baseline risk in HFpEF, with associations extending beyond rhythm outcomes to death and hospitalization [20]. This should not be overinterpreted as a superior HFpEF prognostic model. Instead, it may be more appropriately interpreted as a low-friction baseline risk layer that reflects age and systemic comorbidity burden, factors that remain relevant across HFpEF phenotypes regardless of the dominant HFpEF mechanism. (Supplementary Table 3) [20].

IMMUNO-NUTRITIONAL SCORES FOR PROGNOSIS

A frequent topic across contemporary cohorts is that malnutrition and systemic inflammation carry independent prognostic weight in HFpEF and can be captured with low-cost indices derived from routine laboratory testing. GNRI (Geriatric Nutritional Risk Index), CONUT (Controlling Nutritional Status), and PNI (Prognostic Nutritional Index) each associate with mortality and with heart-failure readmission across several datasets (Supplementary Table 2 and 3) [21][22][23]. Among these measures, CONUT and GNRI tend to be most informative in older, hospitalized, or recently discharged cohorts [22][23]. Outpatient cohorts often show weaker absolute gradients, and follow-up horizons vary, limiting direct comparisons (Table 2).

In TOPCAT-Americas (Treatment of Preserved Cardiac Function Heart Failure With an Aldosterone Antagonist-American cohort), roughly one-third of participants met GNRI criteria for nutritional risk, and lower GNRI values independently tracked higher cardiovascular events and mortality [24]. This is trial-derived evidence in a selected population, not a universal HFpEF registry estimate. Nutritional risk quantified by CONUT, PNI, and NRI identified patients at higher risk of adverse rhythm-related outcomes in an HFpEF cohort undergoing rhythm control, reinforcing the broader prognostic role of malnutrition beyond mortality and HF readmission [25]. That cohort differs by pathway (rhythm management), endpoint mix, and comorbidity profile, so results should be interpreted within that context.

Equally important seems to be the temporal element. Worsening nutritional trajectories after discharge or during the index hospitalization (e.g., rising CONUT or falling GNRI) are associated with a higher subsequent risk, whereas improvement in these indices is associated with fewer later events (Supplementary Table 3) [26][27][28].

The modified Glasgow Prognostic Score (mGPS), which integrates CRP (C-reactive protein) and albumin, appears to add value over NT-proBNP for predicting 12-month death or HF hospitalization in ambulatory patients with HFpEF (Supplementary Table 2 and 3), thereby linking inflammation to clinically relevant outcomes [29]. Bedside instruments such as the nutritional index NRS-2002 (Nutritional Risk Screening 2002) can further identify high-risk inpatients, particularly men. This finding implies a sex-specific vulnerability that can be easily integrated into routine practices during hospitalization without additional cost [30]. From an implementation perspective, these indices are inexpensive and reproducible.

BIOMARKER-ENRICHED AND MULTIMODAL HFPEF-SPECIFIC TOOLS

Scores that combine biomarkers, cardiac structure, and prior clinical events reflect the idea that HFpEF risk has multiple drivers, so focusing on a single type of information is unlikely to capture the full risk profile. HALO is a clear example. By integrating clinical severity, echocardiographic burden, natriuretic peptide load, and prior hospitalizations, it predicts survival and future admission burden, treating recurrent decompensation as part of the disease trajectory rather than a downstream complication [31]. WATCH-DM, oriented to the diabetic phenotype, uses routinely available discharge variables and appears to stratify 1-year mortality after HFpEF hospitalization, with performance comparable to MAGGIC, while being more explicitly tailored to a cardiometabolic risk profile [32]. Both tools are most interpretable when applied to the populations they were tested in (post-hospitalization cohorts; diabetes enrichment for WATCH-DM). They are less suited to uniform application across all HFpEF presentations (see comparative notes in the tables) [31][32].

Patient-reported health adds a complementary dimension. Kansas City Cardiomyopathy Questionnaire (KCCQ or KCCQ-12) predicts death and hospitalization (Supplementary Table 2 and 3). In a direct comparison, KCCQ showed higher discrimination than the Minnesota Living With Heart Failure Questionnaire (MLHFQ) (C-index 0.702 [95% CI 0.666–0.738] vs 0.658 [95% CI 0.621–0.695]) and a stronger risk gradient per 5-point change in the expected direction (adjusted HR 0.894 per 5-point KCCQ increase vs 1.077 per 5-point MLHFQ increase) [33]. Higher KCCQ reflects better health status, whereas higher MLHFQ reflects worse status. However, studies vary by instrument version, follow-up, and endpoint definition, which should be considered when comparing effect sizes across cohorts.

Patient-reported health adds a complementary dimension. Kansas City Cardiomyopathy Questionnaire (KCCQ or KCCQ-12) predicts death and hospitalization (Supplementary Table 2 and 3) and often outperforms symptom inventories such as MLHFQ [33]. However, studies vary by instrument version, follow-up completeness, and endpoint definition, which should be considered when comparing effect sizes across cohorts.

TEMPORAL TRAJECTORIES AND PHENOTYPE-AWARE APPLICATION

What happens over time tends to matter more than a single snapshot. Worsening CONUT or declining GNRI over months or even within a single hospitalization independently signals a higher subsequent risk, whereas stabilization or improvement appears to attenuate it [26][27][28]. Natriuretic peptide trajectories show similar dynamics and are partly captured by HALO via prior admission count [31]. The practical implication is that there is a need to move beyond a one-time score. Reevaluate with low-cost indices (CONUT, GNRI), global tools (MAGGIC, GWTG-HF), and congestion metrics (LUS B-lines) at clinically meaningful checkpoints, such as admission, discharge, 30–90 days, and after any re-hospitalization [28] [19]. Such serial assessments are straightforward and likely to catch turning points that a baseline score misses.

Building on this emphasis on reassessment, a layered workflow can align tools with the decision point. At admission or discharge, a global score can provide baseline context for triage and early planning (e.g., MAGGIC or GWTG-HF, and ARIC/TRI in acute presentations) [9][10][11][12]. At discharge, adding a congestion-focused measure such as LUS B-lines helps quantify residual risk that baseline clinical variables may not capture [19]. During early follow-up (around 30–90 days) and after any rehospitalization, repeating low-cost nutrition–inflammation indices (CONUT, GNRI) and updating biomarker burden can track whether risk is stable or undergoing meaningful change (see Fig 1) [26][27][28]. Across cohorts, congestion and trajectory sensitive measures tend to outperform baseline-only tools for near-term events.

The choice and sequencing of layers then become phenotype dependent. In older or frail patients, nutrition-based scores (GNRI, CONUT, mGPS) often outperform anthropometrics such as BMI, which is fluid-sensitive and may miss sarcopenia. These findings suggest that nutritional frailty is likely a key driver of events in this population [21][22] [23] [29]. In phenotype-specific settings, nutritional indices also inform prognosis: higher CONUT and lower PNI/NRI were associated with a greater risk of adverse rhythm outcomes in HFpEF patients undergoing rhythm control, supporting the use of nutrition screening to guide follow-up frequency and intervention thresholds [25]. In the diabetic phenotype, WATCH-DM provides a straightforward, diabetes-tailored stratification tool at discharge. Used alongside NT-proBNP (N-terminal pro–B-type natriuretic peptide) and, where feasible, KCCQ, it synthesizes biological signals with patient-reported outcomes [32][33]. Sex differences warrant attention as well. Bedside screening with NRS-2002 uncovers a pronounced mortality indicator in men during acute admissions, arguing for sex-aware escalation of nutritional support [30]. Finally, generalizability is not guaranteed. When phenotype distributions differ from derivation cohorts, local recalibration is preferable [13].

LIMITATIONS

This review is narrative and integrative rather than a preregistered systematic review, and we did not pool estimates. The main limitation is heterogeneity across the underlying studies. Cohorts differed by setting (hospitalized, post-discharge, outpatient, trial), HFpEF definitions and diagnostic work-up, phenotype distributions, and endpoints. As a result, numerical performance metrics are not fully comparable, and transportability across contexts is constrained.

Reporting was inconsistent. Discrimination was frequently provided, whereas calibration was variably assessed and rarely standardized. Effect measures were reported on different scales, and robust external validation and within-cohort head-to-head comparisons were limited.

In addition, several tools discussed as prognostic tools (e.g., H2FPEF and HFA-PEFF) were originally designed for diagnostic evaluation; therefore, prognostic use represents off-label repurposing and may be sensitive to missing components and local phenotype mix.

Finally, the focus on clinician-usable scores may underrepresent higher-dimensional models that could improve prediction but require infrastructure and recalibration before routine implementation.

CONCLUSIONS

Baseline clinical scores in HFpEF offer only moderate discrimination. Performance appears to improve when assessment reflects what drives events, such as residual congestion, nutritional or inflammatory state, and how these change over time. A layered workflow seems most practical: start with an implementable clinical or diagnostic score (MAGGIC, GWTG-HF, H2FPEF/HFA-PEFF), add natriuretic peptides, check discharge lung ultrasound for B-lines, and follow simple nutrition indices plus KCCQ longitudinally. This mix is quick and inexpensive, yet very efficient. It may support admission triage, early post-discharge planning, and follow-up. Utility is likely phenotype-dependent: nutrition-centric indices are particularly informative in older or frail patients, while WATCH-DM provides diabetes-specific stratification.

Clinical Prognostic Scoring Systems in Heart Failure with Preserved Ejection Fraction: An Integrative Review of Risk Prediction Models

Full Article

Paradigm

My account