Skip to main content
Have a personal or library account? Click to login
Quasiprobabilistic Imaginary-Time Evolution on Quantum Computers Cover

Quasiprobabilistic Imaginary-Time Evolution on Quantum Computers

Open Access
|Jun 2026

Full Article

1.
Introduction

Quantum computers are expected to have wide-ranging applications in various domains [14] but perhaps most significantly in the simulation of quantum physical systems. Important problems in this arena concern the estimation of ground-state and thermal equilibrium properties of quantum many-body systems, problems which have applications in chemistry and material science [2,57]. Consequently, there has been an enormous amount of research [811] directed towards the development of quantum algorithms for these problems. Although computational complexity results limit the possibility for dramatic quantum speed-ups for these problems [1214], there is also theoretical evidence that quantum computers potentially provide an advantage over classical algorithms [15].

The idea of imaginary-time evolution plays a key role in many algorithms, both classical and quantum, for simulating ground-state and equilibrium properties [1624]. Given a Hamiltonian H and an inverse temperature β ≥ 0, imaginary-time evolution (ITE) under H is described by the exponential operator eβH. Performing ITE with sufficiently high β will drive a quantum system to its ground state, making it a useful primitive in algorithms for ground state preparation. ITE is also used as a subroutine in quantum algorithms for preparing thermal equilibrium or Gibbs states [25], which find application in solving optimization problems. Performing ITE of quantum Hamiltonians using classical methods can be computationally prohibitive due to the exponential increase in Hilbert space dimension and the sign problem [26,27]. Many existing quantum algorithms such as those based on block-encoding [2832] require fault-tolerant quantum computers and are beyond the reach of current hardware [33]. Several recent works have therefore proposed quantum algorithms that are potentially more suitable for near-term to intermediate quantum devices [3440].

In this work, we add to the quantum toolkit for imaginary-time evolution and develop a new near-term friendly algorithm using ideas from quantum error mitigation (QEM), a collection of classical post-processing techniques that aim to mitigate the effects of noise without requiring the full machinery of quantum error correction and faulttolerance [41,42]. These techniques are usually not scalable, but have been shown to significantly improve the performance of current devices [42]. In particular, our work is inspired by the error-mitigation technique of probabilistic error cancellation (PEC) [41,4345]. The original motivation of PEC as a technique is to invert a noise channel on noisy quantum devices [46]. Probabilistic error cancellation simulates the action of an ideal, unitary operation from a linear combination of native channels implementable on a quantum device aided by post-processing. The basic idea underlying PEC is as follows. Given a quantum device with a set of known native (and possibly noisy) gates, one decomposes a target unitary U into a weighted sum of the native gates. The coefficients in the linear combination will be real numbers that sum to 1, although they may or may not be positive and hence form a quasi-probability distribution (QPD). Nevertheless, the action of U can be effected by a protocol that probabilistically samples a native gate from the linear combination. With a sufficient number of samples and simple classical post-processing, the expectation value of an observable after the action of the target unitary U can be reconstructed. Recovering the ideal expectation value comes at an increased cost related to the negativity of the quasi-probability distribution. The success of the PEC method relies on an accurate characterization of the native gates in the quantum device. This can be a very challenging task in practice, although recent work has shown that tailoring the device noise to Pauli noise (via randomized compiling [47]) allows efficient and accurate noise characterization [48]. We note that related QPD ideas have also been used in other quantum information contexts, such as classical simulation of quantum circuits [4953], circuit cutting [5458], and for mitigating noise in non-Clifford gates [59].

Here, we design a quasi-probabilistic algorithm inspired by PEC to perform imaginary-time evolution for a given Hamiltonian H. The starting point of our algorithm is the observation that we can implement non-unitary and even non-trace-preserving operations as a linear combination of native gates and measurement channels. We use our algorithm to estimate the expectation value of a given observable for imaginary-time evolved states, and we analyze its cost, specifically for the case of Trotterized imaginary time evolution. In addition, we show that our algorithm is suitable for preparing thermal pure quantum (TPQ) states, pure states which can be used to estimate the thermal expectation value of observables; see Figure 1. We demonstrate the efficacy of our approach through classical simulations and demonstrations on quantum hardware.

Figure 1.

Thermal expectation value estimation using ITE. For each data point, we average the error in the thermal expectation value |〈ψi|Oj|ψi〉 – tr(|ψβ〉〈ψβ|Oj)| over 30 randomly chosen Pauli observables Oj and 10 random TPQ states |ψi〉 = e−0.02HUi|0〉n, where H is the 1D Heisenberg Hamiltonian on n qubits and Ui is a randomly chosen Clifford unitary. The blue and orange points show the values for exact and simulated ITE, respectively, where the simulation implements Algorithm 1.

Our method of simulating ITE in expectation using quasiprobability decompositions has a few features that make it amenable for implementation on near-term devices where qubits are few, and multi-qubit entangling gates are noisier than single-qubit gates. First, our method does not need auxiliary qubits beyond those of the system described by a given Hamiltonian. Second, we avoid the problem of implementing multi-qubit-controlled operations between auxiliary and data qubits [25,60], which can require large SWAP networks for implementation on devices that only permit entangling operations between nearest-neighbors (such as present-day superconducting devices).

The remainder of the paper is organized as follows. In Section 2, we review the quasiprobability decomposition of linear maps and discuss our extension to nonlinear maps, including our algorithm for estimating rescaled expectation values. Following this, in Section 3, we use our algorithm for the specific case of imaginary time evolution and combine the protocol with various choices of initial states, focusing on the thermal pure quantum state. In this section, we also present the results of our demonstrations conducted using IBM’s quantum computing hardware. Finally, we provide a summary and discuss future work in Section 5.

2.
Implementing Linear Maps via Quasiprobability Decompositions

We consider the problem of implementing Hermiticity-preserving linear maps—which may not be trace-preserving in general—on a quantum device with a (basis) set of completely positive (CP) native operations. In Section 2.1, we present our algorithm for estimating rescaled expectation values of observables with respect to quantum states evolving under the linear map of interest. In Section 2.2, we analyze the sampling cost of our algorithm. Our method requires, as input, a decomposition of the linear map of interest into a linear combination of the set of available operations. In Section 2.4, we discuss the conditions under which a linear map can be decomposed as such, and we briefly review the method to obtain the linear decomposition. And in Section 2.3, we generalize our algorithm to the case of implementing a sequence of linear maps.

2.1.
Procedure for Estimating Rescaled Expectation Values of Observables

Suppose we have a linear map 𝒯 over the 2n-dimensional Hilbert space, and a set {Bi} of native operations on a quantum device such that for some real coefficients qi, 1T(ρ)=iqiBi(ρ){\cal T}(\rho ) = \sum\limits_i {{q_i}} {{\cal B}_i}(\rho ) for all input states ρ. Using Eq. (1), we can define parameters γ and pi such that 2γ:=i| qi |,\gamma : = \sum\limits_i {\left| {{q_i}} \right|} , 3pi:=| qi |γ.{p_i}: = {{\left| {{q_i}} \right|} \over \gamma }.

We refer to such a linear combination as a quasiprobability decomposition (QPD). Note that since the qi are real, γ ≥ 0 and the pi form a probability distribution. Moreover, Eq. (1) can be rewritten as 4T(ρ)=γisgn(qi)·pi·Bi(ρ).{\cal T}(\rho ) = \gamma \sum\limits_i {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{{\cal B}_i}(\rho ).

This immediately suggests a way to probabilistically build the map 𝒯 by sampling from the set {Bi} according to the probability distribution pi (see Figure 2).

Figure 2.

A schematic of a linear map 𝒯 implemented by sampling from its quasiprobability distribution (see Eq. (4), Algorithm 1). Given a quasiprobability decomposition of the map 𝒯, circuits are randomly sampled according to the distribution |qi|/γ, and their measurement outcomes are weighted by the factors sgn(qi)γ.

In particular, we consider the task of estimating the expectation value of some observable A (having operator norm ∥A∥ ≤ 1) with respect to the state 𝒯(ρ). Since 𝒯 may be a non-trace-preserving map, the state 𝒯(ρ) is unnormalized in general. Thus, the rescaled expectation value of the observable A is 5AT(ρ):=tr[AT(ρ)]tr[T(ρ)].{\langle A\rangle _{{\cal T}(\rho )}}: = {{{\mathop{\rm tr}\nolimits} [A{\cal T}(\rho )]} \over {{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]}}.

In order to correctly estimate the rescaled expectation value of A, we require estimates of the numerator and denominator of Eq. (5) which are obtained using Eq. (4) 6tr[AT(ρ)]=γisgn(qi)·pi·tr[ ABi(ρ) ],{\mathop{\rm tr}\nolimits} [A{\cal T}(\rho )] = \gamma \sum\limits_i {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{\mathop{\rm tr}\nolimits} \left[ {A{{\cal B}_i}(\rho )} \right], 7tr[T(ρ)]=γisgn(qi)·pi·tr[ Bi(ρ) ].{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )] = \gamma \sum\limits_i {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{\mathop{\rm tr}\nolimits} \left[ {{{\cal B}_i}(\rho )} \right].

Denoting the trace-preserving and non-trace-preserving operations in {Bi} as BTP and BNTP, respectively, the denominator in Eq. (5) can be rewritten as 8tr[T(ρ)]=γ(BiBTPsgn(qi)·pi+BiBNTPsgn(qi)·pi·tr[ Bi(ρ) ]).{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )] = \gamma \left( {\sum\limits_{{{\cal B}_i} \in {{\cal B}_{{\rm{TP}}}}} {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i} + \sum\limits_{{{\cal B}_i} \in {{\cal B}_{{\rm{NTP}}}}} {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{\mathop{\rm tr}\nolimits} \left[ {{{\cal B}_i}(\rho )} \right]} \right).

In this work, we consider BNTP to comprise rank-1 projectors Π|ψ = |ψ〉 〈ψ| on specific quantum states |ψ〉 (see Table A1 for example). Typically, such non-trace-preserving operations would be implemented via a POVM given by {|ψ〉 〈ψ|, |ψ〉 〈ψ|} and post-selecting on the outcome |ψ〉. Thus, tr [Bi(ρ)] is the probability of post-selecting the outcome corresponding to the projector for Bi.

If we randomly sample N circuits from the probability distribution pi in Eq. (2), the random variable Wj := γ sgn(qi)Ij corresponding to the jth circuit is an estimator (see Eq. (9)) for tr [𝒯(ρ)], where j ∈ {1,2,…,N}. Here, the indicator variable Ij is 0 when the jth circuit implements an operation Bi ∈ BNTP whose associated POVM outcome is discarded; otherwise Ij = 1. 9E[ Wj ]=E[ γsgn(qi)Ij ]=γi,Ijsgn(qi)Ij·| qi |γ·Pr[ Ij ]\left[ {{W_j}} \right] = \left[ {\gamma {\mathop{\rm sgn}} \left( {{q_i}} \right){I_j}} \right] = \gamma \sum\limits_{i,{I_j}} {{\mathop{\rm sgn}} } \left( {{q_i}} \right){I_j}\cdot{{\left| {{q_i}} \right|} \over \gamma }\cdot\Pr \left[ {{I_j}} \right] 10=γisgn(qi)·pi·tr[ Bi(ρ) ] = \gamma \sum\limits_i {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{\mathop{\rm tr}\nolimits} \left[ {{{\cal B}_i}(\rho )} \right] 11=tr[T(ρ)] = {\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]

Similarly, we show in Appendix D that the random variable Mj = γ sgn(qi)Aj, where Aj is the outcome of measuring the observable A for the jth circuit, is an estimator of tr [A𝒯(ρ)]. We remark that the measurement outcome Aj is necessarily zero when the indicator variable Ij is zero, i.e., when the jth sample is discarded. 12E[ Mj ]=tr[AT(ρ)]\left[ {{M_j}} \right] = {\mathop{\rm tr}\nolimits} [A{\cal T}(\rho )]

Algorithm 1 Estimate re-scaled expectations

Input:

    QPD of 𝒯 as 𝒯(·) = γi sgn(qi)piBi(·)

    Initial state ρ

    Number of random samples N

    Observable A

Output: Estimate of the re-scaled expectation value 〈A𝒯

 1: for j = 1 to N do

 2:   Sample i from the distribution pi

 3:   if Bi is TP then

 4:     apply Bi directly to ρ

 5:     Ij ← 1                                        ⊳ Indicator variable

 6:   else

 7:     Measure in the basis corresponding to Bi

 8:     Store the outcome as Ij ∈ {0, 1}

 9:   end if

 10:   Wjγ · sgn(qi) · Ij                                        ⊳ Weighted outcome

 11:   Measure A

 12:   Store the outcome as Aj

 13:   Mjγ · sgn(qi) · Aj

 14: end for

 15: tr[T(ρ)˜]jWj/N{\mathop{\rm tr}\nolimits} [\widetilde {{\cal T}(\rho )}] \leftarrow \sum\nolimits_j {{W_j}} /N        ⊳ Empirical estimate of tr [𝒯(ρ)]

 16: tr[AT˜(ρ)]jMj/N{\mathop{\rm tr}\nolimits} [\widetilde {A{\cal T}}(\rho )] \leftarrow \sum\nolimits_j {{M_j}} /N      ⊳ Empirical estimate of tr [A𝒯(ρ)]

 17: return tr[AT˜(ρ)]/tr[T(ρ)˜]{\mathop{\rm tr}\nolimits} [\widetilde {A{\cal T}}(\rho )]/{\mathop{\rm tr}\nolimits} [\widetilde {{\cal T}(\rho )}]

Together, the two estimators Wj, Mj allow us to estimate the rescaled expectation value in Eq. (5). We formalize the procedure described above in Algorithm 1.

2.2.
Sampling Cost of the Algorithm

The crux of Algorithm 1 lies in the estimation of tr [ A𝒯(ρ)] and tr [𝒯(ρ)] separately, in order to estimate 〈A𝒯(ρ) via their ratio as per Eq. (5). We use standard concentration inequalities to bound the number of samples N needed to estimate tr [A𝒯(ρ)] and tr [𝒯(ρ)] each within some precision ϵ, which allows us to bound the error in the estimate of 〈A𝒯(ρ) as per Lemma 1.

Lemma 1 (Algorithm 1 cost).

Given a decomposition of a linear map 𝒯 into a set of operations as in Eq. (1) and a Hermitian observable A with ∥A∥ ≥ 1, Algorithm 1 uses N=O(γ2ϵ2)N = {\cal O}\left( {{{{\gamma ^2}} \over {{^2}}}} \right) random circuits drawn from the distribution pi in Eq. (2), to return an estimate of the re-scaled expectation valueA𝒯 within error 132ϵ|tr[T(ρ)]|(1+ϵ|tr[T(ρ)]|).{{2} \over {|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]|}}\left( {1 + { \over {|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]|}}} \right).

Proof sketch of Lemma 1.

We note that for an observable A with operator norm ∥A∥ ≤ 1, the random variables Wj, Mj are such that 14γWj,Mjγ. - \gamma \le {W_j},{M_j} \le \gamma .

For N circuits (enumerated by the index j) sampled independently from pi, we can define the following: 15W¯=jWjN,\bar W = {{\sum\nolimits_j {{W_j}} } \over N}, 16M¯=jMjN\bar M = {{\sum\nolimits_j {{M_j}} } \over N}

For some ϵ ≥ 0, Hoeffding’s inequality implies 17Pr[ | W¯E[ Wj ] |ϵ ]exp(Nϵ22γ2)\Pr \left[ {\left| {\bar W - \left[ {{W_j}} \right]} \right| \ge } \right] \le \exp \left( { - {{N{^2}} \over {2{\gamma ^2}}}} \right) 18Pr[ | M¯E[ Mj ] |ϵ ]exp(Nϵ22γ2).\Pr \left[ {\left| {\bar M - \left[ {{M_j}} \right]} \right| \ge } \right] \le \exp \left( { - {{N{^2}} \over {2{\gamma ^2}}}} \right).

We can bound the RHS of Eq. (17) and (18) by 0 ≤ δ < 1 by choosing N=2γ2ϵ2N = {{2{\gamma ^2}} \over {{^2}}} log(1/δ). Then with probability at least 1 – δ, we have 19| M¯W¯E[ Mj ]E[ Wj ] || E[ Wj ](M¯E[ Mj ])W¯E[ Wj ] |+| E[ Mj ](E[ Wj ]W¯)W¯E[ Wj ] |\left| {{{\bar M} \over {\bar W}} - {{\left[ {{M_j}} \right]} \over {\left[ {{W_j}} \right]}}} \right| \le \left| {{{\left[ {{W_j}} \right]\left( {\bar M - \left[ {{M_j}} \right]} \right)} \over {\bar W\left[ {{W_j}} \right]}}} \right| + \left| {{{\left[ {{M_j}} \right]\left( {\left[ {{W_j}} \right] - \bar W} \right)} \over {\bar W\left[ {{W_j}} \right]}}} \right| 20ϵ|W¯|(1+| E[ Mj ]E[ Wj ] |) \le { \over {|\bar W|}}\left( {1 + \left| {{{\left[ {{M_j}} \right]} \over {\left[ {{W_j}} \right]}}} \right|} \right) 212ϵ|W¯| \le {{2} \over {|\bar W|}}

In the above, we have used Eqs (9) and (12), and that E[ Mj ]E[ Wj ]1{{\left[ {{M_j}} \right]} \over {\left[ {{W_j}} \right]}} \le 1. Additionally, we have 221|W¯|1| | E[ Wj ] |ϵ |1| E[ Wj ] |(1+ϵ| E[ Wj ] |).{1 \over {|\bar W|}} \le {1 \over {\left| {\left| {\left[ {{W_j}} \right]} \right| - } \right|}} \le {1 \over {\left| {\left[ {{W_j}} \right]} \right|}}\left( {1 + { \over {\left| {\left[ {{W_j}} \right]} \right|}}} \right).

Using Eqs (21) and (22), and recalling Eq. (5), we see that for the chosen N, with probability at least (1 – δ), we have 23| M¯W¯AT |2ϵ|tr[T(ρ)]|(1+ϵ|tr[T(ρ)]|).\left| {{{\bar M} \over {\bar W}} - {{\langle A\rangle }_{\cal T}}} \right| \le {{2} \over {|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]|}}\left( {1 + { \over {|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]|}}} \right).

Algorithm 1 requires a decomposition of the form Eq. (1) as input. Given a basis, such a decomposition can be obtained classically by a linear program [41,61]. The cost of finding the QPD scales polynomially with the Hilbert-space dimension and is thus tractable only for operations on k-qubit subsystems with kn. Implementing an n-qubit operation therefore requires us to first approximate it as a sequence of k-qubit operations, and then use the QPD method for each operation in the sequence. This protocol and an analysis of its cost are the focus of the next section. We defer a detailed discussion of how to find a suitable quasi-probability decomposition to Section 2.4. There we will also discuss some possible choices of QPD basis, and their effect on the cost of our algorithms through the factor γ.

2.3.
Implementing a Sequence of Operations

We now explain how to implement a sequence of linear maps 𝒯1, 𝒯2, …, 𝒯R where each map 𝒯i acts non-trivially on a constant k number of qubits (see Figure 3). Later, we will seek to implement an imaginary-time evolution by approximating it through a Trotter-Suzuki formula that gives rise to a sequence of k-local maps. We restrict ourselves to the case where each 𝒯i has the form 𝒯⊗I for some k-qubit map 𝒯. That is to say, 𝒯i acts as 𝒯 on a k-qubit subsystem and as the identity on all other qubits; note that each 𝒯i may act non-trivially on a different set of k qubits. We emphasize that our algorithm and its analysis do not require the 𝒯i’s to have this form. We make this assumption because it helps simplify notation and is sufficient for implementing imaginary-time evolution of k-local Hamiltonians such as the Heisenberg model. We present the protocol for applying a sequence of 𝒯 maps in Algorithm 2, and in Lemma 2, we analyze its cost.

Figure 3.

Example of an operation 𝒯′ = 𝒯1 𝒯2 𝒯3 that we implement. Here 𝒯1, 𝒯2, 𝒯3 are 2-qubit operations applied on nearest-neighbor qubits at different locations. In the analysis of Lemma 2, we assume for simplicity that 𝒯1 = 𝒯2 = 𝒯3 = 𝒯.

Algorithm 2 Estimate re-scaled expectations for repeated applications of 𝒯

Input:

    Operator ℰ consisting of R applications of 𝒯

    QPD of 𝒯 as 𝒯(·) = γi sgn( qi)piBi(·)

    Initial state ρ

    Number of random samples N

    Observable A

Output: Estimate of the re-scaled expectation value 〈Aϵ

 1: for j = 1 to N do

 2:   Sample (i1,…,iR) from the joint (independent) distribution piRp_i^R

 3:     for k = 1 to R do

 4:       if Bik is TP then

 5:         Apply Bik

 6:         Ijk ← 1                                                  ⊳ Indicator variable

 7:     else

 8:       Measure in the basis corresponding to Bik

 9:       Store the outcome as Ijk ∈ {0,1}

 10:     end if

 11:     Wjkγ · sgn(qik) · Ijk                                                  ⊳ Weighted outcome

 12:     end for

 13:     Measure A

 14:     Store the outcome as Aj

 15:     MjγR · sgn(qj1)…sgn(qjR) · Aj

 16: end for

 17: tr[ε(ρ)]˜j(ΠkWjk)/N{\mathop{\rm tr}\nolimits} \widetilde {[{\cal E}(\rho )]} \leftarrow \sum\nolimits_j {\left( {{\Pi _k}{W_{{j_k}}}} \right)} /N        ⊳ Empirical estimate of tr [ε(ρ)]

 18: tr[Aε(ρ)]˜jMj/N{\mathop{\rm tr}\nolimits} \widetilde {[A{\cal E}(\rho )]} \leftarrow \sum\nolimits_j {{M_j}} /N        ⊳ Empirical estimate of tr [(ρ)]

 19: return tr[Aε(ρ)]˜/tr[ε(ρ)]˜{\mathop{\rm tr}\nolimits} \widetilde {[A{\cal E}(\rho )]}/{\mathop{\rm tr}\nolimits} \widetilde {[{\cal E}(\rho )]}

Lemma 2 (Algorithm 2 cost).

Suppose we are given the QPD of a linear map 𝒯, acting non-trivially on a constant number of qubits. Define the n-qubit map ε as the composition of R applications of 𝒯, where each 𝒯 acts non-trivially on at most k-qubits (see Figure 3). Then, for a Hermitian observable A withA∥ ≥ 1, Algorithm 2 uses N=O(γ2Rϵ2)N = {\cal O}\left( {{{{\gamma ^{2R}}} \over {{^2}}}} \right) random circuits drawn from the distribution pi in Eq. (2), to return an estimate of the rescaled expectation valueAε within error 242ϵ|tr[ε(ρ)]|(1+ϵ|tr[ε(ρ)]|).{{2} \over {|{\mathop{\rm tr}\nolimits} [{\cal E}(\rho )]|}}\left( {1 + { \over {|{\mathop{\rm tr}\nolimits} [{\cal E}(\rho )]|}}} \right).

The proof of Lemma 2 is similar to that of Lemma 1 and can be found in Appendix D.

2.4.
Quasiprobability decomposition of linear maps

We now consider the task of obtaining the linear decomposition in Eq. (1), which is required as an input to Algorithms 1 and 2. Given a k-qubit map 𝒯 and a complete basis (consisting of 16k linearly independent operators), the coefficients qi in Eq. (1) can be directly solved for via a system of 16k independent equations. The tensor product of the Endo-Benjamin-Li (EBL) basis [43] is an example of a complete basis for all linear maps. It is composed of single-qubit Clifford operations (TP) and projectors (non-TP); see Table A1. It may be possible for a basis with fewer elements than 16k to be used to obtain a QPD for a given map 𝒯. We note that a basis containing only trace-preserving (TP) elements can only be used to express maps 𝒯 that are scalar multiples of TP operations. However, operations such as the ITE operator are not simply scalar multiples of TP operations,1 and therefore require the QPD basis to have at least one non-TP operation.

Different choices of basis will affect the values of the coefficients qi and hence the value of γ. From the analysis of our algorithm in Section 2.2, we know that the number of samples required for estimating expectation values scales as 𝒪(γ2). Consequently, the sampling cost of our method can be reduced by minimizing the value of γ in the QPD of the map of interest. In the context of PEC, the QPD of a k-qubit map 𝒯 over a basis {Bi} can be obtained by solving a linear program [41,61] that minimizes γ. It is known that a lower bound of γ for a HPTP map 𝒯 is given by its diamond norm ∥𝒯∥ [62,63].2

As an example of the impact of the choice of basis, we consider the ITE operator eβH of the 2-qubit Heisenberg Hamiltonian H = –XXγγZZ + II.3 In Figure 4, we show the trends of γ as a function of β for two different choices of basis. We find that the ITE operator for the 2-qubit Heisenberg Hamiltonian can be decomposed into the Takagi basis (Table A2) with a lower value of γ than the EBL basis, where the Takagi basis value is closer to the lower bound given by the diamond norm (see Figure 4). This reduced value of γ can be attributed to the presence of entangling operations in the Takagi basis. Since the EBL basis for 2-qubits is simply the (uncorrelated) Cartesian product of two single-qubit basis sets, it has a high sampling overhead for simulating entangling operations in expectation.

Figure 4.

Scaling of the QPD cost γ with inverse temperature β for the decomposition of eβH using different basis sets, where H is the 2-qubit Heisenberg Hamiltonian H = –XXγγZZ + II. The Takagi basis (orange) outperforms the EBL basis (blue), illustrating the utility of including entangling gates in the QPD basis set. The lower bound, obtained using the diamond norm [62,63], is shown in teal. (See Appendix C for the analysis of optimal sampling cost and the effect of adding identity to the Hamiltonian for ITE.)

This is apparent in the case of the CNOT operation [43,64], which has γCNOT = 9 for the 2-qubit EBL basis but γCNOT = 1 when decomposed into the Takagi basis. We remark that although the Takagi basis permits a decomposition of the ITE operator in this case, it is not a complete basis for general HP maps on a 2-qubit system. The Takagi basis is only known to be complete for the set of all 2-qubit CPTP maps [64]. A more detailed analysis of QPD and choices of basis can be found in Appendix A.

The problem of obtaining a QPD for a given map requires careful consideration of two important factors: the choice of basis set used for the decomposition and the value of γ obtained from the QPD. In general, the QPD corresponding to the optimal value of γ could be a linear combination of entangling operations or general quantum instruments that need to dilate to a bigger Hilbert space, which may be expensive (in gate depth) to implement directly on a quantum device. This tradeoff between the generality of the basis and the optimality of the decomposition [43, 64,65] has been well studied for the specific case of QPDs for CPTP maps, with several results on optimizing [45,61] the choice of basis for different use-cases. However, the solution to this problem for general HP maps, including NTP maps such as ITE which commonly occur in physics, is yet (to the best of our knowledge) unknown. We provide some discussions in Appendix A. Additionally, the problem of finding a sample-efficient yet hardware-friendly basis for decomposing maps acting on systems larger than 2-qubits quickly becomes intractable (since the size of the associated classical optimization problem grows polynomially with the dimension of the Hilbert space) and remains a challenge even for the case of decomposing CPTP maps.

3.
QPD Implementation of Imaginary-Time Evolution

We now use the algorithms developed in Section 2 to implement imaginary time evolution eβH for a local Hamiltonian H 25H=l=1LHl,H = \sum\limits_{l = 1}^L {{H_l}} , presented as a sum of L = O(poly(n)) local terms Hl acting non-trivially on at most a constant k number of qubits. We may assume without loss of generality that H ≥ 0 and ∥H∥ ≤ 1, where ∥·∥ denotes the operator norm. Our goal is to implement the operator eβH for real β > 0. More precisely, for a given initial state |ψ〉, we want to estimate expectation values of a given observable A with respect to the quantum state |ψβ〉 ∝ eβH|ψ〉. Some basics of ITE and the optimal decomposition of ITE are presented in Appendix B and Appendix C, respectively.

3.1.
Warm-Up Example: 2-Qubit Heisenberg Model

We first consider the simple case of implementing the ITE of a 2-qubit Heisenberg model 26H=XXYYZZH = - XX - YY - ZZ

Since the basis {Bi} in Table A1 is known to be complete for all linear maps on single-qubits, we use the 256-element Cartesian product set {Bi} × {Bj} as the basis for decomposing the 2-qubit ITE operator eβH into a QPD. In Figure 5, we simulate Algorithm 1 for estimating the energy of the 2-qubit Heisenberg model when it is initialized in the state |00〉 and undergoes ITE for increasing values of β = 0.01t, where t is the Trotter step. We note that the simulation is noiseless and there is no Trotter error because H commutes with itself. Therefore, the only error in the simulation is due to the QPD sampling.

Additionally, we implement the 2-qubit ITE operator eβH using Algorithm 1 on a 5-qubit superconducting processor (ibm_manila). In this case, we use the noisy native gates on the device as the basis for QPD using noise characterization information. In particular, we take the noisy basis { Bi }\left\{ {{\cal B}_i^\prime } \right\} to be {𝒩 ⚬ Bi} where 𝒩 is the noise channel used to model the noise in the device, based on calibration data provided by IBM [66].

Figure 5.

Energy estimation using ITE. We estimate the energy 〈ψt|H|ψt〉/〈ψt|ψt〉, where H is the 2-qubit Heisenberg Hamiltonian and |ψt〉 = (e−0.01H)t|00〉 is the imaginary-time evolved state with t Trotter steps. The blue, orange, and teal points show the values for exact, simulated, and hardware ITE, respectively, where the simulation (resp., demonstration) implements Algorithm 2 on classical (resp., quantum) hardware.

In the simulation and demonstration on hardware, we used 400, 800, 3200, and 25600 QPD samples for Trotter steps 1, 2, 3, and 4, respectively. For each sample, 512 shots of the corresponding circuit were performed (or simulated). For each Trotter step, this process was repeated 10 times; the data in Figure 5 show the mean estimate and the standard deviation. It is instructive to compare the number of samples we used with the cost estimate from 1 of 26000, 50000, 79000, and 114000 circuits for Trotter steps 1, 2, 3, and 4, respectively (for a target error of 1%). These numbers suggest that our analytical bound is not tight.

The agreement between the demonstration on noisy hardware and the noiseless (classical) simulation indicates that our method can be made noise-resilient when information about the noise in a device is available. We note that the demonstration data for the third Trotter step is significantly worse than for the other Trotter steps. A possible explanation for this is that the device noise drifts over time, and so the noisy basis { Bi }\left\{ {{\cal B}_i^\prime } \right\} may become inaccurate, leading to bias in the results. This is a particular issue for demonstrations such as ours, where we ran many different circuits, each with a small number of shots, which is much slower than running a small number of circuits with many shots each. The results could also possibly be improved by using more QPD samples (compare the size of the (simulated data) error bars of the third and fourth Trotter steps in 5).

3.2.
QPD Implementation of Trotterized Imaginary Time Evolution

We now define 𝒯(·) to be the imaginary-time evolution for the n-qubit Hamiltonian in Eq. (28) 27T(·)=eβH(·)eβH.{\cal T}(\cdot) = {e^{ - \beta H}}(\cdot){e^{ - \beta H}}.

We cannot apply Algorithm 1 directly to 𝒯(·) since the task of obtaining the quasiprobability decomposition over an n-qubit basis set (|{Bi}| = 𝒪(16n)) by solving a linear program can quickly become intractable for large n. Accordingly, we first decompose the n-qubit operation eβH to a sequence of k-qubit operations by using the 1st-order Trotter approximation: 28eβH(eβH1/reβH2/reβHL/r)r.{e^{ - \beta H}} \approx {\left( {{e^{ - \beta {H_1}/r}}{e^{ - \beta {H_2}/r}} \ldots {e^{ - \beta {H_L}/r}}} \right)^r}.

Since each Hl is k-local, we obtain the quasiprobability decomposition of each local ITE operation eβ/rHl separately and implement a Trotterized ITE using Algorithm 2. Let 𝒯l(·) be the ITE operation for the k-local Hamiltonian Hl 29Tl(·)=eβHl/r(·)eβHl/r{{\cal T}_l}(\cdot) = {e^{ - \beta {H_l}/r}}(\cdot){e^{ - \beta {H_l}/r}}

We require the QPDs of each of the linear maps 𝒯1, …, 𝒯L into a set of operations {Bi} as in Eq. (1). Suppose that we have computed these decompositions with the associated parameters γ1, …, γL respectively. Then, we define the n-qubit map 𝒫 as the composition of the maps (𝒯1, …, 𝒯L) corresponding to the 1st-order Trotter approximation in Eq. (28). For a Hermitian observable A with ∥A∥ ≤ 1, it follows from Lemma 2 that implementing 𝒫 using Algorithm 2 returns an estimate of the re-scaled expectation value 〈A𝒫 within additive error ϵ · (tr{[𝒫(ρ)]})−1 using O(γ12rγL2rϵ2){\cal O}\left( {\gamma _1^{2r} \ldots \gamma _L^{2r}{^{ - 2}}} \right) circuits.

In our implementation, we only consider a translation-invariant Hamiltonian, i.e., each local interaction term in H is the same. As a result, each of the exponentials eβHl/r has the same decomposition in a given basis with an associated parameter γ. Now, the Trotter approximation causes the expectation value 〈A𝒫 to differ from the target imaginary-time expectation 〈A𝒯, but this difference can be bounded using known results [67]. We show in Appendix E that for any observable A, 30|tr[AT(ρ)]tr[A(ρ)]|=O(β2L2reβL/r)|{\mathop{\rm tr}\nolimits} [A{\cal T}(\rho )] - {\mathop{\rm tr}\nolimits} [A{\cal P}(\rho )]| = {\cal O}\left( {{{{\beta ^2}{L^2}} \over r}{e^{\beta L/r}}} \right) 31|tr[T(ρ)]tr[(ρ)]|=O(β2L2reβL/r)|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )] - {\mathop{\rm tr}\nolimits} [{\cal P}(\rho )]| = {\cal O}\left( {{{{\beta ^2}{L^2}} \over r}{e^{\beta L/r}}} \right) again assuming ∥A∥ ≤ 1. Note that we can choose 32r=O(β2L2ϵ)r = {\cal O}\left( {{{{\beta ^2}{L^2}} \over }} \right) to make the error on the RHS at most O(ϵ). We know from Lemma 2 that Algorithm 2 can estimate the expectation value A=tr(A(ρ))tr((ρ)){\langle A\rangle _{\cal P}} = {{{\mathop{\rm tr}\nolimits} (A{\cal P}(\rho ))} \over {{\mathop{\rm tr}\nolimits} ({\cal P}(\rho ))}} to within error 332ϵ|tr[(ρ)]|(1+ϵ|tr[(ρ)]|){{2} \over {|{\mathop{\rm tr}\nolimits} [{\cal P}(\rho )]|}}\left( {1 + { \over {|{\mathop{\rm tr}\nolimits} [{\cal P}(\rho )]|}}} \right) using 34O(γ2Lrϵ2){\cal O}\left( {{{{\gamma ^{2Lr}}} \over {{^2}}}} \right) circuits where each circuit consists of at most 𝒪(Lr) gates. Now, the algorithm is suitable only when the normalization tr (𝒫(ρ)) is much larger than the target precision ϵ. In that case, for the choice of r in Eq. (32), it follows that 35| ATA |O(ϵ),\left| {{{\langle A\rangle }_{\cal T}} - {{\langle A\rangle }_{\cal P}}} \right| \le {\cal O}(), and moreover, the error in estimating 〈A𝒫 via Eq. (33) can be upper-bounded by 364ϵtr[T(ρ)](1+4ϵtr[T(ρ)]).{{4} \over {{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]}}\left( {1 + {{4} \over {{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]}}} \right).

Even though the cost as stated scales as γ2Lr, we note that the factor γ depends on the local exponential eβHi/r and is therefore likely to decrease as we increase the Trotter number r. Moreover, our error bounds above can likely be further tightened, in particular because we appeal to standard additive-error bounds for Trotter formula, which may not be well-suited for non-trace-preserving imaginary-time evolution. It may also be possible to improve the bounds by using multiplicative-error estimates for Trotter-Suzuki formula [67,68]. We leave a detailed analysis of the complexity of our algorithm to future work.

3.2.1.
Estimating Thermal Expectation Values

We now show how our QPD implementation of imaginary-time evolution can be used to compute expectation values of observables in the Gibbs state. Recall that the Gibbs state is defined to be 37ρβ=eβHZβ,{\rho _\beta } = {{{e^{ - \beta H}}} \over {{{\cal Z}_\beta }}}, where H is the Hamiltonian of the quantum system, β is the inverse temperature, and Zβ=tr(eβH){{\cal Z}_\beta } = {\mathop{\rm tr}\nolimits} \left( {{e^{ - \beta H}}} \right) is the partition function. Preparing the Gibbs state and estimating thermal expectation values for general local Hamiltonians are known to be difficult tasks, and therefore unlikely to admit efficient algorithms in general [14].

Instead of directly preparing the (mixed) Gibbs state, we seek to prepare a thermal pure quantum state (TPQ) [69]. The latter is a suitably constructed n-qubit pure state |ψ〉 which, with high probability, can be used to estimate the Gibbs state expectation values for a set of observables, up to a tunable precision ϵ. Formally, the TPQ state for a given set of observables {O^}\{ \hat O\} , is defined as a pure state |ψ〉 that satisfies 38Pr[ ψ|O^|ψtr(ρβO^)ϵ ]f(N,ϵ),\Pr \left[ {\langle \psi |\hat O|\psi \rangle - {\mathop{\rm tr}\nolimits} \left( {{\rho _\beta }\hat O} \right) \ge } \right] \le f(N,), where limN→∞ f(N, ϵ) = 0. [70,71] It is known that applying imaginary-time evolution to a state chosen uniformly at random from the Haar measure gives a TPQ state [7173]. Since Haar-random states cannot be efficiently prepared on a quantum computer, we follow the approach of Ref. [74] where the initial Haar-random state is replaced by a random 3-design. Specifically, the authors in Ref. [74] apply a random Clifford operator to a computational basis state followed by imaginary-time evolution, preparing the state, 39|ψβ=eβH/2U|00|U+eβHU|0,\left| {{\psi _\beta }} \right\rangle = {{{e^{ - \beta H/2}}U|0\rangle } \over {\sqrt {\langle 0|{U^ + }{e^{ - \beta H}}U|0\rangle } }}, where U is the random Clifford operator, and the state |ψβ〉 is in normalized form. Note that a random n-qubit Clifford operator can be implemented using 𝒪(n2) one- and two-qubit gates [75], a significant gain over the exponential number of such gates required to generate a Haar-random unitary. Leveraging the 3-design property of the Clifford group, it can be shown [74,76] that the above state (Eq (39)) is a valid TPQ state satisfying the definition in Eq. (38).

Ref. [74] used algorithms based on quantum signal processing to implement the ITE operation to prepare the Clifford-random TPQ. Here, we implement ITE using our quasi-probabilistic approach based on applying Algorithm 2 to a Trotterized approximation of ITE. We note that preparing TPQ states is a highly suitable application of Algorithm 2 because the TPQ state error rapidly approaches zero as n becomes large [74]. In addition, for a k-local n-qubit Hamiltonian, all that is needed is the QPDs of the local terms, which are much more tractable to compute than the QPD of the entire n-qubit Hamiltonian. The method we outline here can be used to estimate Gibbs-state expectation values for any k-local Hamiltonian at a given temperature.

Numerical Simulations

We test the performance of our technique in this case by simulating the preparation of TPQ states using Algorithm 2 and comparing their performance with that of TPQ states prepared using ideal ITE. Specifically, we consider TPQ states of the form |ψi〉 = e−0.02HUi|0〉n, where H is the 1D Heisenberg Hamiltonian on n ∈ {4,…,8} qubits, and Ui is a randomly chosen Clifford unitary. We generate 10 such random states and average the error in the thermal expectation value, |〈ψi|Oj|ψi〉 – tr(|ψβ〉〈ψβ|Oj)|, over 30 randomly chosen Pauli observables, Oj. For each simulated TPQ state, we use N ∈ {1024,9400,51200,409600,1638400} QPD samples for n ∈ {4,5,6,7,8} qubits, respectively. The results are shown in Figure 1 and show good agreement between our simulation and states prepared using ideal ITE.

4.
Comparison with Prior Work

As highlighted in the Introduction, there has been extensive work on quantum algorithms for imaginary-time evolution and thermal state preparation. Here we summarize relevant prior work and highlight the salient features of our algorithm that distinguish it from those.

4.1.
Imaginary-Time Evolution

Algorithms based on techniques such as block-encoding and quantum signal processing [28,30,32,60,77] have rigorous performance guarantees but usually require complicated controlled-unitaries and a large number of ancilla qubits. The need for ancilla qubits in these algorithms arises because the non-unitary ITE operation has to be embedded into a unitary quantum circuit. The well-known QITE algorithm of Ref. [34] takes a different path to dealing with this nonunitarity. QITE approximates the ITE as a sequence of local non-unitary operations, exp(βH)Πj=1JTj\exp ( - \beta H) \approx \Pi _{j = 1}^J{T_j}, and then embeds each local operation Tj into a unitary Uj that acts on a slightly larger subsystem than the Tj it embeds. As a result, the QITE algorithm does not require any ancilla qubits, but determining the unitary Uj requires performing intermediate measurements on the state prepared at the (j – 1)th step. Like QITE, our algorithm also requires no ancilla qubits and proceeds by approximating exp(–βH) as a product of local operators. The key difference is that we implement the local operations Tj by decomposing them into linear combination of operations with the same locality. In addition, our algorithm is nonadaptive and can be implemented without intermediate measurements. Unlike the block-encoding algorithms mentioned above, we do not require complicated controlled-unitary operations. We note that a number of alternative ways of effecting imaginary-time evolution with quantum circuits have been proposed in the literature [38,39,78], which also share similar benefits of requiring few to no ancilla qubits. We leave a detailed comparison with these algorithms to future work.

4.2.
Thermal State Expectation Values

Our proposed method for evaluating thermal expectation values enjoys the same feature as our ITE algorithm of requiring no ancillary qubits. In contrast, quantum algorithms for Gibbs-state preparation based on block-encoding techniques require at least n ancilla qubits to prepare the Gibbs state of an n-qubit Hamiltonian [25,28,30,77]. Ref. [74] used the TPQ idea with block-encoding algorithms to reduce the number of ancilla qubits from n to a constant for the purpose of estimating thermal expectation values. Using our quasiprobabilistic algorithm for performing ITE instead of block-encoding eliminates the need for any ancillary qubits. We note that a number of recent works have investigated reducing quantum resource requirements for Gibbs-state preparation and partition function evaluation. These include the Gibbs sampling algorithm of Ref. [79], which also requires no ancilla qubits. Other algorithms have combined randomized ideas with block-encoding techniques to reduce ancilla count [37,80].

There has been a large amount of work in recent years on quantum algorithms that prepare Gibbs states by simulating Lindbladian evolution [8185]. These methods appear promising as they have been shown to prepare certain Gibbs states in polynomial time [8688]. However, with few exceptions such as Refs. [83,84], these generally require complex subroutines and are thus more suitable for fault-tolerant quantum computers.

Among feasible near-term implementations, there are a number of variational algorithms for Gibbs-state preparation [8992], which minimize a cost function over parametrized quantum circuits to prepare the Gibbs state. Even though the relatively simple circuits make them attractive for experimentation, the evaluation of cost functions often require a large number of measurements. Moreover, convergence can be hampered by phenomenon such as barren plateaus [93].

We also emphasize that our method only gives access to expectation values of observables in the Gibbs state, and not the Gibbs state itself. Most of the algorithms we mentioned above are perhaps more powerful in that they give access to the state itself. However, our method can prove adequate for most physics applications where the goal would be to extract observables of interest.

5.
Conclusions

In this work, we propose using quasiprobability decompositions to implement imaginary time evolution, a useful primitive in many quantum algorithms. This “algorithmic” use of quasiprobability decompositions inherits the error mitigation properties of related techniques such as probabilistic error cancellation. Our method also extends to implementing linear Hermiticity-preserving maps, of which imaginary time evolution is one example. We provide algorithms for estimating the expectation values of observables with respect to states evolved under one or more applications of a given map. Using our algorithms, we can estimate the expectation value of an observable for imaginary-time evolved states. This has many applications including preparing thermal pure quantum states, which can be used to estimate the Gibbs state expectation value of a set of observables. We simulate the performance of our method for this task, and find good agreement with the ideal imaginary time evolution operation. Our method is well-suited to near-term hardware, as it requires only 1- and 2-qubit gates and measurement, and the quasiprobability decomposition can be performed over a device’s native (noisy) gateset. As a proof-of-principle demonstration, we implement imaginary time evolution using a superconducting qubit processor (provided by IBM), and use this to estimate the ground state energy of the 2-qubit Heisenberg Hamiltonian.

A natural next step will be to extend our proof-of-principle demonstration on hardware to Hamiltonians on more than two qubits. Although this will entail increased sampling costs, when one considers a more specific use-case, namely, of thermal pure quantum states to estimate (thermal) expectation values, the error in the expectation value decays rapidly with the number of qubits, a promising sign for application of our method on near-term devices. There are two hardware-level optimizations that would make the methods presented in this work more efficient. First, in our demonstration using IBM hardware, if a mid-circuit measurement gives the incorrect result, i.e., post-selection is unsuccessful, then we still execute the remainder of the circuit because of the limitations of the measurement-based feedforward. If instead one has the ability to apply quantum circuits based on the results of mid-circuit measurements, then this waste of resources can easily be avoided. Second, sampling all of the required circuits in software and then loading each circuit individually into hardware is highly inefficient. If instead the sampling is implemented at the level of the field-programmable gate-array control system (as was recently demonstrated for randomized compiling [94]), then we expect that the overall runtime of the circuits will be significantly reduced.

DOI: https://doi.org/10.2478/qic-2026-0005 | Journal eISSN: 3106-0544 | Journal ISSN: 1533-7146
Language: English
Page range: 89 - 113
Submitted on: Jun 2, 2025
Accepted on: Dec 25, 2025
Published on: Jun 4, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Annie Ray, Esha Swaroop, Ningping Cao, Michael Vasmer, Anirban Chowdhury, published by Cerebration Science Publishing Co., Limited
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.