Quasiprobabilistic Imaginary-Time Evolution on Quantum Computers

Annie Ray; Esha Swaroop; Ningping Cao; Michael Vasmer; Anirban Chowdhury

doi:10.2478/qic-2026-0005

Introduction

Quantum computers are expected to have wide-ranging applications in various domains [1–4] but perhaps most significantly in the simulation of quantum physical systems. Important problems in this arena concern the estimation of ground-state and thermal equilibrium properties of quantum many-body systems, problems which have applications in chemistry and material science [2,5–7]. Consequently, there has been an enormous amount of research [8–11] directed towards the development of quantum algorithms for these problems. Although computational complexity results limit the possibility for dramatic quantum speed-ups for these problems [12–14], there is also theoretical evidence that quantum computers potentially provide an advantage over classical algorithms [15].

The idea of imaginary-time evolution plays a key role in many algorithms, both classical and quantum, for simulating ground-state and equilibrium properties [16–24]. Given a Hamiltonian H and an inverse temperature β ≥ 0, imaginary-time evolution (ITE) under H is described by the exponential operator e^−βH. Performing ITE with sufficiently high β will drive a quantum system to its ground state, making it a useful primitive in algorithms for ground state preparation. ITE is also used as a subroutine in quantum algorithms for preparing thermal equilibrium or Gibbs states [25], which find application in solving optimization problems. Performing ITE of quantum Hamiltonians using classical methods can be computationally prohibitive due to the exponential increase in Hilbert space dimension and the sign problem [26,27]. Many existing quantum algorithms such as those based on block-encoding [28–32] require fault-tolerant quantum computers and are beyond the reach of current hardware [33]. Several recent works have therefore proposed quantum algorithms that are potentially more suitable for near-term to intermediate quantum devices [34–40].

In this work, we add to the quantum toolkit for imaginary-time evolution and develop a new near-term friendly algorithm using ideas from quantum error mitigation (QEM), a collection of classical post-processing techniques that aim to mitigate the effects of noise without requiring the full machinery of quantum error correction and faulttolerance [41,42]. These techniques are usually not scalable, but have been shown to significantly improve the performance of current devices [42]. In particular, our work is inspired by the error-mitigation technique of probabilistic error cancellation (PEC) [41,43–45]. The original motivation of PEC as a technique is to invert a noise channel on noisy quantum devices [46]. Probabilistic error cancellation simulates the action of an ideal, unitary operation from a linear combination of native channels implementable on a quantum device aided by post-processing. The basic idea underlying PEC is as follows. Given a quantum device with a set of known native (and possibly noisy) gates, one decomposes a target unitary U into a weighted sum of the native gates. The coefficients in the linear combination will be real numbers that sum to 1, although they may or may not be positive and hence form a quasi-probability distribution (QPD). Nevertheless, the action of U can be effected by a protocol that probabilistically samples a native gate from the linear combination. With a sufficient number of samples and simple classical post-processing, the expectation value of an observable after the action of the target unitary U can be reconstructed. Recovering the ideal expectation value comes at an increased cost related to the negativity of the quasi-probability distribution. The success of the PEC method relies on an accurate characterization of the native gates in the quantum device. This can be a very challenging task in practice, although recent work has shown that tailoring the device noise to Pauli noise (via randomized compiling [47]) allows efficient and accurate noise characterization [48]. We note that related QPD ideas have also been used in other quantum information contexts, such as classical simulation of quantum circuits [49–53], circuit cutting [54–58], and for mitigating noise in non-Clifford gates [59].

Here, we design a quasi-probabilistic algorithm inspired by PEC to perform imaginary-time evolution for a given Hamiltonian H. The starting point of our algorithm is the observation that we can implement non-unitary and even non-trace-preserving operations as a linear combination of native gates and measurement channels. We use our algorithm to estimate the expectation value of a given observable for imaginary-time evolved states, and we analyze its cost, specifically for the case of Trotterized imaginary time evolution. In addition, we show that our algorithm is suitable for preparing thermal pure quantum (TPQ) states, pure states which can be used to estimate the thermal expectation value of observables; see Figure 1. We demonstrate the efficacy of our approach through classical simulations and demonstrations on quantum hardware.

Our method of simulating ITE in expectation using quasiprobability decompositions has a few features that make it amenable for implementation on near-term devices where qubits are few, and multi-qubit entangling gates are noisier than single-qubit gates. First, our method does not need auxiliary qubits beyond those of the system described by a given Hamiltonian. Second, we avoid the problem of implementing multi-qubit-controlled operations between auxiliary and data qubits [25,60], which can require large SWAP networks for implementation on devices that only permit entangling operations between nearest-neighbors (such as present-day superconducting devices).

The remainder of the paper is organized as follows. In Section 2, we review the quasiprobability decomposition of linear maps and discuss our extension to nonlinear maps, including our algorithm for estimating rescaled expectation values. Following this, in Section 3, we use our algorithm for the specific case of imaginary time evolution and combine the protocol with various choices of initial states, focusing on the thermal pure quantum state. In this section, we also present the results of our demonstrations conducted using IBM’s quantum computing hardware. Finally, we provide a summary and discuss future work in Section 5.

Implementing Linear Maps via Quasiprobability Decompositions

We consider the problem of implementing Hermiticity-preserving linear maps—which may not be trace-preserving in general—on a quantum device with a (basis) set of completely positive (CP) native operations. In Section 2.1, we present our algorithm for estimating rescaled expectation values of observables with respect to quantum states evolving under the linear map of interest. In Section 2.2, we analyze the sampling cost of our algorithm. Our method requires, as input, a decomposition of the linear map of interest into a linear combination of the set of available operations. In Section 2.4, we discuss the conditions under which a linear map can be decomposed as such, and we briefly review the method to obtain the linear decomposition. And in Section 2.3, we generalize our algorithm to the case of implementing a sequence of linear maps.

2.1.

Procedure for Estimating Rescaled Expectation Values of Observables

Suppose we have a linear map 𝒯 over the 2ⁿ-dimensional Hilbert space, and a set {B_i} of native operations on a quantum device such that for some real coefficients q_i, 1 $T (ρ) = \sum_{i} q_{i} B_{i} (ρ)$ {\cal T}(\rho ) = \sum\limits_i {{q_i}} {{\cal B}_i}(\rho ) for all input states ρ. Using Eq. (1), we can define parameters γ and p_i such that 2 $γ : = \sum_{i} | q_{i} |,$ \gamma : = \sum\limits_i {\left| {{q_i}} \right|} , 3 $p_{i} : = \frac{| q_{i} |}{γ} .$ {p_i}: = {{\left| {{q_i}} \right|} \over \gamma }.

We refer to such a linear combination as a quasiprobability decomposition (QPD). Note that since the q_i are real, γ ≥ 0 and the p_i form a probability distribution. Moreover, Eq. (1) can be rewritten as 4 $T (ρ) = γ \sum_{i} sgn (q_{i}) \cdot p_{i} \cdot B_{i} (ρ) .$ {\cal T}(\rho ) = \gamma \sum\limits_i {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{{\cal B}_i}(\rho ).

This immediately suggests a way to probabilistically build the map 𝒯 by sampling from the set {B_i} according to the probability distribution p_i (see Figure 2).

In particular, we consider the task of estimating the expectation value of some observable A (having operator norm ∥A∥ ≤ 1) with respect to the state 𝒯(ρ). Since 𝒯 may be a non-trace-preserving map, the state 𝒯(ρ) is unnormalized in general. Thus, the rescaled expectation value of the observable A is 5 $〈 A 〉 T (ρ) : = \frac{tr [A T (ρ)]}{tr [T (ρ)]} .$ {\langle A\rangle _{{\cal T}(\rho )}}: = {{{\mathop{\rm tr}\nolimits} [A{\cal T}(\rho )]} \over {{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]}}.

In order to correctly estimate the rescaled expectation value of A, we require estimates of the numerator and denominator of Eq. (5) which are obtained using Eq. (4) 6 $tr [A T (ρ)] = γ \sum_{i} sgn (q_{i}) \cdot p_{i} \cdot tr [A B_{i} (ρ)],$ {\mathop{\rm tr}\nolimits} [A{\cal T}(\rho )] = \gamma \sum\limits_i {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{\mathop{\rm tr}\nolimits} \left[ {A{{\cal B}_i}(\rho )} \right], 7 $tr [T (ρ)] = γ \sum_{i} sgn (q_{i}) \cdot p_{i} \cdot tr [B_{i} (ρ)] .$ {\mathop{\rm tr}\nolimits} [{\cal T}(\rho )] = \gamma \sum\limits_i {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{\mathop{\rm tr}\nolimits} \left[ {{{\cal B}_i}(\rho )} \right].

Denoting the trace-preserving and non-trace-preserving operations in {B_i} as B_TP and B_NTP, respectively, the denominator in Eq. (5) can be rewritten as 8 $tr [T (ρ)] = γ (\sum_{B_{i} \in B_{TP}} sgn (q_{i}) \cdot p_{i} + \sum_{B_{i} \in B_{NTP}} sgn (q_{i}) \cdot p_{i} \cdot tr [B_{i} (ρ)]) .$ {\mathop{\rm tr}\nolimits} [{\cal T}(\rho )] = \gamma \left( {\sum\limits_{{{\cal B}_i} \in {{\cal B}_{{\rm{TP}}}}} {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i} + \sum\limits_{{{\cal B}_i} \in {{\cal B}_{{\rm{NTP}}}}} {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{\mathop{\rm tr}\nolimits} \left[ {{{\cal B}_i}(\rho )} \right]} \right).

In this work, we consider B_NTP to comprise rank-1 projectors Π_|ψ〉 = |ψ〉〈ψ| on specific quantum states |ψ〉 (see Table A1 for example). Typically, such non-trace-preserving operations would be implemented via a POVM given by {|ψ〉〈ψ|, |ψ^⊥〉〈ψ^⊥|} and post-selecting on the outcome |ψ〉. Thus, tr [B_i(ρ)] is the probability of post-selecting the outcome corresponding to the projector for B_i.

If we randomly sample N circuits from the probability distribution p_i in Eq. (2), the random variable W_j := γ sgn(q_i)I_j corresponding to the j^th circuit is an estimator (see Eq. (9)) for tr [𝒯(ρ)], where j ∈ {1,2,…,N}. Here, the indicator variable I_j is 0 when the j^th circuit implements an operation B_i ∈ B_NTP whose associated POVM outcome is discarded; otherwise I_j = 1. 9 $E [W_{j}] = E [γ sgn (q_{i}) I_{j}] = γ \sum_{i, I_{j}} sgn (q_{i}) I_{j} \cdot \frac{| q_{i} |}{γ} \cdot \Pr [I_{j}]$ \left[ {{W_j}} \right] = \left[ {\gamma {\mathop{\rm sgn}} \left( {{q_i}} \right){I_j}} \right] = \gamma \sum\limits_{i,{I_j}} {{\mathop{\rm sgn}} } \left( {{q_i}} \right){I_j}\cdot{{\left| {{q_i}} \right|} \over \gamma }\cdot\Pr \left[ {{I_j}} \right] 10 $= γ \sum_{i} sgn (q_{i}) \cdot p_{i} \cdot tr [B_{i} (ρ)]$ = \gamma \sum\limits_i {{\mathop{\rm sgn}} } \left( {{q_i}} \right)\cdot{p_i}\cdot{\mathop{\rm tr}\nolimits} \left[ {{{\cal B}_i}(\rho )} \right] 11 $= tr [T (ρ)]$ = {\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]

Similarly, we show in Appendix D that the random variable M_j = γ sgn(q_i)A_j, where A_j is the outcome of measuring the observable A for the j^th circuit, is an estimator of tr [A𝒯(ρ)]. We remark that the measurement outcome A_j is necessarily zero when the indicator variable I_j is zero, i.e., when the j^th sample is discarded. 12 $E [M_{j}] = tr [A T (ρ)]$ \left[ {{M_j}} \right] = {\mathop{\rm tr}\nolimits} [A{\cal T}(\rho )]

Algorithm 1 Estimate re-scaled expectations

Input:

QPD of 𝒯 as 𝒯(·) = γ∑_i sgn(q_i)p_iB_i(·)

Initial state ρ

Number of random samples N

Observable A

Output: Estimate of the re-scaled expectation value 〈A〉_𝒯

1: for j = 1 to N do

2: Sample i from the distribution p_i

3: if B_i is TP then

4: apply B_i directly to ρ

5: I_j ← 1 ⊳ Indicator variable

6: else

7: Measure in the basis corresponding to B_i

8: Store the outcome as I_j ∈ {0, 1}

9: end if

10: W_j ← γ · sgn(q_i) · I_j ⊳ Weighted outcome

11: Measure A

12: Store the outcome as A_j

13: M_j ← γ · sgn(q_i) · A_j

14: end for

15: $tr [\tilde{T (ρ)}] \leftarrow \sum_{j} W_{j} / N$ {\mathop{\rm tr}\nolimits} [\widetilde {{\cal T}(\rho )}] \leftarrow \sum\nolimits_j {{W_j}} /N ⊳ Empirical estimate of tr [𝒯(ρ)]

16: $tr [\tilde{A T} (ρ)] \leftarrow \sum_{j} M_{j} / N$ {\mathop{\rm tr}\nolimits} [\widetilde {A{\cal T}}(\rho )] \leftarrow \sum\nolimits_j {{M_j}} /N ⊳ Empirical estimate of tr [A𝒯(ρ)]

17: return $tr [\tilde{A T} (ρ)] / tr [\tilde{T (ρ)}]$ {\mathop{\rm tr}\nolimits} [\widetilde {A{\cal T}}(\rho )]/{\mathop{\rm tr}\nolimits} [\widetilde {{\cal T}(\rho )}]

Together, the two estimators W_j, M_j allow us to estimate the rescaled expectation value in Eq. (5). We formalize the procedure described above in Algorithm 1.

2.2.

Sampling Cost of the Algorithm

The crux of Algorithm 1 lies in the estimation of tr [ A𝒯(ρ)] and tr [𝒯(ρ)] separately, in order to estimate 〈A〉_𝒯(ρ) via their ratio as per Eq. (5). We use standard concentration inequalities to bound the number of samples N needed to estimate tr [A𝒯(ρ)] and tr [𝒯(ρ)] each within some precision ϵ, which allows us to bound the error in the estimate of 〈A〉_𝒯(ρ) as per Lemma 1.

Lemma 1 (Algorithm 1 cost).

Given a decomposition of a linear map 𝒯 into a set of operations as in Eq. (1) and a Hermitian observable A with ∥A∥ ≥ 1, Algorithm 1 uses $N = O (\frac{γ^{2}}{ϵ^{2}})$ N = {\cal O}\left( {{{{\gamma ^2}} \over {{^2}}}} \right) random circuits drawn from the distribution p_i in Eq. (2), to return an estimate of the re-scaled expectation value 〈A〉_𝒯 within error 13 $\frac{2 ϵ}{| tr [T (ρ)] |} (1 + \frac{ϵ}{| tr [T (ρ)] |}) .$ {{2} \over {|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]|}}\left( {1 + { \over {|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]|}}} \right).

Proof sketch of Lemma 1.

We note that for an observable A with operator norm ∥A∥ ≤ 1, the random variables W_j, M_j are such that 14 $- γ \leq W_{j}, M_{j} \leq γ .$ - \gamma \le {W_j},{M_j} \le \gamma .

For N circuits (enumerated by the index j) sampled independently from p_i, we can define the following: 15 $\bar{W} = \frac{\sum_{j} W_{j}}{N},$ \bar W = {{\sum\nolimits_j {{W_j}} } \over N}, 16 $\bar{M} = \frac{\sum_{j} M_{j}}{N}$ \bar M = {{\sum\nolimits_j {{M_j}} } \over N}

For some ϵ ≥ 0, Hoeffding’s inequality implies 17 $\Pr [| \bar{W} - E [W_{j}] | \geq ϵ] \leq \exp (- \frac{N ϵ^{2}}{2 γ^{2}})$ \Pr \left[ {\left| {\bar W - \left[ {{W_j}} \right]} \right| \ge } \right] \le \exp \left( { - {{N{^2}} \over {2{\gamma ^2}}}} \right) 18 $\Pr [| \bar{M} - E [M_{j}] | \geq ϵ] \leq \exp (- \frac{N ϵ^{2}}{2 γ^{2}}) .$ \Pr \left[ {\left| {\bar M - \left[ {{M_j}} \right]} \right| \ge } \right] \le \exp \left( { - {{N{^2}} \over {2{\gamma ^2}}}} \right).

We can bound the RHS of Eq. (17) and (18) by 0 ≤ δ < 1 by choosing $N = \frac{2 γ^{2}}{ϵ^{2}}$ N = {{2{\gamma ^2}} \over {{^2}}} log(1/δ). Then with probability at least 1 – δ, we have 19 $| \frac{\bar{M}}{\bar{W}} - \frac{E [M_{j}]}{E [W_{j}]} | \leq | \frac{E [W_{j}] (\bar{M} - E [M_{j}])}{\bar{W} E [W_{j}]} | + | \frac{E [M_{j}] (E [W_{j}] - \bar{W})}{\bar{W} E [W_{j}]} |$ \left| {{{\bar M} \over {\bar W}} - {{\left[ {{M_j}} \right]} \over {\left[ {{W_j}} \right]}}} \right| \le \left| {{{\left[ {{W_j}} \right]\left( {\bar M - \left[ {{M_j}} \right]} \right)} \over {\bar W\left[ {{W_j}} \right]}}} \right| + \left| {{{\left[ {{M_j}} \right]\left( {\left[ {{W_j}} \right] - \bar W} \right)} \over {\bar W\left[ {{W_j}} \right]}}} \right| 20 $\leq \frac{ϵ}{| \bar{W} |} (1 + | \frac{E [M_{j}]}{E [W_{j}]} |)$ \le { \over {|\bar W|}}\left( {1 + \left| {{{\left[ {{M_j}} \right]} \over {\left[ {{W_j}} \right]}}} \right|} \right) 21 $\leq \frac{2 ϵ}{| \bar{W} |}$ \le {{2} \over {|\bar W|}}

In the above, we have used Eqs (9) and (12), and that $\frac{E [M_{j}]}{E [W_{j}]} \leq 1$ {{\left[ {{M_j}} \right]} \over {\left[ {{W_j}} \right]}} \le 1. Additionally, we have 22 $\frac{1}{| \bar{W} |} \leq \frac{1}{| | E [W_{j}] | - ϵ |} \leq \frac{1}{| E [W_{j}] |} (1 + \frac{ϵ}{| E [W_{j}] |}) .$ {1 \over {|\bar W|}} \le {1 \over {\left| {\left| {\left[ {{W_j}} \right]} \right| - } \right|}} \le {1 \over {\left| {\left[ {{W_j}} \right]} \right|}}\left( {1 + { \over {\left| {\left[ {{W_j}} \right]} \right|}}} \right).

Using Eqs (21) and (22), and recalling Eq. (5), we see that for the chosen N, with probability at least (1 – δ), we have 23 $| \frac{\bar{M}}{\bar{W}} - {〈 A 〉}_{T} | \leq \frac{2 ϵ}{| tr [T (ρ)] |} (1 + \frac{ϵ}{| tr [T (ρ)] |}) .$ \left| {{{\bar M} \over {\bar W}} - {{\langle A\rangle }_{\cal T}}} \right| \le {{2} \over {|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]|}}\left( {1 + { \over {|{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]|}}} \right).

Algorithm 1 requires a decomposition of the form Eq. (1) as input. Given a basis, such a decomposition can be obtained classically by a linear program [41,61]. The cost of finding the QPD scales polynomially with the Hilbert-space dimension and is thus tractable only for operations on k-qubit subsystems with k ≪ n. Implementing an n-qubit operation therefore requires us to first approximate it as a sequence of k-qubit operations, and then use the QPD method for each operation in the sequence. This protocol and an analysis of its cost are the focus of the next section. We defer a detailed discussion of how to find a suitable quasi-probability decomposition to Section 2.4. There we will also discuss some possible choices of QPD basis, and their effect on the cost of our algorithms through the factor γ.

2.3.

Implementing a Sequence of Operations

We now explain how to implement a sequence of linear maps 𝒯₁, 𝒯₂, …, 𝒯_R where each map 𝒯_i acts non-trivially on a constant k number of qubits (see Figure 3). Later, we will seek to implement an imaginary-time evolution by approximating it through a Trotter-Suzuki formula that gives rise to a sequence of k-local maps. We restrict ourselves to the case where each 𝒯_i has the form 𝒯⊗I for some k-qubit map 𝒯. That is to say, 𝒯_i acts as 𝒯 on a k-qubit subsystem and as the identity on all other qubits; note that each 𝒯_i may act non-trivially on a different set of k qubits. We emphasize that our algorithm and its analysis do not require the 𝒯_i’s to have this form. We make this assumption because it helps simplify notation and is sufficient for implementing imaginary-time evolution of k-local Hamiltonians such as the Heisenberg model. We present the protocol for applying a sequence of 𝒯 maps in Algorithm 2, and in Lemma 2, we analyze its cost.

Algorithm 2 Estimate re-scaled expectations for repeated applications of 𝒯

Input:

Operator ℰ consisting of R applications of 𝒯

QPD of 𝒯 as 𝒯(·) = γ∑_i sgn( q_i)p_iB_i(·)

Initial state ρ

Number of random samples N

Observable A

Output: Estimate of the re-scaled expectation value 〈A〉_ϵ

1: for j = 1 to N do

2: Sample (i₁,…,i_R) from the joint (independent) distribution $p_{i}^{R}$ p_i^R

3: for k = 1 to R do

4: if B_{i_k} is TP then

5: Apply B_{i_k}

6: I_{j_k} ← 1 ⊳ Indicator variable

7: else

8: Measure in the basis corresponding to B_{i_k}

9: Store the outcome as I_{j_k} ∈ {0,1}

10: end if

11: W_{j_k} ← γ · sgn(q_{i_k}) · I_{j_k} ⊳ Weighted outcome

12: end for

13: Measure A

14: Store the outcome as A_j

15: M_j → γ^R · sgn(q_j₁)…sgn(q_jR) · A_j

16: end for

17: $tr \tilde{[ε (ρ)]} \leftarrow \sum_{j} (Π_{k} W_{j_{k}}) / N$ {\mathop{\rm tr}\nolimits} \widetilde {[{\cal E}(\rho )]} \leftarrow \sum\nolimits_j {\left( {{\Pi _k}{W_{{j_k}}}} \right)} /N ⊳ Empirical estimate of tr [ε(ρ)]

18: $tr \tilde{[A ε (ρ)]} \leftarrow \sum_{j} M_{j} / N$ {\mathop{\rm tr}\nolimits} \widetilde {[A{\cal E}(\rho )]} \leftarrow \sum\nolimits_j {{M_j}} /N ⊳ Empirical estimate of tr [Aε(ρ)]

19: return $tr \tilde{[A ε (ρ)]} / tr \tilde{[ε (ρ)]}$ {\mathop{\rm tr}\nolimits} \widetilde {[A{\cal E}(\rho )]}/{\mathop{\rm tr}\nolimits} \widetilde {[{\cal E}(\rho )]}

Lemma 2 (Algorithm 2 cost).

Suppose we are given the QPD of a linear map 𝒯, acting non-trivially on a constant number of qubits. Define the n-qubit map ε as the composition of R applications of 𝒯, where each 𝒯 acts non-trivially on at most k-qubits (see Figure 3). Then, for a Hermitian observable A with ∥A∥ ≥ 1, Algorithm 2 uses $N = O (\frac{γ^{2 R}}{ϵ^{2}})$ N = {\cal O}\left( {{{{\gamma ^{2R}}} \over {{^2}}}} \right) random circuits drawn from the distribution p_i in Eq. (2), to return an estimate of the rescaled expectation value 〈A〉_ε within error 24 $\frac{2 ϵ}{| tr [ε (ρ)] |} (1 + \frac{ϵ}{| tr [ε (ρ)] |}) .$ {{2} \over {|{\mathop{\rm tr}\nolimits} [{\cal E}(\rho )]|}}\left( {1 + { \over {|{\mathop{\rm tr}\nolimits} [{\cal E}(\rho )]|}}} \right).

The proof of Lemma 2 is similar to that of Lemma 1 and can be found in Appendix D.

2.4.

Quasiprobability decomposition of linear maps

We now consider the task of obtaining the linear decomposition in Eq. (1), which is required as an input to Algorithms 1 and 2. Given a k-qubit map 𝒯 and a complete basis (consisting of 16^k linearly independent operators), the coefficients q_i in Eq. (1) can be directly solved for via a system of 16^k independent equations. The tensor product of the Endo-Benjamin-Li (EBL) basis [43] is an example of a complete basis for all linear maps. It is composed of single-qubit Clifford operations (TP) and projectors (non-TP); see Table A1. It may be possible for a basis with fewer elements than 16^k to be used to obtain a QPD for a given map 𝒯. We note that a basis containing only trace-preserving (TP) elements can only be used to express maps 𝒯 that are scalar multiples of TP operations. However, operations such as the ITE operator are not simply scalar multiples of TP operations,¹ and therefore require the QPD basis to have at least one non-TP operation.

Different choices of basis will affect the values of the coefficients q_i and hence the value of γ. From the analysis of our algorithm in Section 2.2, we know that the number of samples required for estimating expectation values scales as 𝒪(γ²). Consequently, the sampling cost of our method can be reduced by minimizing the value of γ in the QPD of the map of interest. In the context of PEC, the QPD of a k-qubit map 𝒯 over a basis {B_i} can be obtained by solving a linear program [41,61] that minimizes γ. It is known that a lower bound of γ for a HPTP map 𝒯 is given by its diamond norm ∥𝒯∥_⋄ [62,63].²

As an example of the impact of the choice of basis, we consider the ITE operator e^−βH of the 2-qubit Heisenberg Hamiltonian H = –XX – γγ – ZZ + II.³ In Figure 4, we show the trends of γ as a function of β for two different choices of basis. We find that the ITE operator for the 2-qubit Heisenberg Hamiltonian can be decomposed into the Takagi basis (Table A2) with a lower value of γ than the EBL basis, where the Takagi basis value is closer to the lower bound given by the diamond norm (see Figure 4). This reduced value of γ can be attributed to the presence of entangling operations in the Takagi basis. Since the EBL basis for 2-qubits is simply the (uncorrelated) Cartesian product of two single-qubit basis sets, it has a high sampling overhead for simulating entangling operations in expectation.

This is apparent in the case of the CNOT operation [43,64], which has γ_CNOT = 9 for the 2-qubit EBL basis but γ_CNOT = 1 when decomposed into the Takagi basis. We remark that although the Takagi basis permits a decomposition of the ITE operator in this case, it is not a complete basis for general HP maps on a 2-qubit system. The Takagi basis is only known to be complete for the set of all 2-qubit CPTP maps [64]. A more detailed analysis of QPD and choices of basis can be found in Appendix A.

The problem of obtaining a QPD for a given map requires careful consideration of two important factors: the choice of basis set used for the decomposition and the value of γ obtained from the QPD. In general, the QPD corresponding to the optimal value of γ could be a linear combination of entangling operations or general quantum instruments that need to dilate to a bigger Hilbert space, which may be expensive (in gate depth) to implement directly on a quantum device. This tradeoff between the generality of the basis and the optimality of the decomposition [43, 64,65] has been well studied for the specific case of QPDs for CPTP maps, with several results on optimizing [45,61] the choice of basis for different use-cases. However, the solution to this problem for general HP maps, including NTP maps such as ITE which commonly occur in physics, is yet (to the best of our knowledge) unknown. We provide some discussions in Appendix A. Additionally, the problem of finding a sample-efficient yet hardware-friendly basis for decomposing maps acting on systems larger than 2-qubits quickly becomes intractable (since the size of the associated classical optimization problem grows polynomially with the dimension of the Hilbert space) and remains a challenge even for the case of decomposing CPTP maps.

QPD Implementation of Imaginary-Time Evolution

We now use the algorithms developed in Section 2 to implement imaginary time evolution e^−βH for a local Hamiltonian H 25 $H = \sum_{l = 1}^{L} H_{l},$ H = \sum\limits_{l = 1}^L {{H_l}} , presented as a sum of L = O(poly(n)) local terms H_l acting non-trivially on at most a constant k number of qubits. We may assume without loss of generality that H ≥ 0 and ∥H∥ ≤ 1, where ∥·∥ denotes the operator norm. Our goal is to implement the operator e^−βH for real β > 0. More precisely, for a given initial state |ψ〉, we want to estimate expectation values of a given observable A with respect to the quantum state |ψ_β〉 ∝ e^−βH|ψ〉. Some basics of ITE and the optimal decomposition of ITE are presented in Appendix B and Appendix C, respectively.

3.1.

Warm-Up Example: 2-Qubit Heisenberg Model

We first consider the simple case of implementing the ITE of a 2-qubit Heisenberg model 26 $H = - X X - Y Y - Z Z$ H = - XX - YY - ZZ

Since the basis {B_i} in Table A1 is known to be complete for all linear maps on single-qubits, we use the 256-element Cartesian product set {B_i} × {B_j} as the basis for decomposing the 2-qubit ITE operator e^−βH into a QPD. In Figure 5, we simulate Algorithm 1 for estimating the energy of the 2-qubit Heisenberg model when it is initialized in the state |00〉 and undergoes ITE for increasing values of β = 0.01t, where t is the Trotter step. We note that the simulation is noiseless and there is no Trotter error because H commutes with itself. Therefore, the only error in the simulation is due to the QPD sampling.

Additionally, we implement the 2-qubit ITE operator e^−βH using Algorithm 1 on a 5-qubit superconducting processor (ibm_manila). In this case, we use the noisy native gates on the device as the basis for QPD using noise characterization information. In particular, we take the noisy basis ${B_{i}^{'}}$ \left\{ {{\cal B}_i^\prime } \right\} to be {𝒩 ⚬ B_i} where 𝒩 is the noise channel used to model the noise in the device, based on calibration data provided by IBM [66].

In the simulation and demonstration on hardware, we used 400, 800, 3200, and 25600 QPD samples for Trotter steps 1, 2, 3, and 4, respectively. For each sample, 512 shots of the corresponding circuit were performed (or simulated). For each Trotter step, this process was repeated 10 times; the data in Figure 5 show the mean estimate and the standard deviation. It is instructive to compare the number of samples we used with the cost estimate from 1 of 26000, 50000, 79000, and 114000 circuits for Trotter steps 1, 2, 3, and 4, respectively (for a target error of 1%). These numbers suggest that our analytical bound is not tight.

The agreement between the demonstration on noisy hardware and the noiseless (classical) simulation indicates that our method can be made noise-resilient when information about the noise in a device is available. We note that the demonstration data for the third Trotter step is significantly worse than for the other Trotter steps. A possible explanation for this is that the device noise drifts over time, and so the noisy basis ${B_{i}^{'}}$ \left\{ {{\cal B}_i^\prime } \right\} may become inaccurate, leading to bias in the results. This is a particular issue for demonstrations such as ours, where we ran many different circuits, each with a small number of shots, which is much slower than running a small number of circuits with many shots each. The results could also possibly be improved by using more QPD samples (compare the size of the (simulated data) error bars of the third and fourth Trotter steps in 5).

3.2.

QPD Implementation of Trotterized Imaginary Time Evolution

We now define 𝒯(·) to be the imaginary-time evolution for the n-qubit Hamiltonian in Eq. (28) 27 $T (\cdot) = e^{- β H} (\cdot) e^{- β H} .$ {\cal T}(\cdot) = {e^{ - \beta H}}(\cdot){e^{ - \beta H}}.

We cannot apply Algorithm 1 directly to 𝒯(·) since the task of obtaining the quasiprobability decomposition over an n-qubit basis set (|{B_i}| = 𝒪(16ⁿ)) by solving a linear program can quickly become intractable for large n. Accordingly, we first decompose the n-qubit operation e^−βH to a sequence of k-qubit operations by using the 1^st-order Trotter approximation: 28 $e^{- β H} \approx {(e^{- β H_{1} / r} e^{- β H_{2} / r} \dots e^{- β H_{L} / r})}^{r} .$ {e^{ - \beta H}} \approx {\left( {{e^{ - \beta {H_1}/r}}{e^{ - \beta {H_2}/r}} \ldots {e^{ - \beta {H_L}/r}}} \right)^r}.

Since each H_l is k-local, we obtain the quasiprobability decomposition of each local ITE operation e^−β/rH_l separately and implement a Trotterized ITE using Algorithm 2. Let 𝒯_l(·) be the ITE operation for the k-local Hamiltonian H_l 29 $T_{l} (\cdot) = e^{- β H_{l} / r} (\cdot) e^{- β H_{l} / r}$ {{\cal T}_l}(\cdot) = {e^{ - \beta {H_l}/r}}(\cdot){e^{ - \beta {H_l}/r}}

We require the QPDs of each of the linear maps 𝒯₁, …, 𝒯_L into a set of operations {B_i} as in Eq. (1). Suppose that we have computed these decompositions with the associated parameters γ₁, …, γ_L respectively. Then, we define the n-qubit map 𝒫 as the composition of the maps (𝒯₁, …, 𝒯_L) corresponding to the 1^st-order Trotter approximation in Eq. (28). For a Hermitian observable A with ∥A∥ ≤ 1, it follows from Lemma 2 that implementing 𝒫 using Algorithm 2 returns an estimate of the re-scaled expectation value 〈A〉_𝒫 within additive error ϵ · (tr{[𝒫(ρ)]})⁻¹ using $O (γ_{1}^{2 r} \dots γ_{L}^{2 r} ϵ^{- 2})$ {\cal O}\left( {\gamma _1^{2r} \ldots \gamma _L^{2r}{^{ - 2}}} \right) circuits.

In our implementation, we only consider a translation-invariant Hamiltonian, i.e., each local interaction term in H is the same. As a result, each of the exponentials e^−βH_l/r has the same decomposition in a given basis with an associated parameter γ. Now, the Trotter approximation causes the expectation value 〈A〉_𝒫 to differ from the target imaginary-time expectation 〈A〉_𝒯, but this difference can be bounded using known results [67]. We show in Appendix E that for any observable A, 30 $| tr [A T (ρ)] - tr [A (ρ)] | = O (\frac{β^{2} L^{2}}{r} e^{β L / r})$ |{\mathop{\rm tr}\nolimits} [A{\cal T}(\rho )] - {\mathop{\rm tr}\nolimits} [A{\cal P}(\rho )]| = {\cal O}\left( {{{{\beta ^2}{L^2}} \over r}{e^{\beta L/r}}} \right) 31 $| tr [T (ρ)] - tr [(ρ)] | = O (\frac{β^{2} L^{2}}{r} e^{β L / r})$ |{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )] - {\mathop{\rm tr}\nolimits} [{\cal P}(\rho )]| = {\cal O}\left( {{{{\beta ^2}{L^2}} \over r}{e^{\beta L/r}}} \right) again assuming ∥A∥ ≤ 1. Note that we can choose 32 $r = O (\frac{β^{2} L^{2}}{ϵ})$ r = {\cal O}\left( {{{{\beta ^2}{L^2}} \over }} \right) to make the error on the RHS at most O(ϵ). We know from Lemma 2 that Algorithm 2 can estimate the expectation value $〈 A 〉 = \frac{tr (A (ρ))}{tr ((ρ))}$ {\langle A\rangle _{\cal P}} = {{{\mathop{\rm tr}\nolimits} (A{\cal P}(\rho ))} \over {{\mathop{\rm tr}\nolimits} ({\cal P}(\rho ))}} to within error 33 $\frac{2 ϵ}{| tr [(ρ)] |} (1 + \frac{ϵ}{| tr [(ρ)] |})$ {{2} \over {|{\mathop{\rm tr}\nolimits} [{\cal P}(\rho )]|}}\left( {1 + { \over {|{\mathop{\rm tr}\nolimits} [{\cal P}(\rho )]|}}} \right) using 34 $O (\frac{γ^{2 L r}}{ϵ^{2}})$ {\cal O}\left( {{{{\gamma ^{2Lr}}} \over {{^2}}}} \right) circuits where each circuit consists of at most 𝒪(Lr) gates. Now, the algorithm is suitable only when the normalization tr (𝒫(ρ)) is much larger than the target precision ϵ. In that case, for the choice of r in Eq. (32), it follows that 35 $| {〈 A 〉}_{T} - 〈 A 〉 | \leq O (ϵ),$ \left| {{{\langle A\rangle }_{\cal T}} - {{\langle A\rangle }_{\cal P}}} \right| \le {\cal O}(), and moreover, the error in estimating 〈A〉_𝒫 via Eq. (33) can be upper-bounded by 36 $\frac{4 ϵ}{tr [T (ρ)]} (1 + \frac{4 ϵ}{tr [T (ρ)]}) .$ {{4} \over {{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]}}\left( {1 + {{4} \over {{\mathop{\rm tr}\nolimits} [{\cal T}(\rho )]}}} \right).

Even though the cost as stated scales as γ^2Lr, we note that the factor γ depends on the local exponential e^−βH_i/r and is therefore likely to decrease as we increase the Trotter number r. Moreover, our error bounds above can likely be further tightened, in particular because we appeal to standard additive-error bounds for Trotter formula, which may not be well-suited for non-trace-preserving imaginary-time evolution. It may also be possible to improve the bounds by using multiplicative-error estimates for Trotter-Suzuki formula [67,68]. We leave a detailed analysis of the complexity of our algorithm to future work.

3.2.1.

Estimating Thermal Expectation Values

We now show how our QPD implementation of imaginary-time evolution can be used to compute expectation values of observables in the Gibbs state. Recall that the Gibbs state is defined to be 37 $ρ_{β} = \frac{e^{- β H}}{Z_{β}},$ {\rho _\beta } = {{{e^{ - \beta H}}} \over {{{\cal Z}_\beta }}}, where H is the Hamiltonian of the quantum system, β is the inverse temperature, and $Z_{β} = tr (e^{- β H})$ {{\cal Z}_\beta } = {\mathop{\rm tr}\nolimits} \left( {{e^{ - \beta H}}} \right) is the partition function. Preparing the Gibbs state and estimating thermal expectation values for general local Hamiltonians are known to be difficult tasks, and therefore unlikely to admit efficient algorithms in general [14].

Instead of directly preparing the (mixed) Gibbs state, we seek to prepare a thermal pure quantum state (TPQ) [69]. The latter is a suitably constructed n-qubit pure state |ψ〉 which, with high probability, can be used to estimate the Gibbs state expectation values for a set of observables, up to a tunable precision ϵ. Formally, the TPQ state for a given set of observables ${\hat{O}}$ \{ \hat O\} , is defined as a pure state |ψ〉 that satisfies 38 $\Pr [〈 ψ | \hat{O} | ψ 〉 - tr (ρ_{β} \hat{O}) \geq ϵ] \leq f (N, ϵ),$ \Pr \left[ {\langle \psi |\hat O|\psi \rangle - {\mathop{\rm tr}\nolimits} \left( {{\rho _\beta }\hat O} \right) \ge } \right] \le f(N,), where lim_N→∞ f(N, ϵ) = 0. [70,71] It is known that applying imaginary-time evolution to a state chosen uniformly at random from the Haar measure gives a TPQ state [71–73]. Since Haar-random states cannot be efficiently prepared on a quantum computer, we follow the approach of Ref. [74] where the initial Haar-random state is replaced by a random 3-design. Specifically, the authors in Ref. [74] apply a random Clifford operator to a computational basis state followed by imaginary-time evolution, preparing the state, 39 $| ψ_{β} 〉 = \frac{e^{- β H / 2} U | 0 〉}{\sqrt{〈 0 | U^{+} e^{- β H} U | 0 〉}},$ \left| {{\psi _\beta }} \right\rangle = {{{e^{ - \beta H/2}}U|0\rangle } \over {\sqrt {\langle 0|{U^ + }{e^{ - \beta H}}U|0\rangle } }}, where U is the random Clifford operator, and the state |ψ_β〉 is in normalized form. Note that a random n-qubit Clifford operator can be implemented using 𝒪(n²) one- and two-qubit gates [75], a significant gain over the exponential number of such gates required to generate a Haar-random unitary. Leveraging the 3-design property of the Clifford group, it can be shown [74,76] that the above state (Eq (39)) is a valid TPQ state satisfying the definition in Eq. (38).

Ref. [74] used algorithms based on quantum signal processing to implement the ITE operation to prepare the Clifford-random TPQ. Here, we implement ITE using our quasi-probabilistic approach based on applying Algorithm 2 to a Trotterized approximation of ITE. We note that preparing TPQ states is a highly suitable application of Algorithm 2 because the TPQ state error rapidly approaches zero as n becomes large [74]. In addition, for a k-local n-qubit Hamiltonian, all that is needed is the QPDs of the local terms, which are much more tractable to compute than the QPD of the entire n-qubit Hamiltonian. The method we outline here can be used to estimate Gibbs-state expectation values for any k-local Hamiltonian at a given temperature.

Numerical Simulations

We test the performance of our technique in this case by simulating the preparation of TPQ states using Algorithm 2 and comparing their performance with that of TPQ states prepared using ideal ITE. Specifically, we consider TPQ states of the form |ψ_i〉 = e^−0.02HU_i|0〉^⊗n, where H is the 1D Heisenberg Hamiltonian on n ∈ {4,…,8} qubits, and U_i is a randomly chosen Clifford unitary. We generate 10 such random states and average the error in the thermal expectation value, |〈ψ_i|O_j|ψ_i〉 – tr(|ψ_β〉〈ψ_β|O_j)|, over 30 randomly chosen Pauli observables, O_j. For each simulated TPQ state, we use N ∈ {1024,9400,51200,409600,1638400} QPD samples for n ∈ {4,5,6,7,8} qubits, respectively. The results are shown in Figure 1 and show good agreement between our simulation and states prepared using ideal ITE.

Comparison with Prior Work

As highlighted in the Introduction, there has been extensive work on quantum algorithms for imaginary-time evolution and thermal state preparation. Here we summarize relevant prior work and highlight the salient features of our algorithm that distinguish it from those.

4.1.

Imaginary-Time Evolution

Algorithms based on techniques such as block-encoding and quantum signal processing [28,30,32,60,77] have rigorous performance guarantees but usually require complicated controlled-unitaries and a large number of ancilla qubits. The need for ancilla qubits in these algorithms arises because the non-unitary ITE operation has to be embedded into a unitary quantum circuit. The well-known QITE algorithm of Ref. [34] takes a different path to dealing with this nonunitarity. QITE approximates the ITE as a sequence of local non-unitary operations, $\exp (- β H) \approx Π_{j = 1}^{J} T_{j}$ \exp ( - \beta H) \approx \Pi _{j = 1}^J{T_j}, and then embeds each local operation T_j into a unitary U_j that acts on a slightly larger subsystem than the T_j it embeds. As a result, the QITE algorithm does not require any ancilla qubits, but determining the unitary U_j requires performing intermediate measurements on the state prepared at the (j – 1)th step. Like QITE, our algorithm also requires no ancilla qubits and proceeds by approximating exp(–βH) as a product of local operators. The key difference is that we implement the local operations Tj by decomposing them into linear combination of operations with the same locality. In addition, our algorithm is nonadaptive and can be implemented without intermediate measurements. Unlike the block-encoding algorithms mentioned above, we do not require complicated controlled-unitary operations. We note that a number of alternative ways of effecting imaginary-time evolution with quantum circuits have been proposed in the literature [38,39,78], which also share similar benefits of requiring few to no ancilla qubits. We leave a detailed comparison with these algorithms to future work.

4.2.

Thermal State Expectation Values

Our proposed method for evaluating thermal expectation values enjoys the same feature as our ITE algorithm of requiring no ancillary qubits. In contrast, quantum algorithms for Gibbs-state preparation based on block-encoding techniques require at least n ancilla qubits to prepare the Gibbs state of an n-qubit Hamiltonian [25,28,30,77]. Ref. [74] used the TPQ idea with block-encoding algorithms to reduce the number of ancilla qubits from n to a constant for the purpose of estimating thermal expectation values. Using our quasiprobabilistic algorithm for performing ITE instead of block-encoding eliminates the need for any ancillary qubits. We note that a number of recent works have investigated reducing quantum resource requirements for Gibbs-state preparation and partition function evaluation. These include the Gibbs sampling algorithm of Ref. [79], which also requires no ancilla qubits. Other algorithms have combined randomized ideas with block-encoding techniques to reduce ancilla count [37,80].

There has been a large amount of work in recent years on quantum algorithms that prepare Gibbs states by simulating Lindbladian evolution [81–85]. These methods appear promising as they have been shown to prepare certain Gibbs states in polynomial time [86–88]. However, with few exceptions such as Refs. [83,84], these generally require complex subroutines and are thus more suitable for fault-tolerant quantum computers.

Among feasible near-term implementations, there are a number of variational algorithms for Gibbs-state preparation [89–92], which minimize a cost function over parametrized quantum circuits to prepare the Gibbs state. Even though the relatively simple circuits make them attractive for experimentation, the evaluation of cost functions often require a large number of measurements. Moreover, convergence can be hampered by phenomenon such as barren plateaus [93].

We also emphasize that our method only gives access to expectation values of observables in the Gibbs state, and not the Gibbs state itself. Most of the algorithms we mentioned above are perhaps more powerful in that they give access to the state itself. However, our method can prove adequate for most physics applications where the goal would be to extract observables of interest.

Conclusions

In this work, we propose using quasiprobability decompositions to implement imaginary time evolution, a useful primitive in many quantum algorithms. This “algorithmic” use of quasiprobability decompositions inherits the error mitigation properties of related techniques such as probabilistic error cancellation. Our method also extends to implementing linear Hermiticity-preserving maps, of which imaginary time evolution is one example. We provide algorithms for estimating the expectation values of observables with respect to states evolved under one or more applications of a given map. Using our algorithms, we can estimate the expectation value of an observable for imaginary-time evolved states. This has many applications including preparing thermal pure quantum states, which can be used to estimate the Gibbs state expectation value of a set of observables. We simulate the performance of our method for this task, and find good agreement with the ideal imaginary time evolution operation. Our method is well-suited to near-term hardware, as it requires only 1- and 2-qubit gates and measurement, and the quasiprobability decomposition can be performed over a device’s native (noisy) gateset. As a proof-of-principle demonstration, we implement imaginary time evolution using a superconducting qubit processor (provided by IBM), and use this to estimate the ground state energy of the 2-qubit Heisenberg Hamiltonian.

A natural next step will be to extend our proof-of-principle demonstration on hardware to Hamiltonians on more than two qubits. Although this will entail increased sampling costs, when one considers a more specific use-case, namely, of thermal pure quantum states to estimate (thermal) expectation values, the error in the expectation value decays rapidly with the number of qubits, a promising sign for application of our method on near-term devices. There are two hardware-level optimizations that would make the methods presented in this work more efficient. First, in our demonstration using IBM hardware, if a mid-circuit measurement gives the incorrect result, i.e., post-selection is unsuccessful, then we still execute the remainder of the circuit because of the limitations of the measurement-based feedforward. If instead one has the ability to apply quantum circuits based on the results of mid-circuit measurements, then this waste of resources can easily be avoided. Second, sampling all of the required circuits in software and then loading each circuit individually into hardware is highly inefficient. If instead the sampling is implemented at the level of the field-programmable gate-array control system (as was recently demonstrated for randomized compiling [94]), then we expect that the overall runtime of the circuits will be significantly reduced.

Quasiprobabilistic Imaginary-Time Evolution on Quantum Computers

Full Article

Paradigm

My account