An Efficient Quantum Algorithm for the Traveling Salesman Problem

Sharma, Anant; Deshpande, Nupur; Ghosh, Sanchita; Das, Sreetama; Roy, Shibdas

doi:10.2478/qic-2025-0004

Full Article

1.

Introduction

Logistics and complex supply chain-related problems that require optimization are challenging to solve. The Traveling Salesman Problem (TSP) is the most commonly explored use case of combinatorial optimization. The problem appears simple: find the shortest path in a graph that visits each node exactly once and returns to its origin. It is an N P -hard problem, where N P stands for nondeterministic polynomial-time [1]. The hardest of all problems in N P complexity class is N P -complete, while problems that are at least as hard as N P -complete problems and can lie outside N P are N P -hard. The real-world applications of TSP extend to domains, such as transportation, manufacturing, and network design.

A path in a weighted graph that visits each node, representing a city, exactly once, through edges weighted with distances (say, in kilometers) between the cities, before returning to the starting node, is called a Hamiltonian cycle. Clearly, there can be multiple distinct Hamiltonian cycles in a given graph, with different sums of the weights of the constituent edges, and the problem being considered here is then to find the Hamiltonian cycle with the smallest sum of the weights of the constituent edges. Besides undirected graphs, we mainly consider directed graphs here with only one edge between any two nodes in the same direction, since some or all roads in a city-network can be one-way, but the given graph itself need not be fully connected.

Classically, the problem has been tackled by exact as well as heuristic algorithms. Notably, seminal work in linear programming in Ref. [2] introduced cutting planes, laying the groundwork for branch and cut methods [3, 4,5], and branch and bound algorithms [ 6,7]. In particular, Ref. [8] discussed an implementation of the method from Ref. [2], suitable for TSP instances having a million or more cities. There are other approaches to solve TSP in the literature, such as a thermodynamic approach to find approximate solutions using a Monte Carlo algorithm [9].

With the advent of quantum computing and the possibility of solving combinatorial optimization problems faster than classical methods, TSP became a test bed for ample Noisy Intermediate Scale Quantum (NISQ)-era algorithms. Both gate-based approaches and annealer-based approaches have been tested extensively for TSP. Refs. [10, 11,12] use annealer-based approaches for the problem. For example, while Ref. [10] explored the use of Quadratic Unconstrained Binary Optimization (QUBO) models in solving TSP through quantum annealing algorithms and Graph Neural Networks (GNN), Ref. [11] proposed a path-integral Monte Carlo quantum annealing scheme. By contrast, gate-based approaches include the use of Quantum Approximate Optimization Algorithm (QAOA) [13] and Variational Quantum Eigensolvers (VQE) [14]. Fault-Tolerant Quantum Computing (FTQC) algorithms use approaches based on Grover’s search to solve the problem with a quadratic speedup [15,16, 17,18]. In Ref. [16], the eigenstates corresponding to Hamiltonian cycles in the graph are treated as given, but in practice, finding all the Hamiltonian cycles in a graph itself is an N P -complete problem.

In this work, we develop a gate-based FTQC BQP algorithm that solves TSP with exponential speedup. We achieved this by using a novel quantum circuit involving controlled swap gates to yield all candidate Hamiltonian cycles of a given graph of N nodes, quantum phase estimation to capture the sums of edges of these Hamiltonian cycles, and density matrix exponentiation to find the shortest Hamiltonian cycle. The algorithm complexity is O(N ³ log(N)κ/ϵ + 1/ϵ³), that is polynomial in N, when the errors ϵ are polynomially but not exponentially small in N, as required for the overall error probability of our algorithm to be less than or equal to 1/3, and κ is the condition number of the matrix encoding all Hamiltonian cycles of the graph.

2.

Method

Consider that an arbitrary directed graph is given, with N vertices that are connected by edges. Then, there can be a maximum of 2 × ^N C₂ = 2 × N !/(2(N − 2)!) = N (N − 1) number of edges in the graph. A Hamiltonian cycle, if one exists in the graph, would have exactly N vertices and N edges. We here want to find a quantum algorithm that would take time, polynomial in N, to solve the problem of finding out the shortest Hamiltonian cycle, if one exists, in the graph.

We start with log L := ⌈log N ⌉ number of qubits. For example, if we have 4 vertices in a graph, we would use 2 qubits that can have 4 levels: |0〉 = |00〉, |1〉 = |01〉, |2〉 = |10〉, |3〉 = |11〉. We create a unitary operator U, that encodes the distances of all the edges into the phase factors of their eigenstates: (1) $U = \sum_{j, k = 0}^{L - 1} e^{i ϕ_{j k}} | j k 〉〈 j k |,$ U = \sum\limits_{j,k = 0}^{L - 1} {e^{i{\phi _{jk}}}}|jk\rangle \langle jk|, where ϕ_jk is the distance between city j to city k. The phase factor ϕ_jk is equal to the phase factor ϕ_jk in case of an undirected graph. For example, if we have N = 4 (= L), i.e, cities represented by index values j = 0, 1, 2, 3, then we create the following diagonal unitary matrix: (2) $\begin{array}{l} U & = e^{i ϕ_{00}} | 00 〉〈 00 | + e^{i ϕ_{01}} | 01 〉〈 01 | + e^{i ϕ_{02}} | 02 〉〈 02 | + e^{i ϕ_{03}} | 03 〉〈 03 | \\ + e^{i ϕ_{10}} | 10 〉〈 10 | + e^{i ϕ_{11}} | 11 〉〈 11 | + e^{i ϕ_{12}} | 12 〉〈 12 | + e^{i ϕ_{13}} | 13 〉〈 13 | \\ + e^{i ϕ_{20}} | 20 〉〈 20 | + e^{i ϕ_{21}} | 21 〉〈 21 | + e^{i ϕ_{22}} | 22 〉〈 22 | + e^{i ϕ_{23}} | 23 〉〈 23 | \\ + e^{i ϕ_{30}} | 30 〉〈 30 | + e^{i ϕ_{31}} | 31 〉〈 31 | + e^{i ϕ_{32}} | 32 〉〈 32 | + e^{i ϕ_{33}} | 33 〉〈 33 | . \end{array}$ \matrix{ U \hfill & { = {e^{i{\phi _{00}}}}|00\rangle \langle 00| + {e^{i{\phi _{01}}}}|01\rangle \langle 01| + {e^{i{\phi _{02}}}}|02\rangle \langle 02| + {e^{i{\phi _{03}}}}|03\rangle \langle 03|} \hfill \cr {} \hfill & { + {e^{i{\phi _{10}}}}|10\rangle \langle 10| + {e^{i{\phi _{11}}}}|11\rangle \langle 11| + {e^{i{\phi _{12}}}}|12\rangle \langle 12| + {e^{i{\phi _{13}}}}|13\rangle \langle 13|} \hfill \cr {} \hfill & { + {e^{i{\phi _{20}}}}|20\rangle \langle 20| + {e^{i{\phi _{21}}}}|21\rangle \langle 21| + {e^{i{\phi _{22}}}}|22\rangle \langle 22| + {e^{i{\phi _{23}}}}|23\rangle \langle 23|} \hfill \cr {} \hfill & { + {e^{i{\phi _{30}}}}|30\rangle \langle 30| + {e^{i{\phi _{31}}}}|31\rangle \langle 31| + {e^{i{\phi _{32}}}}|32\rangle \langle 32| + {e^{i{\phi _{33}}}}|33\rangle \langle 33|.} \hfill \cr }

However, some edges from among all the possible 2 × ^N C₂ number of edges may not exist in the actual given graph. We would, then, precompute the sum of all the up to 2 × ^N C₂ number of edges in the graph. Call this sum s, computing which is cheap and efficient even classically. Then, we would encode the phase factor ϕ_jk ∀j and k < N for each edge that does not exist in the graph as equal to s, and ϕ_jk = 0, j or k ≥ N in the unitary U . We will assume L = N for simplicity, without loss of generality, in the rest of the paper.

If we now have V = U ^⊗N, then there are N ! number of eigenstates, that are possible Hamiltonian cycles from among a total of N ^2N (L^2N if L > N) number of eigenstates of V, of which (N − 1)! Hamiltonian cycles are unique. For example, for N = 4, there are a total of 4⁸ number of eigenstates of V, but we have the following (4 − 1)! = 6 eigenstates representing unique Hamiltonian cycles starting from city 0: (3) $\begin{array}{l} | ν 〉 = | 01122330 〉 & : φ_{ν} = ϕ_{01} + ϕ_{12} + ϕ_{23} + ϕ_{30} : 0 \to 1 \to 2 \to 3 \to 0, \\ | ν 〉 = | 03322110 〉 & : φ_{ν} = ϕ_{03} + ϕ_{32} + ϕ_{21} + ϕ_{10} : 0 \to 3 \to 2 \to 1 \to 0, \\ | ν 〉 = | 02211330 〉 & : φ_{ν} = ϕ_{02} + ϕ_{21} + ϕ_{13} + ϕ_{30} : 0 \to 2 \to 1 \to 3 \to 0, \\ | ν 〉 = | 03311220 〉 & : φ_{ν} = ϕ_{03} + ϕ_{31} + ϕ_{12} + ϕ_{20} : 0 \to 3 \to 1 \to 2 \to 0, \\ | ν 〉 = | 01133220 〉 & : φ_{ν} = ϕ_{01} + ϕ_{13} + ϕ_{32} + ϕ_{20} : 0 \to 1 \to 3 \to 2 \to 0, \\ | ν 〉 = | 02233110 〉 & : φ_{ν} = ϕ_{02} + ϕ_{23} + ϕ_{31} + ϕ_{10} : 0 \to 2 \to 3 \to 1 \to 0. \end{array}$ \matrix{ {|\nu \rangle = |01122330\rangle } \hfill & {:{\varphi _\nu } = {\phi _{01}} + {\phi _{12}} + {\phi _{23}} + {\phi _{30}}:0 \to 1 \to 2 \to 3 \to 0,} \hfill \cr {|\nu \rangle = |03322110\rangle } \hfill & {:{\varphi _\nu } = {\phi _{03}} + {\phi _{32}} + {\phi _{21}} + {\phi _{10}}:0 \to 3 \to 2 \to 1 \to 0,} \hfill \cr {|\nu \rangle = |02211330\rangle } \hfill & {:{\varphi _\nu } = {\phi _{02}} + {\phi _{21}} + {\phi _{13}} + {\phi _{30}}:0 \to 2 \to 1 \to 3 \to 0,} \hfill \cr {|\nu \rangle = |03311220\rangle } \hfill & {:{\varphi _\nu } = {\phi _{03}} + {\phi _{31}} + {\phi _{12}} + {\phi _{20}}:0 \to 3 \to 1 \to 2 \to 0,} \hfill \cr {|\nu \rangle = |01133220\rangle } \hfill & {:{\varphi _\nu } = {\phi _{01}} + {\phi _{13}} + {\phi _{32}} + {\phi _{20}}:0 \to 1 \to 3 \to 2 \to 0,} \hfill \cr {|\nu \rangle = |02233110\rangle } \hfill & {:{\varphi _\nu } = {\phi _{02}} + {\phi _{23}} + {\phi _{31}} + {\phi _{10}}:0 \to 2 \to 3 \to 1 \to 0.} \hfill \cr }

Notice that there are ^N P_N = N ! = 4! = 24 number of possible permutations of four different vertices |0〉, |1〉, |2〉, |3〉. For example, the eigenstates |01122330〉 and |23300112〉 represent the same Hamiltonian cycle, but the starting points are different, i.e., cities 0 and 2, respectively. So, there are N number of same Hamiltonian cycles, but simply rotated with respect to each other, for each of the (N − 1)! unique Hamiltonian cycles, amounting to a total of N ! Hamiltonian cycle eigenstates of V for a fully connected directed graph.

Now, we initialize N number of data registers, each of ⌈log N ⌉ qubits, to |0〉, |1〉, …, |N − 1〉, respectively. Then, we create all possible permutations of 0, 1, …, N − 1 by using ^N C₂ number of single-qubit ancilla registers, each initialized to state $| + 〉 = \frac{1}{\sqrt{2}} (| 0 〉 + | 1 〉)$ | + \rangle = {1 \over {\sqrt 2 }}\left( {|0\rangle + |1\rangle } \right) , and applying ⌈log N ⌉ number of controlled swap gates on each of every combination of two of the N data registers with one ancilla register as the control qubit. See the circuit in Figure 1.

We will then have the following state in the data registers, upon tracing out the ancilla registers, with $\sum_{μ} β_{μ}^{2} {= 2}^{^{N} C_{2}}$ \sum\nolimits_\mu \beta _\mu ^2{ = 2^{^N{C_2}}} : (4) $ρ = \frac{1}{2^{^{N} C_{2}}} \sum_{μ} β_{μ}^{2} | μ 〉〈 μ | .$ \rho = {1 \over {{2^{^N{C_2}}}}}\sum\limits_\mu \beta _\mu ^2|\mu \rangle \langle \mu |.

We, then, expand the state ρ from an N -register state to a 2N -register state. For example, for N = 4, we augment a state |0123 〉 to the state |01122330〉, so that it forms a valid eigenstate for the unitary V . We can augment the N - register state by adding N new registers, each of ⌈ log N ⌉ qubits initialised to |0〉, at appropriate places, and applying CN OT gates to create adjacent copies of every register in ρ. This way, we will now have terms like |01122330〉 replacing |0123〉 (for N = 4) in ρ. See Figure 2. Call this new state σ, so that we have, with $\sum_{ν} β_{ν}^{2} {= 2}^{^{N} C_{2}}$ \sum\nolimits_\nu \beta _\nu ^2{ = 2^{^N{C_2}}} : (5) $σ = \frac{1}{2^{^{N} C_{2}}} \sum_{ν} β_{ν}^{2} | ν 〉〈 ν | .$ \sigma = {1 \over {{2^{^N{C_2}}}}}\sum\limits_\nu \beta _\nu ^2|\nu \rangle \langle \nu |.

We use this state σ as the input eigenstates to perform quantum phase estimation of the unitary V, to obtain the following state at the output: (6) $γ = \frac{1}{2^{^{N} C_{2}}} \sum_{ν} β_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν | .$ \gamma = {1 \over {{2^{^N{C_2}}}}}\sum\limits_\nu \beta _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu |.

The state γ then has all the valid Hamiltonian cycle eigenstates |ν〉, along with the estimates φ̃_ν of their corresponding (normalized-) sums φ_ν of weights of constituent edges ϕ_jk.

Notice that we use the improved quantum phase estimation method from Ref. [19] in the phase estimation of V, and so, we would have the time variable t = O(ηκ/ɛ) (the factor ηκ arises as a result of using C = ηκ in the controlled rotation later) when simulating the unitary U in the beginning by exponentiating a diagonal matrix Φ, such that U = eⁱ^Φt. The matrix Φ to exponentiate to obtain unitary U of (2) is: $\begin{array}{l} Φ & = ϕ_{00} | 00 〉〈 00 | + ϕ_{01} | 01 〉〈 01 | + ϕ_{02} | 02 〉〈 02 | + ϕ_{03} | 03 〉〈 03 | \\ + ϕ_{10} | 10 〉〈 10 | + ϕ_{11} | 11 〉〈 11 | + ϕ_{12} | 12 〉〈 12 | + ϕ_{13} | 13 〉〈 13 | \\ + ϕ_{20} | 20 〉〈 20 | + ϕ_{21} | 21 〉〈 21 | + ϕ_{22} | 22 〉〈 22 | + ϕ_{23} | 23 〉〈 23 | \\ + ϕ_{30} | 30 〉〈 30 | + ϕ_{31} | 31 〉〈 31 | + ϕ_{32} | 32 〉〈 32 | + ϕ_{33} | 33 〉〈 33 |, \end{array}$ \matrix{ \Phi \hfill & { = {\phi _{00}}|00\rangle \langle 00| + {\phi _{01}}|01\rangle \langle 01| + {\phi _{02}}|02\rangle \langle 02| + {\phi _{03}}|03\rangle \langle 03|} \hfill \cr {} \hfill & { + {\phi _{10}}|10\rangle \langle 10| + {\phi _{11}}|11\rangle \langle 11| + {\phi _{12}}|12\rangle \langle 12| + {\phi _{13}}|13\rangle \langle 13|} \hfill \cr {} \hfill & { + {\phi _{20}}|20\rangle \langle 20| + {\phi _{21}}|21\rangle \langle 21| + {\phi _{22}}|22\rangle \langle 22| + {\phi _{23}}|23\rangle \langle 23|} \hfill \cr {} \hfill & { + {\phi _{30}}|30\rangle \langle 30| + {\phi _{31}}|31\rangle \langle 31| + {\phi _{32}}|32\rangle \langle 32| + {\phi _{33}}|33\rangle \langle 33|,} \hfill \cr } which, being diagonal, and so, sparse, U can be simulated efficiently [19,20]. Here, ɛ/2 is the estimation precision error in trace distance, and so, the maximum probability of estimation error. We use ɛ = O(1/poly(N)) to let time t to simulate U be polynomial, and not exponential, in N, as long as the error probability of our overall algorithm is below 1/3.

In order to find the shortest Hamiltonian cycle, we add to γ an ancilla qubit, initialized in the state (|0〉〈0| + |1〉〈1|)/2, and rotate it, conditioned on |φ̃_ν〉, to get, with $α_{ν}^{2} : = β_{ν}^{2} {/ 2}^{^{N} C_{2}}$ \alpha _\nu ^2: = \beta _\nu ^2{/2^{^N{C_2}}} : (7) $ξ = \sum_{ν} α_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν | \otimes (1 - C^{2} {\tilde{φ}}_{ν}^{2}) | 0 〉〈 0 | + C^{2} {\tilde{φ}}_{ν}^{2} | 1 〉〈 1 |] .$ \xi = \sum\limits_\nu \alpha _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | \otimes \left[ {(1 - {C^2}\tilde \varphi _\nu ^2)|0\rangle \langle 0| + {C^2}\tilde \varphi _\nu ^2|1\rangle \langle 1|} \right]. We ensure that all the eigenphases φ̃_ν are normalized to be between 0 and 1 by initially dividing all edges ϕ_jk by φ̂_max, that is taken as the sum of the N largest edges as an estimate of φ_max. If φ̂_max is equal to φ_max and if the phase estimation was perfect, then we have the minimum φ̃_ν, which is φ̃_min, equal to the inverse of the not-too-large condition number (of the matrix of which φ_ν ’s are the eigenvalues of only the valid Hamiltonian cycle eigenstates of V), 1/κ = φ_min/φ_max. Then, the ancilla qubit above will be rotated from |0〉 only for φ̃_min that is equal to 1/κ, if C = κ, since the probability $C^{2} {\tilde{φ}}_{ν}^{2}$ {C^2}\tilde \varphi _\nu ^2 , attached to |1〉, will be equal to 1 only for φ̃_min. That is, the ancilla qubit will not be rotated at all for any φ̃_ν, other than φ̃_min. However, since φ̂_max would almost always be an overestimate of φ_max, we would have 1/κ rather larger than φ̃_min, more so when the variance in the given edges is large, or there are missing edges initially assigned a value of s. This is why we use an extra multiplicative factor η, that we discuss later, to ensure that only φ̃_min is likely going to be less than or equal to 1/(ηκ). So, in order to find and separate out φ̃_min, we choose above C = ηκ, so that the ancilla qubit is rotated from |0〉 only for φ̃_min that is less than or equal to 1/(ηκ). We will discuss shortly how to guess the value of ηκ to use.

We then need to find the Hamiltonian cycle |ν〉 that corresponds to φ̃_min. Note that if Cφ̃_min = O(1), the probability of obtaining 1 as the outcome of measuring the ancilla qubit above can be as small as O(1/(N − 1)!) ≥ O(1/2^NC₂) = O(1/2^O(N²)), which can be more than exponentially small in N . So, we cannot efficiently perform a postselection on the outcome being 1 of measuring the ancilla like in Ref. [19], to obtain the below desired state from the state ξ in (7): (8) $ζ = \sum_{{\tilde{φ}}_{\min}} \frac{α_{ν}^{2} C^{2} {\tilde{φ}}_{ν}^{2}}{λ_{\min}} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν |,$ \zeta = \sum\limits_{{{\tilde \varphi }_{\min }}} {{\alpha _\nu ^2{C^2}\tilde \varphi _\nu ^2} \over {{\lambda _{\min }}}}|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu |, where if only φ̃_min is less than 1/(ηκ) = 1/C, then we would have N terms in the summation, all of which are the same Hamiltonian cycles (with the same sum of weights of edges), just rotated with respect to each other, so that $λ_{\min} : = \sum_{{\tilde{φ}}_{\min}} α_{ν}^{2} C^{2} {\tilde{φ}}_{ν}^{2}$ {\lambda _{\min }}: = \sum\nolimits_{{{\tilde \varphi }_{\min }}} \alpha _\nu ^2{C^2}\tilde \varphi _\nu ^2 . We would not use amplitude amplification either, as used in Ref. [19], since it gives only a quadratic speedup, while we want to get exponential speedup.

Instead, in order to get the above state ζ in (8), we first exponentiate the ancilla qubit (call it ϱ) from (7) to get the unitary Y = e^iϱτ, and perform phase estimation on Y for eigenstate |1〉. We create Y by repeatedly applying the below to state ϱ of ancilla “anc” and another state ς [21,22]: (9) ${Tr}_{anc} e^{i S Δ τ} (ϱ \otimes ς) e^{- i S Δ τ}] = ς - i Δ τ [ϱ, ς] + O (Δ τ^{2}) .$ {\rm{Tr}_{{\rm{anc}}}}\left[ {{e^{i{\cal S}\Delta \tau }}(\varrho \otimes \varsigma ){e^{ - i{\cal S}\Delta \tau }}} \right] = \varsigma - i\Delta \tau [\varrho ,\varsigma ] + O(\Delta {\tau ^2}).

Here, 𝒮 is the swap operator, which is sparse and so, e^i𝒮^Δτ can be performed efficiently [19,20]. Also, here, τ = nΔτ, where n = O(τ²/ϵ) is the required number of copies of ϱ, and so, the required number of times (9) needs to be repeated to simulate Y with an error of ϵ. The phase estimation process requires controlled-e^iϱτ operations $\sum_{ι = 0}^{T - 1} | ι 〉〈 ι | \otimes e^{i ϱ τ ι / T}$ \sum\nolimits_{\iota = 0}^{T - 1} |\iota \rangle \langle \iota | \otimes {e^{i\varrho \tau \iota /T}} (for some T), that is done by acting the conditional swap operator |ι〉〈ι| ⊗ e^i𝒮τ^ι/^T on |ι〉〈ι| ⊗ ϱ ⊗ ς instead, and tracing out ϱ. Note that the eigenvalues of 𝒮 are ±1, so that W^1/T = W ^2¹/T = W ^2²/T = W ^2³/T …, where W = e^i𝒮τ, if we use say τ = 2π.

The phase estimate so obtained would be λ̃_min, using which we further apply a rotation to the ancilla qubit in (7), to get: (10) $χ = \sum_{ν} α_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν | \otimes (1 - \frac{C^{2} {\tilde{φ}}_{ν}^{2}}{{\tilde{λ}}_{\min}}) | 0 〉〈 0 | + \frac{C^{2} {\tilde{φ}}_{ν}^{2}}{{\tilde{λ}}_{\min}} | 1 〉〈 1 |],$ \chi = \sum\limits_\nu \alpha _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | \otimes \left[ {\left( {1 - {{{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}}} \right)|0\rangle \langle 0| + {{{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}}|1\rangle \langle 1|} \right], where since the eigenvalue of the eigenstate |1〉 of the ancilla qubit is evidently $\sum_{ν} \frac{α_{ν}^{2} C^{2} {\tilde{φ}}_{ν}^{2}}{{\tilde{λ}}_{\min}} = 1$ \sum\nolimits_\nu {{\alpha _\nu ^2{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}} = 1 , we get the eigenvalue of the eigenstate |0〉 of the ancilla qubit to be $\sum_{ν} α_{ν}^{2} (1 - \frac{C^{2} {\tilde{φ}}_{ν}^{2}}{{\tilde{λ}}_{\min}}) = \sum_{ν} α_{ν}^{2} - \sum_{{\tilde{φ}}_{\min}} \frac{α_{ν}^{2} C^{2} {\tilde{φ}}_{ν}^{2}}{{\tilde{λ}}_{\min}} = 1 - 1 = 0$ \sum\nolimits_\nu \alpha _\nu ^2\left( {1 - {{{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}}} \right) = \sum\nolimits_\nu \alpha _\nu ^2 - \sum\nolimits_{{{\tilde \varphi }_{\min }}} {{\alpha _\nu ^2{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}} = 1 - 1 = 0 . Thus, the above state, upon tracing out the ancilla qubit, is just the desired state ζ from (8), where every φ̃_ν is equal to φ̃_min, so that measuring the two registers yields the outputs, φ̃_min and an eigenstate ν, corresponding to the desired shortest Hamiltonian cycle in the graph. Clearly, if φ̃_min so obtained is larger than s/φ̂_max, the eigenstate ν so obtained would not be a Hamiltonian cycle, since it would have at least one missing edge assigned a normalized value of s/φ̂_max. So, we decide that there is no Hamiltonian cycle in the graph. Otherwise, we output the obtained ν and φ̃_min × φ̂_max as the shortest Hamiltonian cycle and the sum of its edges, respectively.

Note that the controlled rotation in (7) is achieved by considering two ancilla qubits, denoted by A and B, initialized in the state |0_A0_B〉, and then rotating this state, conditioned on phase estimates |φ̃_ν〉, and ignoring one of the ancilla qubits: (11) $ξ = {Tr}_{B} \sum_{ν} α_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν | \otimes | Φ_{A B} 〉〈 Φ_{A B} |\},$ \xi = {\rm{T}}{{\rm{r}}_B}\left\{ {\sum\limits_\nu \alpha _\nu ^2|{{\tilde \varphi }_\nu }\rangle \langle {{\tilde \varphi }_\nu }| \otimes |\nu \rangle \langle \nu | \otimes |{\Phi _{AB}}\rangle \langle {\Phi _{AB}}|} \right\}, where $| Φ_{A B} 〉 : = \sqrt{1 - C^{2} {\tilde{φ}}_{ν}^{2}} {| 0}_{A} 0_{B} 〉 + C {\tilde{φ}}_{ν} {| 1}_{A} 1_{B} 〉$ |{\Phi _{AB}}\rangle : = \sqrt {1 - {C^2}\tilde \varphi _\nu ^2} {|0_A}{0_B}\rangle + C{\tilde \varphi _\nu }{|1_A}{1_B}\rangle . In order to simulate the unitary Y, we perform density matrix exponentiation of the state ϱ of the ancilla qubit (as pointed out earlier), that requires n = O(τ²/ϵ) copies of ϱ. Clearly, we get two copies of ϱ from above, one by ignoring the qubit B, and the other by ignoring the qubit A, but undoing the action of ϱ^iϛτ on ϱ in between. Thus, we get n copies of ϱ, required for simulation of Y, by considering n ancilla qubits in the state |0^⊗n〉, and then rotating this state, conditioned on |φ̃_ν〉: (12) $ϑ = \sum_{ν} α_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν | \otimes | ψ 〉〈 ψ |,$ \vartheta = \sum\limits_\nu \alpha _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | \otimes |\psi \rangle \langle \psi |, where $| ψ 〉 : = \sqrt{1 - C^{2} {\tilde{φ}}_{ν}^{2}} {| 0}^{\otimes n} 〉 + C {\tilde{φ}}_{ν} {| 1}^{\otimes n} 〉$ |\psi \rangle : = \sqrt {1 - {C^2}\tilde \varphi _\nu ^2} {|0^{ \otimes n}}\rangle + C{\tilde \varphi _\nu }{|1^{ \otimes n}}\rangle . With n ancilla qubits, the state corresponding to χ of (10) is: (13) $Θ = \sum_{ν} α_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν | \otimes | Ψ 〉〈 Ψ |,$ \Theta = \sum\limits_\nu \alpha _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | \otimes |\Psi \rangle \langle \Psi |, where $| Ψ 〉 : = \sqrt{1 - \frac{C^{2} {\tilde{φ}}_{ν}^{2}}{{\tilde{λ}}_{\min}}} {| 0}^{\otimes n} 〉 + \frac{C {\tilde{φ}}_{ν}}{\sqrt{{\tilde{λ}}_{\min}}} {| 1}^{\otimes n} 〉$ |\Psi \rangle : = \sqrt {1 - {{{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}}} {|0^{ \otimes n}}\rangle + {{C{{\tilde \varphi }_\nu }} \over {\sqrt {{{\tilde \lambda }_{\min }}} }}{|1^{ \otimes n}}\rangle .

We discuss an empirical way to guess C = ηκ. The mid-range of the means (of N edges) of the Hamiltonian cycles is normally close or equal to the mean M of all edges. We use an extra factor ω/(2N) to offset any deviation of 2NM below φ̂_max, to get: (φ_min/N + φ̂_max/N)/2 = ωM/(2N), with ω = max(2N, (e! = 0)θ), where θ = ^NC₂ for undirected graphs, θ/2 = ^N C₂ for directed graphs, e is number of too large (at least ten times the smallest edge) or missing edges. So, a guess of 1/κ is 1/κ̂ = ωM/φ̂_max −1. We use η = Ω/ϖ, if ω > 2N, η = (Ω/ϖ) · (ω/(2N)), if ω > 4N, else η = 1, if ω = 2N, where Ω is the sum of all edges including missing edges, ϖ ≤ s is the sum of all edges excluding missing or too large edges. Usually, 1/C = 1/(ηκ̂) would be below, yet close to, normalized φ̃_min, so that we repeat our algorithm a few times, slightly raising 1/C each time till we capture φ̃_min only. We find that this empirical method rarely fails [23].

3.

Algorithm

We create the unitary U = eⁱ^Φt encoding all weights of edges ϕ_jk, where $Φ = \sum_{j, k = 0}^{L - 1} \frac{ϕ_{j k}}{{\hat{φ}}_{\max}} | j k 〉〈 j k |$ \Phi = \sum\limits_{j,k = 0}^{L - 1} {{{\phi _{jk}}} \over {{{\hat \varphi }_{\max }}}}|jk\rangle \langle jk| , φ̂_max is sum of N largest edges upon replacing missing edges ∀_{j, k} < N by s, s is sum of all given edges, log L = ⌈ log N ⌉. We create N copies of U for V = U^⊗N, since the number of edges in every Hamiltonian cycle is exactly N .
We initialize N number of data registers of ⌈ log N ⌉ qubits each to |0〉, |1〉, |2〉, …, |N − 1〉, respectively. We generate all permutations of 0, 1, 2, …, N − 1 using ^N C₂ number of single-qubit ancilla registers, each initialized to $| + 〉 = \frac{1}{\sqrt{2}} (| 0 〉 + | 1 〉)$ | + \rangle = {1 \over {\sqrt 2 }}\left( {|0\rangle + |1\rangle } \right) , and applying ⌈ log N ⌉ number of controlled swap gates on each combination of two data registers with one ancilla as control qubit.
The first register has the state $ρ = \frac{1}{2^{^{N} C_{2}}} \sum_{μ} β_{μ}^{2} | μ 〉〈 μ |$ \rho = {1 \over {{2^{^N{C_2}}}}}\sum\nolimits_\mu \beta _\mu ^2|\mu \rangle \langle \mu | , where $\sum_{ν} β_{μ}^{2} {= 2}^{^{N} C_{2}}$ \sum\nolimits_\nu \beta _\mu ^2{ = 2^{^N{C_2}}} . We augment this N - register state by adding N ancilla registers each of ⌈ log(N) ⌉ qubits in state |0〉 and acting CN OT gates to get a 2N -register state $σ = \frac{1}{2^{^{N} C_{2}}} \sum_{ν} β_{ν}^{2} | ν 〉〈 ν |$ \sigma = {1 \over {{2^{^N{C_2}}}}}\sum\nolimits_\nu \beta _\nu ^2|\nu \rangle \langle \nu | , where $\sum_{ν} β_{ν}^{2} {= 2}^{^{N} C_{2}}$ \sum\nolimits_\nu \beta _\nu ^2{ = 2^{^N{C_2}}} .
We perform quantum phase estimation [19] on V using σ as input to get $γ = \frac{1}{2^{^{N} C_{2}}} \sum_{ν} β_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν |$ \gamma = {1 \over {{2^{^N{C_2}}}}}\sum\nolimits_\nu \beta _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | .
We add to the state γ an ancilla register of n qubits, initialized in the state |0^⊗n〉, and rotate it, conditioned on |φ̃_ν〉, to get the state $ϑ = \sum_{ν} α_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν | \otimes | ψ 〉〈 ψ |$ \vartheta = \sum\nolimits_\nu \alpha _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | \otimes |\psi \rangle \langle \psi | , where $α_{ν}^{2} : = β_{ν}^{2} {/ 2}^{^{N} C_{2}}$ \alpha _\nu ^2: = \beta _\nu ^2{/2^{^N{C_2}}} , $| ψ 〉 : = \sqrt{1 - C^{2} {\tilde{φ}}_{ν}^{2}} {| 0}^{\otimes n} 〉 + C {\tilde{φ}}_{ν} {| 1}^{\otimes n} 〉]$ |\psi \rangle : = \left[ {\sqrt {1 - {C^2}\tilde \varphi _\nu ^2} {{|0}^{ \otimes n}}\rangle + C{{\tilde \varphi }_\nu }{{|1}^{ \otimes n}}\rangle } \right] and C is set to ηκ.
We exponentiate the effective state ϱ of each ancilla qubit, using n copies of ϱ from n ancilla qubits, to get a unitary Y := e^iϱτ, and perform phase estimation on Y for the eigenstate |1〉. If the phase estimate for |1〉 is 0, we repeat step 5, with a slightly smaller value of C, and then repeat this step, until we can capture φ̃_min in step 5 and get λ̃_min as the phase estimate in this step.
We further apply a rotation to the ancilla register in the state ϑ from step 5, using λ̃_min obtained in step 6, to get $Θ = \sum_{ν} α_{ν}^{2} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν | \otimes | Ψ 〉〈 Ψ |$ \Theta = \sum\nolimits_\nu \alpha _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | \otimes |\Psi \rangle \langle \Psi | , where $| Ψ 〉 : = \sqrt{1 - \frac{C^{2} {\tilde{φ}}_{ν}^{2}}{{\tilde{λ}}_{\min}}} {| 0}^{\otimes n} 〉 + \frac{C {\tilde{φ}}_{ν}}{\sqrt{{\tilde{λ}}_{\min}}} {| 1}^{\otimes n} 〉]$ |\Psi \rangle : = \left[ {\sqrt {1 - {{{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}}} {{|0}^{ \otimes n}}\rangle + {{C{{\tilde \varphi }_\nu }} \over {\sqrt {{{\tilde \lambda }_{\min }}} }}{{|1}^{ \otimes n}}\rangle } \right] . So, upon tracing out the ancilla register | Ψ 〉 from Θ, we get $ζ = \sum_{{\tilde{φ}}_{\min}} \frac{α_{ν}^{2} C^{2} {\tilde{φ}}_{ν}^{2}}{λ_{\min}} | {\tilde{φ}}_{ν} 〉〈 {\tilde{φ}}_{ν} | \otimes | ν 〉〈 ν |$ \zeta = \sum\nolimits_{{{\tilde \varphi }_{\min }}} {{\alpha _\nu ^2{C^2}\tilde \varphi _\nu ^2} \over {{\lambda _{\min }}}}|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | , as $\sum_{ν} \frac{α_{ν}^{2} C^{2} {\tilde{φ}}_{ν}^{2}}{{\tilde{λ}}_{\min}} = 1$ \sum\nolimits_\nu {{\alpha _\nu ^2{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}} = 1 .
Measure the two registers in the state ζ from step 7 to get φ̃_min and a ν. If φ̃_min is larger than s/φ̃_max, output the decision that there is no Hamiltonian cycle in the graph. Otherwise, output the obtained values of ν and φ̃_min ×φ̂_max as the desired shortest Hamiltonian cycle of the graph, and the sum of its edges, respectively.

The improved quantum phase estimation method from Ref. [19], that we use above, is as follows. We start with an initial state |Λ₀〉|u_j〉, where |u_j〉 is the j-th eigenstate of the Hermitian matrix Γ, that we exponentiate, and $| Λ_{0} 〉 : = \sqrt{\frac{2}{T}} \sum_{ι = 0}^{T - 1} \sin \frac{π (ι + \frac{1}{2})}{T} | ι 〉$ |{\Lambda _0}\rangle : = \sqrt {{2 \over T}} \sum\nolimits_{\iota = 0}^{T - 1} \sin {{\pi (\iota + {1 \over 2})} \over T}|\iota \rangle for some large T . The state |Λ₀〉 can be prepared up to some error ϵ_Λ in time poly log(T /ϵ_Λ) (see Ref. [19]). We apply the conditional Hamiltonian evolution $\sum_{ι = 0}^{T - 1} | ι 〉〈 ι | \otimes e^{i Γ ι t / T}$ \sum\nolimits_{\iota = 0}^{T - 1} |\iota \rangle \langle \iota | \otimes {e^{i\Gamma \iota t/T}} on the initial state in both registers, and then apply inverse quantum Fourier transform on the first register to get the state $\sum_{q = 0}^{T - 1} υ_{q | j} | q 〉 | u_{j} 〉$ \sum\nolimits_{q = 0}^{T - 1} {\upsilon _{q|j}}|q\rangle |{u_j}\rangle . Defining the estimate r̃_q of the q-th eigenvalue r_q of Γ as ${\tilde{r}}_{q} : = \frac{2 π q}{t}$ {\tilde r_q}: = {{2\pi q} \over t} , we relabel the Fourier basis states |q〉 to get $\sum_{q = 0}^{T - 1} υ_{q | j} | {\tilde{r}}_{q} 〉 | u_{j} 〉$ \sum\nolimits_{q = 0}^{T - 1} {\upsilon _{q|j}}|{\tilde r_q}\rangle |{u_j}\rangle . If the phase estimation is perfect, we have υ_q_|j = 1 if r̃_q = r_j, and 0 otherwise. So, we get the state |r̃_j〉|u_j〉, from which we obtain the estimate of r_j upon measuring the first register.

4.

Algorithm Complexity

In Figure 1 for step 2, the swap gates on ⌈ log N ⌉ number of qubits of each data register are applied parallelly. Since there are ^N C₂ number of such sets of swap gates, the complexity of this step is O(N ²). The complexity is independent of N in Figure 2 for step 3, since all the CN OT gates can be applied in parallel. In step 1, creating each copy of U has a complexity of O(2 log(N)t) [19,20], assuming L = N, where each eigenstate has 2 log L qubits. Further, N copies of U, required for V, can be created in parallel. The use of improved quantum phase estimation from Ref. [19] in step 4 along with the controlled rotation with C = ηκ in step 5 require the time variable t in step 1 to be O(ηκ/ε) (see Ref. [19]). In step 5, the complexity is polynomial for the controlled rotation of the ancilla register state, as in Ref. [19]. The circuit depth of the density matrix exponentiation in step 6 of the (single) ancilla qubit is O(log(2)n) = O(τ²/ϵ) = O(1/ϵ³) [21,22], where ϵ is the simulation error for Y, as also the precision error of improved phase estimation of Y for which τ = O(1/ϵ). Here, ϵ or ϵ/2 is an error in trace distance, and so, determines the maximum error probability in simulation or estimation of Y [24]. The eigenvalue of ϱ for eigenstate |1〉 can be as low as O(1/(N − 1)!), that is indistinguishable from 0 with less than O(log((N − 1)!)) phase estimate qubits, that otherwise leads to τ ≥ O((N − 1)!) in usual phase estimation [1]. Since ϵ/2 is maximum error probability, we use ϵ = O(1/poly(N)), not ϵ = O(1/(N − 1)!), but with T = O((N − 1)!) ≤ O(2^O(N²)) in improved phase estimation, so that τ = O(poly(N)), as long as the cumulative error probability of our algorithm is 1/3 or lower. The complexities of steps 7 and 8 can be ignored. So, the overall complexity of our algorithm is O(2 log(N)t), which yields O(log(N)ηκ/ϵ) ≤ O(N ³ log(N)κ/ϵ), taking ɛ = ϵ for simplicity, with ϵ = O(1/poly(N)), and η ≤ O(N ³) (since Ω/ϖ ≤ O(N ²s/s) = O(N ²) and ω/(2N) ≤ O(N ²/N) = O(N)). This is because the complexity of step 1 is dominant amongst all steps, if it suffices to have ϵ ≥ O(1/N). Also, κ = O(poly(N)), since it is geometrically unlikely, if not impossible, for a Hamiltonian cycle of N edges to be exponentially larger than another Hamiltonian cycle of N edges in a city-network graph, even with too-large edges, as each Hamiltonian cycle must visit all cities, including too-far ones. When there are e ≤ O(N²) missing edges, we have κ̂ = φ̂_max/((e + 1)s − φ̂_max), where φ̂_max < ps + O(10(N − p)/θ)s with p = min(N, e) (since an edge at least 10 times the smallest edge is treated too-large), so that with θ = O(N²), we get κ̂ < O((N²p + 10(N − p))/(N²(e + 1 − p) − 10(N − p))) = O(poly(N)). Note that we need to classically precompute φ̂_max by summing the N largest edges, for which we sort the data, that is expensive. If we use merge sort to sort the ^N C₂ number of edges, the complexity is O(^N C₂ log(^N C₂)) = O(N ² log(N)), which is less than O(N ³ log(N)κ/ϵ). But if ϵ ≤ O(1/N ²), the complexity of step 6 is dominant, so the overall complexity of our algorithm is at least O(1/ϵ³) = O(N ⁶). This is an exponential speedup over the brute-force complexity of O(N ^N) for undirected [1], or O(N ^2N) for directed graphs.

5.

Discussion

Note that if there is a city that is very far from the rest of the cities in the network, every Hamiltonian cycle will need to traverse to that far-off city and back, since a Hamiltonian cycle must visit all cities. This is why it is geometrically unlikely for the ratio κ of the longest Hamiltonian cycle to the shortest Hamiltonian cycle to be exponential in N . However, it is not absolutely impossible. It is possible that one of the Hamiltonian cycles has a path to that far-off city via a highly zig-zag route or requires having to cross a mountain, for example. In this case, it is indeed possible for that Hamiltonian cycle to be exponentially larger than the shortest Hamiltonian cycle. Nonetheless, this is practically irrelevant, even if not absolutely unlikely, since the salesman is anyway not going to be crossing the mountain, as he would normally be aware of such an unusually long path that exists and would naturally avoid it anyway. Under these unusual or unlikely circumstances, our empirical method to guess C would yield a very poor estimate of C, and the algorithm complexity can become exponential in N .

We also mentioned that our empirical method rarely fails. See Ref. [23], where we indicate in red the rare cases that fail from among the many examples we consider, while in most cases, indicated in green, it succeeds. These rare cases are when there can be many eigenvalues φ̃_ν, and not just the smallest one φ̃_min, that are lower than 1/C, and therefore, can get picked up by our controlled rotation. Usually, such a situation may arise in cases of highly skewed data. Otherwise, as can be seen from the many successful cases, indicated in green, our empirical method usually yields an estimate of 1/C that is lower than, but very close to, the φ̃_min we are after. This means that if we are unable to capture φ̃_min, we need to repeat our algorithm upon slightly raising the estimated value of 1/C, until we are able to capture φ̃_min. Since our empirical method normally yields an estimate of 1/C very close to φ̃_min, it would usually take only a few repetitions of our algorithm, until the value of 1/C used exceeds φ̃_min, so that we are able to capture it. Evidently, the number of repetitions required would depend on the size of the chosen increment in the value of 1/C in every iteration, compared to the value of φ̃_min. The size of the increment needs to be chosen carefully, so that in trying to minimize the number of repetitions, we do not end up raising 1/C above multiple (unequal) φ̃_ν’s and not just φ̃_min, in which case the algorithm would not be able to capture the shortest Hamiltonian cycle only, and may, therefore, not yield the right answer in the end.

Appendix A gives an example of our entire algorithm.

6.

Conclusions

We presented a bounded-error quantum polynomial-time (BQP) algorithm for solving the N P -hard traveling salesman problem. The worst-case brute-force time complexity for this problem is O(N ^N) for undirected graphs, and O(N ^2N) for directed graphs, where N is the number of vertices in the graph. The quantum speedup known to be achievable for this problem is quadratic only, using Grover’s search, yielding a complexity of O(N ^N/2) for undirected graphs or O(N ^N) for directed graphs, which are still exponential in N. On the contrary, our algorithm presented here provides with an exponential quantum speedup, by solving the problem in quantum polynomial time. The overall time complexity of our algorithm is O(N ³ log(N) κ/ϵ + 1/ ϵ³), which is indeed polynomial, and not exponential, in N, where the errors ϵ need not be exponentially small, and can be polynomially small, as long as the overall cumulative probability of error of our algorithm does not exceed 1/3.

An Efficient Quantum Algorithm for the Traveling Salesman Problem

Full Article

Paradigm

My account