Logistics and complex supply chain-related problems that require optimization are challenging to solve. The Traveling Salesman Problem (TSP) is the most commonly explored use case of combinatorial optimization. The problem appears simple: find the shortest path in a graph that visits each node exactly once and returns to its origin. It is an N P -hard problem, where N P stands for nondeterministic polynomial-time [1]. The hardest of all problems in N P complexity class is N P -complete, while problems that are at least as hard as N P -complete problems and can lie outside N P are N P -hard. The real-world applications of TSP extend to domains, such as transportation, manufacturing, and network design.
A path in a weighted graph that visits each node, representing a city, exactly once, through edges weighted with distances (say, in kilometers) between the cities, before returning to the starting node, is called a Hamiltonian cycle. Clearly, there can be multiple distinct Hamiltonian cycles in a given graph, with different sums of the weights of the constituent edges, and the problem being considered here is then to find the Hamiltonian cycle with the smallest sum of the weights of the constituent edges. Besides undirected graphs, we mainly consider directed graphs here with only one edge between any two nodes in the same direction, since some or all roads in a city-network can be one-way, but the given graph itself need not be fully connected.
Classically, the problem has been tackled by exact as well as heuristic algorithms. Notably, seminal work in linear programming in Ref. [2] introduced cutting planes, laying the groundwork for branch and cut methods [3, 4,5], and branch and bound algorithms [ 6,7]. In particular, Ref. [8] discussed an implementation of the method from Ref. [2], suitable for TSP instances having a million or more cities. There are other approaches to solve TSP in the literature, such as a thermodynamic approach to find approximate solutions using a Monte Carlo algorithm [9].
With the advent of quantum computing and the possibility of solving combinatorial optimization problems faster than classical methods, TSP became a test bed for ample Noisy Intermediate Scale Quantum (NISQ)-era algorithms. Both gate-based approaches and annealer-based approaches have been tested extensively for TSP. Refs. [10, 11,12] use annealer-based approaches for the problem. For example, while Ref. [10] explored the use of Quadratic Unconstrained Binary Optimization (QUBO) models in solving TSP through quantum annealing algorithms and Graph Neural Networks (GNN), Ref. [11] proposed a path-integral Monte Carlo quantum annealing scheme. By contrast, gate-based approaches include the use of Quantum Approximate Optimization Algorithm (QAOA) [13] and Variational Quantum Eigensolvers (VQE) [14]. Fault-Tolerant Quantum Computing (FTQC) algorithms use approaches based on Grover’s search to solve the problem with a quadratic speedup [15,16, 17,18]. In Ref. [16], the eigenstates corresponding to Hamiltonian cycles in the graph are treated as given, but in practice, finding all the Hamiltonian cycles in a graph itself is an N P -complete problem.
In this work, we develop a gate-based FTQC BQP algorithm that solves TSP with exponential speedup. We achieved this by using a novel quantum circuit involving controlled swap gates to yield all candidate Hamiltonian cycles of a given graph of N nodes, quantum phase estimation to capture the sums of edges of these Hamiltonian cycles, and density matrix exponentiation to find the shortest Hamiltonian cycle. The algorithm complexity is O(N 3 log(N)κ/ϵ + 1/ϵ3), that is polynomial in N, when the errors ϵ are polynomially but not exponentially small in N, as required for the overall error probability of our algorithm to be less than or equal to 1/3, and κ is the condition number of the matrix encoding all Hamiltonian cycles of the graph.
Consider that an arbitrary directed graph is given, with N vertices that are connected by edges. Then, there can be a maximum of 2 × N C2 = 2 × N !/(2(N − 2)!) = N (N − 1) number of edges in the graph. A Hamiltonian cycle, if one exists in the graph, would have exactly N vertices and N edges. We here want to find a quantum algorithm that would take time, polynomial in N, to solve the problem of finding out the shortest Hamiltonian cycle, if one exists, in the graph.
We start with log L := ⌈log N ⌉ number of qubits. For example, if we have 4 vertices in a graph, we would use 2 qubits that can have 4 levels: |0〉 = |00〉, |1〉 = |01〉, |2〉 = |10〉, |3〉 = |11〉. We create a unitary operator U, that encodes the distances of all the edges into the phase factors of their eigenstates:
However, some edges from among all the possible 2 × N C2 number of edges may not exist in the actual given graph. We would, then, precompute the sum of all the up to 2 × N C2 number of edges in the graph. Call this sum s, computing which is cheap and efficient even classically. Then, we would encode the phase factor ϕjk ∀j and k < N for each edge that does not exist in the graph as equal to s, and ϕjk = 0, j or k ≥ N in the unitary U . We will assume L = N for simplicity, without loss of generality, in the rest of the paper.
If we now have V = U ⊗N, then there are N ! number of eigenstates, that are possible Hamiltonian cycles from among a total of N 2N (L2N if L > N) number of eigenstates of V, of which (N − 1)! Hamiltonian cycles are unique. For example, for N = 4, there are a total of 48 number of eigenstates of V, but we have the following (4 − 1)! = 6 eigenstates representing unique Hamiltonian cycles starting from city 0:
Notice that there are N PN = N ! = 4! = 24 number of possible permutations of four different vertices |0〉, |1〉, |2〉, |3〉. For example, the eigenstates |01122330〉 and |23300112〉 represent the same Hamiltonian cycle, but the starting points are different, i.e., cities 0 and 2, respectively. So, there are N number of same Hamiltonian cycles, but simply rotated with respect to each other, for each of the (N − 1)! unique Hamiltonian cycles, amounting to a total of N ! Hamiltonian cycle eigenstates of V for a fully connected directed graph.
Now, we initialize N number of data registers, each of ⌈log N ⌉ qubits, to |0〉, |1〉, …, |N − 1〉, respectively. Then, we create all possible permutations of 0, 1, …, N − 1 by using N C2 number of single-qubit ancilla registers, each initialized to state

Quantum circuit to create a superposition of all permutations of 0, 1, 2, …, N − 1 for N = 4.
We will then have the following state in the data registers, upon tracing out the ancilla registers, with
We, then, expand the state ρ from an N -register state to a 2N -register state. For example, for N = 4, we augment a state |0123 〉 to the state |01122330〉, so that it forms a valid eigenstate for the unitary V . We can augment the N - register state by adding N new registers, each of ⌈ log N ⌉ qubits initialised to |0〉, at appropriate places, and applying CN OT gates to create adjacent copies of every register in ρ. This way, we will now have terms like |01122330〉 replacing |0123〉 (for N = 4) in ρ. See Figure 2. Call this new state σ, so that we have, with

Quantum circuit to augment terms like |0123〉 in ρ to |01122330〉 in σ for N = 4.
We use this state σ as the input eigenstates to perform quantum phase estimation of the unitary V, to obtain the following state at the output:
The state γ then has all the valid Hamiltonian cycle eigenstates |ν〉, along with the estimates φ̃ν of their corresponding (normalized-) sums φν of weights of constituent edges ϕjk.
Notice that we use the improved quantum phase estimation method from Ref. [19] in the phase estimation of V, and so, we would have the time variable t = O(ηκ/ɛ) (the factor ηκ arises as a result of using C = ηκ in the controlled rotation later) when simulating the unitary U in the beginning by exponentiating a diagonal matrix Φ, such that U = eiΦt. The matrix Φ to exponentiate to obtain unitary U of (2) is:
In order to find the shortest Hamiltonian cycle, we add to γ an ancilla qubit, initialized in the state (|0〉 〈0| + |1〉 〈1|)/2, and rotate it, conditioned on |φ̃ν〉, to get, with
We then need to find the Hamiltonian cycle |ν〉 that corresponds to φ̃min. Note that if Cφ̃min = O(1), the probability of obtaining 1 as the outcome of measuring the ancilla qubit above can be as small as O(1/(N − 1)!) ≥ O(1/2NC2) = O(1/2O(N2)), which can be more than exponentially small in N . So, we cannot efficiently perform a postselection on the outcome being 1 of measuring the ancilla like in Ref. [19], to obtain the below desired state from the state ξ in (7):
Instead, in order to get the above state ζ in (8), we first exponentiate the ancilla qubit (call it ϱ) from (7) to get the unitary Y = eiϱτ, and perform phase estimation on Y for eigenstate |1〉. We create Y by repeatedly applying the below to state ϱ of ancilla “anc” and another state ς [21,22]:
Here, 𝒮 is the swap operator, which is sparse and so, ei𝒮Δτ can be performed efficiently [19,20]. Also, here, τ = nΔτ, where n = O(τ2/ϵ) is the required number of copies of ϱ, and so, the required number of times (9) needs to be repeated to simulate Y with an error of ϵ. The phase estimation process requires controlled-eiϱτ operations
The phase estimate so obtained would be λ̃min, using which we further apply a rotation to the ancilla qubit in (7), to get:
Note that the controlled rotation in (7) is achieved by considering two ancilla qubits, denoted by A and B, initialized in the state |0A0B〉, and then rotating this state, conditioned on phase estimates |φ̃ν〉, and ignoring one of the ancilla qubits:
We discuss an empirical way to guess C = ηκ. The mid-range of the means (of N edges) of the Hamiltonian cycles is normally close or equal to the mean M of all edges. We use an extra factor ω/(2N) to offset any deviation of 2NM below φ̂max, to get: (φmin/N + φ̂max/N)/2 = ωM/(2N), with ω = max(2N, (e! = 0)θ), where θ = NC2 for undirected graphs, θ/2 = N C2 for directed graphs, e is number of too large (at least ten times the smallest edge) or missing edges. So, a guess of 1/κ is 1/κ̂ = ωM/φ̂max −1. We use η = Ω/ϖ, if ω > 2N, η = (Ω/ϖ) · (ω/(2N)), if ω > 4N, else η = 1, if ω = 2N, where Ω is the sum of all edges including missing edges, ϖ ≤ s is the sum of all edges excluding missing or too large edges. Usually, 1/C = 1/(ηκ̂) would be below, yet close to, normalized φ̃min, so that we repeat our algorithm a few times, slightly raising 1/C each time till we capture φ̃min only. We find that this empirical method rarely fails [23].
We create the unitary U = eiΦt encoding all weights of edges ϕjk, where
, φ̂max is sum of N largest edges upon replacing missing edges ∀j, k < N by s, s is sum of all given edges, log L = ⌈ log N ⌉. We create N copies of U for V = U⊗N, since the number of edges in every Hamiltonian cycle is exactly N .\Phi = \sum\limits_{j,k = 0}^{L - 1} {{{\phi _{jk}}} \over {{{\hat \varphi }_{\max }}}}|jk\rangle \langle jk| We initialize N number of data registers of ⌈ log N ⌉ qubits each to |0〉, |1〉, |2〉, …, |N − 1〉, respectively. We generate all permutations of 0, 1, 2, …, N − 1 using N C2 number of single-qubit ancilla registers, each initialized to
, and applying ⌈ log N ⌉ number of controlled swap gates on each combination of two data registers with one ancilla as control qubit.| + \rangle = {1 \over {\sqrt 2 }}\left( {|0\rangle + |1\rangle } \right) The first register has the state
, where\rho = {1 \over {{2^{^N{C_2}}}}}\sum\nolimits_\mu \beta _\mu ^2|\mu \rangle \langle \mu | . We augment this N - register state by adding N ancilla registers each of ⌈ log(N) ⌉ qubits in state |0〉 and acting CN OT gates to get a 2N -register state\sum\nolimits_\nu \beta _\mu ^2{ = 2^{^N{C_2}}} , where\sigma = {1 \over {{2^{^N{C_2}}}}}\sum\nolimits_\nu \beta _\nu ^2|\nu \rangle \langle \nu | .\sum\nolimits_\nu \beta _\nu ^2{ = 2^{^N{C_2}}} We perform quantum phase estimation [19] on V using σ as input to get
.\gamma = {1 \over {{2^{^N{C_2}}}}}\sum\nolimits_\nu \beta _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | We add to the state γ an ancilla register of n qubits, initialized in the state |0⊗n〉, and rotate it, conditioned on |φ̃ν〉, to get the state
, where\vartheta = \sum\nolimits_\nu \alpha _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | \otimes |\psi \rangle \langle \psi | ,\alpha _\nu ^2: = \beta _\nu ^2{/2^{^N{C_2}}} and C is set to ηκ.|\psi \rangle : = \left[ {\sqrt {1 - {C^2}\tilde \varphi _\nu ^2} {{|0}^{ \otimes n}}\rangle + C{{\tilde \varphi }_\nu }{{|1}^{ \otimes n}}\rangle } \right] We exponentiate the effective state ϱ of each ancilla qubit, using n copies of ϱ from n ancilla qubits, to get a unitary Y := eiϱτ, and perform phase estimation on Y for the eigenstate |1〉. If the phase estimate for |1〉 is 0, we repeat step 5, with a slightly smaller value of C, and then repeat this step, until we can capture φ̃min in step 5 and get λ̃min as the phase estimate in this step.
We further apply a rotation to the ancilla register in the state ϑ from step 5, using λ̃min obtained in step 6, to get
, where\Theta = \sum\nolimits_\nu \alpha _\nu ^2|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | \otimes |\Psi \rangle \langle \Psi | . So, upon tracing out the ancilla register | Ψ 〉 from Θ, we get|\Psi \rangle : = \left[ {\sqrt {1 - {{{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}}} {{|0}^{ \otimes n}}\rangle + {{C{{\tilde \varphi }_\nu }} \over {\sqrt {{{\tilde \lambda }_{\min }}} }}{{|1}^{ \otimes n}}\rangle } \right] , as\zeta = \sum\nolimits_{{{\tilde \varphi }_{\min }}} {{\alpha _\nu ^2{C^2}\tilde \varphi _\nu ^2} \over {{\lambda _{\min }}}}|{\tilde \varphi _\nu }\rangle \langle {\tilde \varphi _\nu }| \otimes |\nu \rangle \langle \nu | .\sum\nolimits_\nu {{\alpha _\nu ^2{C^2}\tilde \varphi _\nu ^2} \over {{{\tilde \lambda }_{\min }}}} = 1 Measure the two registers in the state ζ from step 7 to get φ̃min and a ν. If φ̃min is larger than s/φ̃max, output the decision that there is no Hamiltonian cycle in the graph. Otherwise, output the obtained values of ν and φ̃min ×φ̂max as the desired shortest Hamiltonian cycle of the graph, and the sum of its edges, respectively.
The improved quantum phase estimation method from Ref. [19], that we use above, is as follows. We start with an initial state |Λ0〉|uj〉, where |uj〉 is the j-th eigenstate of the Hermitian matrix Γ, that we exponentiate, and
In Figure 1 for step 2, the swap gates on ⌈ log N ⌉ number of qubits of each data register are applied parallelly. Since there are N C2 number of such sets of swap gates, the complexity of this step is O(N 2). The complexity is independent of N in Figure 2 for step 3, since all the CN OT gates can be applied in parallel. In step 1, creating each copy of U has a complexity of O(2 log(N)t) [19,20], assuming L = N, where each eigenstate has 2 log L qubits. Further, N copies of U, required for V, can be created in parallel. The use of improved quantum phase estimation from Ref. [19] in step 4 along with the controlled rotation with C = ηκ in step 5 require the time variable t in step 1 to be O(ηκ/ε) (see Ref. [19]). In step 5, the complexity is polynomial for the controlled rotation of the ancilla register state, as in Ref. [19]. The circuit depth of the density matrix exponentiation in step 6 of the (single) ancilla qubit is O(log(2)n) = O(τ2/ϵ) = O(1/ϵ3) [21,22], where ϵ is the simulation error for Y, as also the precision error of improved phase estimation of Y for which τ = O(1/ϵ). Here, ϵ or ϵ/2 is an error in trace distance, and so, determines the maximum error probability in simulation or estimation of Y [24]. The eigenvalue of ϱ for eigenstate |1〉 can be as low as O(1/(N − 1)!), that is indistinguishable from 0 with less than O(log((N − 1)!)) phase estimate qubits, that otherwise leads to τ ≥ O((N − 1)!) in usual phase estimation [1]. Since ϵ/2 is maximum error probability, we use ϵ = O(1/poly(N)), not ϵ = O(1/(N − 1)!), but with T = O((N − 1)!) ≤ O(2O(N2)) in improved phase estimation, so that τ = O(poly(N)), as long as the cumulative error probability of our algorithm is 1/3 or lower. The complexities of steps 7 and 8 can be ignored. So, the overall complexity of our algorithm is O(2 log(N)t), which yields O(log(N)ηκ/ϵ) ≤ O(N 3 log(N)κ/ϵ), taking ɛ = ϵ for simplicity, with ϵ = O(1/poly(N)), and η ≤ O(N 3) (since Ω/ϖ ≤ O(N 2s/s) = O(N 2) and ω/(2N) ≤ O(N 2/N) = O(N)). This is because the complexity of step 1 is dominant amongst all steps, if it suffices to have ϵ ≥ O(1/N). Also, κ = O(poly(N)), since it is geometrically unlikely, if not impossible, for a Hamiltonian cycle of N edges to be exponentially larger than another Hamiltonian cycle of N edges in a city-network graph, even with too-large edges, as each Hamiltonian cycle must visit all cities, including too-far ones. When there are e ≤ O(N2) missing edges, we have κ̂ = φ̂max/((e + 1)s − φ̂max), where φ̂max < ps + O(10(N − p)/θ)s with p = min(N, e) (since an edge at least 10 times the smallest edge is treated too-large), so that with θ = O(N2), we get κ̂ < O((N2p + 10(N − p))/(N2(e + 1 − p) − 10(N − p))) = O(poly(N)). Note that we need to classically precompute φ̂max by summing the N largest edges, for which we sort the data, that is expensive. If we use merge sort to sort the N C2 number of edges, the complexity is O(N C2 log(N C2)) = O(N 2 log(N)), which is less than O(N 3 log(N)κ/ϵ). But if ϵ ≤ O(1/N 2), the complexity of step 6 is dominant, so the overall complexity of our algorithm is at least O(1/ϵ3) = O(N 6). This is an exponential speedup over the brute-force complexity of O(N N) for undirected [1], or O(N 2N) for directed graphs.
Note that if there is a city that is very far from the rest of the cities in the network, every Hamiltonian cycle will need to traverse to that far-off city and back, since a Hamiltonian cycle must visit all cities. This is why it is geometrically unlikely for the ratio κ of the longest Hamiltonian cycle to the shortest Hamiltonian cycle to be exponential in N . However, it is not absolutely impossible. It is possible that one of the Hamiltonian cycles has a path to that far-off city via a highly zig-zag route or requires having to cross a mountain, for example. In this case, it is indeed possible for that Hamiltonian cycle to be exponentially larger than the shortest Hamiltonian cycle. Nonetheless, this is practically irrelevant, even if not absolutely unlikely, since the salesman is anyway not going to be crossing the mountain, as he would normally be aware of such an unusually long path that exists and would naturally avoid it anyway. Under these unusual or unlikely circumstances, our empirical method to guess C would yield a very poor estimate of C, and the algorithm complexity can become exponential in N .
We also mentioned that our empirical method rarely fails. See Ref. [23], where we indicate in red the rare cases that fail from among the many examples we consider, while in most cases, indicated in green, it succeeds. These rare cases are when there can be many eigenvalues φ̃ν, and not just the smallest one φ̃min, that are lower than 1/C, and therefore, can get picked up by our controlled rotation. Usually, such a situation may arise in cases of highly skewed data. Otherwise, as can be seen from the many successful cases, indicated in green, our empirical method usually yields an estimate of 1/C that is lower than, but very close to, the φ̃min we are after. This means that if we are unable to capture φ̃min, we need to repeat our algorithm upon slightly raising the estimated value of 1/C, until we are able to capture φ̃min. Since our empirical method normally yields an estimate of 1/C very close to φ̃min, it would usually take only a few repetitions of our algorithm, until the value of 1/C used exceeds φ̃min, so that we are able to capture it. Evidently, the number of repetitions required would depend on the size of the chosen increment in the value of 1/C in every iteration, compared to the value of φ̃min. The size of the increment needs to be chosen carefully, so that in trying to minimize the number of repetitions, we do not end up raising 1/C above multiple (unequal) φ̃ν’s and not just φ̃min, in which case the algorithm would not be able to capture the shortest Hamiltonian cycle only, and may, therefore, not yield the right answer in the end.
Appendix A gives an example of our entire algorithm.
We presented a bounded-error quantum polynomial-time (BQP) algorithm for solving the N P -hard traveling salesman problem. The worst-case brute-force time complexity for this problem is O(N N) for undirected graphs, and O(N 2N) for directed graphs, where N is the number of vertices in the graph. The quantum speedup known to be achievable for this problem is quadratic only, using Grover’s search, yielding a complexity of O(N N/2) for undirected graphs or O(N N) for directed graphs, which are still exponential in N. On the contrary, our algorithm presented here provides with an exponential quantum speedup, by solving the problem in quantum polynomial time. The overall time complexity of our algorithm is O(N 3 log(N) κ/ϵ + 1/ ϵ3), which is indeed polynomial, and not exponential, in N, where the errors ϵ need not be exponentially small, and can be polynomially small, as long as the overall cumulative probability of error of our algorithm does not exceed 1/3.