Grover’s quantum search algorithm is renowned for providing a quadratic speedup in solving unstructured search problems [1,2]. A notable application of this speedup is in addressing the string-matching problem [3,4], where the goal is to find a pattern P of length M within a given string L of length N. This problem is fundamental in computer science due to its wide-ranging applications [5], including information retrieval [6], plagiarism detection [7], text mining [8], intrusion detection [9,10], language translation [11], and DNA sequence analysis [12–15], among others.
A significant challenge in implementing the quantum string matching (QSM) algorithm is the requirement for fault-tolerant quantum computation due to the inherent fragility of quantum information. In fault-tolerant quantum computations, circuits are constructed using a universal and fault-tolerant gate set. The Clifford + T gate set is commonly employed in various leading fault-tolerant schemes [16,17]. Within these approaches, Clifford gates are relatively straightforward to implement, often through transversal methods. In contrast, T gates are resource-intensive and require techniques like magic state distillation, which significantly increase the implementation costs [18–21]. Consequently, optimizing the number of T gates (T-count) is crucial for the efficient execution of large-scale quantum algorithms, including QSM.
Several studies have developed QSM algorithms [3,4,22], with the most notable contribution being the work of Niroula and Nam [4]. Their work is particularly significant because it provides an explicit quantum circuit implementation of the QSM algorithm. In their circuit design, they introduce a component called the “cyclic shift operator,” which plays an essential role in constructing the Grover’s operator specific to the QSM. The construction of these operators in the QSM circuit is the primary factor that increases the T-count within the QSM algorithm.
In this paper, we propose a method to reduce the T-count for the QSM circuit, based on the circuit design proposed by Niroula and Nam [4]. The QSM circuit in Niroula and Nam’s work requires approximately 14N3/2 log2 N T gates, the majority of which arise from the synthesis of controlled-SWAP (Fredkin) gates in cyclic shift operators. They synthesized each Fredkin gate into Clifford + T gates using seven T gates (see Figure 1a). As part of our strategy to reduce the T-count, we introduce a new concept: the relative-phase Fredkin gate, inspired by the relative-phase Toffoli gate [23]. We synthesize a relative-phase Fredkin gate that contains only four T gates (see Figure 1b) and prove that each Fredkin gate in the QSM circuit can be replaced with any relative-phase Fredkin gate. As a result, we reduce the leading-order term of the T-count for QSM to 8N3/2 log2 N. Additionally, we demonstrate that our approach provides advantages in terms of reducing other circuit costs, such as the depth of T gates (T-depth) and the number of CNOT gates (CNOT-count).

Decomposition of the Fredkin gate and the relative-phase Fredkin gate into the Clifford + T gate set. (a) Decomposition of the Fredkin gate used in Ref. [4], which requires seven T gates. (b) Decomposition of the relative-phase Fredkin gate used in this study, which requires four T gates. Implementing the QSM algorithm with the relative-phase Fredkin gate significantly reduces the T-count, a critical factor in minimizing space-time costs in fault-tolerant quantum computing.
The circuit cost reduction presented in this work is achieved by incorporating relative-phase encoding into the quantum states of the QSM algorithm proposed by Niroula and Nam [4]. This approach is feasible because the relevant information about the string and pattern is encoded in basis states rather than amplitudes, preserving measurement outcomes performed in the computational basis. Furthermore, our relative-phase-based optimization strategy effectively reduces circuit costs not only for the QSM algorithm but also for a broader class of quantum algorithms, particularly those employing controlled quantum gates such as Fredkin or similar controlled permutations.
Grover’s algorithm is a quantum algorithm that offers a quadratic speedup over classical search algorithms in unstructured search problems [1,2]. This section briefly explains Grover’s algorithm using general quantum amplitude amplification [2].
Consider a Boolean function f : {0,1,…, 2n – 1} → {0,1}. Within an n-qubit system, two subspaces, G and B, are defined as follows: G ≡ span({|x〉 | f (x) = 1}) and B = span({|x〉 | f (x) = 0}). Note that In = PG + PB, where PG and PB are projectors onto G and B, respectively. Given a unitary operator A acting on the n-qubit system, A|0〉n = |ψ〉 can be expressed as a linear combination of PG|ψ〉 and PB|ψ〉 as shown in Eq. (1). Here, the states |ϕg〉 and |ϕb〉 are normalized states of PG|ψ〉 and PB|ψ〉, respectively.
In this study, we call the operator A the initialization operator.
The primary goal of Grover’s algorithm is to find (or measure) the state in G with high probability. The operators utilized in Grover’s algorithm are defined in Eqs. (2) to (5):
We refer to operator Q as Grover’s operator. The process of Grover’s algorithm can be delineated as follows: First, an n-qubit is set to state |0〉n, after which the operator A is applied to the state |0〉n. Next, Qk is applied to the state A|0〉n = |ψ〉, where k = ⌊π/4θ⌋. Finally, a measurement is conducted on the n-qubit state. The post measurement state becomes a state in G with a probability higher than max(1 – sin2 θ, sin2 θ) [2].
This section describes the QSM algorithm and its circuit representation described in Ref. [4]. String matching aims to locate a pattern P of length M in a given string L of length N. For simplicity of analysis, we assume P and L to be binary digit strings and N = 2n throughout this study.
The core subcircuit of the algorithm is the cyclic shift operator C. The cyclic shift operator C on the (n + N)-qubit system is defined as Eq. (6), where |k〉n is a computational basis state, the additions are modulo N additions, and each xi is a binary number 0 or 1.
To construct the circuit of this operator, the cyclic shift operator C is divided into operators Ck, which are defined as follows.
Constructing a C2j operator requires a maximum of N – 1 controlled-SWAP (Fredkin) gates. The process of constructing the C2j operator is divided into n stages. In the first stage, N/2 digits are moved to the right place using N/2 Fredkin gates. In the second stage, N/4 digits are moved to the right place using N/4 Fredkin gates, and so on. The total number of Fredkin gates used to construct the C2j operator is at most N/2 + N/4 + … + 1 = N – 1 (see Figure 2). Optionally, the Fredkin gates at each stage of constructing a C2j operator can be parallelized using N/2 – 1 ancilla qubits and fan-out operations with N – 2 CNOT gates. The exact number of Fredkin gates required to construct the cyclic shift operator C can be found in Appendix A.

Construction of a C20 operator. In this figure, we show a C20 circuit when N = 8.
To implement Grover’s algorithm, the initialization operator A such that A|0〉(n+N+M) = |ψ〉 needs to be defined. The process of applying A to state |0〉(n+N+M) is as follows (see Figure 3a):
- (i)
Prepare three quantum registers with n, N, and M qubits. We call the n-qubit register the first register, N-qubit register the second register, and M-qubit register the third register. All the states were initialized to state |0〉.
- (ii)
Apply H⊗n to the first register and apply X gates to encode L and P in the second and third registers, respectively. Then, the entire state becomes state |ψ1 in Eq. (9). |L〉N and |P〉M in Eq. (9) are defined in Eqs. (10) and (11), respectively, where each lj and pj is a binary number.
9 \left| {{\psi _1}} \right\rangle = {1 \over {\sqrt N }}\sum\limits_{k = 0}^{N - 1} {|k{\rangle _n}|L{\rangle _N}|P{\rangle _M}} 10 |L{\rangle _N} = \left| {{l_0}{l_1}{l_2} \ldots {l_{N - 1}}} \right\rangle 11 |P{\rangle _M} = \left| {{p_0}{p_1}{p_2} \ldots {p_{M - 1}}} \right\rangle - (iii)
Apply cyclic shift operator C to the first and second registers. Subsequently, the state evolves into state |ψ2〉. in Eq. (12), where the additions are modulo N additions.
12 \left| {{\psi _2}} \right\rangle = {1 \over {\sqrt N }}\sum\limits_{k = 0}^{N - 1} {|k{\rangle _n}\left| {{l_{0 + k}}{l_{1 + k}}{l_{2 + k}} \ldots {l_{N - 1 + k}}} \right\rangle \left| {{p_0}{p_1}{p_2} \ldots {p_{M - 1}}} \right\rangle } \; - (iv)
Apply M CNOT gates to the first M qubits in the second register and to all qubits in the third register. For the CNOT operations, each qubit in the second register serves as the control qubit, and each qubit in the third register serves as the target qubit (see Figure 3b). Subsequently, the state evolves into state |ψ〉 as described in Eq. (13).
13 \matrix{ {|\psi \rangle } \hfill & { = {1 \over {\sqrt N }}\sum\limits_{k = 0}^{N - 1} | k{\rangle _n}\left| {{l_{0 + k}}{l_{1 + k}} \ldots {l_{N + k - 1}}} \right\rangle } \hfill \cr {} \hfill & {\left| {\left( {{p_0} \oplus {l_{0 + k}}} \right)\left( {{p_1} \oplus {l_{1 + k}}} \right) \ldots \left( {{p_{M - 1 + k}} \oplus {l_{M - 1 + k}}} \right)} \right\rangle } \hfill \cr }

Construction of Grover’s operator Q = AR0A−1Rg for QSM algorithm. (a) Structure of the initialization operator A. XL and XP represent X gates to encode L and P, respectively. C represents the cyclic shift operator. XOR transforms state |ψ2〉 in Eq. (12) into state |ψ〉 in Eq. (13). (b) Example of constructing an XOR operation when N = 4 and M = 2. (c) Structure of Grover’s operator Q = AR0A−1Rg for the QSM algorithm. Operators R0 and Rg can be constructed using multi-qubit controlled-Z gates and single-qubit gates. Repeated application of Grover’s operator amplifies the amplitude of the desired matching states.
In the remainder of this study, we use the notation in Eqs. (14) and (15).
We define the state in G as the state in which the third register is in state |0〉M.
As described earlier, Grover’s operator Q is given by AR0A−1Rg (see Figure 3c). If amplitude amplification is carried out using Grover’s operator Q = AR0A−1Rg on the n + N + M qubits, with operator A defined as in Figure 3a, the pattern P can be identified within the given string L. R0 and Rg can be constructed using X gates and a multi-qubit controlled-Z gate. A k-qubit controlled-Z gate is decomposed using two H gates and a k-qubit Toffoli gate. In Ref. [4], multi-qubit Toffoli gates are decomposed using relative-phase Toffoli gates in Ref. [23]. The calculations regarding the size and cost of R0, which is not explicitly presented in Ref. [4], can be found in Appendix B.
First, we define the relative-phase Fredkin gate analogously to the relative-phase Toffoli gate [23]. The Fredkin gate, also known as the controlled-SWAP gate, has a matrix representation on a computational basis given by
A Fredkin gate can be constructed using a Toffoli gate and two CNOT gates (see Figure 4a) [24]. Similarly, a relative-phase Fredkin gate can be constructed using a relative-phase Toffoli gate and two CNOT gates (see Figure 4b). This construction can be easily demonstrated through a straightforward matrix multiplication:

Decomposition of Fredkin gate and relative-phase Fredkin gate. (a) Construction of a Fredkin gate using a Toffoli gate and two CNOT gates reported in Ref. [24]. (b) Construction of a relative-phase Fredkin gate using a relative-phase Toffoli gate and two CNOT gates.
Using this method, as shown in Figure 1b, we constructed a relative-phase Fredkin gate from the relative-phase Toffoli gate described in Ref. [23]. The relative-phase Fredkin gate in Figure 1b has a matrix representation of
Next, we prove that any relative-phase Fredkin gate can replace the Fredkin gates used to construct the cyclic shift operator C in the QSM algorithm.
Any quantum circuit Xr composed solely of relative-phase Fredkin gates can be represented by Eq. (17), where X is a quantum circuit in which all relative-phase Fredkin gates in Xr are replaced with Fredkin gates, and Dr and Dl are circuits with diagonal matrix representation.
Proof. It is sufficient to verify Eq. (18), where RFred represents an arbitrary relative-phase Fredkin gate, Fred represents a Fredkin gate, and D1 and D2 represent circuits with a diagonal matrix representation.
Basic matrix multiplication can prove Eq. (18):
Even if arbitrary relative-phase Fredkin gates replace the Fredkin gates in constructing the cyclic shift operator C in the QSM algorithm described in Ref. [4], the algorithm yields the same results.
Proof. Assume that we replace all Fredkin gates in cyclic shift operator C with relative-phase Fredkin gates. We express operators A and C with relative-phase Fredkin gates as A′ and C′. By Lemma 1, C′ can be written as C · D, where D is an (n + N)-qubit operator with a diagonal matrix representation. Subsequently, A|0〉n |0〉N|0〉M and A′|0〉n|0〉N|0〉M evolve into the states in Eqs. (20) and (21), where each zk is a complex number with norm 1.
Note that θ = θ′ because states |k〉n|Lk)N|Pk)M with different k values are orthonormal. Therefore, Grover’s algorithm using A′ gives the same results as the case using A.
Based on Theorem 1, we can replace every Fredkin gate in the QSM algorithm with the relative-phase Fredkin gate of Figure 1b. Seven T gates are required to synthesize a Fredkin gate into Clifford + T gates (see Figure 1a). However, only four T gates are required to synthesize the relative-phase Fredkin gate of Figure 1b. Therefore, with this replacement, the cyclic shift operator C′ is decomposed into Clifford + T gates using 4(N log2 N – N + 1) T gates. This results in reducing the leading-order term of the T-count in the QSM algorithm from 14N3/2 log2 N to 8N3/2 log2 N, assuming that N1/2 Grover iterations (the number of executions of the Grover operator) are implemented.
To validate the proposed QSM circuit, we simulated the QSM algorithm using the quantum simulator in Qiskit [25]. As shown in Figure 5, the proof-of-principle results exhibit excellent agreement with the expected outcomes of Grover’s search algorithm. Our designed circuit targeted the pattern “11” within the data sequence “00110000”. We varied the number of Grover iterations from 0 to 9 and compared the probabilities of finding the correct answer between the theoretically ideal Grover’s search algorithm and the QSM algorithm implemented with the relative-phase Fredkin gate. Each QSM simulation was executed 10,000 times to estimate the probabilities.

Comparison of the theoretical probabilities of Grover’s search and the results of the QSM simulation with relative-phase Fredkin gates. Each QSM simulation was executed 10,000 times to estimate the probabilities. The designed circuit was intended to identify the pattern “11” within the data sequence “00110000”. The x-axis shows the number of Grover iterations, and the y-axis displays the corresponding probabilities. The red line plot with “x” markers represents the theoretical probabilities of Grover’s search algorithm functioning correctly, while the blue bar graph indicates the simulation results of the QSM algorithm with relative-phase Fredkin gates. The simulation data aligns closely with the theoretical expectations, demonstrating the correctness of the QSM algorithm with relative-phase Fredkin gates.
In this section, we present the T-count required for the implementation of the QSM algorithm with relative-phase Fredkin gates. When Grover’s operator is repeatedly implemented N1/2 times, the leading order term of the total T-count for the QSM algorithm is reduced from 14N3/2 log2 N to 8N3/2 log2 N. Additionally, we observe further T-count reductions in each iteration of A′R0(A′)−1, where the primed operators refer to those using the relative-phase Fredkin gates. To construct C′, we sequentially combine

Circuit of A′R0(A′)−1. The gates within the dashed green box are cancelled out, resulting in a reduction of 2N T-count.
The introduction of the proposed relative-phase Fredkin gate for the QSM algorithm also benefits the T-depth by allowing the parallelization of the T gates without additional ancilla qubits for the construction of the cyclic shift operator. The QSM algorithm described in Ref. [4] parallelizes Fredkin gates in the cyclic shift operator by using N/2 – 1 ancilla qubits and fan-out operations with CNOT gates, which leads to the parallelization of the T gates. In contrast, the use of relative-phase Fredkin gates allows for automatic parallelization of T gates without the need for fan-out operations or additional ancilla qubits, as depicted in Figure 7. Compared to the Fredkin gate in Figure 1a with a T-depth of five, the relative-phase Fredkin gate in Figure 1b has a T-depth of four. Consequently, the T-depth required for the cyclic shift operator is reduced to 4/5 of the result in Ref. [4] without additional ancilla qubits. The use of relative-phase Fredkin gates also offers advantages in terms of other circuit costs, such as the number of CNOT gates. In Table 1, the circuit cost of the QSM algorithms proposed in this paper is compared against those previously reported.

Automatic parallelization of T gates in constructing the cyclic shift operator when using the relative-phase Fredkin gate shown in Figure 1b.
The table presents the required resources for the QSM algorithms, including T-count (the number of T gates), T-depth (the depth of T gates), CNOT-count (the number of CNOT gates), and Qubit-count (the number of qubits). Here, N represents the size of the database to be searched, and M represents the size of the pattern. For the comparison, we assume that N1/2 Grover iterations are implemented.
| Source | Ref. [4] | This Work |
|---|---|---|
| T-count | 14N3/2 log2 N – 14N3/2 + 7N log2 N – 7N + 8N1/2 log2 N + N1/2(8M – 20) + 7 = 14N3/2 log2 N – O(N3/2) | 8N3/2 log2 N – 10N3/2 + 4N log2 N – 4N + 8N1/2 log2 N + N1/2(8M – 26) + 1 = 8N3/2 log2 N – O(N3/2) |
| T-depth | ||
| CONT-count | 16N3/2 log2 N – 14N3/2 + 7N log2 N – 7N + 10N1/2 log2 N + N1/2(8M – 10) + M + 7 = 16N3/2 log2 N – O(N3/2) | 10N3/2 log2 N – 10N3/2 + 5N log2 N – 5N + 6N1/2 log2 N + N1/2(8M– 14) + M + 5 = 10N3/2 log2 N – O(N3/2) |
| Qubit-count | N + log2 N + M |
One might argue that the presence of an O(N3/2) term in the second dominant component of the T-count, appearing in both our result and Ref. [4], could diminish the significance of the improvement achieved in the leading coefficient. However, the difference in T-count between the two approaches is given by N3/2(6 log2 N – 4) + O(N log N), which remains positive even for relatively small values of N and continues to grow as the problem size increases. As illustrated in Figure 8, the T-count difference between our proposed approach and the one in Ref. [4] becomes increasingly significant with larger values of N. Given that the QSM algorithm is expected to exhibit quantum advantage primarily for large-scale instances, this trend indicates that the proposed method offers progressively greater benefits as the input size grows.

Comparison of the T-counts for the QSM algorithm. The red dashed line indicates the T-count derived in this study, while the blue solid line corresponds to the T-count reported in Ref. [4]. For consistency in comparison, the pattern size M is fixed at 100.
The string-matching problem is a widespread challenge in computer science. As the scale of the problem increases, it becomes essential to enhance the efficiency of algorithms by reducing computational costs. In this paper, we have reduced the T-count of the QSM algorithm from 14N3/2 log2 N – O(N3/2) to 8N3/2 log2 N – O(N3/2). Additionally, our approach reduces other circuit costs of the QSM algorithm, including T-depth. Given that T gates dominate the cost of implementing fault-tolerant quantum circuits, our results may expedite the utilization of quantum advantages in solving string-matching problems, potentially impacting fields such as bioinformatics, text processing, and cryptography.
Our primary strategy for reducing circuit costs is to incorporate relative-phase encoding into the quantum states used in the QSM algorithm presented in Ref. [4]. This approach is feasible because the QSM algorithm from Ref. [4] stores information about the string and pattern in basis states rather than in quantum amplitudes. We anticipate that this circuit cost reduction strategy could be broadly applicable to various quantum algorithms. A practical example involving quantum image processing is provided in Appendix C.
To facilitate relative-phase encoding, we introduced the concept of the relative-phase Fredkin gate, inspired by the relative-phase Toffoli gate [23]. Given the extensive use of relative-phase Toffoli gates in optimizing quantum circuits [23,26–28], we expect relative-phase Fredkin gates to have similarly broad applications in quantum circuit optimization. An illustrative example of this application is presented in Appendix D.
By employing the Toffoli gate implementation method presented in Ref. [29] and the Fredkin gate synthesis method presented in Ref. [24], it is possible to execute a Fredkin gate using four T gates. This method can achieve a T-count reduction comparable to our approach. However, this Fredkin gate implementation method necessitates measurement, classically controlled gates, an ancilla qubit, and additional Clifford gates. In contrast, our method, which utilizes the relative-phase Fredkin gate shown in Figure 1b, employs a straightforward circuit synthesis approach. This enables us to reduce the T-count without incurring additional circuit costs. One might also argue that using the method from Ref. [26] allows a Fredkin gate to be implemented with a single T-depth, thereby further reducing the T-depth of the QSM algorithm. However, while this method can create a single Fredkin gate with a lower T-depth, the T gates of different Fredkin gates are not parallelized, ultimately resulting in a higher T-depth compared to our method.
Recently, K. Prousalis et al. proposed a quantum string-matching algorithm based on quantum image processing techniques, where string-matching is performed using a quantum-generated dot-matrix image followed by quantum median filtering to detect similarity patterns [30]. Their approach achieves a time complexity of O(N2) and a space complexity of O(log N) for a string of length N, demonstrating significant improvements in memory efficiency compared to previous quantum methods. They utilized the NEQR (novel enhanced quantum representation) framework [31], which encodes image information in basis states rather than in quantum amplitudes. Therefore, our approach, which incorporates relative-phase encoding in quantum states, may also be applicable to the NEQR-based QSM algorithm proposed by Prousalis et al. [30], potentially reducing quantum circuit complexity in their framework as well. Exploring this possibility would be an interesting direction for future research.