Where are we with Shor’s algorithm?

was proposed by Peter Shor in a seminal paper [1] in 1995 as an algorithm for factoring large numbers using quantum computing. In 2025, i.e. 30 years later, early quantum processors are produced and made available to the public, making it possible to test the algorithm for real. Can we successfully run Shor’s algorithm on large numbers? The answer is well-known to be no, because this would require quantum processors with thousands of qubits and very low quantum noise, and we are not there yet. But what about small numbers? And, by the way, how do we implement Shor’s algorithm concretely?
In this post we give a guide to the implementation of Shor’s algorithm, with a special emphasis on the realisation of the order-finding quantum circuit and the modular arithmetic computations that are at the core of the algorithm. We link a git repository with the full implementation in Qiskit. Finally we run some simulations and provide the results of actual computations on IBM quantum hardware, as of 2025.
In order to follow this post, it would be ideal if you know some basics of quantum computing, namely qubit manipulations and standard unitary gate operations, and a bit of linear algebra.
Disclaimer 1: Although I aim at a scientific level of rigorousness, this is not an academic paper and I do not pretend to be an expert in the field.
Disclaimer 2: No artificial intelligence was used to produce this post.
A breakthrough algorithm for number factoring
Shor’s algorithm is an algorithm which aims at finding a non-trivial factor of an integer N. While it is easy to find a factor for a small enough N, by trying possible values up to N1/2 for instance, the problem becomes impractical when N is really big, like N ~ 21000, simply because the search space becomes too big. A human lifetime would not suffice to find a factor of such a large integer, using a classical computer.
If we denote by n = ⌈log2(N)⌉ the number of digits to write N in binary notation, the best known classical algorithm has time complexity exp(c n1/3 (log n)2/3), for some constant c, i.e. exponential time complexity. By “classical” algorithm one means an algorithm which can be run on a traditional computer, namely an algorithm which is made of arithmetic operations on numbers. This is basically all algorithms, except quantum algorithms.
Shor’s algorithm is a quantum algorithm, which means that it involves operations on a quantum system, in this case on qubits. Moreover, it is a probabilistic algorithm, which means that it is guaranteed to work with high probability, but occasionally it can fail.
Shor’s algorithm is able to find a factor of N in time complexity O(n2 log n) with high probability, in its most efficient implementation, i.e. in polynomial time complexity. This is probably the most impressive example of a quantum algorithm beating classical algorithms, so far.
Moreover, it happens that the problem of factoring large integers is at the heart of public key cryptography and the RSA algorithm, which is widely used today to secure Internet communications. Shor’s algorithm, if successfully run on a quantum computer holding a few thousand qubits, could break this encryption scheme. This makes it one of the most popular applications in quantum computing.
While we are still somewhat far from having a quantum computer with the capacity to run Shor’s algorithm on large numbers, the current progress makes it possible to try it on small numbers. So it is about time to take a deeper look at Shor’s algorithm, see how it works, how we can actually implement it and run it on quantum hardware.
Taking advantage of the IBM quantum platform, which allows for a limited amount of free access to real quantum processors (QPUs), I have implemented the full algorithm with the Qiskit SDK and tried it on small numbers. The full implementation is available in this qiskit-shor repository.
Let’s dive now into the algorithm.
The algorithm
Introductions to Shor’s algorithm are numerous on the web (see e.g. Wikipedia). We will only write down the algorithm itself and explain the ideas involved, but we will not explain why it works, nor give a proof of every statement.
Consider a composite integer N, i.e. an integer which admits non-trivial prime factors. Also require that N is odd and is not the power of a prime number (in those cases, we can easily find a non-trivial factor).
Then, the algorithm steps are:
- Choose a random integer 1 < A < N and compute gcd(A, N). If gcd(A, N) > 1, then gcd(A, N) is a non-trivial factor of N and we are done (lucky case). Otherwise A and N are coprime (typical case).
- Find the order of A in ZN, i.e. the smallest integer 1 < r < N such that Ar = 1 mod N, using the phase estimation quantum algorithm (see below).
- If r is odd, go back to step 1 and choose another A. Otherwise, compute d = gcd(Ar/2 − 1, N). If d > 1, then d is a non-trivial factor of N. If not, start again in step 1.
Peter Shor proposed this algorithm in a seminal paper [1] and proved that it finds a factor of N in polynomial time, with high probability.
Among the steps listed above only the order finding step requires a quantum operation. The other steps are classical operations, which are easily performed on a classical computer.
The quantum algorithm consists in applying unitary operations on a set of qubits representing numbers in binary notation. The integer
[ x = sum_{i=0}^{m-1} x_i 2^i, quad x_i in {0,1} ,,]
is represented by the m-qubit state
[ |xrangle = |x_0rangle |x_1rangle …|x_{m-1}rangle ]
For instance, the number 13 = 20 +22 + 23 in represented by the “quantum integer” (4-qubit state) |13⟩ = |1⟩|0⟩|1⟩|1⟩.
To find the order r of the integer A in ZN, the idea of Shor’s algorithm is to leverage the fact that the unitary operator
[ U_A : x rightarrow Ax ,, text{mod} ,, N ]
admits eigenvalues of the form exp(2πi j/r), with 0 ≤ j < r, and that the vector |1⟩ (the “quantum integer 1”) can be expressed as a linear combination of the corresponding eigenvectors. The algorithm tries to estimate at least one of the fraction j/r. This is referred to as phase estimation, as we estimate the phase Φ = 2π j/r of an eigenvalue exp(iΦ) of the operator UA.
We will provide the actual quantum circuit that needs to be run in the next section. The quantum algorithm consists in running this circuit and measuring its m output bits, forming a number k in binary notation. This number k is such that k/2m should be close to j/r for some 0 ≤ j < r , with high probability. Unless we end up in some unlucky case (like j=0), we can extract the value of r by finding the fraction of the form a/b, with b < N, closest to k/2m and identifying r=b (there are libraries doing this efficiently). Typically, one would run the order finding circuit and measure its outcomes many times to find r with very large probability. The expectation is that the distribution of outputs k/2m should have peaks (of same magnitude) at all the values j/r, for 0 ≤ j < r.
All the classical operations are easy to implement and they can be run in log N time, so we only need to worry about the realisation and running of the order-finding quantum circuit. Let’s see how it looks.
Order-finding quantum circuit
The order-finding circuit is depicted in the following figure. Several modifications can be brought to the circuit to improve its efficiency, but let’s leave this for later.
There are two groups of qubits: one group of n qubits initialised to the quantum integer state |1⟩, i.e. |x⟩ with x=1, called the target qubits (lower qubits in the figure), and another group of 2n qubits, called the control qubits (upper qubits in the figure). The control qubits are initialised to the superposition of all integer states, using Hadamard gates,
[ (H|0rangle)^{otimes 2n} = left( frac{|0rangle + |1rangle}{sqrt 2}right)^{otimes 2n} = frac{1}{2^n} sum_{x =0}^{2^{2n}-1} |xrangle ]
and are then used as control qubits for unitary operations UD acting on the target qubits sequentially,
[
CU_{D}|brangle|xrangle = left{
begin{align}
& |0rangle , |xrangle & text{if $b = 0$} \
& |1rangle , |Dx, text{mod}, Nrangle & text{if $b = 1$}
end{align}
right.
,,
]
with D taking values
[ D = A^{2^i} , text{mod} , N ,, quad i = 0, …, 2n-1, ]
Finally the control qubits are transformed by an inverse QFT gate and are all measured (indicated by circles with “m” in the figure).
One could replace the number of control qubits 2n by another number of control qubits m. The more control qubits, the better will be the estimated phase k/2m of the UA operator. The choice m=2n is a compromise between precision and circuit size.
Why this circuit performs phase estimation is the topic of many introductions to Shor’s algorithm. The IBM lecture is a very good place to study this topic.
In the above circuit, all ingredients are easily implemented in the Qiskit SDK, except for the unitary operator UA (and its controlled version CUA). This is where all the difficulty stands. The majority of academic research on Shor’s algorithm revolves around optimising the implementation of UA. The above depiction of the order finding circuit suggests that 3n qubits are enough to implement the quantum algorithm, but the UA gate itself actually requires n + c extra qubits, termed ancillary or ancilla qubits (the exact constant c depends on the chosen implementation).
Before discussing the circuit implementation of UA, let us mention that there is an important simplification of the above circuit. The 2n control qubits can be replaced by a single qubit (!), which undergoes a series of unitary gate actions, measurements and reset, leading to a large reduction in the number of qubits required to run Shor’s algorithm, but at the price of introducing control flow operations, namely gate actions depending on intermediate measurements. While this may look like a cheap price to pay in theory, in practice control flow gates are operationally more complex to handle than standard unitary operations. The “one-control-qubit” circuit is depicted in the following figure.

where mi ∈ {0, 1} are successive measurement values of the unique control qubit, and Xm is the X gate (= NOT gate) if m=1, or the identity gate if m=0. Applying Xm resets the control qubit to |0⟩. The Ri are phase gates (= RZ rotation gates) with phase parameters depending on all the previous measurement values. The details and the explanations for this simplification can be found, for instance, in [3].
The UA gate and quantum modular arithmetics
Note: This section is more technical. Some readers might want to skip this part on a first read and go directly to the next section.
Let us now focus on the circuit implementing the unitary action
[ U_A : x rightarrow Ax ,, text{mod} ,, N ]
The insightful reader might ask whether this is really a unitary operation, as the operation does not look very bijective, due to the modulo N operation. This is quite an important point, as, using only unitary gates, we can only implement unitary operations. The fact is that UA, acting on the space of integers in [0, N), is a bijective function, if and only if A and N are coprime integers, which is true for the values considered in Shor’s algorithm. The inverse operation is UB, with B the inverse of A in ZN, i.e. B is an integer such that BA = 1 mod N. The fact that A and N are coprime guarantees the existence of B.
So it is theoretically possible to build a unitary operator UA acting on the space S spanned by the vectors |x⟩, with 0 ≤ x < N. In the order-finding circuit, a sequence of (controlled) UD operations, with various values of D, are applied to the initial state |1⟩, which is in S. The action of UA on states |x⟩, with x ≥ N, does not matter, as this case will not occur. We can ignore how our UA gate circuit acts on those larger integer states.
To keep the presentation short enough, we will only present the most important features of the implementation of UA, leaving some pointers to relevant papers to the interested reader.
The important building block that needs to be implemented is the modular addition:
[ text{add}(Y,N): quad |xrangle rightarrow |(x+Y) , text{mod} , N rangle ]
with 0 ≤ Y < N a “classical” integer, i.e. a given parameter, and 0 ≤ x < N a “quantum” integer, i.e. an integer represented by the quantum state |x⟩, as described in previous sections. To implement this operation, we need at least n = ⌈log2(N)⌉ qubits to represent quantum integers modulo N, so we will assume that n is the size of the quantum register we work with, i.e. the number of qubits holding x. This means we can represent quantum integers between 0 and 2n-1.
There are two “schools of thought” for implementing this operation. The “Clifford+T” approach uses only the gates NOT, H, S, S-1, CNOT and Toffoli, while the “Quantum Fourier Transform” (QFT) approach performs addition via parametrised phase gates P(λ) on the Fourier space representation of the input integer (more on this below).
The Clifford+T approach essentially uses a somewhat straightforward “school-boy” procedure, adding the bits of two quantum numbers written in binary notation and keeping track of carry-over units in ancilla qubits. In the implementation [6], the whole procedure requires around 10n gates and one ancilla qubit, and has depth roughly 2n (these notions are discussed in the section on algorithm complexity below). This approach can be adapted to adding a classical and a quantum number.
The QFT approach was proposed in the work of Draper [4]. It starts by acting on |x⟩ with a QFT gate, which is composed of H and P(π/2j) gates, producing a superposition of states
[ QFT|xrangle = frac{1}{2^{n/2}} sum_{y=0}^{2^n-1} e^{2ipi frac{xy}{2^n}} |yrangle ]
In this representation, the addition of a classical number Y can be implemented with n phase rotation gates acting on the n qubits of the register
[ prod_{i=0}^{n-1} Pleft(2pi Yfrac{2^i}{2^n}right) QFT|xrangle = QFT|(x + Y) , text{mod} , 2^n rangle ]
The strategy is thus to sandwich phase rotations between a QFT and a QFT-1 to implement the addition
[ |(x + Y), text{mod}, 2^n rangle = QFT^{-1} prod_{i=0}^{n-1} Pleft(2pi Yfrac{2^i}{2^n}right) QFT |xrangle ]
The implementation of controlled addition is obtained by replacing phase gates by controlled phase gates.
While the number of elementary gates in the QFT and QFT-1 operations is large, O(n2) actually, the subsequent addition in Fourier space requires only n phase gates which can operate in parallel. Several additions, or controlled additions, can be implemented sequentially, along the execution of a quantum circuit, while the quantum integer x is represented in Fourier space, via a sequence of (controlled) phase rotations. Moreover, the whole QFT-adder gate does not require ancilla qubits. This makes the QFT approach a preferred choice in many projects requiring quantum adder circuits.
A recent review by S. Wang et al [5] compares the different state-of-the-art algorithms for quantum arithmetics and provides an extensive bibliography on the topic.
It is important to note that the addition operation described above is always modulo 2n. This is as expected, since we can only represent integers in the range [0, 2n) with n qubits. So the operation that is implemented so far is
[ text{add}(Y): quad |xrangle_n rightarrow |x+Yrangle_n := |(x + Y), text{mod}, 2^n rangle_n ]
where the subscript n indicates the number of qubits in the quantum register.
The next step is to implement the “modulo N” part. This is substantially harder and requires a series of (controlled) additions and (controlled) subtractions. The basic ideas have been described in the work of Beauregard [2].
The steps proposed by Beauregard, with 0 ≤ Y < N and 0 ≤ x < N, are:
- With n = ⌈log2(N)⌉, consider the n+1 qubit state |x⟩n+1. This is one more qubit than necessary to hold x, i.e. there is one “overflow” qubit initialised to |0⟩.
- Add Y-N, using the add(Y-N) operation. This yields |x+Y-N⟩n+1 if x+Y-N≥ 0, or |2n+1-x+N)⟩ if x+Y-N < 0 .
- Compute the sign of x+Y-N in an ancilla qubit |a⟩. This is done via a CNOT gate acting on |a⟩ and controlled by the most significant bit (i.e. the overflow bit) of the n+1-qubit register. After this step |a⟩ = |0⟩ if x+Y ≥ N and |a⟩ = |1⟩ if x+Y < N.
- Add N back to n+1-qubit register, controlled on |a⟩, using a controlled add(N) operation. The resulting state is |(x + Y) mod N⟩|a⟩.
The following steps allow to reset the ancilla qubit to its initial state |0⟩:
- Add -Y using an add(-Y) operation. After this step the n+1-register is in the state |x⟩ if x + Y < N, or in the state |2n+1+x-N)⟩ if x+Y ≥ N.
- Reverse the ancilla bit controlled on the most significant bit of the n+1-qubit register with a CNOT gate. After this step the ancilla bit is always in the state |a⟩ = |1⟩. We reset it to |0⟩ by acting with a NOT gate: |a⟩ = NOT|1⟩ = |0⟩.
- Add Y back using an add(Y) operation. This leads to the final state |(x + Y) mod N⟩|0⟩.
This may look a little involved, but this is arguably the simplest quantum algorithm found so far to perform modular addition.
The last three steps, reverting the ancilla qubit |a⟩ back to its initial state |0⟩, are important, if the ancilla qubit is to be re-used in subsequent modular addition operations, which is the case in Shor’s algorithm. Alternatively, one could use new ancilla qubits for each modular addition, reducing the depths and gate counts of the full order-finding circuit, at the cost of increasing the number of qubits. Minimising the number of qubits is usually the primary objective, as current quantum processors have a limited number of qubits (more on this below).
The implementation of the controlled modular addition is obtained by replacing each addition/subtraction of a factor Y, by a controlled addition/subtraction.
The next step is to implement the operation
[ |xrangle |yrangle -> |xrangle|(y+Ax) ,text{mod}, Nrangle ]
where 0 ≤ y < N is another quantum integer. This is done by decomposing the modular addition y ↦ (y + Ax) mod N into a sequence of modular additions y ↦ (y + 2iA) mod N, implemented as add(2iA, N)|y⟩ = |(y + 2iA) mod N⟩, controlled on the i-th bit |xi⟩ of x = Σi xi 2i. As we saw, the modular addition requires an overflow qubit and an ancilla qubit, so the true operation we implement with this prescription is
[ V_A: |xrangle_n |yrangle_{n+1} |0rangle rightarrow |xrangle_n |(y+Ax) , text{mod}, Nrangle_{n+1} |0rangle ]
with subscripts indicating the number of qubits.
Finally the UA operation is realised with
[
begin{align}
U_A , : quad
|xrangle_n |0rangle_{n+1} |0rangle &rightarrow |xrangle_n |Ax,text{mod}, Nrangle_{n+1} |0rangle \
&rightarrow |Ax,text{mod}, Nrangle_n |xrangle_{n+1} |0rangle \
&rightarrow |Ax,text{mod}, Nrangle_n |0rangle_{n+1} |0rangle
end{align}
]
The first step is the VA operation. The second step uses SWAP gates to exchange the two sets of n qubits. Note that the overflow qubit of the middle state is not swapped (it is in the |0> state at this stage). The third step is the V-B operation with B = A-1, the inverse of A in ZN, using the fact that (x – BAx) mod N = (x – x) mod N = 0. This is the first and only place where the condition that A and N are coprime integers is used (otherwise B does not exist).
Counting the number of qubits used, we see that the full UA operation requires 2n+2 qubits.
Quantum complexity analysis
The “complexity” of a quantum algorithm is usually expressed in terms of three quantities: the number of qubits used, the number of gates used and the depth of the circuit.
The number of qubits used is a clear notion. It is important as quantum processors still have a limited number of qubits to work with (typically in the hundreds as of 2025).
The number of gates is a more fuzzy notion and often refers to the number of single-qubit and two-qubit gates such as H, NOT, P, SWAP, CNOT. This can be deceptive as the physical implementation of those gates may require several “physical” gates, representing qubit operations implemented at the hardware level. Typically two-qubit gates, such as SWAP and CNOT, between distant qubits require a chain of physical two-qubit gates acting on neighbouring qubits. The number of physical gates depends on the “connectivity” of the quantum processors, namely which two-qubit operations are allowed physically. Those correspond to operations on nearby qubits. Even single-qubit gates, such as the Hadamard gate, may require several physical single-qubit gates.
Since the set of physical gates and the hardware connectivity vary from one provider to the other, most of academic research simply focuses on counting the number of “basic” single-qubit and two-qubit gates.
The depth of the quantum circuit refers to the number of sequential operations on the qubits necessary to reach the final measurement step. The more parallelised operations on qubits there are, the lower is the depth. The depth can be roughly visualised as the length of the circuit in a pictorial representation. It is important to minimise the depth of a quantum circuit because the quantum coherence of qubits diminishes with the depth. The more sequential operations there are, the longer it takes to run the circuit before measuring its outcomes and the more time there is for qubits to interact with the environment and loose their coherence.
If we take a look at our implementation of UA , we see that it requires (at least) 2n+2 qubits. The number of gates we used is O(n3) (a factor n2 comes from using the QFT gate) and the depth is O(n2) (a factor n comes from using the QFT gate).
The total order-finding circuit requires one more ancilla qubit, if we use the 1-control qubit simplification, and performs O(n) sequential UA operations, so the total number of qubits required is 2n+3 (4n+2 if we use the circuit with 2n control qubits), the total number of gates is O(n4) and the depth is O(n3).
There is one more standard simplification, which is to use an approximate QFT transformation using only O(n log n) gates, instead of O(n2). This makes the evaluation of the order-finding fraction k/22n a bit less precise, but still good enough for running Shor’s algorithm with high success probability. This brings down the number of gates to O(n3 log n).
Many research papers have come with small improvements on these numbers, but no major breakthrough to my knowledge.
Overall the cost of running the algorithm, both in terms of number of qubits and number of operations is polynomial in n, which was the promise of Shor’s algorithm, showing a quantum advantage on the factoring task for very large integers N.
Simulations and runs on IBM quantum hardware
So far everything we discussed was theoretical, and known 20 years ago. Today several companies are trying to develop quantum processors which could in principle run Shor’s algorithm. In particular the IBM Quantum Platform offers the possibility to perform a limited of number of quantum computations for free on quantum processors (QPUs) with 128 qubits.
The Qiskit SDK provides a convenient interface to implement quantum circuits, perform simulations and run the circuits on IBM quantum hardware. The learning platform offers an introduction to Qiskit for newcomers.
$ pip install qiskit qiskit-ibm-runtime qiskit-aer qiskit[visualization]
To run Shor’s algorithm, we use the open-source qiskit-shor library, implemented by the author of this post. We invite any reader interested in the exact implementation to take a look at the linked repository.
The qiskit-shor API has two functions: find_order(A, N) running the order-finding circuit and returning the order r of A in ZN , together with the distribution of measured outputs, and find_factor(N) which runs Shor’s algorithm in full and returns a factor of N, if one was found.
To increase the chances of success, we run the order-finding circuit many times, between 100 and 1000 times, and observe a distribution of measured outputs. From this distribution, we consider the top 10 most frequent values to try and extract the order.
The library implements both the order-finding circuit with 2n control qubits and the optimised version with one control qubit. However, we mostly use the less optimal version with 2n control qubits, as the IBM platform has stricter usage limitations for circuits using control flow operations.
Simulations
After cloning the qiskit-shor repository, we can start with running simulations, using the qiskit Aer simulator. We run simulations without noise, to test the code. We will work only with small integers as simulating a quantum state of many qubits on a classical computer is computationally intensive and we only want to illustrate the above presentation with simple examples.
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit.visualization import plot_distribution
from qiskit_aer import AerSimulator
from qiskit_aer.primitives import SamplerV2 as AerSampler
from qiskit_shor.shor import find_order
aer_sim = AerSimulator()
pm = generate_preset_pass_manager(backend=aer_sim, optimization_level=1)
sampler = AerSampler()
# Find the order of 7 in Z_15.
r, dist = find_order(A=7, N=15, sampler=sampler, pass_manager=pm, num_shots=100)
plot_distribution(dist)
Start search for the order of 7 in Z_15
Found value 4 for order of 7 in Z_15.

The function returns the order 4, which is correct, 74 = 1 mod 15. The output distribution shows four values for the returned integer k: 00000000, 01000000, 1000000, 11000000, corresponding to the binary notation of the integers 0, 26, 27 and 26+27 respectively. The output has 2n=8 qubits, with n=⌈log2(15)⌉=4. The distribution of approximate fractions k/28 has values 0, 1/4, 1/2, 3/4, with similar number of occurrences. These correspond to the expected phases estimates j/4, j=0, 1, 2, 3, of the operator U7. The common denominator 4 of these fractions gives us the order of 7 in Z15 .
Let us look at a second example and search for the order of 5 in Z21.
r, dist = find_order(A=5, N=21, sampler=sampler, pass_manager=pm, num_shots=500)
plot_distribution(dist)
Start search for the order of 5 in Z_21
Found value 6 for the order of 5 in Z_21.

The quantum search returns the correct value 6. The distribution of outputs has more outcomes and importantly shows some peaks. Replacing the histogram bin labels by the corresponding fractions k/210 (rounded to 4 decimals) the histogram becomes

We see that there are peaks at 0, 1/6, 1/3, 1/2, 2/3 and 5/6, which are all the values of the form j/6, j =0, 1, 2, 3, 4, 5 (the peaks do not appear regularly spaced in the figure, due to bins with zero outcomes dropping). The values j/6 correspond to the eigenvalues of the unitary operator U5, with 6 being the order of 5 in Z21, as expected.
We can simulate the full algorithm for completeness and extract non-trivial factors for N=15 and N=21. The following code runs Shor’s algorithm with a maximum of 3 trial values for A, stopping if a factor of N is found. Each trial consists in running 100 times the order-finding circuit.
from qiskit_shor.shor import find_factor
f = find_factor(
N=15, sampler=sampler, pass_manager=pm, num_tries=3, num_shots_per_trial=100
)
Start search for the order of 4 in Z_15
Found value 2 for the order of 4 in Z_15
Factor found: 3
The value A=4 was randomly selected, then the order 2 was found, finally the factor 3 of N=15 was returned.
f = find_factor(
N=21, sampler=sampler, pass_manager=pm, num_tries=3, num_shots_per_trial=100
)
Start search for the order of 17 in Z_21
Found value 6 for the order of 17 in Z_21
Start search for the order of 19 in Z_21
Found value 6 for the order of 19 in Z_21
Factor found: 3
The value A=17 was tried first, getting the order 6, but gcd(Ar/2 – 1, N) = gcd(4912, 21) = 1, so this does not give a factor of 21. The value 19 was tried next, getting the order 6 again. This time gcd(Ar/2 – 1, N) = gcd(6858, 21) = 3, giving the factor 3 of 21.
Quantum runs
Now that we are convinced that the order-finding circuit is correctly implemented, we can try to run it on actual QPUs.
from qiskit_ibm_runtime import QiskitRuntimeService
from qiskit_ibm_runtime import SamplerV2 as Sampler
# For this to run, you need to setup an IBM Cloud account and generate
# an API token. See
# Load default saved credentials
service = QiskitRuntimeService()
backend = service.least_busy(operational=True, simulator=False, min_num_qubits=127)
print(f"backend: {backend.name}")
pm = generate_preset_pass_manager(target=backend.target, optimization_level=1)
sampler = Sampler(mode=backend)
backend: ibm_brisbane
Let us now try to find the order of 7 in Z15, as in the simulation above.
r, dist = find_order(A=7, N=15, sampler=sampler, pass_manager=pm, num_shots=1000)
plot_distribution(dist)
Start search for the order of 7 in Z_15
Found value 4 for order of 7 in Z_15

The correct order 4 of 7 in Z15 is returned, but the distribution of outcomes is far from the one in the simulation, which had only 4 observed values. Here almost all values between 0 and 28-1 appear, making the plot rather unreadable. There are some small peaks, but the result is largely dominated by noise. It is somewhat miraculous that from the 10 most frequent values, the algorithm is able to extract the correct order.
Other runs for different values of A, for N=15 or N=21, produce similar noisy data. This is because the quantum hardware is subject to quantum noise interfering with unitary gate operations. The more gates there are, the more noise there is. In the N=15 order finding circuit, our implementation has already 2482 gates. This is way to much for the current quantum computation capacities.
from qiskit_shor.shor import order_finding_circuit
qc = order_finding_circuit(A=7, N=15)
print(f"Number of qubits: {qc.num_qubits}") # 4n+2 qubits, with n = ceil(log2(N)) = 4
print(f"Number of gates: {qc.size()}")
print(f"Circuit depth: {qc.depth()}")
Number of qubits: 18
Number of gates: 2482
Circuit depth: 1632
We can try a case with fewer qubits, by searching for the order of 5 in Z6, using the one-control qubit optimised circuit.
from qiskit_shor.shor import order_finding_circuit_one_control
qc = order_finding_circuit_one_control(A=5, N=6)
print(f"Number of qubits: {qc.num_qubits}") # 2n+3 qubits, with n = ceil(log2(N)) = 3
print(f"Number of gates: {qc.size()}")
print(f"Circuit depth: {qc.depth()}")
Number of qubits: 9
Number of gates: 1246
Circuit depth: 861
r, dist = find_order(A=5, N=6, sampler=sampler, pass_manager=pm, num_shots=1000, one_control_circuit=True)
plot_distribution(dist)
Start search for the order of 5 in Z_6
Found value 2 for the order of 5 in Z_6

The correct order 2 is returned, but the distribution is still dominated by the quantum noise.
Concluding thoughts
We have given a tentatively complete presentation of the implementation of Shor’s algorithm, including a somewhat detailed description of the quantum modular arithmetic operations involved. Using the implementation of the qiskit-shor module, we have run some simulations and some actual quantum computations on the IBM Quantum platform. The quantum computations turned out to be dominated by quantum noise, making it impossible, for the time being, to run Shor’s algorithm and extract meaningful results.
The implementation we used was naive, in the sense that it did not include state-of-the-art optimisation. But the observed noisy outputs of this naive implementation when running Shor’s algorithm for very small values of N, indicates that similar results would be obtained, even with extra circuit optimisations.
In the future it might become more realistic to run Shor’s algorithm successfully. The enormous progress in quantum hardware capacities in the recent years leave us with decent hopes to see this happen in a few years time.
References
[1] Original paper: Shor, P. W. (1999). Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM review, 41(2), 303-332.
[2] S. Beauregard, Circuit for Shor’s algorithm using 2n+ 3 qubits. arxiv: quant-ph/0205095.
[3] Parker, S., & Plenio, M. B. (2000). Efficient factorization with a single pure qubit and log N mixed qubits. Physical Review Letters, 85(14), 3049. arxiv: quant-ph/0001066.
[4] T. Draper, Addition on a Quantum Computer, arxiv preprint: quant-ph/0008033
[5] S. Wang et al, A Comprehensive Study of Quantum Arithmetic Circuits, arxiv: quant-ph/2406.03867.
[6] S. Cuccaro et al, A new quantum ripple-carry addition circuit, arxiv: quant-ph/0410184.
Unless otherwise noted, all images are by the author.



