In an earlier project exploring early probabilistic methods of language processing, it became clear how massive the feature-space for word-vectors is. Modern needs to process large volumes of natural language don't scale well using early probabilistic methods for modeling language. In this project I am exploring the possibility of storing and processing large feature-spaces outside of classical computing.

The repository for this project is still under construction but is located here:

https://github.com/Using-Namespace-System/qFeatureSpace

In that earlier project we showed how the angular distance between the high dimensional feature vectors of terms in a term-document matrix represented the syntactical (words that make sense close together) relationship between words. This angular distance can also be represented in a number between 0 and 1. In our calculation approaching one would be angles that are close together and zero would be a 90+ degree separation. This is referred to as cosine similarity. In that earlier project the similarities only landed between 0.1 and 0.0

You can take a look at the previous project here:

https://using-namespace-system.github.io/natural-language-processing-1-term-document-matrix.html

In quantum systems a single bit can store an immense amount of complex probabilistic information. In this project we are exploring storing our feature-space in a quantum system using amplitude embedding. The amount of real information that can be stored in an N qubit system is exponentially related to the number of qubits.

$$2^N$$

From our previous project's dataset we have picked the terms court and murder to encode and compare.

In [ ]:

import numpy as np
import scipy as sp
import pickle
from qiskit import QuantumCircuit, transpile, Aer
from onehotq import plot_state_qsphere,cpu_eig
import math as m

num_qubits = 16

with open('./sparse_word_doc_matrix.obj', 'rb') as file:
    sparse_word_doc_matrix = pickle.load(file)

#court 692
first_word = sparse_word_doc_matrix[[692],:]

#qsphere visualization for court
dense_points = np.array(first_word.nonzero()[1][::-1])
sparse_eigenvecs,eigvals_diag,eigvals_diag_matrix = cpu_eig(dense_points,num_qubits)
res_state = first_word/np.sqrt(np.sum(first_word**2))
display(plot_state_qsphere(sparse_eigenvecs,eigvals_diag,res_state,num_qubits))

#murder 4428
second_word = sparse_word_doc_matrix[[4428],:]

#qsphere visualization for murder
dense_points = np.array(second_word.nonzero()[1][::-1])
sparse_eigenvecs,eigvals_diag,eigvals_diag_matrix = cpu_eig(dense_points,num_qubits)
res_state = second_word/np.sqrt(np.sum(second_word**2))
display(plot_state_qsphere(sparse_eigenvecs,eigvals_diag,res_state,num_qubits))

No description has been provided for this image

The horrifying spaghetti monsters above are one representation of how a word's (court on top and murder on bottom )features can exist in a quantum system in superposition. In the last project I included an equally unwieldy graphic of the entire sparse feature-space. Data in this many dimensions is not exactly human readable, however if cleaned up a 3 dimensional spherical representation of a 500 thousand feature vector might be useful somewhere.

So it's possible to encode our 500 thousand feature vector into 16 qubits. Now What?

Using 17 qubits we could embed the amplitudes of court to the state $|0⟩$ and murder to the state $|1⟩$.

An ancillary qubit is added to prepare the bell state: $$|-⟩ ⊗ (court|0⟩ + murder|1⟩) = ⟨court|0⟩ - ⟨murder|1⟩$$

I believe this would mean our ancillary and our embedded system are entangled and both hold information about each other.

One more qubit is added for control and measure.The control qubit is initialized to zero and prepared with a Hadamard gate before the swap.

Using our control with a swap operation between the ancillary and the encoded features

Some level of kickback happens to the control of the swap giving us a measure who's difference of probabilities is equivalent to the cosine similarity.

In [ ]:

first_words = first_word.copy()

#add empty features so the vector is a power of two
first_words.resize(1,2**num_qubits)

#normalize court vector
first_words = first_words/np.sqrt(np.sum(first_words**2))

second_words = second_word.copy()

#add empty features so the vector is a power of two
second_words.resize(1,2**num_qubits)

#normalize murder vector
second_words = second_words/np.sqrt(np.sum(second_words**2))

#stack court on top of murder and transpose
embed = sp.sparse.vstack((first_words,second_words)).T

#flattening as pairs {(court_1,murder_1),(court_2,murder_2),...(court_n,murder_n)}
#amplitudes of court on the left of each pair is embedded to |0⟩ 
#and amplitudes of murder on the right of each pair is embedded |1⟩ 
#amplitudes of (1,0) = |0⟩  and (0,1) = |1⟩
#the embedded amplitudes are the probability of any feature belonging to court |0⟩  or murder |1⟩
#all of this information is superimposed within this 17 qubit system
embed = embed.reshape(1,(embed.shape[0]*embed.shape[1]))

#normalize embeddings
embed = embed/np.sqrt(np.sum(embed.power(2)))

#prepare bell state court|0⟩ - murder|1⟩ with the ancillary qubit
embedded = sp.sparse.hstack((1/m.sqrt(2)*embed,-1/m.sqrt(2)*embed))

# the control qubit is initialized in the zeo state |0⟩ ⊗ (court|0⟩ - murder|1⟩)
embedded = sp.sparse.hstack((1*embedded,0*embedded))

#sum of squares is one
print(np.sum(embedded.power(2)))

#shape is a power of two
print((embedded.shape,2**19))

1.0000000000000424
((1, 524288), 524288)

This project may have been much harder to get results from without NVIDIA's cuQuantum Appliance.

More information here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuquantum-appliance

For a vscode devcontainer built from the cuQuantum Appliance image check out this template:

https://github.com/Using-Namespace-System/QuantumSimulationEnv

In [ ]:

#prepare circuit with embeddings

cos = QuantumCircuit(19,1)
simulator = Aer.get_backend('aer_simulator_statevector')
cos.set_state_simple(embedded.toarray())

#prepare the control qubit with a hadamard gate
cos.h(0)

#perform controlled swap with the control qubit the ancillary and the embedded data
cos.cswap(0, 1, 2)

#measure the control qubit
cos.measure(0,0)

#run simulation
compiled = transpile(cos, simulator)
shots = 10000000
job = simulator.run(compiled,shots=shots)
result = job.result()
counts = result.get_counts(cos)

#postprocessing find the deference between probabilities
print(counts)
quantum_similarity = (counts['0']-counts['1'])/(shots)
print(quantum_similarity)

#classical cosine similarity with numpy
one = first_words.toarray()[0]
two = second_words.toarray()[0]
cosine_similarity = one@two/(np.sqrt(one@one)*np.sqrt(two@two))

print(cosine_similarity)

{'0': 5395777, '1': 4604223}
0.0791554
0.07911342342408463

The top result is the result of our quantum simulation and the bottom result is the result of our classical computation.

With various pairs of vectors with varying levels of similarity the results were consistent. Examples below.

In this experiment the accuracy beyond 3 figures seems unreliable. Beyond 10000000 shots the accuracy might increase but the computation time was unreasonable.

I am still working on how best to explain why we get these results from simple amplitude embedding, a Hadamard gate, and a controlled swap operation. More information may be included later.

The the logarithmically smaller storage and time complexity achieved within certain quantum algorithms paints a bright future for advancements in computing. Keep learning and experimenting.

The papers below were core in informing my circuit for computing cosine similarity.

A quantum binary classifier based on cosine similarity Davide Pastorello and Enrico Blanzieri 2021

A quantum k-nearest neighbors algorithm based on the Euclidean distance estimation Enrico Zardini, Enrico Blanzieri and Davide Pastorello 2023

Police 9152, Man 20986 {'1': 4561817, '0': 5438183} q0.0876366 c0.08758761591564795

Police 9152, Court 692 {'1': 4896177, '0': 5103823} q0.0207646 c0.020903931944773215

Australia 31170, Country 1488 {'0': 5036718, '1': 4963282} q0.0073436 c0.007576233170634154

Police 9152, Jailed 21313 {'1': 4967746, '0': 5032254} q0.0064508 c0.006255897007314822

Search 17193, Country 1488 {'0': 5013614, '1': 4986386} q0.0027228 c0.0029232269282506566

You should also read:

Quantum Pre-Processing Filters for Machine Learning