This year I’m participating in the Google Summer of Code again. Just like last year I’m working with the ML4SCI organization. In this years project I am working on Quantum Generative Adversarial Networks.
GANs
Generative Adversarial Networks (GANs) are a class of unsupervised machine learning models proposed in ( Citation: Goodfellow, Pouget-Abadie & al., 2014 Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. & Bengio, Y. (2014). Generative Adversarial Networks. Retrieved from http://arxiv.org/abs/1406.2661 ) . GANs aim to train a generator $G(z,\Theta_g)$ with a latent space $z$ and parameters $\Theta_g$ to replicate a reference probability distribution when sampling from the latent space $z$.
A GAN consists of two networks, the generator $G$ and a discriminator $D$. The networks are trained by playing a zero sum game, where the generator tries to generate samples which are as realistic as possible, while the discriminator tries to classify real data samples and tag samples generated by the generator as fake.
A schematic sketch of a GAN is shown below.
Both the generator and the discriminator are trained independently by the classification results of the discriminator. If the generator $G(z,\Theta_g)$ is neural network which maps from the latent space to the space $\Omega$, then the discriminator $D:\Omega\to[0,1]$ classifies the data with $D=1$ corresponding to the discriminator tagging a sample as real and $D=0$ as fake. The objective function $\mathcal{L}(\Theta_g, \Theta_d)$ can then be written as $$\mathcal{L}(\Theta_g, \Theta_d) = E_{x~\mu_{train}}[\ln D(x)]+E_{z~\mu_z}[\ln(1-D(G(z)))],$$ and the training as a min-max optimization of the form $$\min_{\Theta_g}\max_{\Theta_d}\mathcal{L}(\Theta_g,\Theta_d).$$ The expecation values $E$ run over the training data and the latent space distribution respectively.
This min-max loss can be recast into a different form with two distinct loss functions, one for the discriminator $\mathcal{L}_D$ and one for the generator $\mathcal{L}_G$ $$\mathcal{L}_D(\Theta_g,\Theta_d) = E_{z~\mu_z}[\ln D(G(z))]+E_{x~x_{train}}[\ln(1-D(x))],$$ $$\mathcal{L}_G(\Theta_g,\Theta_d) = -E_{z~\mu_z}[\ln D(1-G(z))].$$ Both of these are minimized, with respect to their parameters, while freezing the parameters of the opponent network. $$\min_{\Theta_d} \mathcal{L}_D(\Theta_g,\Theta_d),$$ $$\min_{\Theta_g} \mathcal{L}_G(\Theta_g,\Theta_d).$$
To check that this make sense, we can think about the terms in the loss function: The discriminator wants to tag the generators samples as fake ($D(G(z))=0$) and data samples as real ($D(x)=1$). Inserting these values in $\mathcal{L}_D$ would minimize both terms.
GANs in hep
There has been a vast amount of work on generative models applied to high energy physics tasks, e.g. ( Citation: Oliveira, Paganini & al., 2017 Oliveira, L., Paganini, M. & Nachman, B. (2017). Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis. Comput Softw Big Sci (2017) 1: 4. https://doi.org/10.1007/s41781-017-0004-6 ; Citation: Butter, Plehn & al., 2019 Butter, A., Plehn, T. & Winterhalder, R. (2019). How to GAN LHC Events. SciPost Phys. 7, 075 (2019). https://doi.org/10.21468/SciPostPhys.7.6.075 ; Citation: Hariri, Dyachkova & al., 2021 Hariri, A., Dyachkova, D. & Gleyzer, S. (2021). Graph Generative Models for Fast Detector Simulations in High Energy Physics. Retrieved from http://arxiv.org/abs/2104.01725 ) . The main incentive is to speed up the simulation of particle physics processes by training a GAN, which then cheaply be sampled from.
In the typical analysis pipline of a high energy physics (HEP) experiment, one of the computationally most demanding steps is the generation of expected reference data from our assumed theory (the Standard Model of particle physics). Classical event generator rely on Monte-Carlo techniques to sample from the respective event distributions, which is a very demanding step. A GAN can in principle learn the structure even of complex events once and then generate events more efficiently.
QGAN
Building on the success of classical GANs in generative tasks, similar models have been proposed to perform generative tasks on quantum computers ( Citation: Lloyd & Weedbrook, 2018 Lloyd, S. & Weedbrook, C. (2018). Quantum generative adversarial learning. Phys. Rev. Lett. 121, 040502 (2018). https://doi.org/10.1103/PhysRevLett.121.040502 ; Citation: Dallaire-Demers & Killoran, 2018 Dallaire-Demers, P. & Killoran, N. (2018). Quantum generative adversarial networks. Phys. Rev. A 98, 012324 (2018). https://doi.org/10.1103/PhysRevA.98.012324 ) . There are different motivations for a Quantum Generative Adversarial Network (QGAN), what I find particularly intersting is:
- The measurement of a quantum system can, under certain assumptions, generate classical data, which can not be generted efficiently by a classical model (based on a classical random number generator) ( Citation: Preskill, 2018 Preskill, J. (2018). Quantum Computing in the NISQ era and beyond. Quantum 2, 79 (2018). https://doi.org/10.22331/q-2018-08-06-79 ) , which implies a quantum advantage in generative tasks of such distributions.
- The concept of a QRAM ( Citation: Giovannetti, Lloyd & al., 2007 Giovannetti, V., Lloyd, S. & Maccone, L. (2007). Quantum random access memory. V. Giovannetti, S. Lloyd, L. Maccone, Phys. Rev. Lett. 100, 160501 (2008).. https://doi.org/10.1103/PhysRevLett.100.160501 ) aims to represent a large data vector of size $N$ in $\log N$ qubits. Together with the ability of quantum computers to perform maniputlations of sparse and low rank $N\times N$ matrices with a scaling of $\mathcal{O}(\text{poly}(\log N))$ implies that there is a potential advantage in the scaling of sampling in QGANs ( Citation: Lloyd & Weedbrook, 2018 Lloyd, S. & Weedbrook, C. (2018). Quantum generative adversarial learning. Phys. Rev. Lett. 121, 040502 (2018). https://doi.org/10.1103/PhysRevLett.121.040502 )
- From a practical point of view I think QGANs offer interesing applications for e.g. state preperation, to learn a shallower circuit to load an approximation of a probability distribution, instead of deeper exact circuits.
QGAN circuit
There are many different proposals for QGANs, with both fully quantum architectures, or hybrid models with a classical discriminator network, see e.g. ( Citation: Romero & Aspuru-Guzik, 2019 Romero, J. & Aspuru-Guzik, A. (2019). Variational quantum generators: Generative adversarial quantum machine learning for continuous distributions. Retrieved from http://arxiv.org/abs/1901.00848 ; Citation: Tian, Sun & al., 2022 Tian, J., Sun, X., Du, Y., Zhao, S., Liu, Q., Zhang, K., Yi, W., Huang, W., Wang, C., Wu, X., Hsieh, M., Liu, T., Yang, W. & Tao, D. (2022). Recent Advances for Quantum Neural Networks in Generative Learning. Retrieved from http://arxiv.org/abs/2206.03066 ) for reviews.
The simplest version of a fully quantum QGAN can be build with a SWAP test as discriminator.
In this case the discriminator does not have parameters and therefore, we do not have an adversarial min-max training. Also, we do not have a latent space $z$ to sample from. Instead we just want to train the generator unitarity $G(\Theta_g)$ to produce a state which is a superposition of the input states $\sigma_i$ $$G(\Theta_g)\ket{0} = \sum P_i \sigma_i.$$
The generator parameters can be trained by maximizing the fidelity $F(\sigma,G(\Theta))$ of the generator $G(\Theta)$ and the input data $\sigma$ $$F(\sigma,G(\Theta))=\left|\braket{\sigma|G(\Theta)}\right|^2.$$ In practice I use the following loss function for minimization $$\mathcal{L}(\Theta_g) = -\log\left(G(\Theta_g)\epsilon\right),$$ with some small regularization $\epsilon$.
Measuring the generetor circuit would then correspond to sampling from the data distribution. While I perfom this simple training on a noiseless simulator, ( Citation: Niu, Zlokapa & al., 2021 Niu, M., Zlokapa, A., Broughton, M., Boixo, S., Mohseni, M., Smelyanskyi, V. & Neven, H. (2021). Entangling Quantum Generative Adversarial Networks. Retrieved from http://arxiv.org/abs/2105.00080 ) shows that adding parameters $\Theta_d$ to the SWAP test, making the fidelity loss “imperfect”, can make the training more robust to device noise.
Implementing a simple QGAN (toy data)
As a simple toy example I want to train a QGAN to load a gaussian peak. So I start off by generating a toy dataset drawing $N=120$ integers between $0$ and $15$ from a normal distribution with $\mu=7$ and $\sigma=1.5$.
To load the data, I convert the integers to 4 bit values and encode them in quantum states of a four qubit quantum register. To implement the quantum circuits and the optimization I use the pennylane library. The for the generator circuit, I use a strongly entangled layer. The code for the training circuit is given below.
dev = qml.device('lightning.qubit', wires=9)
def num_circuit(num, wires):
# Cast numberst to binary
bin_str = format(num, '#06b')[2:]
# Apply X to appropiate wires
for i, c in enumerate(bin_str):
if c == '1':
qml.PauliX(wires=i+wires[0])
def generator(params_g, qubits):
qml.StronglyEntanglingLayers(weights=params_g, wires=qubits)
@qml.qnode(dev)
def training_circ(data, params_g):
# Real data
num_circuit(data, [1,2,3,4])
# Generator circuit
generator(params_g, [5,6,7,8])
# SWAP test
qml.Hadamard(wires=0)
qml.CSWAP(wires=[0,1,5])
qml.CSWAP(wires=[0,2,6])
qml.CSWAP(wires=[0,3,7])
qml.CSWAP(wires=[0,4,8])
qml.Hadamard(wires=0)
return qml.expval(qml.PauliZ(0))
To perform the training, I loop over the data samples and optimize the parameters using Adam.
# The first dimension corresponds to the number of layers in the generator
# Since we need some expressiveness a higher number will give better results
params_g = np.random.uniform(0,np.pi, size=(18,4,3), requires_grad=True)
epochs = 110
batch_size=16
learning_rate=0.01
def iterate_minibatches(data, batch_size):
for start_idx in range(0, data.shape[0] - batch_size + 1, batch_size):
idxs = slice(start_idx, start_idx + batch_size)
yield data[idxs]
def cost_batch(paramsg, paramsd, batch, reg=0.000001):
loss = 0.0
for i in batch:
f = training_circ(i, paramsg) + reg
loss += - np.log(f)
return loss / len(batch)
# Training loop
for it in range(epochs):
for j,Xbatch in enumerate(iterate_minibatches(data, batch_size=batch_size)):
cost_fn = lambda p: cost_batch(p, Xbatch)
params_g = optg.step(cost_fn, params_g)
print(j, end="\r")
loss = cost_batch(params_g, data)
print(f"Epoch: {it} | Loss: {loss:.3} | ")
print("____")
Performing the optimization until convergence sets the generator parameters. We can then define a circuit which only contains the generator and sample in the computational basis.
sample_dev = qml.device('lightning.qubit', wires=4, shots = N)
@qml.qnode(sample_dev)
def sample_test():
generator(paramsg, [0,1,2,3])
return qml.sample()
testresult = [int(''.join(str(i) for i in a), 2) for a in sample_test()]
If we convert the computational basis results back to integers we can check how well the generator aprroximates our data distribution.
Outlook
For further work, I’m interested in a couple of things. First I want to see how complex the distributions can be which I can train in this matter. Then I want to implement the same model with a classical discriminator and compare the training with the fully quantum one. Finally, I want to extend the model to a continous value QGAN by using an embedding for the continous input data, and add a latent space $z$ to the generator. In this way it should be possible to sample from a continous distribution, by taking an expectatin value over a number of shots when drawing from the generator.
References
- Butter, Plehn & Winterhalder (2019)
- Butter, A., Plehn, T. & Winterhalder, R. (2019). How to GAN LHC Events. SciPost Phys. 7, 075 (2019). https://doi.org/10.21468/SciPostPhys.7.6.075
- Dallaire-Demers & Killoran (2018)
- Dallaire-Demers, P. & Killoran, N. (2018). Quantum generative adversarial networks. Phys. Rev. A 98, 012324 (2018). https://doi.org/10.1103/PhysRevA.98.012324
- Giovannetti, Lloyd & Maccone (2007)
- Giovannetti, V., Lloyd, S. & Maccone, L. (2007). Quantum random access memory. V. Giovannetti, S. Lloyd, L. Maccone, Phys. Rev. Lett. 100, 160501 (2008).. https://doi.org/10.1103/PhysRevLett.100.160501
- Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville & Bengio (2014)
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. & Bengio, Y. (2014). Generative Adversarial Networks. Retrieved from http://arxiv.org/abs/1406.2661
- Hariri, Dyachkova & Gleyzer (2021)
- Hariri, A., Dyachkova, D. & Gleyzer, S. (2021). Graph Generative Models for Fast Detector Simulations in High Energy Physics. Retrieved from http://arxiv.org/abs/2104.01725
- Lloyd & Weedbrook (2018)
- Lloyd, S. & Weedbrook, C. (2018). Quantum generative adversarial learning. Phys. Rev. Lett. 121, 040502 (2018). https://doi.org/10.1103/PhysRevLett.121.040502
- Niu, Zlokapa, Broughton, Boixo, Mohseni, Smelyanskyi & Neven (2021)
- Niu, M., Zlokapa, A., Broughton, M., Boixo, S., Mohseni, M., Smelyanskyi, V. & Neven, H. (2021). Entangling Quantum Generative Adversarial Networks. Retrieved from http://arxiv.org/abs/2105.00080
- Oliveira, Paganini & Nachman (2017)
- Oliveira, L., Paganini, M. & Nachman, B. (2017). Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis. Comput Softw Big Sci (2017) 1: 4. https://doi.org/10.1007/s41781-017-0004-6
- Preskill (2018)
- Preskill, J. (2018). Quantum Computing in the NISQ era and beyond. Quantum 2, 79 (2018). https://doi.org/10.22331/q-2018-08-06-79
- Romero & Aspuru-Guzik (2019)
- Romero, J. & Aspuru-Guzik, A. (2019). Variational quantum generators: Generative adversarial quantum machine learning for continuous distributions. Retrieved from http://arxiv.org/abs/1901.00848
- Tian, Sun, Du, Zhao, Liu, Zhang, Yi, Huang, Wang, Wu, Hsieh, Liu, Yang & Tao (2022)
- Tian, J., Sun, X., Du, Y., Zhao, S., Liu, Q., Zhang, K., Yi, W., Huang, W., Wang, C., Wu, X., Hsieh, M., Liu, T., Yang, W. & Tao, D. (2022). Recent Advances for Quantum Neural Networks in Generative Learning. Retrieved from http://arxiv.org/abs/2206.03066