In this lecture, we derive the adjoint variable method for computing the gradient of the figure of merit of a photonic device. We derive this gradient from first principles, starting from the solution of Maxwell’s equations in the frequency domain. Using a simple example of focusing electromagnetic field intensity at a single position, we derive the relevant terms in the gradient and give physical interpretation. We discuss how this method provides gradient information with only two simulations, regardless of the number of design parameters, and how this enables inverse design optimization.
This lecture introduces the mathematics of the adjoint method and implements a simple adjoint gradient in Tidy3D, which is compared against a gradient found with finite difference.
We start by importing the packages we need, including Tidy3D.
import matplotlib.pylab as plt
import numpy as np
import tidy3d as td
import tidy3d.web as web
First we set up our toy problem.
We assume there is a dielectric recangle illuminated by a point dipole from the left.
We'll also add a monitor to measure the field at a point to the right of the box.
Ourn design objective will be to maximize the electric intensity at this point.
wavelength = 2.0 # um
freq0 = td.C_0 / wavelength
fwidth = freq0/10
# size of simulation domain
Lx = 2.5 * wavelength
Ly = 2.5 * wavelength
# size of rectangle
lx = 1.0 * wavelength
ly = 1.0 * wavelength
# grid size
dl = wavelength / 50
# relative permittivity of box
eps_box = 2.0
# x positions of the source and monitor
pos_src = - 0.4 * (Lx/2 + lx/2)
pos_mnt = + 0.6 * (Lx/2 + lx/2)
# the dielectric box
box = td.Structure(
geometry=td.Box(size=(lx, ly, td.inf)),
medium=td.Medium(permittivity=eps_box),
)
# point dipole source
src = td.PointDipole(
center=(pos_src, -0.5, 0),
polarization='Ez',
source_time=td.GaussianPulse(freq0=freq0, fwidth=fwidth),
)
# monitor to visualize fields in the entire domain
mnt = td.FieldMonitor(
size=(td.inf, td.inf, 0),
freqs=[freq0],
name='field',
fields=('Ez',)
)
# monitor to specifically measure the fields at the location of the objective
mnt_objective = td.FieldMonitor(
size=(0,0,0),
center=(pos_mnt, 0.5, 0),
freqs=[freq0],
fields=('Ez',),
name='measure',
)
# put everything together into a simulation
sim = td.Simulation(
size=(Lx, Ly, 0),
structures=[box],
sources=[src],
monitors=[mnt, mnt_objective],
boundary_spec=td.BoundarySpec.pml(x=True, y=True, z=False),
run_time=100.0/fwidth,
grid_spec=td.GridSpec.uniform(dl=dl)
)
# plot the domain
ax = sim.plot(z=0, monitor_alpha=0.1)
plt.savefig('img/sim_orig_plot.png')
plt.show()
# run the simulation
sim_data = web.run(sim, task_name='orig', path='data/sim_data_orig.hdf5')
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-f1f71106-032b-43be-ac97-c9a53344819 av1'.
Output()
Output()
Output()
Output()
Output()
# visualize the fields
ax = sim_data.plot_field(field_monitor_name='field', field_name='Ez', val='real')
plt.savefig('img/sim_orig_fields.png')
plt.show()
Next, we will introduce our objective function, which computes the field intensity at the point to the right of the box.
For convenience, we'll package this as a couple functions that accept the simulation data.
This makes things cleaner for when we want to re-compute this later.
def measure_amplitude(sim_data: td.SimulationData) -> complex:
"""How to post process a SimulationData object to give the intensity at the measurement point."""
field_data = sim_data['measure']
ez = field_data.Ez
return dl**3 * complex(ez)
def measure_intensity(sim_data: td.SimulationData) -> float:
"""How to post process a SimulationData object to give our objective function value."""
amplitude = measure_amplitude(sim_data)
return abs(amplitude)**2
print(f'intensity = {measure_intensity(sim_data):.5f}')
intensity = 0.01088
As mentioned, our adjoint simulation uses a source that is given by the partial derivative of our objective funciton with respect to the electric fields.
Our objective function involves measuring Ez at the mnt_pos
and then squaring the absolute value.
If we write the measurement position as vector $m$ and the Ez
as $e$, then our objective can be written as
Taking the derivative of $J$ with respect to a design parameter $p$ and respecting the complex-valued nature of $e$ by taking the regular and complex conjugate parts separately, we get:
$$ \frac{dJ}{dp} = \frac{\partial J}{\partial e} \frac{de}{dp} + \frac{\partial J}{\partial e^*} \frac{de^*}{dp} = 2 \mathcal{R} \{ \frac{\partial J}{\partial e}\frac{de}{dp} \} = 2 \mathcal{R} \{ m^T (m^T e)^{*} \frac{de}{dp} \} $$We identify that our adjoint source $\frac{\partial J}{\partial e}$ is given by $m^T (m^T e)^{*}$.
Let's set up the adjoint simulation now, using our SimulationData
from the forward pass to compute the adjoint source.
# get the "e" from our math before
forward_amp = measure_amplitude(sim_data)
# implement a phase corresponding to -1j * conj(forward_amp)
adj_phase = 3 * np.pi / 2 - np.angle(forward_amp)
# and amplitude corresponding to |forward_amp| / mu0 / omega0 * dl^2 (extra terms needed for proper normalization)
omega0 = 2 * np.pi * freq0
adj_amp = dl * dl * forward_amp / (td.MU_0 * omega0)
# source is locaed where the measurement monitor was, we apply conj(m^T) to the source time dependence
src_adj = td.PointDipole(
center=mnt_objective.center,
polarization="Ez",
source_time=td.GaussianPulse(
freq0=freq0,
fwidth=fwidth,
amplitude=abs(adj_amp),
phase=adj_phase)
)
# adjoint simulation is the same as original, but only adjoint source, and remove the measurement monitor
sim_adj = sim.updated_copy(
sources=[src_adj],
monitors=[mnt],
)
sim_data_adj = web.run(sim_adj, task_name='sim_adj', path='data/sim_data_adj.hdf5')
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-1553efcd-8e29-4d80-8c93-852fdfafeba 1v1'.
Output()
Output()
Output()
Output()
Output()
ax = sim_data_adj.plot_field(field_monitor_name='field', field_name='Ez', val='real')
plt.savefig('img/sim_adj_fields.png')
plt.show()
Now that we have our forward and adjoint fields, we can multiply them together to get a map of how the change in permittivity at each point will affect our objective function.
e_fwd = sim_data['field'].Ez
e_adj = sim_data_adj['field'].Ez
k0 = 2 * np.pi / wavelength
eps_map = 4 * dl * dl * k0 ** 2 * (e_fwd * e_adj).real
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(12, 3), tight_layout=True)
sim_data.plot_field(field_monitor_name='field', field_name='Ez', val='real', ax=ax1)
sim_data_adj.plot_field(field_monitor_name='field', field_name='Ez', val='real', ax=ax2)
eps_map.squeeze().plot.pcolormesh(x='x', y='y', ax=ax3, cmap="RdYlGn", vmin=-0.05, vmax=0.05)
ax1.set_title('forward Ez(r)')
ax2.set_title('adjoint Ez(r)')
ax3.set_title('dJ / d$\epsilon$(r)')
plt.savefig('img/three_fields.png')
plt.show()
To test our implementation, let's imagine our box was made up of four quadrants, each with their own individual permittivities $\epsilon_i$.
We will see how the adjoint gradient compares to one computed with finite difference.
Let's first set up our boxes.
boxes = []
for x_center in (-lx/4, +lx/4):
for y_center in (-ly/4, +ly/4):
box = td.Structure(
geometry=td.Box(
center=(x_center, y_center, 0),
size=(lx/2, ly/2, td.inf)
),
medium=td.Medium(permittivity=eps_box),
)
boxes.append(box)
sim_num = sim.updated_copy(structures=boxes)
ax = sim_num.plot(z=0, monitor_alpha=0.1)
As a reminder, the numerical derivative with respect to parameter p
is approximated by
Here, we will compute this for each $p$ in our design variables.
We must loop over each of our box quadrants, perturb the permittivity in that quadrant by a small amount in both plus and minus directions, compute the objective function through a simulation, and take the difference.
This will require 2 N = 8 simulations total, which we'll perform in a batch since they don't depend on each other.
# step size
delta = 5e-3
# set up all simulations in the batch with perturbed permittivities
num_grad_sims = {}
for i_box in range(len(boxes)):
for pm_str, delta_eps in zip('+-', (+delta, -delta)):
task_name = f'box_{i_box+1}_{pm_str}'
new_boxes = list(boxes)
new_medium = boxes[i_box].medium
new_medium = new_medium.updated_copy(permittivity=new_medium.permittivity + delta_eps)
new_boxes[i_box] = boxes[i_box].updated_copy(medium=new_medium)
print(f'task_name = "{task_name}": box epsilons = {[b.medium.permittivity for b in new_boxes]}')
new_sim_i = sim.updated_copy(structures=new_boxes)
num_grad_sims[task_name] = new_sim_i
task_name = "box_1_+": box epsilons = [2.005, 2.0, 2.0, 2.0] task_name = "box_1_-": box epsilons = [1.995, 2.0, 2.0, 2.0] task_name = "box_2_+": box epsilons = [2.0, 2.005, 2.0, 2.0] task_name = "box_2_-": box epsilons = [2.0, 1.995, 2.0, 2.0] task_name = "box_3_+": box epsilons = [2.0, 2.0, 2.005, 2.0] task_name = "box_3_-": box epsilons = [2.0, 2.0, 1.995, 2.0] task_name = "box_4_+": box epsilons = [2.0, 2.0, 2.0, 2.005] task_name = "box_4_-": box epsilons = [2.0, 2.0, 2.0, 1.995]
# inspect each of the forward perturbed sims to get a feeling for what is being simulated
_, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, tight_layout=True, figsize=(6, 6))
axes = [ax1, ax2, ax3, ax4]
for ax, box_num in zip(axes, '1234'):
ax = num_grad_sims[f'box_{box_num}_+'].plot(z=0, monitor_alpha=0.0, ax=ax)
ax.set_title(f'sim perturbing box #{box_num}')
# run all simulations in batch
batch_data = web.run_async(simulations=num_grad_sims, path_dir='data')
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-a3cbce76-60ed-4767-afc7-62252703262 av1'.
Output()
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-3ef1a853-e0fb-4df7-92e1-e4279a1a4bd av1'.
Output()
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-ff5c4e18-c8e9-4937-aa79-ffada6801a0 9v1'.
Output()
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-2be47ebc-3fb9-48eb-a204-8b44d2d1fab cv1'.
Output()
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-482e525d-a36e-4ffa-9b8d-2f4ddca4f54 bv1'.
Output()
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-c33e587e-a7e4-41ae-a67c-110f97c84dc 4v1'.
Output()
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-8c06ba75-5423-4675-99f7-fd23d57b56c 0v1'.
Output()
View task using web UI at webapi.py:189 'https://tidy3d.simulation.cloud/workbench?taskId=fdve-49918a3f-410f-4099-9e92-ec2c94e1a27 9v1'.
Output()
[11:25:10] Started working on Batch. container.py:457
Output()
[11:26:19] Batch complete. container.py:497
# post process batch results using finite difference derivative equation to compute gradient
grad_num = np.zeros((len(boxes)))
for task_name, sim_data in batch_data.items():
_, box_i_plus1, pm_str = task_name.split('_')
box_i = int(box_i_plus1) - 1
constant = 1/(2 * delta) if pm_str == "+" else -1/(2 * delta)
J_i = measure_intensity(sim_data)
grad_num[box_i] += constant * J_i
Output()
Output()
Output()
Output()
Output()
Output()
Output()
Output()
For the adjoint gradient, we simply sum our permittivity derivative map over each of the domains associated with the permittivities.
Note, the neat thing about this is that we only require 2 simulations to compute the gradient, no matter how many design parameters we have.
The permittivity derivative matp contains all the information we need to compute the gradient w.r.t. a set of arbitrary parameters.
grad_adj = []
num_pts = 100
# loop over quadrants and integrate using trapezoidal rule
for x_center in (-lx/4, +lx/4):
dx = lx/2/num_pts
x_bounds = (x_center - lx/4, x_center + lx/4)
xpts = np.linspace(x_bounds[0] + dx/2, x_bounds[1] - dx/2, num_pts)
for y_center in (-ly/4, +ly/4):
dy = ly/2/num_pts
y_bounds = (y_center - ly/4, y_center + ly/4)
ypts = np.linspace(y_bounds[0] + dy/2, y_bounds[1] - dy/2, num_pts)
grad_adj_i = np.sum(eps_map.interp(x=xpts, y=ypts)) * dx * dy
grad_adj.append(grad_adj_i)
grad_adj = np.array(grad_adj)
print('adjoint gradient = ', grad_adj)
print('numerical gradient = ', grad_num)
adjoint gradient = [ 0.00441784 -0.00192454 0.00454513 -0.00041604] numerical gradient = [ 0.00440619 -0.00193145 0.00454112 -0.00042815]
We compare the two gradients and see that match quite well! to 0.15% RMS error.
rms_unnormalized = np.linalg.norm(grad_adj - grad_num) / np.linalg.norm(grad_num)
print(f'rms unnormalized = {(rms_unnormalized * 100):.4f} %')
rms unnormalized = 0.2806 %
For the purposes of optimization, we often care more about the direction of the gradient vector than it's overall magnitude.
This is because most optimization algorithms require the user to specify a step size for iteration anyway.
In this spirit, comparing the adjoint and numerical gradient vectors normalized as unit vectors, the error gets even smaller.
rms_normalized = np.linalg.norm(grad_adj / np.linalg.norm(grad_adj) - grad_num / np.linalg.norm(grad_num))
print(f'rms normalized = {(rms_normalized * 100):.4f} %')
rms normalized = 0.2552 %
This is a typical approach towards checking that your adjoint code is working properly.
In the following notebooks, we will make use of the adjoint
plugin in tidy3d to compute these gradients without all of this extra work, and use it to do some interesting optimizations.
src_adj
PointDipole(type='PointDipole', center=(2.1, 0.5, 0.0), size=(0, 0, 0), source_time=GaussianPulse(amplitude=1.4102874969705147e-07, phase=6.9936180213814865, type='GaussianPulse', freq0=149896229000000.0, fwidth=14989622900000.0, offset=5.0), name=None, polarization='Ez', interpolate=True)
np.angle(forward_amp)
-2.2812290409967972
3 * np.pi / 2 - np.angle(forward_amp)
6.9936180213814865
PointDipole(type='PointDipole', center=(2.1, 0.5, 0.0), size=(0, 0, 0), source_time=GaussianPulse(amplitude=1.1282299534577141e-08, phase=2.431159933347032, type='GaussianPulse', freq0=149896229000000.0, fwidth=14989622900000.0, offset=5.0), name=None, polarization='Ez', interpolate=True)
Inverse Design: Lecture 2
This is the second lecture on inverse design in photonics where will we go into the mathematical details of the adjoint variable method.
As a brief review of the last lecture:
In an optimization, one would like to adjust the parameters of a device such that it eventually operates in a desired fashion. For example, in the design of a metasurface, one may want to adjust the geometric parameters of each of the metasurface elements until an incident plane wave is focused onto a point on the other side. In this case, one can define a cost function that quantifies the device performance as the intensity of light at the focal region.
Then, the goal of the computational design is to maximize the intensity of light at the focal point. To accomplish this, you may compute the derivative of this intensity at the focal point with respect to your design parameters. For example, the gradient of the focused intensity with respect to the radius of the cylindrical meta atoms in the diagram above.
Once you know this gradient, then you may move in the parameter space along the gradient direction until you reach an optimum of the device. As discussed, the key step in the in this optimization process is computing the gradient.
In the last lecture, we talked about a naive way to compute the gradient through finite difference derives, where the number of computation that you need to do scales linearly with the number of parameters that you need to consider. Today we're going to talk about the adjoint method, which is a remarkable method in which the amount of computation needed to compute the gradient is practically independent of the number of parameters that you are trying to adjust.
Let’s start by reviewing how to perform an EM simulation in general. We will present it in the frequency domain as it is easier to understand. Imagine that you are solving Maxwell’s equations at a single frequency.
You can write Maxwell’s equations as a linear system of equations where “E(r)” is the electric field, “epsilon_r(r)” is a permittivity distribution, and “P(r)” here denotes the current source. Thus, this is the equation takes a particular primitivistic distribution as well as a source, and generates the resulting electric field.
After discretizing, this can be written as a linear system of equations where the term in the bracket is a matrix “A”, operating on the electric field vector “e”, to give a vector “b” representing the current sources. This linear system can be solved in a variety of different ways. Formally you can write a solution in terms of the inverse of “A” times the right hand side “b”. In reality, very rarely want to explicitly form the inverse of the matrix so we use this notation only to denote that we are solving this linear system in some way.
As an example, let's say you your system contains a square dielectric a point source on the left hand side as indicated by the point “b” in the diagram. Then if you set up and solve this system of equations, you would get the electric field distribution as indicated by the bottom diagram. From this plot, you can see where the point source is and how the field spreads out into the rest of the cell, as expected.
Within this framework, you would then like to construct some objective function that depends on the design parameters of your system through the solution of Maxwell’s equations. As a simple example, let’s say the design problem requires us to maximize the intensity at a given point in the system “m”. We can write down an objective function that represents the electric intensity evaluated at the measurement point “m”.
In general, the objective function is written as a function of the electric field and perhaps it's a complex conjugate. For example, to form the intensity, one needs to take the product of both the electric field and its complex conjugate.
For our example, you can write the objective function as the total electric field projected onto the vector “m” that is constructed to have a single non-zero only at the measurement point, and then multiply that quantity by its complex conjugate.
Once you have this objective function, then you would like to compute the derivative of it with respect to respect to some parameter that you would like to tune. For example, in our case, this parameter could be the permittivity at certain point inside the square region. As a starting point, you may just apply the chain rule to your objective function.
For example, to compute the derivative of the objective function with respect to a parameter “p”, we know the objective function depends on the electric field “e”, which itself depends on “p”. So to use the chain rule, we first form the derivative of the objective with respect to the electric field and then multiply that by the derivative of the electric field with respect to the parameter.
We must take care of this for the electric field itself as well as its complex conjugate, adding them together and making use of the fact that both the objective function and the parameter are real-valued quantities, we obtain an equation for the derivative.
This equation has two parts:
Next we will consider evaluating each of these parts separately.
First, let's consider the first term, containing the derivative of the objective function with respect to the electric field. As discussed, in our example, this is just a simple product of the electric field and its complex conjugate.
So if you take the derivative with respect to only the electric field, you end up with only the complex conjugate of the field corresponding to the original simulation evaluated at the position “m” and with a spatial dependence that is only non-zero at position “m”.
Next, one can compute the derivative of the electric field with respect to the design parameter. For this, we take our original system of equations. We take derivative with respect to the parameter on both sides of the equation.
For simplicity, we assume that the excitation source (“b”) doesn't depend on these parameters, so the derivative and right hand side is zero. The left hand side is a product so we use the product rule to get two terms.
We need the de/dp term for our gradient, so we solve for that term and can simply write it using the inverse of the system matrix times another quantity.
We plug this into our derivative equation and recover a more general formula for the derivative. Next we will discuss each term one by one to develop an efficient way to evaluate the gradient.
Inside our derivative information there are four terms.
Now we want to evaluate this product efficiently. The essence of the adjoint variable method is to realize that one can solve the first two terms first using a single linear system solution. If we define the product of the first two terms as a an “adjoint” electric field “e_adj”, you can write this vector as the solution to a linear system where the matrix is the same as your original system but the source is now given by the derivative of your objective function with respect to the electric field.
As we mentioned, this happens to just be the complex conjugate of the electric field at the monitor position. Therefore, solving for this adjoint field can be done in the exact same way as the original solution, just with a difference source that depends on your forward solution and the objective function.
As seen in the diagram, the adjoint fields emanate from the point “m” and have a phase and amplitude dependent on the original solution “e”.
Now that we have the adjoint field, the gradient calculation become simply a field overlap calculation. You take the original field “e” and the adjoint field “e_adj” and you form an overlap calculation over the matrix that is the derivative of the system matrix with respect to parameter “p”.
We note that the two fields themselves have no “p” dependence as it is all contained in this matrix. In the next slides we will talk about the interpretation of this term.
To interpret this matrix dA/dp, let’s consider the definition of matrix A from Maxwell’s equations. The matrix has a term containing curl operations minus a term that involves the relative permittivity along the diagonal. As the parameters only affect the system through the relative permitivity distribution, the derivative dA/dp only depends on the second term and we can ignore the curl term.
We can further rearrange terms and see that the dA/dp matrix is proportional to a diagonal matrix where each diagonal element describes how the relative permittivity at that point depends on parameter “p”.
As an example, let's say that your parameter “p” described the relative permittivity within the entire central box. As the relative permittivity distribution only depends on parameter “p” within the box, the derivative is non-zero only within box regions. We can therefore write this diagonal matrix as a delta function.
Plugging this into our derivative equation, the we see that the gradient can be finally obtained by summing the field overlap over the box region.
Therefore, the general procedure for the adjoint variable method consists of three main steps:
To see the power of this method, consider performing this gradient computation now for several parameters. The forward and adjoint simulation are both completely independent of the design parameters, so they may be computed once and saved for later. The dA/dp term used in the overlap integral is the only term that depends on parameters. Therefore, using the stored original and adjoint fields one may just perform the overlap integral step for each parameter of interest to compute the full gradient.
For a concrete example, consider four parameters a, b, c, d corresponding to the relative permittivity of each quadrant of the box. The gradient with respect to the permittivity of each quadrant can be found by computing original and adjoint fields once, and then summing the overlap of these quantities over each quadrant independently. The expensive part of this process is the solving of the system of equations, which only needs to be done twice, no matter how many parameters we extend this to. In fact, this is what enables inverse design where you may have thousands or maybe millions of parameters.
Using the adjoint method, the gradient of the objective function with respect to the parameters can be solved in roughly the same amount of time as the original simulation. This concludes a brief introduction to the mathematics of the adjoint variable method. Next time we will show how to apply this to an actual optimization problem.