Building intuition for electromagnetism's symmetric origins
Introduction and Warnings
I’ve recently been digging deeper into electromagnetism (see, e.g., A rational person’s guide to EMFs), and along the way, I’ve been increasingly amazed by just how crazy the concept even is. Magnets? Crazy. Charged particles attracting or repelling each other at a distance? Crazy. Visible light being, uh, kinda the same thing as radio waves and electricity? Crazy.
It’s one of those ideas we take for granted. But upon reflection, it sounds like science fiction.
I realized that I didn’t have much of an intuition for what electromagnetism is at a fundamental scientific level, or how it is described mathematically and how its behavior is predicted. I was familiar with the basics like Ohm’s Law, Kirchhoff’s Laws, and Maxwell’s equations, but I didn’t really know where those came from.
The resource that really made it click was this exceptional video: Electromagnetism as a Gauge Theory from Richard Behiel.
This post is my attempt to summarize the key ideas from that video, along with my intuition for how they fit together and give rise to one another. The post is reasonably math-y (although not as much as the video — I skip over most of the step-by-step derivations), and assumes a decent amount of baseline knowledge about classical electromagnetism, quantum physics, and vector calculus.
If I’m being honest, this article isn’t really meant for anyone other than myself. It’s an attempt to check my knowledge and intuition about the topic (“if you can’t explain it, you probably don’t understand it”). As such, it’s probably not a particularly good read for most people (sorry!).
A philosophical detour
With that said, doing the math prompted some more general, more meaningful — and probably more accessible — thoughts about maps, territories, and symmetries.
The equations we’re about to work through, while powerful and beautiful, are a way of describing the world. But they are not the world themselves. They’re the map, not the territory. A very, very good map, but a map nonetheless.
And that has caused me to take a step back and think about the process of building such a map. How is it that we’re able to predict the behavior of parts unknown (e.g. electromagnetism) from the areas we’ve already explored? Why is it that maintaining symmetry in an abstract sense is so important to our ability to do so? What is the underlying structure of the territory — the universe itself?
If you’re interested in some philosophizing about all this and want to skip the math, check out the On Existence and Symmetry section at the bottom.
And if you want to see the math, read on! (Or skim.)
A final caveat: I haven’t taken a physics class since high school, and while I studied math in undergrad, I did drop out after my sophomore year. So I would not be surprised if this post contains errors, misunderstandings, or poor use of terminology. If you see any, please let me know!
Dirac and his Free Lagrangian
In the mid-1920s, physicists were trying to figure out how to describe the behavior of fermions (particles with half-integer spin, like electrons). We had empirically observed these particles, but we didn’t yet have math to describe their behavior.
At the time, we had the Klein-Gordon equation, which worked well for describing scalar (spin-0) particles. But it didn’t work for particles with half-integer spin.
So we needed a new equation. Paul Dirac wanted to develop a relativistic wave equation for these particles that was first order in time (the Klein-Gordon equation is a second-order differential equation).
So he (at age… 25, I think?!) developed the Dirac equation by “factorizing” the Klein-Gordon operator. It turned out to be very handy, merging the understood principles of quantum mechanics with special relativity. It also surprisingly predicted particles with negative energies, also known as “antiparticles” — and these were experimentally discovered shortly thereafter. So we felt pretty good about the correctness of the Dirac equation.
Here’s the Dirac Equation:
iℏγμ∂μψ−mcψ=0Where:
- i is the imaginary unit
- ℏ is the reduced Planck constant (h/2π)
- ∂μ is the partial derivative with respect to spacetime, where μ∈{0,1,2,3} are the spacetime indices (0 is time; 1, 2, and 3 are the x, y, and z dimensions). The γμ are the contravariant gamma matricesNote that γμ∂μ is using the Einstein notation convention, where repeated indices are summed over. So it’s really:
γμ∂μ=γ0∂0+γ1∂1+γ2∂2+γ3∂3. - m is the mass of the particle
- ψ is a bispinor at each point in spacetime that describes the particle’s wavefunction
- and c is the speed of light.
What does that equation mean?
- if you take a spin-1/2 particle like an electron or positron
- and that particle’s wave function evolves through space and time
- the amount it changes
- is precisely offset by the particle’s rest mass (mc)
Not bad. Pretty simple.
In the video, Richard goes into working through some simple cases to show how the equation describes electrons and positrons, and what that branching means. And he has some really great graphics for visualizing spinors. But I’m going to skip that here, in the interest of focusing on the higher-order intuition.
So, okay, great. We have the Dirac equation. What’s next?
Well, we’d want to put together the Lagrangian that describes this Dirac field, as that would encode the behavior of those particles. And that Lagrangian ends up being:
Lfree=ψ(iγμ∂μ−mc2)ψWhere ψ is the Dirac adjoint of ψ. We’re able to develop this by minimizing the action (via the Euler-Lagrange Equations) of the Dirac equation.
But this is where things get interesting, and we begin to head down the path of developing a model for electromagnetism almost accidentally.
That Lagrangian — what we’re going to start calling the “Free Lagrangian,” because it describes particles free of any interaction — is invariant under a global U(1) symmetry, which means that the Lagrangian doesn’t change when we apply a global phase transformation to the fields.
But, critically: it is not invariant under a local U(1) symmetry! What do we mean by that, and how do we know it?
A Theta-shaped wrench in the machine
A local phase transformation looks something like:
ψ′=eiθψWhere θ is a phase factor that depends on the spacetime point. Basically: take ψ, and multiply it by a unit-length complex number that depends on where you are in space and time (eiθ∈U(1)), and you get a new, transformed wavefunction. In essence, a local phase transformation tweaks the “angle” of the particle’s wavefunction at each point in space and time.
In the video, Richard uses the difference in the a local phase transformation’s impact between electrons and positrons to visually illustrate that the field is not invariant. But let’s do the math instead. We need to take the free Lagrangian we noted above, and substitute in ψ′ to see how the Lagrangian will change after a local phase transformation.
L′=ψ′(iℏγμ∂μ−mc2)ψ′ L′=e−iθψ(iℏγμ∂μ[eiθψ]−mc2eiθψ)Then we do some calculus (as I said, I’ll skip the derivations) and end up simplifying this to:
L′=ψ(iℏcγμ∂μ−mc2)ψ−ℏcψγμψ∂μθBut wait! That whole first term (in red) is just our original free Lagrangian! So we can rewrite this as:
L′=Lfree−ℏc(∂μθ)ψγμψOkay. This is very bad (well, good for the laws of the universe working as we’d expect. But bad inasmuch as it screws with the math being simple).
What’s going on? Well, we applied a local phase transformation to our original free Dirac Lagrangian. And if all we had gotten out of that was that same original free Dirac Lagrangian, we’d be in the clear for it being invariant under local phase symmetry. But instead, we got out the same original free Dirac Lagrangian, plus this other weird term starting with ℏc.
This most certainly does not mesh with our understanding of reality. We need the Lagrangian to be invariant under local phase transformations, because we want the laws of physics to be invariant under local phase transformations. The Lagrangian, in some sense, encodes the laws of physics for a given theory. Based on what we understand about the behavior of the universe, local phase transformations should not change these mechanics.
So what do we do? We need to add something to the Lagrangian that will cancel out this extra term. And that something ends up being… electromagnetism, basically. (Cool!)
In order to get there, we need to modify the Lagrangian to perfectly cancel out that new term. It’s pretty simple, actually: just multiply it by negative one. So our Lagrangian should be:
L=ψ(iℏcγμ∂μ−mc2)ψ+ℏc(∂μθ)ψγμψThis works, because if there’s no phase transformation (θ=0), then that second term disappears. But if theta is nonzero, the first term expands out as above, spitting out the mirror image of the second term — and they then beautifully cancel, and we get a Lagrangian that is — as we desire — invariant under local phase transformations.
Are we cheating?
We could wipe our hands and say we’re done here. But… it does feel like it’s cheating. I really struggled with this point. Why are we allowed to just edit this Lagrangian arbitrarily?
The answer, I think, at some deep level, is that we aren’t creating or even discovering the laws of the universe. We are creating and discovering mathematical abstractions that match our observations of the universe.
There’s not some universal codebase somewhere running the Dirac equation, or checking against the Lagrangian. What seems to happen, over and over again, is that we build equations on top of equations, with some known or unknown assumptions. Those equations do a good job describing our observations of the universe. But then we run into a weird case where they don’t, and we need to either constrain those equations to a more explicit subset of situations, or we need to extend them.
And this is a case where we’re extending. What we’ve learned is that the original Dirac Lagrangian does a good job describing and predicting our observation of free particles — meaning those without interaction. But where there is interaction, it fails. And its failure is shown to us in stark contrast mathematically: because when we apply a local phase transformation to the free Dirac Lagrangian, it turns out to be — just mathematically — variant.
But you can now ask: why is that a problem? Couldn’t it just be so?
And the answer is… not really. What that would suggest is that a particle’s location in spacetime is allowed to affect the laws of physics that are governing it. We (generally) don’t like that concept. And so we — humans — have applied a filter to our mathematical/physical models of the universe that (generally, I’m simplifying a bit here) they need to be invariant under local phase transformations. We’re not okay with our models describing different laws of physics in different places or times.
If this all feels made-up (certainly how it feels to me), I think that’s fair. Because it is made up. We’re doing our best to build comprehensible (to PhD’s, at least!), consistent, abstract models to describe the happenings of the universe. That’s quite an endeavor. And you may say it feels like a house of cards. But how could we even communicate about these concepts without abstractions like language and numbers and so on? And it turns out that in some senses, the universe behaves in quite complicated ways, and so we must build these increasingly complicated models to describe it.
Anyway. All that to say: we can say “our original Lagrangian was fine for free particles (i.e. those that are not interacting). But it doesn’t work for interacting particles. So let’s adjust it with a term that will make it consistent with our observations of the universe. And then we’ll see what happens from there.”
So we’ll do so, and march on with the math. At this point in the journey, I felt a little bummed. It felt inelegant. But just wait: it will tie up very nicely in the end.
Let’s just accept that we’ve added this term to the Lagrangian. And let’s look at it:
ℏc(∂μθ)ψγμψAnd let’s break it up into two pieces: ℏc∂μθ and ψγμψ.
We’ll call the first one aμ for now. It’s gotta be some sort of four-vector that couples to the second term.
You may recognize the second one, as it’s a pretty common term in physics: the probability four-current, or jμ/c.
(And if you’re familiar with electromagnetism, those descriptions might make you suspect the electromagnetic four-potential…)
Now, in order to make the following math a little easier, we do a tiny trick: defining aμ to be a different four-vector (Aμ), times a scalar constant −q.
aμ=−qAμThis doesn’t actually change anything — it’s just pulling a constant −q out of aμ, but it will make things easier in a moment (letting us adjust the units of Aμ). So now our mystery Lagrangian term is:
−qAμψγμψOkay, but, there is now one more issue. As currently defined, Aμ’s existence depends on ∂μθ=0 — meaning it only exists when we applying a local phase transformation. But that doesn’t really make sense — if we’re seeking to say that Aμ is a real thing that exists in the universe… it can’t only exist when that is the case.
So we can do something interesting. Rather than just saying that Aμ exists as defined without other constraints, we take a step further, and we say that when a local phase shift occurs, rather than that just being a part of the definition of Aμ, we instead suggest that:
Aμ→Aμ−qℏc∂μθNow, Aμ exists, and when a phase transformation is applied, it spits out the exact term we need (same as before). But now, Aμ continues to exist independently of that term. And that is the key to ensuring that Aμ makes physical sense.
So again, if we go back up to our new mystery Lagrangian term:
−qAμψγμψAnd we apply a local phase transformation, we get:
−q(Aμ−qℏc∂μθ)ψγμψDistribute that through and we get:
−qAμψγμψ+ℏc∂μθψγμψAnd now if we go back to the first part of our free Dirac Lagrangian (before we added the mystery term) and apply the local phase transformation:
L′=ψ(iγμ∂μ−mc2)ψ−ℏc(∂μθ)ψγμψSee anything interesting? Yeah, whoa. The second terms cancel perfectly, and we get right back to where we started. We’ve made a Dirac Lagrangian invariant under local phase symmetry.
If we put all those pieces together into one equation…
The new Lagrangian:
L=ψ(iγμ∂μ−mc2)ψ−qAμψγμψApply a local phase transformation:
L′=ψ(iγμ∂μ−mc2)ψ−ℏc(∂μθ)ψγμψ−qAμψγμψ+ℏc∂μθψγμψCancel the terms in red and we get right back to where we started:
L′=ψ(iγμ∂μ−mc2)ψ−qAμψγμψBoom! Gauge invariance! Really cool.
Now, we’re not done yet. We’ve adjusted the Lagrangian, but there’s a new problem. We’ve created this mystery term and made it fit the puzzle. But that mystery term… we’ve also said it’s describing a real thing in the world, somehow. And real things have energy. But, you know, energy is conserved. And so if that term “thing” emits energy — where would it go?
Well, as Richard elegantly puts it in the video, how about where these question marks are:
L=ψ(iγμ∂μ−mc2)ψ−qAμψγμψ+???There’s probably a term here that allows Aμ’s energy to be conserved. But in defining it, we’ll have to be really careful, because if that term is at all affected by local phase transformations, everything would fall apart — we’d lose our beautiful invariance. We’ll come back to this in a bit.
But first, let’s try to figure out what this Aμ field actually is. That’ll probably help us.
What is this thing?
Here’s the weird thing, though: Aμ itself cannot be affected by θ — by local phase transformations. That would break our symmetry we’ve developed.
And as a natural consequence of that, that means that Aμ can’t have mass (a local phase transformation that was “scrunchy” would change the amount of mass energy required to bring in for Aμ to do it’s job)Note that technically local gauge invariance can coexist with massful vectors if there is spontaneous symmetry breaking (not the case for electromagnetism, but possible in non-Abelian gauge theories, like the Standard Model with the Higgs mechanism).. And at first blush, we would think that spacetime derivatives of Aμ can’t contain energy either, for similar reasons.
So, again, we’re trying to figure out how to physically describe Aμ, and figure out what components we have at our disposal to put in that ??? term (which must be unaffected by gauge transformations). And we’ve just lost Aμ itself as well as its derivatives.
Well, actually, the derivatives of Aμ can’t contribute to L. But what if we could find combinations of two derivatives of Aμ that, when taken together, cancel each other out? That was the combined term would not be affected at all by θ, in aggregate.
It turns out there is a form that allows for this — any term that looks like:
∂μAν−∂νAμI’ll save you the math showing it, but (where μ,ν∈{0,1,2,3}, the spacetime dimensions) terms of the above form are totally unaffected by gauge transformations. That means we can use them in our Lagrangian!
Now we’re getting to the good stuff.
How many terms in that form are there? Six, it turns out. Good ol’ combinatorics: “4 choose 2” (because the order does not matter).
(24)=2!(4−2)!4!=6Remember {0,1,2,3} are the spacetime dimensions, also known as {ct,x,y,z} (ct instead of t to put them all in the same units). Said another way: we can combine time with a space dimension in three different ways (time with x, time with y, time with z), and we can combine space with space in three different ways (x with y, x with z, y with z).
Remember Aμ is a four-vector in spacetime. And so if you take the derivative with respect to one dimension μ of component Aν and subtract the derivative with respect to ν of Aμ, you’ll get a scalar field in spacetime that can contain energy.
And now I guess we have six of them. Six scalar fields with energyI think this is a fair casual characterization, although ultimately a massless photon in quantum theory has only two polarization states. This is because gauge invariance plus the equations of motion reduce the naive ‘6’ to just ‘2’ physical degrees of freedom. But the ‘6’ is a little clearer for the purposes we’re about to use it. So I refer to “six scalar fields with energy,” although it’s really “six independent components.”.
Let’s first look at the ones that have to do a time dimension and a space dimension (as opposed to the ones that are two space dimensions). So we’re looking at the ones of the form:
∂0Aν−∂νA0Given that Aμ=[V,−A], we can rewrite this as:
∂0Aν−∂νA0=−c1∂t∂Aν−∂ν∂VAnd now if we compile these three fields (time with each of x,y,z) into one vector field, we get:
[−c1∂t∂Ax−∂x∂V,−c1∂t∂Ay−∂y∂V,−c1∂t∂Az−∂z∂V]=−c1∂t∂A−∇VAnd, wow: that happens to be the series of symbols we know as the electric field!
E=[Ex,Ey,Ez]=−c1∂t∂A−∇VThese equations we followed step-by-step end up outputting something that exactly matches the properties and description of the electric field.
I had a few jaw-dropping moments watching Richard’s video, and this was maybe the biggest one. It’s easy to get lost in all the numbers and symbols and terminology, so I want to take a moment to outline what happened here.
We are all familiar with the concept of “the electric field.” It’s experimentally extremely well-established. We rely on it to make predictions about electrical behavior. We use it to do electrical engineering work in the real world.
But — until now — I had simply taken its existence for granted. I had implicitly assumed that scientists had taken a whole bunch of observations and measurements about the universe, and used those to develop a mathematical model of the electric field directly, and that was the entire foundation for its existence.
But this sequence we just walked through revealed something different. The electric field emerged as a straightforward, logical byproduct of taking the Dirac equation and insisting that it be gauge invariant.
In order to develop a “theory of the electric field,” we don’t need to accept anything other than the existence and correctness of the Dirac equation, and that it must be symmetric — invariant — under a local phase transformation. Once we do that, the electric field emerges from the math that follows.
So again, in summary:
- We started with the Dirac equation, which describes the behavior of fermions
- We developed a Dirac Lagrangian that described particles free of interaction
- We found that this Lagrangian was invariant under a global U(1) symmetry
- We then found that this Lagrangian was not invariant under a local U(1) symmetry, which conflicted with our observations of the universe
- We then added a term to the Lagrangian that would make it invariant under local U(1) symmetry, satisfying our understanding
- We then explored this term, trying to understand what it could represent and what the constraints on it were
- We discovered that the derivatives of this term, when combined, defined what we have always known as “the electric field”
Pretty amazing.
And to take it one step further, perhaps unsurprisingly, we can use the other three terms (space dimension and another space dimension) to output the magnetic field.
Skipping some of the detailed math, we take those remaining ∂μAν−∂νAμ terms, and we can compile them into a pseudo-vector field:
[−∂z∂Ay+∂y∂Az,−∂x∂Az+∂z∂Ax,−∂y∂Ax+∂x∂Ay]=∇×A=BAnd that’s the magnetic field.
So, again, we should take a step back here, because this is — for me — the climax of this whole journey.
I’m going to partially restate and reframe what I just said above. (Why? Because I had to talk myself through this several times to really get how powerful it is.)
In the late 1920s, Dirac sought to describe particles with half-integer spin (like electrons). He developed an equation that did so, which incorporated both quantum mechanics and special relativity.
On top of that equation, we developed a Lagrangian — a model of the laws of physics — for the particles described by that equation.
That model of the laws of physics was invariant under a global U(1) symmetry, meaning that if we made a shift in the spacetime phase of the entire universe in the same way at every point, the model of those laws did not change.
But it was not invariant under a local U(1) symmetry, meaning that if we shifted the spacetime phase of the universe differently at different points, those developed laws of physics changed.
This does not make sense. The laws of physics, as we understood and observed them, were consistent regardless of the phase shift.
Said another way: the universe’s laws of physics should be symmetric with respect to phase shifts, both globally (if the whole universe changes at once) and locally (if parts of the universe change differently).
So we added a term to our model of physics that would fix this asymmetry. It was — at this point — a totally abstract thing. A mathematical puzzle piece that fit perfectly in the hole.
But that term was doing something in the universe, clearly — it was contributing to the Lagrangian, our model of the laws of physics. And we realized that in order for it to maintain its perfect puzzle piece shape (that is, to be invariant under a local phase transformation θ), it was constrained in how it could behave.
And when we dug into what those constraints were, we realized that there were only six ways for it to behave — six degrees of freedom — that kept the puzzle piece shape intactAs in this entire post, we’re assuming simple quantum electrodynamics — a Dirac field and an Abelian U(1) gauge field. Otherwise I think we could get higher-dimensional operators (typically suppressed at low energy). I’m not deep enough into this to fully grasp it, but I think that could implicate things like massful protons under the Higgs mechanism, or non-Abelian generalizations (like if we were doing this under SU(2)×U(1) instead of U(1)), which I believe leads to things like the Standard Model’s weak and strong interactions. But I’m out of my depth here..
And those six ways for the term to behave (plus the interaction between the term and the Dirac field) give rise to electromagnetism as we understand it.
The electric and magnetic fields we’ve long understood are little more than mathematical ways of encapsulating how this puzzle piece is allowed to behave on its own.
And this all comes from a (warranted) insistence on symmetry. The symmetry gives rise to a logical, mathematical sequence that results in a series of mathematical symbols and numbers that happen to match our empirically-observed understanding of electromagnetism.
Crazy. Right?
Polishing up Quantum Electrodynamics
At this point, if you’re following along with Richard’s video, you’re about halfway through (roughly 1.5 hours in).
He then goes on to — very clearly — derive Maxwell’s equations and the Lorentz force law from what we just discovered. But I’m not going to do that in this post (as I said, I’m skipping most of the step-by-step math and tedious derivations)One interesting note: he shows how the homogeneous Maxwell’s equations (Gauss’s Law for Magnetism and Faraday’s Law of Induction) both follow extremely straightforwardly from the gauge invariance of the Dirac Lagrangian — they come directly from the pure geometry of the gauge potential. They’re almost not necessary as Laws once you have the overarching concept we’ve been developing.
The inhomogeneous ones (Gauss’s Law for Electricity and Ampere’s Law) are trickier to derive, and probably are still useful on their own. They do follow from the gauge invariance, but they require some variational calculus and thoughtful reasoning. Unlike the homogeneous ones, they are actual equations of motion from setting the partial derivatives of the action with respect to Aμ to zero.
.
Instead, I want to hit on just a couple more intuition-building points about this whole thing, and build out the proper Lagrangian of quantum electrodynamics (“QED”).
First, we want to develop what is called the Faraday tensor. Let’s build a four-by-four tensor, called Fμν, where both the row and the column indices are the dimensions of spacetime {0,1,2,3}I’m writing this post in Cursor and wow is it nice. I write the first term of the matrix and Cursor correctly suggests tab-completing the whole thing. Writing math before this was so tedious. Imagine Dirac writing the next matrix up by hand originally!:
Fμν=F00F10F20F30F01F11F21F31F02F12F22F32F03F13F23F33Let’s define each of those terms as the relevant combination of partial derivatives of A like we used above to represent A’s degrees of freedom:
Fμν=∂μAν−∂νAμSo:
Fμν=∂0A0−∂0A0∂1A0−∂0A1∂2A0−∂0A2∂3A0−∂0A3∂0A1−∂1A0∂1A1−∂1A1∂2A1−∂1A2∂3A1−∂1A3∂0A2−∂2A0∂1A2−∂2A1∂2A2−∂2A2∂3A2−∂2A3∂0A3−∂3A0∂1A3−∂3A1∂2A3−∂3A2∂3A3−∂3A3Now, that’s a mess. But we can see that the terms in the upper-left to lower-left diagonal all cancel to zero:
Fμν=0∂1A0−∂0A1∂2A0−∂0A2∂3A0−∂0A3∂0A1−∂1A00∂2A1−∂1A2∂3A1−∂1A3∂0A2−∂2A0∂1A2−∂2A10∂3A2−∂2A3∂0A3−∂3A0∂1A3−∂3A1∂2A3−∂3A20And, moreover, we can now see that the upper and lower triangles, as mirrored across that diagonal, are the same but multiplied by negative one (see a couple pairs in red and green for example). We’ll use that insight in a moment.
Now let’s focus on the top row.
The top row is — if you recognize it from earlier — the components of the electric field (E)! And that means that the first column (because of the mirroring across the diagonal) is the negative of the components of the electric field!
Fμν=0−Ex−Ey−EzEx0∂2A1−∂1A2∂3A1−∂1A3Ey∂1A2−∂2A10∂3A2−∂2A3Ez∂1A3−∂3A1∂2A3−∂3A20Now what about those other three terms in each diagonal? They are — unsurprisingly — the terms of the magnetic field (B)! There are some mixed up minus signsIt’s beyond the depth of this post to go into the minus sign detail and where to apply it, but it suffices to say that the negative doesn’t actually change the intuition — it’s mainly just convention. And Richard happens to use the mostly-minus convention, so here we are., but you end up seeing:
Fμν=0−Ex−Ey−EzEx0Bz−ByEy−Bz0BxEzBy−Bx0Okay so: this so-called “Faraday tensor” is a four-by-four matrix that contains all the information about the electric and magnetic fields. This is handy for writing simple forms of the equations we’re about to arrive at.
It’s also, in some sense, a simpler idea than the electric and magnetic fields themselves.
We did a whole section above about breaking these six degrees of freedom of A into two 3-degree chunks, which ended up corresponding to the electric and magnetic fields. But this wasn’t really necessary. We could have just moved right into the Faraday tensor, and said that the tensor represented electromagnetism’s fields (it does).
We only split it into electric and magnetic because… that’s what most of us have always known. It was important to do so to illustrate this idea that mathematical descriptions of electric and magnetic fields that we’ve always been familiar with arise from the gauge invariance of the Dirac Lagrangian, and to give ourselves the “a-ha” moment.
But really, at least from a theoretic perspective, it would be simpler and easier to ignore the electric and magnetic fields entirely and just talk about this Faraday tensor object. So that’s mostly what we’ll do from here on out.
One more quick mathematical note: we’ve been developing the “downstairs” Faraday tensor, Fμν. But we’ll also need the “upstairs” Faraday tensor, Fμν. The only difference here is that we flip the signs of the first row and column. I’ll skip the derivation of this, but it can be seen starting from Fμν=∂μAν−∂νAμ.
Fμν=0ExEyEz−Ex0Bz−By−Ey−Bz0Bx−EzBy−Bx0Okay. So what next? Let’s look at the scalar quantity FμνFμν. This is in some sense the magnitude squared of the Faraday tensor, and it’s about to be important to us. I’ll save you the breakdown of the terms when you expand it out, but it ends up pretty trivially being:
FμνFμν=2(B2−E2)An important point here: electric and magnetic fields are not Lorentz invariant quantities — different inertial observers will see different values for them (intuitively: you can think of how moving charges — where “moving” is defined by your frame of reference — create magnetic fields). But 2(B2−E2) is a scalar quantity, and so it is invariant under Lorentz transformations. This makes it useful for the Lagrangian!
Now, here we get to what I consider — on two fronts — to be the least satisfyingly intuitive part of this video (which I get the impression Richard agrees with). Up until now we have been following what feels like a pre-ordained path, where each step is required by the prior steps. But now, we exercise some discretion to help us formulate the final Lagrangian.
First, we’re going to multiply our FμνFμν term by a scalar factor of 1/16π. We can do this, because — if you remember way back earlier — we pulled out a constant q from our added Lagrangian term, explicitly for the purpose of normalizing by some scalar factor to make our eventual equation fit observed conditions.
And a scalar is, of course, invariant under the types of transformations we’re looking at. So it has no chance of screwing up our symmetry.
Basically: we pulled a q scalar out of Aμ way back at the beginning, and that has flowed all the way through to our definition of Fμν. And we’re now leveraging that q to make it such that the units of Aμ and Fμν are defined in a way that makes 1/16π the scale factor.
I wish we could say that 1/16π being the scale factor follows from our steps thus far, but I think (someone please correct me if I’m wrong) that it doesn’t. Rather, we make this choice so that when we derive Maxwell’s equations from this work, the units there work out the way that we’ve always understood them. And part of the reason it ends up being 1/16π is that we’re mostly using Gaussian (CGS) units — we’d get a different (but equally-post-hoc-rationalized) scale factor if we were using SI units.
That is, the scale factor isn’t a requirement of the assumptions we’ve made, but it works with those assumptions, and is a requirement to make the equation fit perfectly with work further down the road.
I don’t love this, but I’m not sure that there’s a way to get to this scale factor otherwise. If you’re aware of one, please reach out!
Aside from that magnitude of scale factor, we’re also going to multiply through by negative one to preserve our mostly-minus convention, so we end up with:
−16π1FμνFμν=8π1E2−8π1B2Now, we’re going to add this term to our Lagrangian. Again: it’s scalar, and thus invariant under local phase transformations. If you go all the way back up to where we put ??? in the earlier equations… that’s where this is going — it fits all our criteria. So we have:
L=ψ(iγμ∂μ−mc2)ψ−qAμψγμψ−16π1FμνFμνAnd if we look at each of those terms:
- the orange first term is the free Dirac term, representing the particles’ kinetic and mass energy
- the blue second term is the interaction energy, where ψ couples to the photon field
- the green third term is the kinetic energy of the photon field
This is the Lagrangian of Quantum Electrodynamics, or the “QED Lagrangian.”
But now I want to re-outline the two “unintuitive fronts” I mentioned earlier and discuss them a little more.
- Why is the scale constant on the third term 1/16π? I understand we can make it that, thanks to the q we pulled out, but why is it that?
- Why is the (unscaled) third term FμνFμν? Yes, it fits what we’re looking for, but why is that the thing to put there?
We already covered #1. So let’s look at #2.
We do need to use the Faraday tensor, because it fully encapsulates all the degrees of freedom for A. And further, in order to satisfy the constraints of local phase symmetry and those of special relativity, the term must be entirely made of FμνFμν. So that’s fine.
But why not, say, (FμνFμν)2 or something similar?
As Richard walks through in the video, unlike everything we’ve done so far, this form is not a requirement of our steps thus far. Something like the square of that quantity is not ruled out by our U(1) symmetry approach.
It turns out that when we use this “simplest” form of the quantity, it ends up resulting in Maxwell’s equations, which we have empirically verified. So it ends up being right. But I find this unsatisfying, as opposed to our work so far, where everything has been necessarily right based on our assumptions.
What we’re basically doing here is saying “FμνFμν is the simplest Lorentz invariant quantity we can make out of Fμν, so let’s use it and see where we go.” And it turns out to work. This is fine, but I don’t love it. If anyone has thoughts on this, please do reach out! I’d really like there to be a more satisfying explanation or rationale under just the constraints of local phase symmetry and special relativityI’ve done some further research on this and it seems like another reason is that renormalizable interactions in 3+1D must be built from operators of mass dimension ≤4 (which is what we’re looking at here). So the FμνFμν term is the simplest possible gauge-invariant operator of mass dimension 4 or less that we can build. In theory the higher-dimensional operations could be used, but they’d be suppressed at accessible / typical energies. This makes me feel a little better, although I still wonder if there are other eligible (mass dimension ≤4 / renormalizable in 3+1D, gauge-invariant) operators that we could have used.
.
So anyway, all that aside: we got the QED LagrangianWell, technically - this is the “Lagrangian density.” We can integrate it over space to get the proper Lagrangian, and over spacetime to get the action. But most people just call this the Lagrangian, as I understand it.:
L=ψ(iγμ∂μ−mc2)ψ−qAμψγμψ−16π1FμνFμνThis is, as Richard refers to it, “the seed that grows into the electromagnetic phenomena.”
It is, at some level, the description of the universe that mathematically gives rise to everything from visible light to x-rays to electrical transmission lines; from magnets to wi-fi to preventing objects from simply passing through each other; from ensuring our brains function to causing our muscles to contract. All of that comes out of this equation.
It all comes from the simple Dirac equation, that formula from way up higher in this post (iℏγμ∂μψ−mcψ=0), and finding the free Dirac Lagrangian for that equation, and then making the obvious demand of local U(1) symmetry.
And aside from a couple post-hoc rationalizations (of the 1/16π scale factor and the choice of plain ol’ FμνFμν), you don’t need to make a single further assumption or presume the existence of anything else, electrical and magnetic fields included. The mathematical path you are led down gives rise to electromagnetism by itself.
There is, of course, a lot more to this story. We could go down the path of deriving Maxwell’s equations and the Lorentz force law. We could dig the principle of least action and how the universe seems to conspire to minimize action — and how that principle can take a Lagrangian like we’ve developed and output governing equations for the broader quantum electrodynamic system. We could explore more complex theories that make different sets of assumptions about the universe (massful photons, non-Abelian gauge fields, or the like).
But instead, let’s wrap with a meditation on existence and symmetry.
On Existence and Symmetry
Early on (about 6:10 in), Richard asks what he characterizes as the foundational question for the entire video:
Why is electromagnetism a thing?
And he answers:
Because the Dirac field has local U(1) symmetry.
Now, I’m a total novice here compared to him. But I don’t think that answer is quite right. I think I understand why he said it. But I don’t agree, or at least I find it a little misleading as stated.
When I think of “electromagnetism” being “a thing,” I’m thinking about the behavior of the universe. Electrons repelling one another. Protons attracting electrons. That sort of behavior.
And I don’t believe that behavior is a result of the Dirac field’s local U(1) symmetry. The behavior just… is something we observe. It isn’t defined by the math. I don’t believe there’s some computer somewhere running code that takes the Dirac field, enforces local symmetry on it, and outputs electromagnetism as a consequence, thus loading up particles which otherwise would not have it with that logic.
In fact, the Dirac field itself is a human construct. I don’t believe the Dirac field exists any more than the letter “a” exists as a standalone “thing” in the universe.
The letter “a” turns out to be a fabulously useful construct for us, allowing for for all sorts of useful ideas and accurate descriptors of the universe to be built on top of it. But “a” is not some fundamental item that exists — it is one we created.
And I find the idea of the Dirac field — and the math we just did to “output” electromagnetism — much the same. The Dirac field happens to be a truly excellent descriptor of behavior we observe in the universe. But it is not a prescriptive thing that is causing the behavior.
When we take that Dirac field and recognize its necessary local U(1) symmetry, we eventually arrive at a mathematical description of electromagnetic behavior, in the form of that Faraday tensor. But that mathematical description of electromagnetic behavior is not electromagnetism itself.
The map is not the territory, as they say.
All of this that we did was exploration of the map — and an awfully good map it is! The Dirac field, its symmetry, the QED Lagrangian, Maxwell’s equations, the Faraday tensor — these seem to be highly, highly accurate maps, as tested across uncountable experimental setups and predictions-proved-true.
But the territory is something else entirely. Electromagnetism — as a territory — is not a “thing” because of the Dirac field. It is simply something we observe from our vantage point and our chosen ways of looking.
Now, I imagine when Richard asked the rhetorical question “why is electromagnetism a thing?” he may have been referring to the map, not the territory. He may have been saying “where do these equations we call electromagnetism come from?”
But I think it’s interesting to look a level deeper. To recognize that while we may be building mighty-accurate maps, the existence of the territory is a different concept entirely.
At the same time, that should not stop us from acknowledging the wonder and beauty of a map that is so unbelievably accurate and simple and consistent, and that can be built step-by-step with little taken for granted.
This is Richard’s powerful point. We arrive at the map of electromagnetism by — again — not “importing” any of our existing understanding of electricity or magnetism. But rather by taking a simple equation — Dirac’s — and assuming naught but symmetry.
And despite its fancy symbols, Dirac’s equation is truly very simple:
iℏγμ∂μψ−mcψ=0Like we said above: it suggests that the universe is such that:
- if you take a spin-1/2 particle like an electron or positron
- and that particle’s wave function evolves through space and time
- the amount it changes
- is precisely offset by the particle’s rest mass (mc)
So we take that minimal starting point, and then we insist on… symmetry.
What is symmetry?
We have a visual sense of symmetry. When you reflect a butterfly’s wings over its midline, the butterfly appears unchanged. When you fold a snowflake over one of its axes, the snowflake is the same. When you rotate a square 90 degrees, it’s an identical square.
In a more general sense, symmetry is a property of an object or system coupled with a transformation, where all observable properties of the object or system are unchanged by the transformation.
Symmetry is beautiful to us humans, and perhaps the universe itself, however the territory came to be. I love Michael Edward Johnson’s Symmetry Theory of Valence on this point, which suggests:
The Symmetry Theory of Valence (STV): the symmetry of an information geometry of mind corresponds with how pleasant it is to be that experience.
He quotes a beautiful passage from Nobel laureate Frank Wilczek:
[…] the idea that there is symmetry at the root of Nature has come to dominate our understanding of physical reality. We are led to a small number of special structures from purely mathematical considerations—considerations of symmetry—and put them forward to Nature, as candidate elements for her design. […] In modern physics we have taken this lesson to heart. We have learned to work from symmetry toward truth. Instead of using experiments to infer equations, and then finding (to our delight and astonishment) that the equations have a lot of symmetry, we propose equations with enormous symmetry and then check to see whether Nature uses them. It has been an amazingly successful strategy.
This is, at some level, exactly what we have done here.
Insisting on local phase symmetry (or U(1) symmetry, or gauge invariance) of the Dirac field is just this sort of step. We say:
- we accept this description of how these particles behave in the universe
- but we need to be able to apply local phase transformations — shifting their complex phase in ways that vary depending on where they are in spacetime
- and we insist that doing so not change the laws of the universe
In much the same way that visually folding a butterfly or a snowflake has a comforting consistency to it, so too does this process. And by doing the math that naturally follows, a map for electromagnetism — one of but four fundamental forces in the universe, and the one arguably most relevant to us as humans — emerges.
I don’t know about you, but I quite like that.
Looking for more to read?
Want to hear about new essays? Subscribe to my roughly-monthly newsletter recapping my recent writing and things I'm enjoying:
And I'd love to hear from you directly: andy@andybromberg.com