Latest Posts

Quantum Evolution – Part 2

Given the general relationships for quantum time evolution in Part 1 of these posts, it is natural to ask about how to express these relationships in a basis that is more suited for computation and physical understanding. That can be done by taking the general relationship for time development

\[ \left| \psi (t_2) \right> = U(t_2, t_1) \left| \psi (t_1) \right> \]

and the projecting this relation into the position basis $$\left| \vec r \right>$$ with the definition that the traditional Schrodinger wave function is given by

\[ \left< \vec r | \psi (t) \right> = \psi(\vec r, t) \; .\]

The rest of the computation proceeds by a strategic placement of the closure relation for the identity operator, $$Id$$,

\[ Id = \int d^3 r_1 \left| \vec r_1 \right>\left< \vec r_1 \right| \]

in the position basis, between $$U(t_2,t_1)$$ and $$\left| \psi(t_1) \right>$$ when $$U(t_2,t_1) \left| \psi(t_1) \right>$$ is substituted for $$\left| \psi(t_2) \right>$$

\[ \left< \vec r_2 | \psi(t_2) \right> = \left< \vec r_2 \right| U(t_2,t_1) \left| \psi(t_1) \right> = \\ \int d^3r_1 \left<\vec r_2 \right| U(t_2,t_1) \left| \vec r_1 \right> \left< \vec r_1 \right| \left. \psi(t_1) \right> \; .\]

Recognizing the form of the Schrodinger wave function about in both the left- and right-hand sides, the equation becomes

\[ \psi(\vec r_2, t_2) = \int d^3r_1 \left<\vec r_2 \right| U(t_2,t_1) \left| \vec r_1 \right> \psi(\vec r_1, t_1) \; .\]

If the matrix element of the evolution operator between $$\vec r_2$$ and $$\vec r_1$$ is defined as

\[ \left<\vec r_2 \right| U(t_2,t_1) \left| \vec r_1 \right> \equiv K(\vec r_2, t_2; \vec r_1, t_1) \; , \]

then the structure of the equation is now


\[ \psi(\vec r_2, t_2) = \int d^3r_1 K(\vec r_2, t_2; \vec r_1, t_1) \psi(\vec r_1, t_1) \; .\]

What meaning can be attached to this equation, which, for convenience, will be referred to as the boxed equation? Well it turns out that the usual textbooks on Quantum Mechanics are not particularly illuminating on this front. For example, Cohen-Tannoudji et al, usually very good in their pedagogy, have a presentation in Complement $$J_{III}$$ that jumps immediately from the boxed equation to the idea that $$K(\vec r_2, t_2; \vec r_1, t_1)$$ is a Greens function. While this idea is extremely important, it would be worthwhile to slow down the development and discuss the interpretation of the boxed equation both mathematically and physically.

Let’s start with the mathematical aspects. The easiest way to understand the meaning of the boxed equation is to start with a familiar example from classical mechanics – the simple harmonic oscillator.

The differential equation for the position, $$x(t)$$, of the simple harmonic oscillator is given by

\[ \frac{d^2}{dt^2} x(t) + \omega^2_0 x(t) = 0 \; ,\]

where $$\omega^2_0 = k/m$$ and where $$k$$ and $$m$$ are the spring constant and mass of the oscillator. The general solution of this equation is the well-known form

\[ x(t) = x_0 \cos(\omega_0 (t-t_0)) + \frac{v_0}{\omega_0} \sin(\omega_0 (t-t_0)) \, \]

with $$x_0$$ and $$v_0$$ being the initial position and velocity at $$t_0$$, respectively. To translate this system into a more ‘quantum’ form, the second-order differential equation needs to be translated into state-space form, where the state, $$\bar S$$, captures the dynamical variables (here the position and velocity)

\[ \bar S = \left[ \begin{array}{c} x \\ v \end{array} \right] \; ,\]

(the time dependence is understood) and the corresponding differential equation is written in the form

\[ \frac{d}{dt} {\bar S} = {\bar f}\left( \bar S,t\right) \; .\]

For the simple harmonic oscillator, the state-space form is explicitly

\[ \frac{d}{dt} \left[ \begin{array}{c} x \\ v\end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ -\omega^2_0 & 0 \end{array} \right] \left[ \begin{array}{c} x \\ v\end{array} \right] \; , \]

with solutions of the form

\[ \left[ \begin{array}{c} x \\ v\end{array} \right] = \left[ \begin{array}{cc} \cos(\omega_0 (t-t_0)) & \frac{1}{\omega_0} \sin(\omega_0 (t-t_0)) \\ -\omega_0 \sin(\omega_0 (t-t_0)) & \cos(\omega_0 (t-t_0)) \end{array} \right] \left[ \begin{array}{c} x \\ v \end{array} \right] \\ \equiv M(t-t_0)\left[ \begin{array}{c} x \\ v \end{array} \right] \; .\]

The matrix $$M(t-t_0)$$ plays the role of the evolution operator (also known as the state transition matrix by engineers and the fundamental matrix by mathematicians), moving solutions forward or backward in time as needed because the theory is deterministic.

If the dynamical variables are denoted collectively by $$q_i(t)$$ where the index $$i=1, 2$$ labels the variable in place of the explicit names $$x(t)$$ and $$v(t)$$, then the state-space evolution equation can be written compactly as

\[ q_i(t) = \sum_{j} M_{ij}(t-t_0) q^{0}_j \;, \]

where $$q^{0}$$ is the collection of initial conditions for each variable (i.e. $$q^{0}_1 = x0$$, $$q^{0}_2 = v0$$). As written, this compact form can be generalized to an arbitrary number of dynamic variables by allowing the indices $$i$$ and $$j$$ to increase their range appropriately.

The final step is then to imagine that the number of dynamic variables goes to infinity in such a way that there is a degree-of-freedom associated with each point in space. This is the typical model used in generalizing a discrete dynamical system such as a long chain of coupled oscillators to a continuum system that describes waves on a string. In this case, the indices $$i$$ and $$j$$ is now replaced by a label indicating the position ($$x$$ and $$x’$$), the sum is replaced by an integral, and we have

\[ q(x,t) = \int dx’ M(t-t_0;x,x’) q(t_0;x’) \; ,\]

which except for the obvious minor differences in notation is the same form as the boxed equation.

Thus we arrive at the mathematical meaning of the boxed equation. The kernel $$K(\vec r_2, t_2; \vec r_1; t_1)$$ takes all of the dynamical values of the system at a given time $$t_1$$ and evolves them up to time $$t_2$$. The time $$t_1$$ is arbitrary since the evolution is deterministic, so that any particular configuration can be regarded as the initial conditions for the ones that follow. Each point in space is considered a dynamical degree-of-freedom and all points at earlier times contribute to its motion through the matrix multiplication involved in doing the integral. That is why the boxed equation involves to integration over time.

The final step is to physically interpret what the kernel means. From its definition as the matrix element between $$\vec r_2$$ and $$\vec r_1$$ of the evolution operator, the kernel is the probability amplitude that a particle moves from $$\vec r_1$$ to $$\vec r_2$$ during its evolution during the time span $$[t_1,t_2]$$. Or in other words, the conditional probability density that a particle can be found at $$\vec r_2$$ at time $$t_2$$ given that it started at position $$\vec r_1$$ at time $$t_1$$ is
\[ Prob(\vec r_2,t_2 | \vec r_1, t_1 ) = \left| K(\vec r_2,t_2; \vec r_1, t_1) \right|^2 \]

Next week, I’ll interpret how a slight modification of the kernel can be interpreted as a Greens function.

Quantum Evolution – Part 1

This post will be the beginning of my extended effort to organize material on the time evolution operator, quantum propagators, and Greens functions.  The aim of this is to put into a self-consistent and self-contained set of posts the background necessary to gnaw away at a reoccurring confusion I have had over these items from their presentations in the literature as to the names, definitions, and uses of the objects. In particular, the use of the Schrodinger, Heisenberg, and Interaction pictures.

Once this organization is done, I hope to use these methods to serve as a springboard for research into methods of applying quantum mechanical techniques to classical dynamical systems. In particular, the use of the Picard iteration (aka the Dyson’s expansion) for time-varying Hamiltonians.

The references that I will be using are:

[1] Quantum Mechanics – Volume 1, Claude Cohen-Tannoudji, Bernard Diu, and Frank Laloe
[2] Quantum Mechanics, Leonard Isaac Schiff
[3] Principles of Quantum Mechanics, R. Shankar
[4] Modern Quantum Mechanics, J.J. Sakurai

Starting simply, in this post I will be reviewing the definition and properties of the evolution operator.

Adapting the material in [1] (p. 236, 308-311), the Schrodinger equation in a representation-free form is:

\[ i \hbar \frac{d}{dt} \left| \psi(t) \right> = H(t) \left| \psi(t)\right>\]

From the structure of the Schrodinger equation, the evolution of the state $$\left|\psi(t)\right>$$ is entirely deterministic being subject to the standard, well-known theorems about the existence and uniqueness of the solution.  For the skeptic that is concerned that $$\left|\psi(t)\right>$$ can be infinite-dimensional I don’t have much in the way of justification except to say three things. First that the Schrodinger equation in finite dimensions (e.g. two state systems) maps directly to the usual cases of coupled linear systems dealt with in introductory classes on differential equations. Second, it is common practice for infinite-dimensional systems (i.e., PDEs) to be discretized for numerical analysis, and so the resulting structure is again a finite-dimensional linear system, although with arbitrary sizes. That is to say, the practitioner can refine the mesh used arbitrarily until either his patience or his computer gives out. It isn’t clear that such a process necessarily converges but the fact that there isn’t a hue and cry of warnings in the community suggests that convergence isn’t a problem. Finally, for those cases where the system is truly infinite-dimensional, with no approximations allowed, there are theorems about the Cauchy problem and how to propagate forward in time from initial data and how the resulting solutions are deterministic. How to match up a evolution operator formalism to these types of problems (e.g., heat conduction) may be the subject of a future post. One last note, I am unaware of a single physical system that involves time evolution that can’t be manipulated (especially for numerical work) into the form $$\frac{d}{dt} \bar S = \bar f(\bar S; t)$$ where $$\bar S$$ is the abstract state and $$\bar f$$ is the vector field that is the function of the state and time. The Schrodinger equation is then an example where $$\bar f(\bar S;t)$$ is a linear operation.

From the theory of linear systems, the state at some initial time $$\left|\psi(t_0)\right>$$ is related to the state at the time $$t$$ by

\[ \left|\psi(t)\right> = U(t,t_0) \left|\psi(t_0)\right>\]

To determine what equation $$U(t,t_0)$$ obeys, simply substitute the above expression into the Schrodinger equation to yield
\[i \hbar \frac{d}{dt}\left[ U(t,t_0) \left|\psi(t_0)\right> \right] = H \; U(t,t_0) \left|\psi(t_0)\right> \; ,\]

and since $$\left|\psi(t_0)\right>$$ is arbitrary, the equation of motion or time development equation for the evolution operator is
\[i \hbar \frac{d}{dt} U(t,t_0) = H \; U(t,t_0) \; .\]

The required boundary condition is
\[ U(t_0,t_0) = Id \; ,\]
where $$Id$$ is the identity operator of the correct size to be consistent with the state dimension. That is to say that $$Id$$ finite dimensional matrix with rank equal to the state of $$\left| \psi(t) \right>$$ or it is infinite dimensional.

Some obvious properties can be deduced without an explicit expression for $$U(t,t_0)$$ by regarding as variable $$t_0$$. Assume that $$t_0$$ takes on a particular value $$t’$$ then the evolution operator can relate the state at that time to some later time $$t”$$ as
\[ \left| \psi(t”)\right> = U(t”,t’) \left| \psi(t’)\right> \; .\]
Now let $$t_0$$ take on the value $$t”$$ and connect the state at this time to some other time $$t$$ by
\[ \left| \psi(t)\right> = U(t,t”) \left| \psi(t”)\right> \; .\]
By composing these two expressions, the state at $$t’$$ can be related to the state at $$t$$ with a stop off at the intermediate time $$t”$$ resulting in the general composition relation
\[ U(t,t’) = U(t,t”) U(t”,t’) \; .\]
Using the same type of arguments the inverse of the evolution operator can be seen to be
\[U^{-1}(t,t_0) = U(t_0,t) \; \]
which can also be expressed as
\[ U(t,t_0) U(t_0,t) = U(t,t) = Id \; .\]

Formal solution for the equation of motion of the evolution operator is
\[ U(t,t_0) = Id – \frac{i}{\hbar} \int_{t_0}^{t} dt’ H(t’) U(t’,t_0) \]
which can be verified using the Liebniz rule for differentiation under the integral sign.

The Liebniz rule says that if the integral $$I(t)$$ is defined as
\[ I(t) = \int_{a(t)}^{b(t)} dx f(t,x) \]
then its derivative with respect to $$t$$ is
\[ \frac{d}{dt} I(t) = \int_{a(t)}^{b(t)} dx \frac{\partial}{\partial t} f(t,x) + f(b(t),x) \frac{\partial}{\partial t}b(t) – f(a(t),x) \frac{\partial}{\partial t}a(t) \]
Applying this to the formal solution for the evolution operator gives
\[ \frac{d}{dt} U(t,t_0) = \int_{t_0}^{t} dt’ \frac{\partial}{\partial t} \left( H(t’) U(t’,t_0) \right) + H(t) U(t,t_0) \frac{\partial}{\partial t} t \\ \\ = H(t) U(t,t_0) \; ,\]

There are three cases to be examined (based on the material in [4] pages 72-3).

1. The Hamiltonian is not time dependent, $$H \neq H(t)$$. In this case, the evolution operator has an immediate closed form solution given by
\[ U(t,t_0) = e^{-\frac{i H (t-t_0)}{\hbar} } \; .\]

2. The Hamiltonian is time dependent but it commutes with itself at different times, $$H = H(t)$$ and $$\left[ H(t),H(t’) \right] = 0$$. This case also possesses an immediate closed solution but with a slight modification
\[ U(t,t_0) = e^{-\frac{i}{\hbar}\int_{t_0}^{t} dt’ H(t’)} \]

3. The Hamiltonian is time dependent and it does not commute with itself at different times, $$H = H(t)$$ and $$\left[H(t),H(t’)\right] \neq 0$$. In this case, the solution that exists is written in the self-iterated form
\[ U(t,t_0) = Id + \\ \sum_{n=1}^{\infty} \left(-\frac{i}{\hbar}\right)^n \int_{t_0}^{t} dt_1 \int_{t_0}^{t_1} dt_2…\int_{t_0}^{t_{n-1}} dt_n H(t_1) H(t_2)… H(t_n) \; .\]

The structure of the iterated integrals in case 3, is formally identical to the Picard iteration, a technique that is used in a variety of disciplines to construct solutions to initial value problems, at least in a limited time span. I am not aware of any formal proof that the convergence in Case 3 is guaranteed in the most general setting when $$H(t)$$ and $$U(t,t_0)$$ are infinite dimensional but the iterated solution is used in quantum scattering and so the method is worth studying.

Next week, I’ll be exploring the behavior of a related object called the quantum propagator.

A New Direction on Vectors

This post grew out of a discussion I had with my son who is now learning physics at the high school level.  He was reading his text book with the old, time-honored discussion of vectors being things with directions and magnitudes.  This is certainly true for physical vectors that describe things like displacement, velocity, force, torque, and electric and magnetic fields – all the typical menagerie that introductory physics students encounter.

But I couldn’t resist talking to him about vector spaces in general.  It has long baffled me why we teach kids from the beginning the dumbed-down version of things then make them unlearn it later on.  From what I can tell, the really creative thinkers of history were ones who didn’t succumb to this mind-numbing nonsense.  They were people who didn’t let the accepted way of doing things constrain them.  If they had, we would never have heard of them.

Thus I resolved to talk to him about vectors in general.  I introduced him to the rules for defining a vector space.  A short aside is warranted on this point.  Depending on which textbook you read, there are subtle differences in what is presented.  Some books enumerate only eight properties, some only seven, and the ones that do it best list 10.  All of them functionally amount to the same thing but standardization seems to elude the community at large, much to the confusion of students.

I touched on all the abstract definitions and properties just to show that there was a firm structure we could build on, but I emphasized that the prototype example of a vector that one could almost always use and never be led astray was a list.  I emphasized that what made the list useful was that it was a vector space in its own right and that almost all other vector spaces could be put into correspondence with a list.  I also emphasized that, while the list was helpful, it is often only a representation of the vector in question, and that we shouldn’t be seduced into thinking it was the vector itself.  A velocity vector is no more a list of numbers than Bernard Riemann is the list of pixels in the 2-array of light and shadow below.

Bernard_Riemann

We finished our discussion by touching on infinite-dimensional vector spaces, and I could see that what I really needed was a concrete example that he could play with.  After about an hour of back-and-forth I finally came up with something that he seemed to grasp.  I offer it here in the hopes that others will be able to use it as well.

What I wanted to create was a model where four properties were met:

  • The basic units are lists
  • These lists can be added component-wise
  • The lists can be infinite
  • The lists can be directly translated into a picture that visualizes magnitude and direction

A note is needed on the point that the lists can be infinite.  By this I mean that the length of the list was limited only by our patience and aesthetics or by the physical memory of the machine.  Much in the usual idea of infinity, I wanted a system where the student could always reach into the bag and pull out another dimension (being assured that the student would tire before the computer would).

The model that I created is expressed in the computer algebra system wxMaxima 11.08.0 running Maxima version 5.25.0. I like Maxima for several reasons, chief amongst them being that it is very capable and it is free.

Being based on LISP, Maxima is quite comfortable handling the first property.  For the second and third properties, I wanted a way to add lists of different lengths, so that the student would never encounter a limitation saying that one list was as long as another.  A different way of expressing that is to say that if there are two lists
\[ M = \left[ m_1, m_2, m_3 \right]\]
\[ N = \left[ n_1, n_2, n_3, n_4, n_5 \right]\]
they are not actually different in length since we can always imagine that they list only the portions of themselves beyond which there are only zeros (i.e. $$m_3$$ and $$n_5$$  are the last non-zero entries in their respective lists) so that they actually look like
\[ M = \left[ m_1, m_2, m_3, 0, 0, 0,…,0,… \right]\]
\[ N = \left[ n_1, n_2, n_3, n_4, n_5,0,…,0,… \right]\]
To do this I wrote a small Maxima function called add_lists that pads the smallest of the two lists with zeros and then returns the sum.  The code for it is

add_lists(a, b)  := block([ret_lst],
                      Na    : length(a),
                      Nb    : length(b),
                      delta : abs(Na-Nb),
                      pad   : makelist(0,i,1,delta),
                      if Na > Nb 
                         then ret_lst : a + append(b,pad) 
                         else ret_lst : append(a,pad) + b);

For the final property, I decided that a plot of a Fourier series in the interval $$[0,L]$$ would correspond to the visual addition of the vectors.

The particular choice of Fourier series is the cosine series given by
\[ f(x;L) = \sum_{n=1}^{\infty} a_n \cos \left( \frac{ 2 \pi n x}{L} \right) \; , \]
where the constant offset term given by $$a_0$$ has been set equal to zero and the presence of the term L there to remind us of the interval over which the function is defined.   Traditional treatments in advanced mathematics and quantum texts, drive the point that the sine and cosine function are orthogonal polynomials
\[ \int_0^{2 L} dx \cos \left( \frac{ 2 \pi n x}{L} \right) \cos \left( \frac{ 2 \pi p x}{L} \right) = \delta_{np} L \]
\[ \int_0^{2 L} dx \sin \left( \frac{ 2 \pi n x}{L} \right) \sin \left( \frac{ 2 \pi p x}{L} \right) = \delta_{np} L \; \; (n \neq 0)\]
\[ \int_0^{2 L} dx \cos \left( \frac{ 2 \pi n x}{L} \right) \sin \left( \frac{ 2 \pi p x}{L} \right) = 0 \]
in a Sturm-Liouville sense and thus are basis vectors in some abstract space.  The coefficients $$a_n$$ are then the components of the general vector (function)  in this space.  While important to the theoretical underpinnings, these relationships obscure what is really a rather simple concept that two sets of constants can be added together as a short hand for adding the two trigonometric series together and thus lists of these constants form a vector space in their own right.

Thus the list $$A = \left[a_1, a_2, …\right]$$ is the representation of the vector (in component or list form) and the function fourier_series given by

fourier_series( lst, L ) := block([expr],
                                  expr : 0,
                                  for i : 1 step 1 thru length(lst) do 
                                   ( temp_expr : lst[i]*cos(2*%pi*i*x/L),
                                     expr : expr + temp_expr ),
                                  expr);

is what delivers the corresponding expression that is plotted.

The final step is to take two lists, plot them separately to give the ‘direction and magnitude’ of each, and then to add them together plot-wise (that is to say plot the sum of the expressions) and show that the resulting vector is the same that is obtained if the two lists of coefficients are add together first and then plotted.

For a concrete example take the lists
\[A = \left[-1,\frac{1}{2},-\frac{1}{3},\frac{1}{4},-\frac{1}{5},\frac{1}{6},-\frac{1}{7}\right] \]
and
\[B = \left[1,\frac{4}{5},\frac{3}{5},\frac{2}{5},\frac{1}{5}\right] \; .\]

Lists $$A$$ and $$B$$ produce a plot of $$f_A(x;L)$$ and $$f_B(x;L)$$ over the interval $$x \in \left[0,L\right]$$
fourier_vector_plot_A_and_B
With corresponding expansions of fourier_series(A,L)
$$f_A(x;L) = -{{\cos \left({{7\,\pi\,x}\over{5}}\right)}\over{7}}+{{\cos \left({{6\,\pi\,x}\over{5}}\right)}\over{6}}-{{\cos \left(\pi\,x\right)}\over{5}}+{{\cos\left({{4\,\pi\,x}\over{5}}\right)}\over{4}}-{{\cos \left({{3\,\pi\,x}\over{5}}\right)}\over{3}}+{{\cos \left({{2\,\pi\,x}\over{5}}\right)}\over{2}}-\cos \left({{\pi\,x}\over{5}}\right)$$

and fourier_series(B,L)

$$f_B(x;L) = {{\cos \left(\pi\,x\right)}\over{5}}+{{2\,\cos \left({{4\,\pi\,x}\over{5}}\right)}\over{5}}+{{3\,\cos \left({{3\,\pi\,x}\over{5}}\right)}\over{5}}+{{4\,\cos\left({{2\,\pi\,x}\over{5}}\right)}\over{5}}+\cos \left({{\pi\,x}\over{5}}\right)$$

The ‘vector sum’   gives the expression

$$f_A(x;L) + f_B(x;L) = -{{\cos \left({{7\,\pi\,x}\over{5}}\right)}\over{7}}+{{\cos \left({{6\,\pi\,x}\over{5}}\right)}\over{6}}+{{13\,\cos \left({{4\,\pi\,x}\over{5}}\right)}\over{20}}+{{4\,\cos \left({{3\,\pi\,x}\over{5}}\right)}\over{15}}+{{13\,\cos \left({{2\,\pi\,x}\over{5}}\right)}\over{10}}$$

and the corresponding plot

fourier_vector_plot_A_plus_B

The student can then confirm this in terms of adding the list representations together by executing C = add_list(A,B) which yields

\[ C = \left[0,\frac{13}{10},\frac{4}{15},\frac{13}{20},0,\frac{1}{6},-\frac{1}{7}\right] \; .\]

 

Evaluating $$f_C(x;L)$$ = fourier_series(C;L) which yields the identical expression

$$f_C(x;L) = -{{\cos \left({{7\,\pi\,x}\over{5}}\right)}\over{7}}+{{\cos \left({{6\,\pi\,x}\over{5}}\right)}\over{6}}+{{13\,\cos \left({{4\,\pi\,x}\over{5}}\right)}\over{20}}+{{4\,\cos \left({{3\,\pi\,x}\over{5}}\right)}\over{15}}+{{13\,\cos \left({{2\,\pi\,x}\over{5}}\right)}\over{10}}$$

and the identical plot

fourier_vector_plot_C

Why Don’t We Teach the Helmholtz Theorem?

In an earlier post, I outlined a derivation of the Helmholtz Theorem starting from the identity

\[ \nabla^2 \left( \frac{1}{|\vec r – \vec r \, {}’|} \right) = -4 \pi \delta( \vec r – \vec r \, {}’ ) .\]

It seems hard for me to believe but it was many years after I had studied E&M in graduate school that I came across this theorem and to appreciate its power.  The reason I say it is hard for me to believe is that almost none of the traditional texts talk about it or even have an index entry for it.

The traditional way we teach electricity and magnetism is to trace through the historic development by examining a host of 18th and 19th century experiments. The usual course is to introduce integral forms of the laws and then show how these can lead to Maxwell’s equation (e.g., from Coulomb’s law to Gauss’s law).  The road here is long and tortuous, starting with static fields, then layering on the time dependence, and then finally sneaking the displacement current into Ampere’s law.  This approach obscures the unity of Maxwell’s equations.  It also leaves the student bored, confused, and overwhelmed, and incapable of appreciating what follows.

To such a student is lost the wonder of realizing that the fields can take on a life of their own, independent of the things that created them.  Lost is the realization that these fields can radiate outwards; that they can reflect and refract (i.e., optics as an inherently electromagnetic phenomenon); and that they can be generated and controlled at will to form the communication network we all use on a daily basis.

A better approach starts with Maxwell’s equations in their full form and then uses the Helmholtz theorem to ‘interrogate’ them to derive the time-honored static field results of Coulomb and Biot-Savart. I believe this approach, which is nearly impossible to find in the usual textbooks, offers a clearer view into the unity of electricity and magnetism, at the cost of some slightly more mature vector calculus.  The fields are introduced early, and the equations they satisfy are complete. There is no unlearning facts later on.  For example, the traditional textbook results for Coulomb’s law emphasize that the electric field is conservative and that its curl is identically zero.  Weeks or months go by with that concept firmly emphasized and entrenched and then, and only then, is the student informed that the result doesn’t hold in general.

There is also a fundamental flaw in the pedagogy of deriving Maxwell’s equations from the integral forms.  Nowhere along the line is there any explanation as to why knowing the divergence and curl of a vector field is all that is needed to uniquely specify the field. After all, why can’t a uniform field be added as a constant of integration?

At the heart of the traditional approach is the idea of a force field as a physically real object and not just a useful mathematical construct. The prototype example is the electric field, which comes from the experimental expression for Coulomb’s law, stating that the force on two charges $$q_2$$ due to $$q_1$$ is given by (note all equations are expressed in SI units):

\[ \vec F_{21}(\vec r) = \frac{1}{4 \pi \epsilon_0} \frac{ q_1 q_2 \left( \vec r_2 – \vec r_1 \right)}{|\vec r_2 – \vec r_1|^3} \, .\]

The usual practice is then to assume one of the charges is a test charge and that the other is smeared into an arbitrary charge distribution within a volume $$V$$ and that the resulting electric field is

\[ \vec E(\vec r) = \frac{1}{4 \pi \epsilon_0} \int_V d^3 r \frac{ \rho(\vec r \;’) \left( \vec r – \vec r \; ‘ \right)}{|\vec r – \vec r \; ‘|^3} \, . \]

The last step in the the traditional approach involves introducing vector field divergence and curl, the associated theorems they obey, and applying the whole lot to electric flux to get the first of the Maxwell equations

\[ \nabla \cdot \vec E (\vec r) = \rho(\vec r) / \epsilon_0 \, .\]

As the traditional program proceeds, magnetostatics follows with the introduction of the Biot-Savart law

\[ \vec B(\vec r) = \frac{\mu_0}{4 \pi} \int_V d^3 r’ \frac{ \vec J(\vec r \; ‘) \times (\vec r – \vec r \;’)}{|\vec r – \vec r \; ‘|^3} \, , \]

as the experimental observation for the generation of a magnetic field for a given current density within a volume $$V$$. This time the vanishing of the divergence is used to find the vector potential and the curl is related to the current density via Ampere’s law

\[ \nabla \times \vec B(\vec r) = \mu_0 \vec J(\vec r) \,.\]

The traditional approach finally gets to time-varying fields when taking up Faraday’s law, requiring the student to unlearn $$\nabla \times \vec E = 0$$ and then finally re-learn Ampere’s equation with the introduction of the displacement current. By this time the full Maxwell equations are on display, but the linkage between the different facets of each field is highly obscured, and the basic underpinning of the theory — that the divergence and curl tells all there is to know about a field — is not to be found. The pedagogy seems to suffer from too many unconnected facts with no common framework by which to relate them.

Using the Helmholtz Theorem in conjunction with an upfront statement of the Maxwell equations offers several advantages in teaching electromagnetism. I will content myself with just the derivation of the Coulomb’s and Biot-Savart’s law.  Additional information can be found in the paper and presentation I recently wrote for the Fall meeting of the Chesapeake Section of AAPT.

Start by considering the Maxwell equations, presented here in vacuum, as

\[ \nabla \cdot \vec E(\vec r,t) = \rho(\vec r,t) / \epsilon_0 \, ,\]

\[ \nabla \cdot \vec B(\vec r,t) = 0 \, , \]

\[ \nabla \times \vec E(\vec r,t) = -\frac{\partial \vec B (\vec r,t)}{\partial t} \, , \]

and

\[ \nabla \times \vec B(\vec r,t) = \mu_0 \vec J (\vec r,t) + \epsilon_0 \mu_0 \frac{\partial \vec E(\vec r,t)}{\partial t} \, .\]

Since the Coulomb and Biot-Savart laws are in the domain of the electro- and magnetostatics, all terms in Maxwell’s equations involving time derivatives are set equal to zero and all $$t$$’s are eliminated to yield
\[ \nabla \cdot \vec E(\vec r) = \rho(\vec r) / \epsilon_0 \, ,\]
\[ \nabla \cdot \vec B(\vec r) = 0 \, , \]
\[ \nabla \times \vec E(\vec r) = 0 \, , \]

and
\[ \nabla \times \vec B(\vec r) = \mu_0 \vec J (\vec r) \, .\]

Now substituting the electric and magnetic field divergences and curls into the $$U(\vec r)$$ and $$\vec W(\vec r)$$ expressions in Helmholtz’s theorem yields the usual scalar potential

\[U(\vec r) = \frac{1}{4 \pi \epsilon_0}\int_V d^3 r’ \frac{\rho(\vec r \;’)}{|\vec r – \vec r \;’|} \]

for the electric field and the usual vector potential

\[\vec W(\vec r) = \frac{\mu_0}{4\pi} \int_{V} d^3r’ \frac{ \vec J(\vec r) }{|\vec r – \vec r\;’|}\]

for the magnetic field.

On the whole, I think this approach enhances the physical understanding of Maxwell’s equations in ways the traditional approach can’t.  It’s not without its downside, but none of the problems present much difficulty.  Further details can be found in my paper.

 

Deriving the Helmholtz Theorem

To derive the Helmholtz theorem start first with one representation of the delta-function in 3-dimensions

\[ \nabla^2 \left( \frac{1}{|\vec r – \vec r \, {}’|} \right) = -4 \pi \delta( \vec r – \vec r \, {}’ ) \, .\]

Start with the identity
\[ \vec F(\vec r) = \int_{V} d^3 r’ \delta(\vec r – \vec r \,{}’) F(\vec r \,{}’) \]
for an arbitrary vector field $$\vec F(\vec r)$$ over a given volume $$V$$. Note that time will not be involved in this derivation. Also note that there is a ongoing discussion in the literature about the correct way to extend this theorem for time varying fields. This will be a discussed in a future post.

Using the explicit representation of the delta-function stated above and factoring out the derivatives with respect to the field point $$\vec r$$ yields

\[ \vec F(\vec r) = \frac{ -\nabla^2_{\vec r} }{4 \pi} \int_V d^3 r’ \frac{\vec F(\vec r\,{}’)}{|\vec r – \vec r\,{}’|} \, . \]

Now apply the vector identity $$\nabla^2 = \nabla( \nabla \cdot ) – \nabla \times (\nabla \times)$$. Doing so allows the expression for $$\vec F(\vec r)$$ to take the form
\[ \vec F (\vec r) = \frac{1}{4 \pi} \nabla_{\vec r} \times \vec I_{vector} – \frac{1}{4 \pi} \nabla_{\vec r} I_{scalar} \]
where the integrals
\[ \vec I_{vector} = \nabla_{\vec r} \times \int_v d^3 r ‘ \frac{\vec F (\vec r\,’)}{|\vec r – \vec r\,’|} \]
and
\[ I_{scalar} = \nabla_{\vec r} \cdot \int_v d^3 r ‘ \frac{\vec F (\vec r\,’)}{|\vec r – \vec r\,’|} \; . \]

The strategy for handling these terms is to

  1. bring the derivative operator with respect to r into the integral
  2. switch the derivative from r to r’ with a cost of a minus sign
  3. integrate by parts
  4. apply the appropriate boundary conditions and boundary integral version of the divergence theorem to the total derivative piece

Application of this strategy to the vector (first) integral gives
\[ \vec I _{vector} = \int_V d^3 r’ \frac{ \nabla_{\vec r\,’} \times \vec F (\vec r \, ‘)}{|\vec r – \vec r\,’|} – \int_{\partial V} dS \frac{\hat n \times \vec F(\vec r\,’)}{|\vec r – \vec r\,’|} \; .\]

Likewise, the same strategy applied to the scalar (second) integral gives
\[ I_{scalar} = \int_V d^3 r’ \frac{ \nabla \cdot \vec F ( \vec r \,’)}{|\vec r – \vec r\,’|} – \int_{\partial V} dS \frac{\hat n \cdot \vec F ( \vec r \,’)}{|\vec r – \vec r\,’|} \; . \]

Now the usual case of interest sets the bounding volume to be all space, which requires that the field drop off faster than $$r^{-1}$$. If this condition is met then the surface integrals zero and the original field can be written as
\[ \vec F (\vec r) = -\nabla U(\vec r) + \nabla \times \vec W(\vec r) \]
where
\[ U(\vec r) = \frac{1}{4\pi} \int_V d^3 r’ \frac{ \nabla’ \cdot \vec F ( \vec r \,’)}{|\vec r – \vec r\,’|} \]
and
\[ \vec W(\vec r) = \frac{1}{4\pi} \int_V d^3 r’ \frac{ \nabla’ \times \vec F ( \vec r \,’)}{|\vec r – \vec r\,’|} \; .\]

At this point it is a snap to derive Coulomb’s and Biot-Savart’s laws from the Maxwell equations but that is a post for another time.