Latest Posts

Laplace Transforms – Part 2: Convergence

Having established some attractive properties of the Laplace Transform that make solving linear differential equations relatively easy, the next logical question to ask is, under what conditions does the Laplace transform actually exist? In answering this, I will follow closely the individual arguments found in Modern Control Engineering by Katsuhiko Ogata, but I’ll be modifying their context to aim at the larger question of the relation between the Laplace and Fourier Transforms.

The first step into understanding what functions are Laplace transformable is to look at the basic definition

\[ {\mathcal L}[f(t)] = \int_0^{\infty} dt \, e^{-st} f(t) \; .\]

The Laplace Transform will only exist when the integral on the right-hand side is convergent. Note that the integral restricts itself to the portion of $$f(t)$$ that falls in the range $$[0,\infty]$$. Since the Laplace Transform doesn’t ask or care about the behavior of $$f(t)$$ for values of $$t < 0$$, the common prescription is to define $$f(t) = 0$$ for $$t < 0$$.

It is a fair question to ask why this requirement is imposed if the the Laplace Transform doesn’t ‘see’ the values of $$f(t)$$ for $$t<0$$. Later on in this column, a relationship between the Fourier and Laplace Transforms presents a plausible justification. For now, let's just say that there are matters of causality that say a signal didn't exist in perpetuity and, thus, at some point in the past it had to turn on. That is what is meant by $$t=0$$.

Of course, there are deep philosophical problems with this point of view, but they don’t touch on the day-to-day workings of the physicist or controls engineer.

So, the operative question is, what type of function, so restricted, has a convergent integral? The class of functions that allow the integral to exist are said to be of exponential order. Operationally, this means that

\[ \lim_{t\rightarrow0} \, e^{-\gamma} |f(t)| = 0 \]

for some positive real number $$\gamma$$ ($$s = \gamma + i \omega$$). If the limit is such that it tends to $$0$$ for $$\gamma > \sigma_c$$ and tends to $$\infty$$ for $$\gamma < \sigma_c$$, the real, positive number $$\sigma_c$$ takes on the name the abscissa of convergence. What this is telling us is that the Laplace Transform only converges when the real part of $$\Re(s) = \gamma$$ is greater than $$\sigma_c$$.

As we will discuss in the example below, this means that $$\gamma$$ must be greater than the right-most pole of the transform $$F(s) = \int_0^{\infty} dt \, e^{-st} f(t)$$.

Some additional words are in order. Functions that increase faster than an exponential do not have Laplace Transforms. The most-oft cited counter-example is the function $$e^{t^2} \; \forall t > 0$$. Note that $$e^{t^2}$$ is a perfectly fine signal if it is defined with compact support, such that it is zero for times $$t>t_{max}$$.

There is an obvious concern that arises at this point. Since the purpose of the Laplace Transform is to turn differential equations into algebraic equations, there is no easy way to restrict the resulting solution to that algebraic equation to have one simple pole. For example, the differential equation $$\ddot x(t) + 3 \dot x(t) + 2 x(t) = f(t)$$ has a Laplace-Transformed left-hand-side of $$s^2 + 3s + 2 = (s+2)(s+1)$$ times $${\mathcal L}[x(t)] \equiv X(s)$$. Assuming the Laplace Transform of $$x(t)$$ and $$f(t)$$ exists, then the following algebraic equation corresponds to the original differential equation with homogeneous initial conditions:

\[ (s+2)(s+1)X(s) = F(s)\; . \]

Solving this for $$X(s)$$ sets before us the task of finding an inverse Laplace Transform for

\[ X(s) = \frac{F(s)}{(s+2)(s+1)} \; .\]

Even assuming that $$F(s)$$ has no poles of its own, the above analysis would suggest that the minimum value of $$\Re(s)$$ would be $$\Re(s)=-1+\epsilon$$, where $$\epsilon$$ is a small positive number. This condition restricts $$s$$ to a portion of the complex plane that doesn’t include the other pole. But the $$s$$ value at the other pole corresponds to physically allowed frequencies, so, how does one reconcile this?

To answer this question, I quote a passage from Ogata’s book:

we must resort to the theory of complex variables [where]…there is a theorem…that states that, if two analytic functions are equal for a finite length along any arc in a region in which both are analytic, then they are equal everywhere in the region. … although we require the real part of $$s$$ to be greater than the abscissa of convergence to make the $$\int_0^{\infty} dt \, e^{-st} f(t)$$ absolutely convergent, once the Laplace transform $$F(s)$$ is obtained, $$F(s)$$ can be considered value throughout the entire $$s$$ place except at the poles of $$F(s)$$.

– Katsuhiko Ogata.

So, since the theory is now well supported, let’s return to the question as to whether or not the restriction of $$f(t) = 0 \; \forall t < 0$$ is needed. To try to justify this, bring in the Fourier Transform (this argument is based on the ideas of P.C. Chau).

The basic form of the Fourier Transform is

\[ {\mathcal F}[f(t)] = \int_{-\infty}^{\infty} dt \, f(t) e^{-i \omega t} \equiv F(\omega) \]

with the inverse transform being

\[ f(t) = \frac{1}{2\pi} \int_{-\infty}^{\infty} d\omega \, F(\omega) e^{i\omega t} \; .\]

Now consider the Fourier Transform of the special form of

\[ f(t) = e^{-\gamma t} g(t) \; .\]

The identity $$f(t) = {\mathcal F}^{-1}\left[{\mathcal F}[f(t)] \right]$$ becomes

\[ f(t) \frac{1}{2\pi} \int_{-\infty}^{\infty} d \omega \, \left[ \int_{-\infty}^{\infty} d \tau \, f(\tau) e^{-i \omega \tau} \right] e^{i\omega t} \]

and

\[ e^{-\gamma t} g(t) = \frac{1}{2\pi} \int_{\infty}^{\infty} d \omega \, \int_{-\infty}^{\infty} d \tau \, e^{-\gamma \tau} g(\tau) e^{-i \omega \tau} e^{i \omega t} \; .\]

Bringing the $$e^{-\gamma t}$$ over to the right-hand side and rearranging gives

\[ g(t) = \frac{1}{2\pi} \int_{\infty}^{\infty} d \omega \, e^{(i\omega + \gamma)t} \int_{-\infty}^{\infty} d \tau \, e^{-(i\omega + \gamma) \tau} g(\tau) .\]

Now define the Laplace variable $$s = \gamma + i \omega$$ from which $$d\omega = ds/i$$ and substitute back in to change the outer integral to get the final form

\[ g(t) = \frac{1}{2\pi} \int_{\gamma – i\infty}^{\gamma + i \infty} d s \, e^{s t} \int_{-\infty}^{\infty} d \tau \, e^{-s \tau} g(\tau) .\]

The inner integral will not converge unless $$\tau$$ is restricted so that it is always positive. So, here is a plausible, well-supported argument for always defining $$f(t)$$ in a piece-wise fashion so that it is zero in the range $$t<0$$.

A couple of final notes. The contour implied in the final integral is known as the Bromwich contour. It is rarely used in practice to compute an inverse Laplace Transform with the usual technique being a combination of a clever use of already tabulated transforms and the use of partial fraction decomposition. Another technique for obtaining the inverse Laplace Transform also exists that does not depend on either of these approaches and I’ll devote some future space to examining it.

Next week, I’ll be taking a look at some additional properties of the Laplace Transform that make convenient the separation of transient and steady-state effects.

Laplace Transforms – Part 1: Basic Definitions

As regular readers know, time evolution of classical and quantum systems is a popular topic on this blog, forming a major thread of the columns to date. But the majority of column space dealing with the initial value problem has either been devoted to time domain analysis or, less frequently, with frequency domain techniques that fall under the Fourier Transform heading. This week starts what will be a lazy meandering through the Laplace Transform.

The Laplace Transform is a favorite of engineers, specifically ones who have to work with feedback control, and it would be worth exploring simply for that reason alone. However, it is also curious that its existence is almost wholely absent from the standard training for physicists. Exactly why the latter tend to use the Fourier transform at the expense of all else can be confidently guessed at (the Fourier Transform seems to be built into Quantum Mechanics) but a deeper understanding will result simply by comparing and contrasting the two methods.

Well before that last goal can be reached, the Laplace Transform must be examined on its own merits.

The Laplace Transform is a functional that maps input functions to output functions via

\[ {\mathcal L}[f(t)] = \int_0^{\infty} dt \, e^{-st} f(t) \; .\]

It is often written in a shorthand way as

\[ {\mathcal L}[f(t)] \equiv F(s) \; , \]

since this notation makes for convenient manipulation.

The properties of the Laplace Transform that make it so useful can be summarized as:

  1. it is a linear transformation,
  2. it turns operations of differentiation and integration into algebraic operations,
  3. incorporates the initial conditions seamlessly, and
  4. is general, thus affording a single approach applicable to an extraordinary wide range of situations.

One caveat is needed on point 4. All Laplace Transforms can only be applied when the system obeys a set of linear evolution equations. In this regard, it is unable to tackle the range and scope of problems that the Lie Series can deal with but its convenience for linear (or linearized) systems makes it a favorite of the controls engineer.

Let’s look at the first 3 points in more detail. For an operation $${\mathcal O}$$ to be linear, the following relationship must hold

\[ {\mathcal O}[a F + b G] = a {\mathcal O}[F] + b {\mathcal O}[G] \; ,\]

where $$a,b$$ are constants and $$F,G$$ are functions. The properties of the integral insure that the Laplace Transform is linear as follows

\[ {\mathcal L}[a f(t) + b g(t) ] = \int_0^{\infty} dt \, e^{-st} \left( a f(t) + b g(t) \right) \\ = a \int_0^{\infty} dt \, e^{-st} f(t) + b \int_0^{\infty} dt \, e^{-st} g(t) \\ = a {\mathcal L}[f(t)] + b {\mathcal L}[g(t)] \; .\]

The Laplace Transform is a machine which takes an input function and maps it to an output function. Differentiation is also a machine that maps input functions to output functions. It’s logical to ask how do these two operations interact.

To figure this out, feed the derivative of a function $$f'(t)$$ into the Laplace Transform.

\[ {\mathcal L}[f'(t)] = \int_0^{\infty} dt \, e^{-st} f'(t) \]

and integrate by parts to move the derivative off of $$f(t)$$ and onto the exponential. This gives

\[ {\mathcal L}[f'(t)] = \left. e^{-st} f(t) \right|_0^{\infty} – \int_0^{\infty} dt \, \left( \frac{d}{dt} e^{-st} \right) f(t) \; , \]

which simplifies to

\[ {\mathcal L}[f'(t)] = -f(0) + \int_0^{\infty} dt \, s e^{-st} f(t) = s F(s) – f(0) \; .\]

The same steps can be followed for higher order derivatives but a more useful way is to simply use an induction argument. Define a second derivative to be the derivative of first derivative and apply the theorem twice.

\[{\mathcal L}[f^{\prime\prime}(t)] = {\mathcal L}[(f'(t))’] = s {\mathcal L}[f'(t)] – f'(0) = s^2 F(s) – s f(0) – f'(0) \; \]

This process can be repeated as often as desired yielding the general form

\[ {\mathcal L}\left[ \frac{d^n}{dt^n} f(t) \right] = s^n F(s) – s^{n-1} f(0) – s^{n-2} f^{(1)}(0) – s^{(n-3)} f^{(2)}(0) – \ldots \; ,\]

where

\[ f^{(n)}(0) = \left. \frac{d^n}{dt^n} f(t) \right|_0 \; .\]

Likewise, the Laplace Transform takes integrals of functions to algebraic relations as well. As in the derivative case, start with the most basic form of

\[{\mathcal L}\left[ \int dt’ \, f(t’) \right] = \int_0^{\infty} dt \; e^{-st} \int dt’ \, f(t’) \; . \]

An application of an integration-by-parts is next employed with

\[ dV = e^{-st} \implies V = – \frac{1}{s} e^{-st} \]

and

\[ U = \int dt’ \, f(t’) \implies \frac{dU}{dt} f(t) \; , \]

yielding

\[ \int_0^{\infty} dt \, e^{-st} \int dt’ \, f(t’) = \left.-\frac{1}{s} e^{-st} \int dt’ \, f(t’) \right|_0^{\infty} – \int_0^{\infty} \left(-\frac{1}{s} \right) e^{-st} f(t) \; \]

This expression simplifies to

\[ {\mathcal L}\left[ \int dt’ \, f(t’) \right] = \frac{f^{(-1)}(0)}{s} + \frac{1}{s} F(s) \; ,\]

where $$f^{(-1)}(0) = \int dt’ \, f(t’) \left.\right|_0$$ is a convenient short-hand.

Repeated applications yields the generalization

\[ {\mathcal L}\left[ \int dt’ \, \int dt^{\prime\prime} \, \ldots \int dt^{(n)} \, f(t^{(n)})\right] = \frac{f^{(-n)}(0)}{(s)} + \frac{f^{(-n+1)}(0)}{(s^2)} \\ + \ldots + \frac{F(s)}{s^n} \; .\]

In contrast, the Laplace Transform of products and divisions, results in derivatives and integrations in the $$s$$-space.

First consider multiplication of $$f(t)$$ by some power of $$t$$:

\[ {\mathcal L}[t f(t)] = \int_0^{\infty} dt \, e^{-st} t f(t) \; .\]

Recognize that

\[ \frac{d}{dt} \left( e^{-st} \right) = -t e^{-st} \;.\]

Thus

\[ {\mathcal L}[t f(t)] = -\int_0^{\infty} dt \, \frac{d}{ds} \left( e^{-st} \right) f(t) = -\frac{d}{ds} \int_0^{\infty} dt \, e^{-st} f(t) = -\frac{d}{ds} F(s) \; .\]

This is easily generalized to any arbitrary power to yield

\[ {\mathcal L}[t^n f(t) ] = (-1)^n \frac{d^n}{ds^n} {\mathcal L}[f(t)] = (-1)^n \frac{d^n}{ds^n} F(s) \; .\]

As the last piece, let’s look at division by a power of $$t$$, starting with the simplest case

\[ {\mathcal L}[f(t)/t] = \int_0^{\infty} dt \, e^{-st} f(t)/t \; .\]

Recognizing that

\[ \int_s^{\infty} ds’ \, e^{-s’ t} = e^{-st}/t \]

allows for the Laplace Transform to become

\[{\mathcal L}[f(t)/t] = \int_0^{\infty} dt \int_s^{\infty} ds’ \, e^{-s’ t} = \int_s^{\infty} ds’ \, F(s’) \; .\]

This is also easily generalized to arbitrary powers to yield

\[{\mathcal L}[f(t)/t^n] = \int_s^{\infty} ds’ \, \int_{s’}^{\infty} ds^{\prime\prime} \ldots \int_{s^{(n-1)}}^{\infty} ds^{(n)} F(s(n)) \;.\]

Next week, I’ll take a look at what functions possess a Laplace Transform and how the Laplace Transform relates to the Fourier Transform.

Lie Series for Non-autonomous Equations

One of the topics touched on in some of the recent posts about Lie series was the idea that a non-autonomous equation can be handled relatively easily in the Lie series framework by enlarging the state space from $$2N$$-dimensions to $$2N+1$$-dimensions. I thought it would be beneficial to see how this is done in an explicit example where the actual equations of motion can be solved analytically.

The candidate for this is treatment is an otherwise free particle, moving in one dimension, subjected to a time varying force. The reason is for this is that the second-order Newtonian, first-order state space, and Green’s functions methods (both time and frequency space) all provide the same answer with relatively little work.

The Newton Method

The Newtonian equation of motion is

\[ \frac{d^2}{dt^2} x = F(t) \; .\]

To get a solution, integrate twice with respect to time to arrive at

\[ x(t) = x_0 + v_0 (t – t_0) + \int_{t_0}^{t} dt’ \int_{t_0}^{t’} dt” F(t”) \; , \]

where $$x_0$$ and $$v_0$$ are the initial position and velocity at time $$t_0$$.

This solution isn’t particularly useful and a much better form results after an integration-by-parts. The usual form of integration-by-parts $$ \int dU \, V = U V – \int U dV $$ is carried out by identifying:

\[ dU = dt’ \]

and

\[ V(t’) = \int_{t_0}^{t’} dt” F(t”) \; \]

Plugging these in yields

\[ x(t) = x_0 + v_0 (t – t_0) + (t-t_0) \int_{t_0}^{t} dt” F(t”) – \int_{t_0}^{t} dt’ (t’-t_0) F(t’) \; \]

or, upon combining the last two terms

\[ x(t) = x_0 + v_0 (t – t_0) + \int_{t_0}^{t’} dt’ (t-t’) F(t’) \; .\]

State Space Method

An alternative way to arrive at the same solution is in the state-space picture. Here the equation of motion is given by

\[ \frac{d}{dt} \left[ \begin{array}{c} x \\ v \end{array} \right] = \left[ \begin{array}{c} x \\ v \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{c} x \\ v \end{array} \right] + \left[ \begin{array}{c} 0 \\ F(t) \end{array} \right] . \]

The form of the process matrix makes the construction of the state transition matrix particularly easy. To wit

\[ A = \left[ \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right] \; ,\]

\[ A^2 = \left[ \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right] = \left[ \begin{array}{cc} 0 & 0 \\ 0 & 0 \end{array} \right]\; ,\]

and so

\[ \Phi(t,t_0) = \exp((t-t_0)A) = \left[ \begin{array}{cc} 1 & (t-t_0) \\ 0 & 1 \end{array} \right] \; . \]

The general solution is then

\[ \left[ \begin{array}{c} x \\v \end{array} \right] = \Phi(t,t_0) \left[ \begin{array}{c} x_0 \\ v_0 \end{array} \right] + \int_{t_0}^t dt’ \Phi(t,t’) \left[ \begin{array}{c} 0 \\ F(t’) \end{array} \right] \; .\]

Using the form of $$\Phi$$ and multiplying the terms out gives

\[ \left[ \begin{array}{c} x \\ v \end{array} \right] = \left[ \begin{array}{c} x_0 + v_0 (t – t_0) + \int_{t_0}^t dt’ (t-t’) F(t’) \\ v_0 + \int_{t_0}^t dt’ F(t’) \end{array} \right] \; , \]

which is exactly the same as the Newton method.

Green’s Function Method

The details for this method have actually been handled in earlier installments of Under the Hood. It suffices to note that the one-side Green’s function in the time domain is obtained from the Wronskian approach and the resulting Green’s function for the operator

\[ D^2 = \frac{d^2}{dt^2} \]

is

\[ G(t,t’) = (t-t’) \; , \]

which leads to the exact same expression.

Lie Series

With that preliminary work under our belt, let’s turn to the application of the Lie series. Since the equation is non-autonomous, the dimensionality of the system has to be expanded. The resulting structure is

\[ \frac{d}{dt} \left[ \begin{array}{c} x \\ v \\ t \end{array} \right] = \left[ \begin{array}{c} v \\ F(t) \\ 1 \end{array} \right] \; , \]

with the initial conditions of $$x_0$$, $$v_0$$, and $$t_0$$.

We can then read off the $$$$D$$$$ operator as

\[ D = v_0 \partial_{x_0} + F(t_0) \partial_{v_0} + \partial_{t_0} \; , \]

noting that the driving function must be evaluated at $$t_0$$.

The Lie series is then obtained by applying

\[ \exp( (t-t_0) D ) \]

to the initial condition $$$$x_0$$$$.

As usual, it is easier to do this in steps by stacking successive applications of $$$$D$$$$:

\[ D[x_0] = v_0 \; , \]
\[ D^2[x_0] = D[v_0] = F(t_0) \; ,\]
\[ D^3[x_0] = D[F(t_0)] = F'(t_0) \; ,\]
\[ D^4[x_0] = D[F'(t_0)] = F”(t_0) \; ,\]
and
\[ D^n[x_0] = D[F^{(n-3)}(t_0)] = F^{(n-2)}(t_0) \; ,\]

Putting the pieces together gives

\[ x(t) = L[x_0] = x_0 + v_0 (t – t_0) + F(t_0) \frac{ (t-t_0)^2 }{2} \\ + F'(t_0) \frac{ (t-t_0)^3 }{3!} + \ldots + F^{(n-2)}(t_0) \frac{ (t-t_0)^n }{n!} + \ldots \; ,\]

for the Lie series.

A Taylor’s series expansion of the integral found in the exact solution gives

\[ x(t) = x_0 + v_0 (t – t_0) + F(t_0) \frac{ (t-t_0)^2 }{2} \\ + F'(t_0) \frac{ (t-t_0)^3 }{3!} + \ldots + F^{(n-2)}(t_0) \frac{ (t-t_0)^n }{n!} + \ldots \; ,\]

which happily gives the same answer.

Numerical Solutions of the Pendulum

Last week’s column examined the use of the Lie series to the problem of the pendulum. As a model in classical mechanics, the pendulum is interesting since it is essentially nonlinear with a well-known analytic solution in terms of elliptic integrals. By essentially nonlinear, I mean that the problem doesn’t possess a nonlinear transformation that permits the differential equation to be recast as a linear problem, like is possible in the Kepler problem.

The comparison of the ‘exact’ solution in terms of elliptic integrals to the Lie series is very difficult. Expansions of the exact solution in power series in very complex due to both the Jacobi $$sn$$ function and the $$\sin^{-1}$$ in the analytic form. In contrast, the Lie series, which essentially generates the Taylor series in a compact, self-interative way, seems to not have any easily identified correspondence.

As a result, it seemed prudent to look at the numerical solutions of both approaches – exact and Lie series. Numerical integration using a Runge-Kutta integration scheme acts as a stand in for the exact elliptical integral solution. WxMaxima served as the tool to tackle this comparison as it does a fairly competent job of mixing symbolic and numerical processing (although it would be nice to try out a mix of numpy, scipy, and sympy in a Jupyter shell – perhaps sometime in the future).

To start, the $$D = \omega_0 \partial_{\theta_0} – g \sin(\theta_0) \partial_{\omega_0} $$ was defined as

D_pen(arg) := w*diff(arg,T) - g*sin(T)*diff(arg,w);

Two things are worth noting. First, the operator $$D$$ is given a decoration in the program so that, during testing, other functional forms could be used without clash. In particular, the use of $$D = \omega_0 \partial_{\theta_0} – g \theta_0 \partial_{\omega_0}$$ was especially helpful since the Lie expansion of the simple harmonic operator corresponds to the regular expansion of $$\sin$$ and $$\cos$$ and so is easy to check that the algorithm is correct. Second, note that the subscript reminding us that the operator acts upon the initial conditions has been suppressed. In some sense, it is not needed except as a reminder of that particular fact and, with that notion understood, it can be dropped.

The next order of business is to define a function that allows the formation of the Lie series based on $$D$$ to order $$n$$. This is accomplished using

L(order,D) := block([ret,temp],
                     ret  : T,
                     temp : T,
                     for n : 1 step 1 thru order do
                         block([],
                               temp : D(temp),
                               ret  : ret + t^n/n!*temp),
                     ret)$

As a test, L(4,D_pen) yields

\[ {{t^4\,\left(g^2\,\cos T\,\sin T+g\,w^2\,\sin T\right)}\over{24}}-{{g\,t^2\,\sin T}\over{2}}-{{g\,t^3\,w\,\cos T}\over{6}}+T+t\,w \; ,\]

which, up to a cosmetic change in notation, is seen to be the same expression as was derived last week.

The final function needed to complete the analysis is one that converts the symbolic form to numeric form for easy plotting and numerical comparisons. That function is defined as

Q(order,D,parms) := block([Lie],
                  Lie : L(order,D),
                  Lie : ev(Lie,parms));

What $$Q$$ does is to first retrieve the expression of the Lie series for a user-specified order and $$D$$ operator and then to evaluate that expression numerically using the $$ev$$ function built into Maxima. The list $$params$$ provides a convenient way to piggyback the appropriate parameters that match the problem at hand. These parameters include the numerical values of the initial conditions, any physical constants (e.g. $$g$$ in the problem of the pendulum), and the specific time at which the evaluation is to take place.

The next step is the solve the pendulum problem using a Runge Kutta and to assume that the resulting solution is close enough to the exact solution to be of use. The steps for this are fairly straightforward but a disciplined approach helps to remember the order of the arguments in the syntax.

The first step is to load the dynamics package using:

load("dynamics")

The second step is to define the physical constant for $$g$$

G : 10

(note that the symbol $$G$$ is chosen for the numerical value of $$g$$ to allow the later to stay as a symbolic quantity).

Next define the differential equations to be solved

dT_dt   : w;
dw_dt   : -G*sin(T);
my_odes : [dT_dt, dw_dt];

the names of the dynamic variables and their initial conditions

var_names : [T,w];
T0  : 2;
w0  : 0;
ICs : [T0,w0];

and the time mesh used for the integration

t_var   : t;
t_start : 0;
t_stop  : 3;
t_step  : 0.1;
t_domain: [t_var, t_start, t_stop, t_step];

Structured in this way, the call to the Runge Kutta integrator is nice and clean

Z : rk(my_odes, var_names, ICs, t_domain)$

The data structure $$Z$$ is a list of lists. Each of the sub-lists has three elements corresponding to the values of $$[t,\theta,\omega]$$ and the outside list has $$n$$ elements where $$n$$ is number of time points in the mesh defined above.

To make plots of the data requires massaging the results in a list of lists with only two elements (either $$[t,\theta]$$ or $$[t,\omega]$$) for use in a discrete plot. In addition, for proper comparison, similar lists are needed for the simple harmonic oscillator (as a linear problem baseline) and for different orders of the Lie series.

The built-in command makelist is the workhorse here with the desired list of lists being specified as

pen   : makelist( [Z[n][1],Z[n][2]],n,1,31)$
SHO   : makelist([(n-1)*0.1,2*cos(sqrt(10.0)*(n-1)*0.1)],n,1,31)$
Lie5  : makelist([(n-1)*0.1,Q(5,D_pen,[T=2.0,w=0.0,g=10.0,t=(n-1)*0.1])],n,1,31)$
Lie10 : makelist([(n-1)*0.1,Q(10,D_pen,[T=2.0,w=0.0,g=10.0,t=(n-1)*0.1])],n,1,31)$

These lists are then displayed using

wxplot2d([[discrete,pen],[discrete,SHO],[discrete,Lie5],[discrete,Lie10]],[x,0,1],[y,-2,2],[legend,"Pendulum","SHO","Lie5","Lie10"],[xlabel,"time"],[ylabel,"theta"]);

The resulting plot is

Pendulum_solutions

Several observations follow immediately from a careful inspection of this plot. First, the nonlinearity of the pendulum is quite obvious as the motion (blue trace) quickly departs from the corresponding motion from the simple harmonic oscillator (red trace). The Lie series capture this departure reasonable well as both the 5th and 10th order behavior (green and purple traces) sits on top of the exact solution for times less than 0.4. After this time, the 5th order begins diverging. The 10th order Lie series stays relatively close until a time of 0.7 where it too begins to diverge. Neither one makes it even a half a period, which is a disappointing observation. Clearly, while the Lie series is universal it is applicability it also seems to be universal in its limited range of convergence about the exact solution.

Lie Series and the Pendulum

For this go around with the Lie series, I thought it would be interesting to experiment with a problem quite unfamiliar to me – namely the full nonlinear pendulum equation. The approach here is two-fold. First, the classical computation for this key problem is presented. For this part, I use Lowenstein’s discussion in the book ‘Essentials of Hamiltonian Dynamics’ as a model. In the second part, I apply the Lie series to the system to see how well the generated expansion matches the classical result. The classical result requires a lot of familiarity with special functions and some clever and unmotivated substitutions. The hope here is that the brute force and straightforward method provided by the Lie series will be accurate enough when compared with the classical result to make it attractive.

Now to set the notation and conventions, consider the following figure of the pendulum.

Pendulum

The angle $$\theta$$ from the vertical is clearly the only relevant degree of freedom. The pendulum bob of mass $$m$$ possesses a kinetic energy of

\[ K = \frac{1}{2} m R^2 \dot \theta^2 \; ,\]

where $$R$$ is the length of the pendulum. As denoted in the figure, the zero of the potential energy is taken when the bob is horizontal so that $$\theta = \frac{\pi}{2}$$. The potential energy function is then

\[ U = – m g R \cos \theta \; ,\]

For convenience in what follows, set $$m = 1$$ and $$R = 1$$. The Lagrangian of the system is then

\[ L(\theta,\dot \theta) = \frac{1}{2} \dot \theta ^2 + g \cos \theta \]

with the corresponding Euler-Lagrange equations of

\[ \ddot \theta + g \sin \theta = 0 \; .\]

The conjugate momentum

\[ p_{\theta} = \frac{\partial L}{\partial \dot \theta} = \dot \theta \]

becomes the new independent variable in the system when we switch to the Hamiltonian

\[ H(\theta,p_{\theta}) = \frac{p_{\theta}^2}{2} – g \cos \theta \; .\]

Since $$H(\theta,p_{\theta})$$ has no explicit time dependence,

\[ \frac{d H}{dt} = \frac{\partial H}{\partial t} = 0 \; ,\]

it is a conserved quantity that is the total energy.

With the conservation firmly in hand, we can eliminate the conjugate momentum in favor of the generalized velocity to get the energy equation

\[ \frac{1}{2} \dot \theta ^2 – g \cos \theta = E \; .\]

At this point, the special function gymnastics begin with the substitutions

\[ z = \frac{1}{k} sin \left( \frac{\theta}{2} \right) \]
and
\[ k = \sqrt{ \frac{E + g}{2 g} } \; ,\]

which bring the energy equation into the form

\[ \dot z ^2 = g(1-k^2 z^2) (1 – z^2) \; .\]

Solving this for $$dt$$ gives

\[ dt = \frac{dz}{ \sqrt{ g(1-k^2 z^2) (1 – z^2) } } \]

which integrates to

\[t – t_0 = \frac{1}{\sqrt{g}} \int_0^z \frac{d \zeta}{\sqrt{ g(1-k^2 \zeta^2) (1 – \zeta^2)}} \]

The integral on the right-hand side is important enough to have its own name – it is an elliptic integral of the first kind and its solutions are defined as

\[ F(sin^{-1} z|k^2) \equiv \frac{1}{\sqrt{g}} \int_0^z \frac{d \zeta}{\sqrt{ g(1-k^2 \zeta^2) (1 – \zeta^2) }}\]

By construction, $$k^2 \in [0,1)$$ corresponds to the librating situation where the bob has only enough energy to move to and fro without spinning about it point of rotation. The classical solution for $$z$$ is compactly written as

\[ z = sn(\sqrt{g}(t-t_0),k^2) \]

where $$sn$$ is the Jacobi elliptic integral. Inverting the defining relation between $$z$$ and $$\theta$$ gives the final solution as

\[ \theta(t) = 2 \sin^{-1}\left(k sn(\sqrt{g}t,k^2) \right) \; .\]

Understanding how to mine information from these expressions requires a fairly sophisticated understanding of the elliptical integrals, which, while important for this key problem, don’t translate universally.

To apply the Lie series, rewrite the equations of motion in state space form

\[ \frac{d}{dt} \left[ \begin{array}{c} \theta \\ \omega \end{array} \right] = \left[ \begin{array}{c} \omega \\ – g \sin \theta \end{array} \right] \]

from which is read the differential operator

\[ D = \omega_0 \partial_{\theta_0} – g \sin \theta_0 \partial_{\omega_0} \; .\]

The solution to the problem is then given by
\[ L[\theta_0] = e^{(t-t_0)D} \theta_0 \; .\]

As is usually the case, the Lie series is tractable to hand computations for the first few terms. The relevant operations with $$D$$ are:

\[ D \theta_0 = \omega_0 \; ,\]

\[ D^2 \theta_0 = – g \sin \theta_0 \; , \]

\[ D^3 \theta_0 = – g \omega_0 \cos \theta_0 \; , \]

and

\[ D^4 \theta_0 = g^2 \cos \theta_0 \sin \theta_0 + g \omega_0^2 \sin \theta_0 \; .\]

The resulting Lie series is quite unwieldy resulting in

\[ L[\theta_0] = \theta_0 – g \frac{(t-t_0)^2}{2!} \sin \theta_0 + \left( (t-t_0) – g \frac{(t-t_0)^3}{3!} \cos \theta_0 \right) \omega_0 \\ + \frac{(t-t_0)^4}{4!} \left( g^2 \cos \theta_0 \sin \theta_0 + g \omega_0^2 \sin \theta_0 \right) \; .\]

Unlike the Kepler case, the resulting Lie series for the pendulum doesn’t neatly separate into two pieces linear in the initial conditions. In addition, it isn’t at all clear how this expansion relates to the classical solution in terms of elliptic integrals. So it seems that in spite of its universal applicability, the Lie series is not a silver bullet.

Next week, I’ll take a look how closely the two approaches match numerically.

Lie Series – The Kepler Files

This week the Lie series faces a stiff challenge in the form of the Kepler problem. Unlike some of the cases previously examined in this blog (harmonic oscillator, quantum wave propagation, etc.) the Kepler problem is nonlinear when written in the usual fashion in terms of Cartesian coordinates. There is a standard ‘clever’ substitution that make everything look linear ($$u = 1/r$$) but this won’t be employed.

In terms of the Cartesian coordinates, the orbital state for the Kepler problem is

\[ \bar s = \left[ \begin{array}{c} \vec r \\ \vec v \end{array} \right] \; ,\]

where $$\vec r$$ and $$\vec v$$ are the position and velocity of a reduced mass moving in a central potential $$V(r) = 1/r$$.

The equations of motion in state space notation are

\[ \frac{d}{dt} \bar S = \left[ \begin{array}{c} \vec v \\ -\frac{\mu \vec r}{r^3} \end{array} \right] \; ,\]

with initial conditions $$\vec r(t_0) = \vec \rho$$ and $$\vec v(t_0) = \vec \nu$$.

The corresponding Lie operator, which is written in terms of the initial conditions is

\[ D = \sum_i \nu_i \frac{\partial}{\partial \rho_i} – \frac{\mu \rho_i} {\rho^3} \frac{\partial}{\partial \nu_i } \; \]

or more compactly (with some slight abuse of notation)

\[ D = \vec \nu \cdot \partial_{\vec \rho} – \epsilon \vec \rho \cdot \partial_{\vec \nu} \; ,\]

where $$\epsilon = \mu/\rho^3$$ has been defined for future convenience.

The full solution is then given by the application of the Lie series operator $$L$$ to the initial position $$\vec \rho$$

\[ \vec r(t) = L[ \vec \rho ] \; .\]

With $$D$$ being nonlinear, the expansion implicit in this solution is very difficult to evaluate to all orders so a truncation of the expansion to order $$(t-t_0)^3$$

\[ L[\vec \rho] = e^{(t-t_0) D} \vec \rho \approx \left[ 1 + (t-t_0) D + \frac{ (t-t_0)^2}{2!} D^2 + \frac{(t-t_0)^3}{3!} D^3 \right] \vec \rho \]

will be employed with an eye for what patterns emerge.

A little bit of organization usually serves well in these types of expansions and, with that in mind, we note the following evaluations

\[ D \vec \rho = \vec \nu \; ,\]

\[ D^2 \vec \rho = D \vec \nu = – \frac{ \mu }{\rho^3} \vec \rho = – \epsilon \vec \rho \; ,\]

and

\[ D^3 \vec \rho = D( -\epsilon \vec \rho) = -\vec \nu \cdot \partial_{\vec \rho} \epsilon – \epsilon \vec \nu \; . \]

The third order expansion takes the form

\[ L[\vec \rho] \approx \vec \rho + (t-t_0) \vec \nu – \frac{ (t-t_0)^2 }{2!} \epsilon \vec \rho – \frac{ (t-t_0)^3 }{3!} \left( \vec \nu \cdot \partial_{\vec \rho} \epsilon + \epsilon \vec \nu \right) \; .\]

The last remaining partial derivative evaluates to

\[ \vec \nu \cdot \partial_{\vec \rho} \epsilon = \vec \nu \cdot \partial_{\vec \rho} \frac{\mu}{\rho^3} = -\frac{3 \vec \nu \cdot \vec \rho \mu}{\rho^5} \; . \]

Defining one last ‘greek’ shorthand (the reason for this will emerge shortly)

\[ \lambda = \frac{\vec \nu \cdot \vec \rho}{\rho^2} \]

allows the expansion to take the form

\[ L[\rho] \approx \vec \rho + (t-t_0) \vec \nu – \frac{ (t-t_0)^2 }{2!} \epsilon \vec \rho – \frac{ (t-t_0)^3 }{3!} \left( 3 \epsilon \lambda \vec \rho – \epsilon \vec \nu \right) \; .\]

What to make of this expansion? Compare this against the Taylor’s series expansion of the position (see ‘An Introduction to the Mathematics and Methods of Astrodynamics, Revised Edition’ by Battin, pp 110-112)

\[ \vec r(t) = \vec r0 + (t-t0) \left . \frac{d \vec r}{dt} \right|_{t=t0} + \frac{(t-t0)^2}{2!} \left. \frac{d^2 \vec r}{d t^2} \right|_{t=t0} \\ + \frac{(t-t0)^3}{3!} \left. \frac{d^3 \vec r}{d t^3} \right|_{t=t0} + \cdots \; .\]

Apparently, Lagrange defined the following ‘invariants’ (there are three of them but I’m only using the first 2)

\[ \epsilon = \frac{\mu}{r^3} \]

and

\[ \lambda = \frac{ \vec r \cdot \vec v }{ r^2 } \; \]

In terms of these classical invariants, the first three derivatives in the Taylors series become

\[ \frac{d \vec r}{dt} = \vec v \; , \]

\[ \frac{d^2 \vec r}{dt^2} = -\epsilon \vec r \; ,\]

and

\[ \frac{d^3 \vec r}{dt^3} = 3 \epsilon \lambda \vec r – \epsilon \vec v \; .\]

The series expansion thus agrees with the Lie series method or, in other words, the Lie series has reproduced the classical Lagrange f-g series

\[ \vec r(t) = f(t) \vec \rho + g(t) \vec \nu \; .\]

While there is really no new content that the Lie series has provided it does provide a few advantages. The first is that the notation is much more compact than the traditional approach using the Lagrange invariants. Second, the Lie series is clearly extensible to other, more complex problems and in such a fashion that is seems to provide a theoretical insight.

I’ll try to flesh this idea out next week when I apply the Lie series to the pendulum problem.

Lie Series Basics

This column, which is a relatively short one as I am still grappling with the details, is the introductory piece for what I intend to be a through exploration of the use of Lie Series as a method for solving differential equations. This particular entry is heavily influenced by the NASA Technical Note TN D-4460 entitled ‘On the Application of Lie-Series to the Problems of Celestial Mechanics’ by Karl Stumpff, dated June 1968.

Stumpff starts his note with the following interesting passage:

Recently, a new method has been proposed by Wolfgang Grobner. He uses the “Lie-Series” in solving a restricted call of differential equation. It is not clear that this new method is better than others; it has not been sufficiently used.

– Karl Stumpff

That passage is interesting mostly due what the meaning of ‘Recently’ is. In his references, Stumpff lists only one work by Grobner, entitled “Die Lie-Reihen und ihre Anwendungen” dated 1960. Sophus Lie, from whom the Lie series derives it’s name, died in 1899 and it seems that it took 61 years for his work to really penetrate celestial mechanics.

The central idea of the Lie series is the production of an operator that makes the solution of certain differential equations manifest and their expansion in terms of a power series ‘automatic’. In presenting the prescription, I follow Stumpff’s ordering closely but with modifications of the notation to suit my tastes and needs.

Assume that there are $$n$$ functions

\[ F_i(z) = F_i(z_1, z_2, \dots, z_n) \]

in $$n$$ complex variables $$z_i$$ that are analytic in the neighborhood of

\[ \zeta = (\zeta_1, \zeta_2, \dots, \zeta_n) \; . \]

The concept of a Lie series depends on first defining the operator

\[ D = F_1(z) \frac{\partial}{\partial z_1} + F_2(z) \frac{\partial}{\partial z_2} + \dots + F_n(z) \frac{\partial}{\partial z_n} \; .\]

The operator $$D$$ is a derivation. That is to say it obeys the following properties.

It is linear

\[ D[f(z) + g(z) ] = D[f(z)] + D[g(z)] \; ,\]

it operation on a constant results in zero

\[ D[ c f(z) ] = c D f(z) \; , \]

and it obeys the Liebniz rule

\[ D[f(z) g(z) ] = f(z) \cdot Dg(z) + Df(z) \cdot g(z) \; .\]

Repeated application of the $$D$$ operator on the product of two functions results in the usual binomial expansion

\[ D^n [ f(z) g(z) ] = \sum_{\nu = 0}^{n} \left( \begin{array}{c} n \\ \nu \end{array} \right) D^{\nu} f(z) D^{n-\nu} g(z) \; .\]

Now the Lie series operator $$L$$ is defined as

\[ L(z,t) f(z) = e^{t D} f(z) \; , \]

where the exponential notation is short-hand for the series

\[ L(z,t) f(z) = \sum_{n=0}^{\infty} \frac{t^n}{n!} D^n f(z) = \left[1 + t D + \frac{t^2}{2!} D^2 + \dots \right] f(z) \; .\]

Note that often, the arguments $$(z,t)$$ are omitted.

Now an interesting and useful property of the Lie series operator for the product of two functions is

\[ L[f(z)g(z)] = L[f(z)] \cdot L[g(z)] \; ,\]

or in words, that the Lie series of a product is the product of the Lie series. This relationship is called the interchange relation.

Since we imagine that all the functions of interest are analytic, this simple relation ‘bootstraps’ to the general relation

\[ L[ Q(z)] = Q(L[z]) \; ,\]

for $$Q(z)$$ analytic.

As an example of this bootstrap relation, consider $$D = \frac{d}{dz}$$, and note that

\[ D^n[z] = 1 \delta_{n0} \]

and therefore

\[ L[z] = e^{t D}z = (1+ tD)z = z + t \; .\]

So the bootstrap relation gives first

\[ L[ Q(z)] = \sum_{n=0}^{\infty} \frac{t^n}{n!} \frac{d}{dz} Q(z) \; ,\]

which from elementary calculus is recognized to be $$Q(z+t)$$ expanded about $$t=0$$, and second

\[ Q(L[z]) = Q(z+t) \; . \]

Now the utility of the Lie series is not that it provides a compact way of representing a Taylor’s series (as convenient as that is) but rather in the fact that when the $$F_i$$ are not trivial functions it encodes solutions to a coupled set of differential equations. To see how this works, assume a system of differential equations given by

\[ \frac{d z_i}{dt} = F_i(z) \]

with initial conditions

\[ z_i(0) = \zeta_i \; .\]

Then the solution of this system of equations is

\[ z_i(t) = e^{t D} \zeta_i \; . \]

To see this, take the derivative of the solution to get

\[ \frac{d z_i(t)}{d t} = \frac{d}{dt} e^{tD} \zeta_i = D e^{t D} \zeta_i \; . \]

But, by definition, $$ e^{t D} \zeta_i = z_i(t) $$, so

\[ \frac{d z_i(t)}{d t} = D z_i = F_i \; . \]

In some sense, this is our old friend the propagator or state transition matrix written in a new form. However, this new encoding works for non-linear systems as well, a point that makes it an improvement on those approaches.

One last note. So far, the system of differential equations was assumed to be autonomous. In the event the system isn’t, a simple ‘trick’ can be used to make it look autonomous. Define a new variable $$z_0 = t$$ and the augment the definition of $$D$$ to be

\[ D = \frac{\partial}{\partial z_0} + \sum_{i=0}^{n} F_i(z) \frac{\partial}{\partial z_i} \; .\]

This manipulation has the advantage of making an non-autonomous system look formally autonomous. The only disadvantage is that all notion of equilibrium points are lost since the right-hand side of the equations can never have well-defined critical points.

Next time, I’ll apply the Lie series formalism to the Kepler problem

Poynting the Way

Last week, we discussed how a clever interrogation of the Maxwell’s equations to find a variety of relations, including the derivation of Poynting’s theorem. This week, I thought it would be interesting to look a little more at how the Poynting’s theorem works for a set of special cases.

The first case looks at the propagation of the an electromagnetic wave. In this case, common intuition as to the direction of the energy flow will match the mathematics and all looks right with the world. The second case involves a uniform current is put through a resistive wire. This common situation ends up an unusual interpretation of the Poynting flux. The third case, which involves running AC current through a capacitor in a circuit, common intuition about the direction of $$\vec S$$ turns out to be quite wrong and the result is both puzzling and instructive.

On note, before the dissection of the two cases begins. This post is heavily influenced by Chapter 10 in Electromagnetic Theory by Frankl and the discussion in Chapter 27 of the Feynman Lectures on Physics.

Case 1 – Propagation of an Electromagnetic Wave

One of the most celebrated results of Maxwell’s synthesis of the equations that bear his name and which follows from and, in some sense, motivated his inclusion of the displacement term in Ampere’s law is the prediction of electromagnetic waves. From the basic equations, a plane electromagnetic wave of frequency $$\omega$$ propagating in the $$k$$-direction (which can point anywhere) is described by

\[ \vec E(z,t) = \vec E_0 e^{i(\vec k \cdot \vec r – \omega t)} \]

for the electric field and by

\[ \vec B(z,t) = \vec B_0 e^{i(\vec k \cdot \vec r – \omega t)} \]

for the magnetic field.

Note that this form assumes that the wave propagates in free space and, therefore, doesn’t interact with matter. Applying Coulomb’s law ($$div(\vec E) = 0$$) gives

\[ \nabla \cdot \vec E = 0 \Rightarrow \vec E_0 \cdot \vec k = 0 \; .\]

Likewise, the application of the Monopole law leads to $$\vec B_0 \cdot \vec k = 0$$. So the electric and magnetic fields are perpendicular to the direction of propagation and since $$\vec S = \vec E \times \vec B / \mu_0$$ the Poynting flux is parallel to $$\vec k$$ just as common intuition indicates. The energy flow is along the direction of propagation.

Note that a careful computation will show that the magnitude of the energy flow is just what elementary wave mechanics would predict.

Case 2 – Current Through a Resistive Wire

In this case, a long straight wire with conductivity $$g$$ carries a current $$I$$ uniformly distributed across it. The relevant geometry is shown in the figure.

resitive_wire

The current density is

\[ \vec J = \frac{I}{\pi a^2} \hat z \; .\]

The electric field needed to drive this current is

\[ \vec E = I R \hat z = I \frac{1}{\pi a^2 g} \hat z \]

and the resulting magnetic field set up by the current (by applying Ampere’s law) is

\[\vec H = \left\{ \begin{array}{c} \frac{I \rho}{2 \pi a^2 } \hat \phi \; \; \text{for} \; \; \rho < a \\ \frac{I}{2 \pi \rho} \hat \phi \; \; \text{for} \; \; \rho >a \end{array} \right. \; , \]

where $$\hat \phi$$ is the unit vector pointing along the azimuthal angle direction and $$\hat \rho$$ points along the radial direction in cylindrical coordinates (i.e. both are confined to the $$x$$-$$y$$ plane in cylindrical coordinates). The Poynting vector evaluates to

\[\vec S = \left\{ \begin{array}{c} -\frac{I^2 R}{2 \pi} \frac{\rho}{a^2} \hat \rho \; \; \text{for} \; \; \rho < a \\ - \frac{I^2 R}{2 \pi} \frac{1}{\rho} \hat \rho \; \; \text{for} \; \; \rho > a \end{array} \right. \; .\]

Note the peculiar nature of the direction of the energy flow. It isn’t along the wire but radially in from the outside.

Now a concern may be raised that there is an error somewhere hidden in the manipulations up to date. To assuage these concerns and build confidence, compute the divergence of the Poynting vector. The result should be related to the energy dissipation happening due to the resistance. The divergence is (recall that, in cylindrical coordinates, $$\nabla \cdot (A_{\rho} \hat \rho) = \frac{1}{\rho} \partial_{\rho} (\rho A_{\rho})$$) :

\[\nabla \cdot \vec S = \left\{ \begin{array}{c} -\frac{I^2 R}{\pi a^2} \; \; \text{for} \; \; \rho < a \\ 0 \; \; \text{for} \; \; \rho > a\end{array} \right. \; . \]

The dissipation within the wire is given by

\[ \vec E \cdot \vec J = \frac{I^2 R}{\pi a^2} \; ,\]

and the Poynting theorem is

\[ \nabla \cdot \vec S + \partial_t u = – \vec E \cdot \vec J \; .\]

Since the situation is static, all partial time derivatives are zero and we should see that $$\nabla \cdot \vec S = – \vec E \cdot \vec J$$, which happily is the result. So, despite what may seem to fly in the face of intuition, the theory holds together.

Case 3 – Run AC Current Through a Capacitor

This case is an intermediate case between the previous two, in which a low-frequency current is driven through a capacitor. This situation is intermediate between the first two – it does not involve static fields but does confine the action to a local circuit element. The relevant geometry is shown in the following figure.

charging capacitor

Assuming that the frequency is small enough to ignore the energy in the magnetic field, the total energy contained between the plates is

\[ U_{tot} = u \cdot Vol = (\frac{1}{2} \epsilon_0 E^2)(\pi a^2 h) \]

and it changes as function of time as

\[ \partial_t U_{tot} = \epsilon_0 \pi a^2 h E \partial_t E \; .\]

From an application of Ampere’s law applied to the surfaces of the capacitor, converting the surface integral over $$curl(\vec B)$$ to a line integral along the boundary and using $$\mu_0 \epsilon_0 = 1/c^2$$, yields

\[ 2 \pi a c^2 B = \partial_t E \pi a^2 \; ,\]

which solved for $B$ gives

\[ B = \frac{a}{2 c^2} \partial_t E \; .\]

The direction of $$\vec E$$ is along the axial direction of the capacitor and the direction of $$\vec B$$ is azimuthal. From these facts, it is obvious that the Poynting vector must be pointing radially inward, as shown in the figure.

The magnitude of the Poynting vector is (note the simplification since the electric and magnetic field are perpendicular)

\[ |\vec S| = |\vec E| |\vec B| =\frac{1}{\mu_0} E \frac{a}{2 c^2} \partial_t E = \frac{1}{2} \epsilon_0 a E \partial_t E \; . \]

The total energy flow in is given by the product of the magnitude of the Poynting vector times the total surface area

\[ \Delta U = |\vec S| \cdot Area = (\frac{1}{2} \epsilon_0 a E \partial_t E) (2 \pi a h) = \epsilon_0 \pi a^2 h E \partial_t E \; , \]

which is happily the same result we got above.

So once again, the flow of the energy seems counterintuitive but the results that follow agree with what is expected.

Interrogating Maxwell

One of the more fun things to do with physical models is to mine general results without actually solving specific problems. It’s fun because it is relatively easy to do and gives some global insight into the problem that is not easily obtainable from examining individual cases. Usually, this type of data mining comes from considerations of symmetry or general principles and takes the form of clever manipulations of the underlying equations. Like any good riddle, the joy is in seeing the solution – the frustration is in trying to find it for the first time.

Of course, to find these nuggets of general fact, one needs to know, or at least firmly believe, they are there. Conservation laws for energy, momentum, and angular momentum are cherished principles that took many decades to be ferreted out of many specific cases.

Nonetheless, we can all stand on the shoulders of giants and enjoy the view and that is what this post is all about. Standing on the shoulders of Maxwell, one can mine the equations that bear his name for all sorts of nuggets.

The version of Maxwell’s equations that I’ll use describes the behavior of the macroscopic fields with the appropriate constitutive equations. These are:
\[ \nabla \times \vec H = \vec J + \frac{\partial \vec D}{\partial t} \; , \]
\[ \nabla \times \vec E = -\frac{\partial \vec B}{\partial t} \; ,\]
\[ \nabla \cdot \vec D = \rho \; ,\]
and
\[ \nabla \cdot \vec B = 0 \; , \]
as the differential expressions for Ampere’s, Faraday’s, Coulomb’s and the Monopole’s laws.

Since dimensional analysis will be important below, note that fields and sources have units of

  • $$\vec E$$ – $$[N/C]$$ or $$[V/m]$$
  • $$\vec D$$ – $$[C/m^2]$$
  • $$\vec B$$ – $$[Ns/Cm]$$ or $$[kg/Cs]$$
  • $$\vec H$$ – $$[A/m]$$
  • $$\vec J$$ – $$[C/m^2/s]$$
  • $$\rho$$ – $$[C/m^3]$$
  • $$\epsilon$$ – $$[C^2/Nm]$$
  • $$\mu$$ – $$[N/A^2]$$

There are three very nice interrogations of these laws that yield: 1) charge conservation, 2) energy and momentum conservation (in the absence of external forces), and 3) the guarantee that a solution to the initial value problem (Coulomb and monopole equations) is preserved by the propagation of the system forward in time. This latter property will be called constraint preservation.

Charge Conservation

As they appear in their usual form, the Maxwell equations don’t immediately show that they honor the conservation of charge. That they actually do is seen by first starting with Ampere’s law and take the divergence of both sides to get

\[ \nabla \cdot \left( \nabla \times \vec H = \vec J + \frac{\partial \vec D}{\partial t} \right ) \Rightarrow \nabla \cdot \nabla \times \vec H = \nabla \cdot \vec J + \nabla \cdot \frac{\partial \vec D}{\partial t} \; . \]

Since the divergence of a curl is zero, the left-hand side disappears. Next interchange of the divergence operator with the partial derivative with respect to time, $$\partial_t$$ to get

\[ 0 = \nabla \cdot \vec J + \frac{\partial ( \nabla \cdot \vec D )}{\partial t} \; . \]

Now the last term on the right-hand side can be put into familiar form by using Coulomb’s law and, with one more minor manipulation, yields the familiar form of the charge conservation law

\[ \nabla \cdot \vec J + \frac{\partial \rho}{\partial t} = 0 \; .\]

Energy and Momentum Conservation

The starting point for this conservation is the proof of the Poynting theorem. Form the dot product between $$\vec H$$ and both sides of Faraday’s law to get

\[ \vec H \cdot \nabla \times \vec E = – \vec H \cdot \partial_t \vec B \]

Rewrite the left-hand side using the vector identity

\[ \nabla \cdot (\vec A \times \vec B) = (\nabla \times \vec A) \cdot \vec B – \vec A \cdot (\nabla \times \vec B) \]

giving

\[ \nabla( \vec E \times \vec H ) + \vec E \times (\nabla \times \vec H) = – \vec H \cdot \partial_t \vec B \]

Now use Ampere’s law to substitute for $$\nabla \times H $$, yielding

\[ \nabla( \vec E \times \vec H ) + \vec E \times (\vec J + \partial_t \vec D) = – \vec H \cdot \partial_t \vec B \]

which, when rearranged, gives the Poynting theorem

\[ \nabla \cdot (\vec E \times \vec H ) + \vec H \cdot \partial_t \vec B + \vec E \cdot \partial_t \vec D = – \vec E \cdot \vec J \]

Make the following identifications

\[ \vec S = \vec E \times \vec H \]

and

\[ u = \frac{1}{2} \left( \vec E \cdot \vec D + \vec H \cdot \vec B \right) \]

to get the common form of

\[ \nabla \cdot \vec S + \partial_t u = – \vec E \cdot \vec J \]

for Poynting’s theorem.

Now for some dimensional analysis. The quantities $$u$$, $$\vec S$$, $$\vec E \cdot \vec J$$, $$\nabla \cdot \vec S$$, and $$\partial_t u$$ have units of:

  • $$u$$ – $$[N/C][C/m^2] + [Ns/Cm][C/sm]=[N/m^2] = [J/m^3]$$
  • $$\vec S$$ – $$[N/C][C/sm] = [N/ms] = [Nm/m^2s] = [J/m^2s]$$
  • $$\vec E \cdot \vec J$$ – $$[N/C][C/m^2s]=[Nm/m^3s] = [J/m^3s]$$
  • $$\nabla \cdot \vec S $$ – $$[N/m^2s] = [Nm/m^3s] = [J/m^3s]$$
  • $$\partial_t u$$ – $$[J/m^3][1/s] = [J/m^3s]$$

The obvious interpretation is that $$u$$ is an energy density, $$\vec S$$ is the energy flow (flux) and the $$\vec E \cdot \vec J$$ is the work done on the charges by the electric field per unit volume (power density).

There is a way to connect the Poynting theorem to momentum since photons obey $$E^2 = c^2 \vec p^2$$. For simplicity, the expression of the Poynting vector in free space and in terms of the magnetic permeability $$\mu_0$$ and the magnetic flux density $$\vec B$$ is

\[ \vec S = \frac{1}{\mu_0} \vec E \times \vec B \; .\]

A check of the Poynting vector units from the definition to find $$\underbrace{ [A^2/N] }_{1/\mu_0} \underbrace{ [N/C] }_{\vec E} \underbrace{ [Ns/Cm] }_{\vec B} = [N/ms]$$. Now these unit can be put into a more suggestive form as either $$[N/ms] = [Nm/m^3][m/s] = [J/m^3][m/s]$$, which suggests interpreting the Poynting vector as a energy density times a speed or as $$[N/ms] = [Nm][1/m^2/s] = [J/m^2/s] = [J/s/m^2] = [W/m^2]$$, which represents a power flux.

So dividing by the speed of light, in the appropriate form, gives

\[ \frac{1}{c} \vec S = \sqrt{ \frac{\epsilon_0}{\mu_0} } \vec E \times \vec B = \sqrt{ \epsilon_0 \mu_0 } \vec E \times \vec H \]

which represents either a momentum flux or a momentum density times speed. Since $$\vec S/c$$ has units of momentum density times speed one may guess that $$\vec S/c^2$$ is the field momentum density. This is true even in the presence of matter allowing the general definition of

\[ \frac{1}{c^2} \vec S = \underbrace{ \mu \epsilon \vec E \times \vec H}_{\text{field momentum density}} \; .\]

Frankl, in his Electromagnetic Theory (pp. 203-4), presents a nice argument, based on the Lorentz force law, to relate the field momentum to the particle momentum to show that

\[ \vec F_{total} = \frac{d}{dt}\left[ \vec p_{particles} + \int d^3 r \, \mu \epsilon \vec E \times \vec H \right] \; .\]

To see this, start with the Lorentz force law ($$\vec F = q (\vec E + \vec v \times \vec B)$$) expressed in terms of particle and field densities using Ampere’s and Coloumb’s laws

\[ \vec f = \rho \vec E + \vec J \times \vec B = (\nabla \cdot \vec D) \vec E – \vec B \times (\nabla \times \vec H) – \partial_t \vec D \times \vec B \; .\]

Using the Faraday and Monopole laws gives

\[ 0 = – \vec D \times (\nabla \times \vec E) – \vec D \times \partial_t \vec B \]

and

\[ 0 = – \vec H \, \nabla \cdot \vec B \; .\]

Adding the above relations to the Lorentz force law relation gives

\[ \vec f = \epsilon ( \vec E \, \nabla \cdot \vec E – \vec E \times (\nabla \times \vec E) ) + \mu ( \vec H \, \nabla \cdot \vec H – \vec H \times ( \nabla \times \vec H ) ) – \mu \epsilon \, \partial_t ( \vec E \times \vec H ) \; .\]

Integrate over all space using the identity

\[ \int d^3r \, (\vec G \nabla \cdot \vec G – \vec G \times \nabla \times \vec G) = \int d^2S \left( \vec G (\vec G \cdot \hat n) – \frac{1}{2} G^2 \hat n \right) \]

that allows two of the volume integrals to become surface integrals resulting in

\[ \underbrace{ \frac{d}{dt} \vec p_{particles} }_{\vec F_{particles}} = I_{\vec E} + I_{\vec H} – \frac{d}{dt} \int d^3r \, \mu \epsilon \vec E \times \vec H \; .\]

Finally, Frankl attaches meaning to the surface integrals by first looking at the situation where the fields are static to pick out the total force as

\[ \frac{d}{dt} \vec p_{particles} = \underbrace{ I_{\vec E} + I_{\vec H} }_{\text{total force}} \; .\]

Since the instantaneous configuration of the fields and the particles cannot ‘know’ the future, $$I_{\vec E} + I_{\vec H}$$ must always be the total force. Bring the field piece over to the left hand side gives

\[ \frac{d}{dt} \vec p_{particles} + \frac{d}{dt} \int d^3r \, \mu \epsilon \vec E \times \vec H = \underbrace{ I_{\vec E} + I_{\vec H} }_{\text{total force}} \; , \]

which suggests the identification of the field momentum as

\[ \vec p_{field} = \int d^3r \, \mu \epsilon \vec E \times \vec H \; ,\]

which is exactly the interpretation obtained above for the Poynting vector divided by the speed of light squared.

Now if the system is defined in such a way to be closed so that the electric and magnetic fields are zero on the boundary, then momentum and energy cannot flow out of the system and thus are conserved. Energy and momentum do flow between the particles and the fields, with one component gaining or losing as the other loses or gains, but never does any of the energy or momentum vanish.

Constraint Preservation

The two Maxwell equations involving the curl operator are dynamical – that is to say that the spatial derivatives are tied to derivatives with respect to time. The two Maxwell equations involving the divergence operator involve no time derivatives. This latter two are interpreted as constraint equations that the field must satisfy at each time step. The operative question is if the fields satisfy these constraints at some initial time, if the subsequent propagation of the field forward in time, described by the dynamical equations, really honors the constraints. That is to say, that the constraints will continue to be satisfied. This is a very important point, as any theory that fails to keep the constraints satisfied is physically untenable. To check whether Maxwell’s equations do preserve these constraints, start by taking the divergence of Ampere’s equation as in the charge conservation case above

\[ \nabla \cdot \vec J + \partial_{t} (\nabla \cdot \vec D ) = 0 \; .\]

This time, instead of hunting for charge conservation, we are going to use charge conservation to eliminate $$div(\vec J)$$ in favor of $$\partial_t \rho$$ to get

\[ -\partial_t \rho + \partial_t (\nabla \cdot \vec D) = 0 \Rightarrow \partial_t (\nabla \cdot \vec D – \rho ) = 0 \; .\]

Note that a logical purist may object that this constraint and charge conservation are really not independent of each other and that this ‘proof’ is merely a tautology. This is actually true but is not a flaw. It simply means that charge conservation and the preservation of the Coulomb go hand-in-hand.

Likewise, taking the divergence of Faraday’s law gives

\[ \nabla \cdot (\nabla \times \vec E + \partial_t \vec B = 0 ) \; , \]

which because of the divergence of the curl is zero becomes

\[ \partial_t (\nabla \cdot \vec B = 0 ) = 0 \; ,\]

assuming that $$\partial_t$$ commutes with $$\nabla \cdot$$.

Thus, once both Coulomb’s Law and the Monopole Law are satisfied at at the initial conditions, they stay satisfied for all time.

Mass-Varying Systems

Last week, the problem of the conveyor belt accumulating some material at a steady rate $$r$$ was examined. The motor driving the belt kept it moving at a constant velocity. The power expended by the motor was double the power needed to accelerate the material when it hits the belt. The natural question is, why is it exactly double.

This question provoked a lot of discussion in ‘The Physics Teacher’, starting with an article by Wu, which tried to give a straightforward answer to why the value is exactly $$2$$ and also where did the other half of the energy go. For the most part, the argument by Wu, while physically intuitive, is lacking in logical rigor.

The best argument was given by Marcelo Alonso who starts with the varying-mass form of Newton’s law

\[ m \dot {\vec v} + {\dot m} (\vec v – \vec u) = \vec F_{ext} \; . \]

In this equation, $$m$$ is the system mass, which, in the case of the conveyor problem, is the mass of the belt and the material on it at any given time. The vector $$\vec v$$ is the velocity of the system. The quantities $$\dot m = r$$ and $$\vec u$$ are the rate of change of the system mass (more on this later) and the velocity of the mass that either enters of leaves the system, respectively. Justification for this equation will be given below.

The power imparted by the external force $$\vec F_{ext}$$ is

\[ {\mathcal P} = \vec F_{ext} \cdot \vec v = m \, \dot {\vec v} \cdot \vec v + \dot m (\vec v \cdot \vec v – \vec u \cdot \vec v) \; .\]

For the conveyor belt problem, the material enters with a very small velocity $$\vec u$$ that is perpendicular to the system velocity $$\vec v$$ so the last term vanishes. Since the motor keeps $$\vec v$$ constant it can be moved into and out of any total time derivative. The motor power is now

\[ {\mathcal P} = \dot m (\vec v \cdot \vec v) = \frac{d}{dt} \left( m \, \vec v \cdot \vec v \right) = \frac{d}{dt} ( 2 KE ) \; .\]

Viewed in this light, the factor of two is not so much mysterious as accidental. It is a consequence of the special circumstances that $$\vec u \cdot \vec v $$ is zero and $$\dot {\vec v} = 0$$. If the material were introduced to the belt in some other fashion, for example, thrown horizontally on the belt, the ratio of the power provided to the time rate of change of the materials kinetic energy would have been different and its value wouldn’t attract special attention.

All told this is a mostly satisfactory explanation. The only problem is that the terse response Alonso provided in the form of a letter to the editor didn’t give enough details on the varying-mass form of Newton’s law was derived.

Surprisingly, these details are difficult to find in most textbooks. Modern presentations don’t seem to care too much about varying mass situations although they are important in many applications. Systems in motion that suffer a mass change are almost everywhere in modern life. Examples range from cars that consume gasoline as they move, to ice accumulation on or water evaporation from industrial surfaces, to rocket motion and jet propulsion.

So in the interest of completeness, I derive the varying-mass form of Newton’s equations here.

There are two cases to consider: the system mass increases as material is accreted and the system mass decreases as material is ejected. The specific mechanisms of accretion or ejection are not important for what follows.

I will start with the accreting case just before a bit of matter $$\delta m$$ is introduced into the system mass. The initial momentum of the system and lump is

\[ \vec p_i = m \vec v + \delta m \, \vec u \; .\]

Just after this small lump has combined with the system, the final momentum is

\[ \vec p_f = ( m + \delta m)(\vec v + \delta \vec v) \; .\]

The change in momentum is the difference between these two quantities

\[ \Delta \vec p = \vec p_f – \vec p_i = ( m + \delta m)(\vec v + \delta \vec v) – m \vec v – \delta m \, \vec u \; ,\]

which simplifies to

\[ \Delta \vec p = m \, \delta \vec v + \delta m \, \vec v – \delta m \, \vec u \; .\]

This change is momentum is caused by the impulse due to the external force $$\vec F_{ext}$$

\[ \Delta \vec p = \vec F_{ext} \, \delta t \]

acting over the short time $$\delta t$$.

Equating the terms gives

\[ m \, \delta \vec v + \delta m \, \vec v – \delta m \, \vec u = \vec F_{ext} \, \delta t \; , \]

which becomes

\[ m \frac{\delta \vec v}{\delta t} + \frac{\delta m}{\delta t} ( \vec v – \vec u) = \vec F_{ext} \]

when dividing by the small time $$\delta t$$. Before this equation is taken to the limit and derivatives materialize out of deltas, let’s look at the ejection case.

For ejected mass, the system starts with the momentum

\[ \vec p_i = m \vec v \; .\]

After the lump with mass $$\delta m$$ is ejected with velocity $$\vec u$$, the final momentum takes the form

\[ \vec p_f = ( m – \delta m)(\vec v + \delta \vec v) + \delta m \, \vec u \;. \]

Following the same process above, the analogous equation

\[ m \frac {\delta \vec v}{\delta t} – \frac{\delta m}{\delta t} ( \vec v – \vec u) = \vec F_{ext} \]

results.

The two equations look similar with only a small change in sign in front of the $$\frac{\delta m}{\delta t}$$ to hint at the physical difference of mass inflowing in the first and outflowing in the second.

Sign differences can always be problematic and more so when it comes to ‘in’ versus ‘out’. Confusion can easily insert itself in such cases. As an example where careful track of sign conventions is important, consider the presentation of mass-varying systems in Halliday and Resnick. They present the mass ejection case as a preliminary to a discussion of rocket propulsion but they arrive at an equation with the opposite sign from the one derived here. They get this difference by explicitly setting the sign of $$dm/dt$$ opposite to that of $$\delta m/\delta t$$.

A small change in one’s frame of mind and a careful attention to the difference between $$\delta m/\delta t$$ and $$dm/dt$$ is all that is needed to reconcile these differences. To go from the simple ratios of deltas to actual derivatives, note that by the initial construction $$\delta m$$ was a positive quantity, a small bit of matter. The time increment $$\delta t$$ is also positive. So the ratio of these two quantities is also positive. But the rate of change of the system mass $$m$$ can be either positive or negative. Despite the notational similarity between $$\delta m/\delta t$$ and $$dm/dt$$, the mass $$m$$ being addressed is not the same. The mass in the derivative expression is the system mass which can gain or lose mass and so $$dm$$ can be positive or negative.

For the accreted mass case, $$dm/dt = r > 0$$ and the appropriate limit is

\[ m \frac{d \vec v}{d t} + \frac{d m}{dt} ( \vec v – \vec u ) = \vec F_{ext} \; ,\]

or, with an eye towards combining the two cases,

\[ m \frac{d \vec v}{d t} + \left| \frac{d m}{dt} \right| ( \vec v – \vec u ) = \vec F_{ext} \; .\]

For the ejected mass case, $$dm/dt = r < 0$$ and the appropriate limit is \[ m \frac{d \vec v}{d t} + \left| \frac{d m}{dt} \right| ( \vec v - \vec u ) = \vec F_{ext} \; ,\] with the sign on the second term now changing due to the inclusion of the absolute value.