Latest Posts

Dynamical Systems and Constraints: Part 1 – General Theory

Constraints in dynamical systems are important in physics for two reasons. First, they show in a variety of real-world mechanical systems with practical applications. In addition, the notion of a constraint, usually of a more abstract variety, is an essential ingredient in modern physical systems like quantum field theory and general relativity. This brief post is meant to cover the basic theory of constraints in classical physics as a precursor to some future posts on specific systems. The approach and the notation is strongly influenced by Theoretical Mechanics by Murray Spiegel, although the arrangement, presentation, and logical flow are my own.

The proper setting for understanding constraints in classical mechanics is the Lagrangian point-of-view and the primary idea for carrying out this program is the notion of generalized coordinates. By expressing the work-energy theorem in generalized coordinates, we will not only arrive at the Euler-Lagrange equations but will find the proper place in which to put the constraints into the formalism through the use of Lagrange multipliers.

The kinetic energy for a multi-particle system is expressed in Cartesian coordinates as

\[ T = \sum_n \frac{1}{2} m_n {\dot {\vec r}}_n \cdot {\dot {\vec r}}_n \; , \]

where the index $$n$$ labels each particle.

The partial derivatives of $$T$$ with respect to the generalized coordinate and velocity are:

\[ \frac{\partial T}{\partial q_{\alpha}} = \sum_n m_n {\dot {\vec r}}_n \cdot \frac{\partial {\dot {\vec r}}_n}{\partial q_{\alpha}} \]

and

\[ \frac{\partial T}{\partial {\dot q}_{\alpha}} = \sum_n m_n {\dot {\vec r}}_n \cdot \frac{\partial {\dot {\vec r}}_n}{\partial {\dot q}_{\alpha}} \; .\]

Lagrange’s cleverness comes is recognizing two somewhat non-obvious steps. First is that one can simplify the last expression by noting that

\[ {\dot {\vec r}}_n = \frac{\partial {\vec r}_n}{\partial q_{\alpha}} {\dot q}_{\alpha} + \frac{\partial {\vec r}_n}{\partial t} \; , \]

which lead immediately to the cancellation-of-dots that says

\[ \frac{\partial {\dot {\vec r}}_n}{\partial {\dot q}_{\alpha}} = \frac{\partial {{\vec r}}_n}{\partial {q}_{\alpha}} \; .\]

This results then simplifies

\[ \frac{\partial T}{\partial {\dot q}_{\alpha}} = \sum_n m_n {\dot {\vec r}}_n \cdot \frac{\partial {{\vec r}}_n}{\partial {q}_{\alpha}} \; .\]

The second of Lagrange’s steps is to recognize that taking the total time-derivative of that same term yields

\[ \frac{d}{dt} \left( \frac{\partial T}{\partial {\dot q}_{\alpha}} \right) = \sum_n m_n {\ddot {\vec r}}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} + \sum_n m_n {\dot {\vec r}}_n \frac{\partial {\dot {\vec r}}_n}{\partial q_{\alpha}} \; .\]

The second part of this expression nicely cancels the derivative of the kinetic energy with respect to the generalized coordinate and so we arrive at the identity

\[ \frac{d}{dt} \left( \frac{\partial T}{\partial {\dot q}_{\alpha}} \right) – \frac{\partial T}{\partial q_{\alpha}} = \sum_n m_n {\ddot {\vec r}}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \; . \]

Defining the left-hand side compactly as

\[ Y_{\alpha} \equiv \frac{d}{dt} \left( \frac{\partial T}{\partial {\dot q}_{\alpha}} \right) – \frac{\partial T}{\partial q_{\alpha}} \; \]

gives

\[ Y_{\alpha} = \sum_n m_n {\ddot {\vec r}}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \; . \]

Like Lagrange, we can conclude that the presence of the accelerations in the right-hand side is related to the sum of the forces. However, it would be incorrect to conclude that each individual force can be teased out this the right-hand side as will be discussed below.

Before trying to make sense of the $$Y_{\alpha}$$’s, examine the forces in generalized coordinates via the differential work, which is given by

\[ dW = \sum_n {\vec F}_n \cdot d {\vec r}_n = \sum_n {\vec F}_n \cdot \sum_{\alpha} \frac{\partial {\vec r}_n}{\partial q_\alpha} dq_{\alpha} \; . \]

Regrouping this last term gives the final expression

\[ dW = \sum_\alpha \left( \sum_n {\vec F}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \right) dq_\alpha \equiv \sum_\alpha \Phi_\alpha d q_\alpha \; . \]

Forming the sum

\[ \sum_{\alpha} Y_{\alpha} dq_{\alpha} = \sum_{\alpha} \left( \sum_n m_n {\ddot {\vec r}}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \right) dq_\alpha \; ,\]

yields an expression for the differential work as well, since from the Work-Energy theorem $$dW = dT$$.

Subtracting these two expressions gives

\[ \sum_{\alpha} \left( Y_{\alpha} – \Phi_{\alpha} \right) dq_\alpha = \sum_{\alpha} \left( \sum_n m_n \left[ {\ddot {\vec r}}_n – {\vec F}_n \right] \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \right) dq_\alpha = 0 \; .\]

One might be tempted to conclude that $$Y_\alpha = \Phi_\alpha$$ but this would be erroneous. All that can be concluded is that the projections of each of the two terms along the constrained surface are the same. The following figure shows a simple geometric case where two vectors (blue and green) (denoting same the generalized accelerations and the generalized forces) are both perpendicular to the red vector (denoting the motion of the system consistent with the constraint) without being equal. If, and only if, there are no constraints in the system does $$Y_\alpha = \Phi_\alpha$$.

If the constraints are integrable, that is to say they can be expressed in such a way that one generalized coordinate can be eliminated by expressing it in terms of the others, then a straight substitution will eliminate the constraint and no additional steps are needed. If that step can’t be carried out, then the use of Lagrange multipliers are required as follows.

Suppose that there are $$m$$ constraints of the form

\[ A_\alpha q_\alpha + A dt = 0 \; ; B_\alpha q_\alpha + B dt = 0 \; ; C_\alpha q_\alpha + C dt = 0 \; \ldots \; \]

and, that any instant, the constraints are satisfied so that the $$dt$$ term can be set to zero. Then these constraint equations can be added together in an undetermined, as yet, linear combination

\[ \sum_\alpha ( \lambda_1 A_\alpha + \lambda_2 B_\alpha + \lambda_3 C_\alpha + \cdots ) dq_\alpha = 0 \; .\]

These $$m$$ equations augment the original equation to yield the augmented form

\[ \sum_{\alpha} (Y_\alpha – \Phi_\alpha – \lambda_1 A_\alpha – \lambda_2 B_\alpha + \lambda_3 C_\alpha + \cdots ) dq_\alpha = 0 \; .\]

These equations, when regrouped, form two systems of equations. The first system can be regrouped and rewritten, using the constraint equations, so that we can eliminate $$m$$ of the $$dq_\alpha$$ (say $$dq_1,\ldots,dq_m$$) and then solve that system for the Lagrange multipliers. Returning to the augmented form, we can set the coefficients of the dependent terms equal to zero. Now the augmented form reduces to

\[ \sum_{\alpha’} (Y_{\alpha’} – \Phi_{\alpha’} – \lambda_1 A_{\alpha’} – \lambda_2 B_{\alpha’} + \lambda_3 C_{\alpha’} + \cdots ) dq_\alpha = 0 \; ,\]

where the prime indicates that the sum is restricted to the independent terms, which allows us to immediately conclude that

\[ Y_\alpha – \Phi_\alpha – \lambda_1 A_\alpha – \lambda_2 B_\alpha + \lambda_3 C_\alpha + \cdots = 0 \; \]

for all $$\alpha$$, both the dependent and independent set.

Next post, I’ll explore this framework with some instructive examples taken from the classical mechanics literature.

Energy and Hamiltonians: Part 3 – Examples

This post, building on the work of the previous two posts, investigates the six cases possible in which $$E=h$$ or $$E\neq h$$ and in which either one, the other, neither, or both are conserved. The examples are generally taken from simple mechanical systems in an attempt to illustrate the underlying physics without bringing in an unwieldy set of mathematics.

As a reminder, the energy $$E$$ is always defined as the sum of kinetic and potential energies and the function $$h$$ reflects not only the underlying physics (i.e., $$E$$) but also the generalized coordinate system in which it is described. The power of the generalized coordinates will allow us, in certain circumstances, to find conserved energy-like quantities that are not the energy proper. This is the power and complexity of generalized coordinates and it is this feature that makes it fun.

Case 1 – $$E \neq h$$, both conserved: Free Particle with Co-moving Transformation

Consider the free particle. The Lagrangian for the system is given by

\[ L = \frac{1}{2} m {\dot x}^2 \; .\]

The conjugate momentum and the generalized force are

\[ p_x = m {\dot x} \]

and

\[ \Phi_x = \frac{\partial L}{\partial x} = 0 \; , \]

respectively. The equation of motion is the familar

\[ m \ddot x = 0 \; ,\]

leading to the usual solution

\[ x(t) = x_0 + v_0 t \; .\]

The conserved quantity $$h_x$$ is the energy, which is entirely kinetic, given by:

\[ h_x = \dot x p_x – L = \frac{1}{2} m v_0^2 = E\; .\]

Now suppose that we define the generalized coordinate $$q$$ through the transformation $$x = q + v_0 t$$ and $$\dot x = \dot q + v_0$$. The corresponding Lagrangian is

\[ L = \frac{1}{2} m \left(\dot q + v_0 \right)^2 \]

with the corresponding equation of motion

\[ m \ddot q = 0 \; . \]

The conserved quantity $$h_q$$ is given by

\[ h_q = \dot q p_q – L = m \dot q (\dot q + v_0) – \frac{1}{2} m (\dot q + v_0)^2 = \frac{1}{2} m \dot q^2 – \frac{1}{2} m v_0^2 \; .\]

From the transformation equation, $$q = v_0$$ and so $$h_q = 0$$. The physical interpretation is that, by transforming (via a Galilean transformation) into a frame co-moving with the free particle, the conserved quantity is identically zero, a value different from the energy in the original frame.

Case 2 – $$E \neq h$$, $$E$$ conserved: Free Particle with Accelerating Transformation

Building off of the last case, again look at the free particle and consider the transformation into an accelerating frame given by

\[ x = q + \frac{1}{2} a t^2 \; ,\]

with

\[ \dot x = \dot q + a t \; .\]

The transformed Lagrangian is

\[ L = \frac{1}{2} m \left( \dot q + a t \right) ^2 \; . \]

The corresponding $$h_q$$ is

\[ h_q = m \dot q (\dot q + a t) – \frac{1}{2} m \left(\dot q + a t \right)^2 = \frac{1}{2} m \dot q^2 – \frac{1}{2} m a^2 t^2 \; , \]

which, since it is time varying, is not conserved.

Case 3 – $$E = h$$, both conserved: Planar SHO

An example where the value of the energy $$E$$ and the function $$h_q$$ can take on dramatically different forms and yet yield the same value is supplied by the planar simple harmonic oscillator. In this case, a mass is found in cyclindrically symmetric harmonic potential $$V = 1/2 \, k(x^2 + y^2)$$.

The corresponding Lagrangian is given by

\[ L = \frac{1}{2} m \left( {\dot x}^2 + {\dot y}^2 \right) – \frac{1}{2} k \left( x^2 + y^2 \right) \; , \]

with the equation of motion being

\[ m \ddot {\vec r} = -k \vec r \; .\]

This equation has well known analytical solutions, but for a change of pace, I implemented numerical solutions of these equations with a Jupyter notebook. The state-space equation took the form of

\[ \frac{d}{dt} \left[ \begin{array}{c} x \\ \dot x \\ y \\ \dot y \end{array} \right] = \left[ \begin{array}{c} \dot x \\ – k/m \; , x \\ \dot y \\ -k/m \; , y \end{array} \right] \; . \]

Under the point transformation to plane polar coordinates

\[ x = r \cos ( \omega_0 t) \]

and

\[ y = r \sin ( \omega_0 t) \; ,\]

(where $$\omega_0 = \sqrt{k/m}$$), the Lagrangian becomes

\[ L = \frac{1}{2} m \left( {\dot r}^2 + r^2 {\dot \theta}^2 \right) – \frac{1}{2} k r^2 \; .\]

The resulting Euler-Lagrange equations are

\[ m \ddot r + k r = 0 \]

and

\[ \frac{d}{dt} \left( m r^2 \dot \theta \right) = 0 \; , \]

which are interpretted as the equation of motion of a one-dimensional radial oscillator and the conservation of the angular momentum in the $$x-y$$ plane. The corresponding numerical form is

\[ \frac{d}{dt} \left[ \begin{array}{c} r \\ \dot r \\ \theta \\ \dot \theta \end{array} \right] = \left[ \begin{array}{c} \dot r \\ – k/m \, r + r \dot \theta^2 \\ \dot \theta \\ -2 \dot r \dot \theta /r \end{array} \right] \; . \]

The initial conditions for the simulation were

\[ {\bar S}_0 = \left[ \begin{array}{c} R \\ 0 \\ 0\\ R \omega_0 \end{array} \right] \]

and

\[ {\bar S}_0 = \left[ \begin{array}{c} R \\ 0 \\ 0\\ \omega_0 \end{array} \right] \; ,\]

for Cartesian and polar forms, respectively.

After the trajectories had been computed, the ephemerides were fed into functions that computed the Cartesian energy

\[ E = \frac{1}{2} m \left( {\dot x}^2 + {\dot y}^2 \right) \]

and

\[ E = \frac{1}{2} m \left( {\dot r}^2 + r^2 {\dot \theta}^2 \right) \; ,\]

as appropriate. As expected, the values for the energy, arrived at in two independent ways, are identical

Case 4 – $$E \neq h$$, $$h$$ conserved: Constrained Planar SHO

In constrast to the Case 3, the constrained planar simple harmonic oscillator executes its motion subject to the constraint

\[ {\dot \theta} = \omega \; ,\]

where $$\omega$$ is the driving frequency. The mechanical analog would be something like the following picture

where a mass on a spring is forced by a motor that keeps the oscillator rotating at a constant angular rate.

The Lagrangian in this case is given by

\[ L = \frac{1}{2} m \left( {\dot r}^2 + r^2 \omega^2 \right) – \frac{1}{2} m k (r-\ell_0)^2 \; . \]

Note that there is no easy way to express the Lagrangian in Cartesian coordinates – another example of the power of generalized coordinates.

The energy of the system is

\[ E = T + U = \frac{1}{2} m \left( {\dot r}^2 + r^2 \omega^2 \right) + \frac{1}{2} m k (r-\ell_0)^2 \; ,\]

but

\[ h_r = \frac{1}{2} m \left( {\dot r}^2 – r^2 \omega^2 \right) + \frac{1}{2} m k (r-\ell_0)^2 \; . \]

The minus sign makes all the difference here. Since there is a motor in the problem, the expectation is that the energy should not be conserved, even though the structure of Lagrangian mechanics guarentees the conservation of $$h$$ (since it is not time-dependent).

While there are analytic solutions to these equations, I again opted for a numerical simulation. The equation of motion is

\[ {\ddot r} = (\omega^2 – \omega_0^2)r + k \ell_0 \; , \]

where $$\omega_0^2 = k/m$$.

The resulting dynamics show the radial motion oscillating sinusoidally

about an equilibrium value determined by

\[ r = \frac{k \ell_0}{\omega_0^2 – \omega^2} \; . \]

The graph of the energy and $$h_r$$ shows clearly that the energy is not conserved, even though $$h_r$$, as expected, is

Case 5 – $$E \neq h$$, neither conserved: Damped SHO

The final example is the damped simple harmonic oscillator. The Lagragian is given by

\[ L = e^{\gamma t/m} \left[\frac{1}{2} m {\dot x}^2 – \frac{1}{2} k x^2 \right] \; , \]

which results in the equation of motion

\[ m {\ddot x} + \gamma {\dot x} + k x = 0 \; .\]

The energy,

\[ E = \frac{1}{2} {\dot x}^2 + \frac{1}{2} k x^2 \; ,\]

is clearly not conserved and the $$h$$ function,

\[ h = {\dot x} \frac{\partial L}{\partial \dot x} – L = e^{t \gamma/m} E \; ,\]

is also not conserved since it depends on time.

Case 6 – $$E = h$$, neither conserved

This last case is something of a contrivance. As may have been guessed, by the above examples, finding a case where $$E=h$$ and yet both are not conserved is not particularly easy to do. Rotating drivers that leave a dynamical condition like $$\dot \theta = constant \equiv \omega$$ lead to a conserved $$h$$ since the transformation to a rotating frame leaves the system in a co-moving configuration, despite the time varying nature of the transformation. Equally frustrating is trying to incorporate dissipation into the problem. Dissipation generally leads to a different form for $$h$$ compared with $$E$$ as Case 5 demonstrates.

The one example that does work, is by defining the potential energy as

\[ V = x F(t) \]

and the Lagrangian as

\[ L = \frac{1}{2} m {\dot x}^2 – x F(t) \]

resulting in the equation of motion

\[ m {\ddot x} = F(t) \; .\]

Both the energy and $$h$$ are given by

\[ \frac{1}{2} m {\dot x}^2 + x F(t) \; , \]

which is clearly not conserved.

Energy and Hamiltonians: Part 2 – Conservation Considerations

Last month, I covered the basic ground work that established under what conditions generalized coordinates can be used in the Lagrangian formulation. Generalized coordinates enable vast simplifications of the analysis of mechanical systems and provide for deep insight into the possible motion. The analysis found that the permissible transformations are the point transformations

\[q^i = q^i(s^j,t) \; ,\]

since these leave the Euler-Lagrange equations of motion invariant.

The next step is to examine what these transformations do to the expression of the energy. Note that no transformation can ‘destroy’ energy conservation – if the energy is conserved in one set of coordinates it is conserved in all coordinate systems. But it is possible that a poor transformation will hide the conservation of energy and that a clever transformation will reveal the conservation of a related quantity (the function $$h$$ to be discussed shortly) even when the energy is not conserved.

In Newtonian mechanics, the energy is given by the sum of the kinetic and potential energy (each possibly depending on $$q^i$$, $${\dot q}^i$$, and $$t$$)

\[ E = T + V \; .\]

The corresponding quantity in Lagrangian dynamics is the $$h$$ function defined as

\[ h = \frac{\partial L}{\partial {\dot q}^i} {\dot q}^i – L(q^j,{\dot q^j};t) \; ,\]

with $$L = T – V$$.

At first glance, it isn’t at all obvious why $$h$$ should ever be conserved and why it can be identified as the energy, in certain circumstances. To see the conservation properties of $$h$$ examine its total time derivative given by

\[ \frac{d h}{d t} = \frac{d}{d t} \left(\frac{\partial L}{\partial {\dot q}^i} \right) {\dot q}^i + \frac{\partial L}{\partial {\dot q}^i} {\ddot q}^i – \frac{\partial L}{\partial q^i} {\dot q}^i – \frac{\partial L}{\partial {\dot q}^i} {\ddot q}^i – \frac{\partial L}{\partial t} \; . \]

The second and fourth terms cancel and the first and third terms can be grouped so that the expression now reads

\[ \frac{d h}{d t} = \left[ \frac{d}{d t} \left(\frac{\partial L}{\partial {\dot q}^i} \right) – \frac{\partial L}{\partial q^i} \right] {\dot q}^i – \frac{\partial L}{\partial t} \; . \]

Physical motion requires the first term, which are the Euler-Lagrange equations, to be zero, leaving the very nice expression that

\[ \frac{d h}{dt} = \frac{\partial L}{\partial t} \; . \]

Thus, under some very general circumstances, $$h$$ is conserved if the Lagrangian, in some coordinate system, doesn’t have any time dependence. Since the allowed point transformations can be time-dependent, it is possible for $$h$$ to be conserved in certain generalized coordinate systems and not in others. This is the genesis of the confusion of between $$E$$ and $$h$$ discussed in the previous post.

Having answered the first question about the conservation properties of $$h$$, we now turn to when $$h$$ can be identified as the energy $$E$$. In order not to obscure the physical points, for the bulk of this post 1-$$d$$ systems will be examined with the generalized coordinates being denoted as $$x$$ (starting Cartesian coordinate) and $$q$$ (arbitrary generalized coordinate). At the end, this restriction will be relaxed.

The kinetic energy, given by

\[ T = \frac{1}{2} m {\dot x}^2 \; ,\]

will be the key parameter, with the potential energy just coming along for the ride.

Expanding on Morin (http://www.people.fas.harvard.edu/~djmorin/chap15.pdf), we examine three cases: 1) the limited point transformation $$x = x(q)$$, 2) the full point transformation $$x = x(q,t)$$, and 3) the ‘forbidden’ phase-space transformation $$x=x(q,{\dot q})$$. In each of these cases, we exploit the fact that the transformations are invertible and so either $$x$$ or $$q$$ can be regarded as the base coordinate. Assuming the Cartesian $$x$$ as a function of the generalized coordinate $$q$$ is simply a convenient choice.

Case 1: $$x = x(q)$$

This case is commonly encountered when the geometry of the system better matches cylindrical, spherical, or some other curvilinear coordinate system. The velocities are related by

\[ {\dot x} = \frac{\partial x}{\partial q} {\dot q} \; , \]

leading to the kinetic energy transforming as

\[ T = \frac{1}{2} m {\dot x}^2 \rightarrow \frac{1}{2} m \left( \frac{\partial x}{\partial q} \right)^2 {\dot q}^2 \equiv \frac{1}{2} \mu_0(q) {\dot q}^2 \; .\]

Defining

\[ p_q = \frac{\partial L}{\partial {\dot q}} = \mu_0(q) {\dot q} \; , \]

the expression for $$h$$ becomes

\[ h = p_q {\dot q} – L = \mu_0(q) {\dot q}^2 – \frac{1}{2} \mu_0(q) {\dot q}^2 + V(q) \; \]

Simplifying yields

\[ h = \frac{1}{2} \mu_0(q) {\dot q}^2 + V(q) = T + V = E \; . \]

So in the case where the point transformation is limited to exclude time dependance, $$h = E$$. Since the motion is derivable from a potential, we can immediately conclude that the energy is conserved.

Case 2: $$ x = x(q,t)$$

This case is commonly encountered in rotating systems where a time-dependent transformation puts the viewpoint coincident with a co-rotating observer. This is the trickiest of all the physical situations since the energy may or may not be conserved independently of what $$h$$ is doing. These cases will be examined in detail in future posts.

The velocities are related by

\[ {\dot x} = \frac{\partial x}{\partial q} {\dot q} + \frac{\partial x}{\partial t} \; , \]

leading to the kinetic energy transforming as

\[ T = \frac{1}{2} m {\dot x}^2 \rightarrow \frac{1}{2} m \left( \frac{\partial x}{\partial q} {\dot q} + \frac{\partial x}{\partial t} \right)^2 \]

Expanding and simplifying yields the kinetic energy in generalized coordinates as

\[ T = \frac{1}{2} m \left[ \left(\frac{\partial x}{\partial q}\right)^2 {\dot q}^2 + 2 \frac{\partial x}{\partial q} \frac{\partial x}{\partial t} {\dot q} + \left(\frac{\partial x}{\partial t} \right)^2 \right] \; \]

or, more compactly as,

\[ T = \frac{1}{2} \mu_0(q,t) {\dot q}^2 + \mu_1(q,t) {\dot q} + \frac{1}{2} \mu_t(q,t) \; .\]

Once again defining

\[ p_q = \frac{\partial L}{\partial {\dot q}} = \mu_0(q,t) {\dot q} + \mu_1(q,t) \; , \]

the expression for

\[ h = p_q {\dot q} – L = \mu_0(q,t) {\dot q}^2 + \mu_1(q,t) {\dot q} – \frac{1}{2} \mu_0(q,t){\dot q}^2 \\ – \mu_1(q,t) {\dot q} – \frac{1}{2} \mu_t(q,t) + V(q,t) \; . \]

Simplifying gives

\[ h = \frac{1}{2} \mu_0(q,t){\dot q}^2 – \frac{1}{2} \mu_t(q,t) + V(q,t) \neq T + V \; .\]

So, in this case, $$h$$ is not identified with the energy. Whether $$h$$ or $$E$$ are conserved is a matter for case-by-case analysis.

Case 3: $$x = x(q,{\dot q})$$

This final case is more of a curiosity rather than a serious exploration. The case is ‘forbidden’ by last month’s analysis since it doesn’t preserve the transformation properties of the Euler-Lagrange equations. Nonetheless, it is interesting to see what would happen to $$h$$ should this transformation be allowed.

The velocities are related by

\[ {\dot x} = \frac{\partial x}{\partial q} {\dot q} + \frac{\partial x}{\partial {\dot q}} {\ddot q} \]

leading to the kinetic energy becoming

\[ T = \frac{1}{2} \mu_0(q,{\dot q}) {\dot q}^2 + \mu_3(q,{\dot q}) {\dot q}{\ddot q} + \frac{1}{2} \mu_4(q,{\dot q}) {\ddot q}^2 \; .\]

Following the same process above yields

\[ h = \frac{1}{2} \mu_0(q,{\dot q}) {\dot q}^2 – \frac{1}{2} \mu_4(q,{\dot q}) {\ddot q}^2 + V(q,{\dot q}) \neq T + V \; .\]

Finally, a note on relaxing the restriction to 1-$$d$$. In multiple dimensions, the velocities in the most general case transform as

\[ {\dot s}^j = \frac{\partial s^j}{\partial q^i} {\dot q}^i + \frac{\partial s^j}{\partial t} \; .\]

Following the same method as above, the kinetic energy will take the form

\[ T = \frac{1}{2} \mu_0^i ({\dot q}^i)^2 + \mu^{ij} {\dot q}^i {\dot q}^j + \mu_1^i {\dot q}^i + \frac{1}{2} \mu_t^i \; ,\]

using the same shorthand notation as above.

The only new feature (beyond the trivial decoration of the $$q$$s with an index and the implied summation) is the cross-term $$\mu^{ij} {\dot q}^i {\dot q}^j$$. These terms will disappear (as do all the terms linear in a given $$q$$) from $$h$$ since

\[ p_{q^i} = \mu^{ij} {\dot q}^j \]

will yield a positive term

\[ p_{q^i} {\dot q}^i = \mu^{ij} {\dot q}^i {\dot q}^j \]

that is exactly cancelled by the term coming from $$-L$$. Everything else follows through as in the 1-$$d$$ case.

Energy and Hamiltonians: Part 1 – Generalized Coordinates

One of the most confusing aspects of Lagrangian/Hamiltonian mechanics centers on the energy $$E$$ and its relationship to the function $$h$$ (or to the Hamiltonian $$H$$, to which it is closely related). The key questions are, to what extent these two terms can be identified with each other, and which of them is conserved and in which circumstances. The answers have deep implications into the analysis of the possible motions the system supports. The opportunity for confusion arises because there are many different possibilities that show up in practice.

The aim of this and the following columns is to delve into these cases in sufficient detail to flesh out the six logically possible cases that result from $$E$$ being either conserved or not, $$h$$ being either conserved or not, both subject to the possible constraint that $$E=h$$.

Before going into specific analysis of this case or that, it is important to get an overview as to how these various cases result in practice. Central to the variety of possibilities is the notion of generalize coordinates and the application of dynamical constraints.

As discussed in great detail in Classical Mechanics by Goldstein (especially Chapter 1), generalized coordinates are what provide the Lagrangian method with its power by enabling it to rise above Cartesian coordinates and into a frame better adapted to the geometry of the dynamics. For example, central force motion is best described in plane-polar coordinates that not only make the computations easier but also provide a simple method of imposing constraints like, the motion must take place at a fixed radius (e.g., a bead on a loop of wire). The number of generalized coordinates can always match the number of degrees of freedom, a condition that usually can’t be fulfilled in Cartesian coordinates if there are constraints. Generalized coordinates serve as the cornerstone in understanding rigid body motion – the three Euler angles mapping directly to the three degrees of freedom such a body has to move about its center of mass.

The price for such freedom is that the universe of possible conserved quantities must be enlarged from the obvious ones evident in elementary mechanics to a broader set. This expansion in scope leads to the need to distinguish between the energy $$E$$ and the function $$h$$. One of the most important and famous examples of this is the conservation of the Jacobi constant in the circular-restricted three-body problem.

But, before launching into an analysis that compares and contrasts $$E$$ with $$h$$, it is important to establish some basic results of the Lagrangian method.

The first result is that the Lagrangian equations of motion are invariant under what Goldstein calls a point transformation

\[ q^i = q^i(s^j,t) \; , \]

relating one set of generalized coordinates $$q^i$$ to another $$s^j$$ (note that Cartesian coordinates are a subset of generalized coordinates).

Transformation of the Lagrangian $$L(q^i,t)$$ to $$L'(s^j,t)$$ starts with determining the form of the generalized velocities

\[ {\dot q}^i = \frac{\partial q^i}{\partial s^j} {\dot s}^j + \frac{\partial q^i}{\partial t} \; , \]

($$\dot f \equiv \frac{d}{dt} f$$).

From this relation comes the somewhat counterintuitive ‘cancellation-of-the-dots’ that states

\[ \frac{\partial {\dot q}^i}{\partial {\dot s}^j} = \frac{\partial q^i}{\partial s^j} \; .\]

A needed corollary of the cancellation-of-the-dots results from applying the total time derivative to both sides of the above relation. The right-hand side expands as

\[ \frac{d}{dt} \left( \frac{\partial q^i}{\partial s^j} \right) = \frac{\partial^2 q^i}{\partial s^k \partial s^j} {\dot s}^k + \frac{\partial^2 q^i}{\partial t \partial s^j} \; .\]

Switching the order of partial derivatives (always allowed for physical functions) and regrouping terms gives the right-hand side (noting along the way that $$\partial {\dot s}^k / \partial s^j = 0 $$) as

\[ \frac{\partial}{\partial s^j} \left( \frac{\partial q^i}{\partial s^k} {\dot s}^k + \frac{\partial q^i}{\partial t} \right) = \frac{\partial}{\partial s^j} \frac{d q^i}{d t} \; . \]

The next step is to grind through the equations of motion, which is straightforward if a bit tedious. Start by labeling the Lagrangian that results after the substitution of the point transformation as

\[ L’ (s^j,{\dot s}^j,t ) = L (q^i(s^j),{\dot q}^i(s^j,{\dot s}^j),t ) \; .\]

The coordinate-derivative piece of the equations of motion gives

\[ \frac{\partial L’}{\partial s^j} = \frac{\partial L}{\partial q^i} \frac{\partial q^i}{\partial s^j} + \frac{\partial L}{\partial {\dot q}^i} \frac{\partial {\dot q}^i}{\partial s^j} \; . \]

The velocity derivative piece of the equations of motion gives

\[ \frac{\partial L’}{\partial {\dot s}^j} = \frac{\partial L}{\partial {\dot q^i}} \frac{\partial {\dot q}^i}{\partial {\dot s}^j} \; . \]

Combining these pieces into the Euler-Lagrange equations gives

\[ \frac{d}{dt} \left( \frac{\partial L’}{\partial {\dot s}^j } \right) – \frac{\partial L’}{\partial s^j} = \frac{d}{d t} \left( \frac{\partial L}{\partial {\dot q}^i} \right) \frac{\partial {\dot q}^i}{\partial {\dot s}^j} + \frac{\partial L}{\partial {\dot q}^i} \frac{d}{dt} \left( \frac{\partial {\dot q}^i}{\partial {\dot s}^j} \right) \\ – \frac{\partial L}{\partial q^i} \frac{\partial q^i}{\partial s^j} – \frac{\partial L}{\partial {\dot q}^i} \frac{\partial {\dot q}^i}{\partial s^j} \; .\]

Cancellation-of-the-dots and its corollary tells us that the second and fourth terms on the right-hand side cancel, leaving

\[ \frac{d}{dt} \left( \frac{\partial L’}{\partial {\dot s}^j } \right) – \frac{\partial L’}{\partial s^j} = \left[ \frac{d}{d t} \left( \frac{\partial L}{\partial {\dot q}^i} \right) – \frac{\partial L}{\partial q^i} \right] \frac{\partial q^i}{\partial s^j} \; . \]

Since the original equations of motion are zero and the Jacobian of the point transformation is not, we conclude that

\[ \frac{d}{d t} \left( \frac{\partial L}{\partial {\dot q}^i} \right) – \frac{\partial L}{\partial q^i} = 0 \; , \]

which informs us that the Euler-Lagrange equations are invariant under point transformations. This result is our hunting license to use any coordinates related to Cartesian coordinates by a point transformation to describe a system’s degrees of freedom.

The next result is that the Lagrangian is invariant with respect to the addition of a total time derivative. In other words, the same equations of motion result from a Lagrangian $$L$$ as do from one defined as

\[ L’ = L + \frac{d F(q^i,t)}{dt} \; .\]

The proof is as follows. Applying the Euler-Lagrange operator to both sides of the definition of $$L’$$ gives

\[ \frac{d}{dt} \left( \frac{\partial L’}{ \partial {\dot q}^i } \right) – \frac{\partial L’}{\partial q^i} = \frac{d}{dt} \left( \frac{\partial L}{ \partial {\dot q}^i } \right) – \frac{\partial L}{\partial q^i} + \frac{d}{dt} \left( \frac{\partial}{\partial {\dot q}^i} \frac{dF}{dt} \right) – \frac{\partial}{\partial q^i} \frac{dF}{dt} \;. \]

Invariance of the equations of motion requires the last two terms to cancel. The easiest way to see that they do is to first concentrate on the expansion of the total time derivative of $$F$$ as

\[ \frac{d F}{dt} = \frac{\partial F}{\partial q^j} {\dot q}^j + \frac{\partial F}{\partial t} \; . \]

The expression within the third term above then becomes

\[ \frac{\partial}{\partial {\dot q}^j} \left( \frac{d F}{dt} \right) = \frac{\partial F}{\partial q^i} \; . \]

Cancellation of the last two terms now rests on the recognition that

\[ \frac{d}{dt} \left( \frac{\partial F}{\partial q^i} \right) = \frac{\partial}{\partial q^i} \left(\frac{dF}{dt} \right) \; ,\]

a fact easily established by realizing that the total time derivative is a specific sum of partial derivatives and that partial derivatives commute with each other.

This result allows us to play with the form of the Lagrangian in an attempt to find the simplest or most compact way of describing the motion of the physical system.

Next month’s column will use the general structure of this formalism as a guide to the connection between the energy $$E$$ and $$h$$. The columns in the following months will apply the two results derived here to various physical systems in order to illustrate the various cases for $$E$$ and $$h$$ outlined above.

Action and Reaction

Action and reaction; Newton’s 3rd law tells us that for every action there is an equal an opposite reaction. Nature, in effect, is a banker whose accounts are carefully balanced according to the principle of double-entry bookkeeping. Modern theories emphasize this point of view by saying that at the fundamental level all interactions are mediated by the exchange of ‘force quanta’ that transfer energy, momentum, and angular momentum. So all is well – or is it?

Often textbooks muddy the water of this ‘simple’ situation by engaging in nuanced discussions and citing exceptions without emphasizing two points. One is that action-reaction is always honored. The second is that it may require careful logic supported by exact experiements to rescue new cases that seem to be the exception.

Now, these nuances and exceptions, which roughly fall into three prototypical classes, are real and need to be addressed. Each is interesting in its own right, helping to illustarte and explore the richness of the action-reaction concept, but the ‘answer’ for each of these cases is always the same – there are degrees of freedom that are being ignored that balance the action-reaction.

Case 1 – Dissipative forces

Friction is one of the earliest examples of what is discussed in which action-reaction seems to be violated. Energy is bled out of the system and the motion eventually comes to a halt. The prototype example is the damped harmonic oscillator. Its equation of motion is given by

\[ m {\ddot x} + \Gamma {\dot x} + k x = 0 \; , \]

with the general solution

\[ x(t) = A e^{-\gamma t} \cos( \omega t + \phi ) \; , \]

with $$\gamma = \Gamma/2m$$ and $$\omega = \sqrt{\Gamma^2/4 m^2 – k/m}$$.

It is natural to ask, where does the energy go? In this model, the energy disappears with no account as to where and how. Of course, the accepted answer is straightfoward enough. The energy goes into heat; into degrees of freedom not directly tracked by this model. There is balance on the action-reaction front. There is a reaction force of the oscillator on the environment and it is this force which transfers the energy to the surroundings.

While this accepted answer is known to virtually everybody who’s taken a physics class, it is worth pausing to consider that it wasn’t neccessarily obvious to those who first grappled with this problem, nor is it obvious to the beginner who is introduced to it. In fact, it is very difficult to find a detailed argument that traces through all of the evidence in a logical fashion. It is simply common knowledge supported by our belief in the balance of action and reaction.

Case 2 – Driving forces

The second case, which is actually closely related to the first, involves a driving force that maintains some aspect of the system. Energy, momentum, and/or angular momentum enter and leave the system without being explicitly tracked. The best example of such a system is the circular restricted three-body problem. There are numerous applications of this model to spaceflight, with many missions exploiting the $$L_1$$ and $$L_2$$ Lagrange points.

In the circular-restricted three-body problem, two massive bodies are in orbit around their common barycenter. This motion honors action-reaction perfectly with the gravitational force between the bodies being equal and opposite. The ‘forced motion’ which breaks action-reaction, takes place in the idealization where a test particle is introduced into the system. The test particle is thought to be so small that its back-reaction on the motion of the two bodies can be ignored. The equation of motion is

\[ \ddot {\vec r} = m_1 \frac{\vec r_1 – \vec r}{|\vec r_1 – \vec r|^3} + m_2 \frac{\vec r_2 – \vec r}{|\vec r_2 – \vec r|^3} \; ,\]

with the motion of the primaries given by

\[ \vec r_1 = m_2 L \left( \cos(\omega t) \hat \imath + \sin(\omega t) \hat \jmath \right) \]

and

\[ \vec r_2 = -m_1 L \left( \cos(\omega t) \hat \imath + \sin(\omega t) \hat \jmath ) \right) \]

and with

\[ \omega = \frac{G(m_1 + m_2)}{L^3} \; . \]

That the motions of $$m_1$$ and $$m_2$$ are set by their interaction with no influence from the test particle is the source of the driving force. So, by construction, the model violates Newton’s third law.

But often treatments of the circular-restricted three-body problem confuse this simple point by dwelling on the conservation of the Jacobi constant. This conservation is clearly an important point as it helps in the understanding of the global properties of the solutions. Nonetheless, discussions about its importance rarely emphasize that while the model violates action-reaction, real physical situations do not. The conservation of the Jacobi constant is a useful tool but it does not actually occur in nature. Rather a close approximation results. While this distinction may seem subtle in theory it is quite important in practice – real physical situations cannot be expected to exactly conserve the Jacobi constant.

Case 3 – Velocity-Dependent Forces

The case of velocity-dependent forces is perhaps the most confusing member of this list. There are two reasons for this. The first is that the physics literature usually employs this term to mean only one kind of velocity-dependent force: the force of magnetism. Second, the force of magnetism cannot be studied without direct analysis of the fields that act as intermediaries. This latter point should be one of clarity in the community when in fact it is often one of confusion.

Case in point, the discussion in Chapter 1 of Goldstein’s Classical Mechanics concerning the weak and strong forms of action-reaction pairs. Goldstein has this interesting point to make about action and reaction

…[T]he Biot-Savart law indeed may violate both forms of the action and reaction law.*

He goes on to say that:

Usually it is then possible to find some generalization of $${\mathbf P}$$ or $${\mathbf L}$$ that is conserved. Thus, in an isolated system of moving charges it is the sum of the mechanical angular momentum and the electromagnetic “angular momentum” of the field is conserved.

Of course, this is eactly the well-known conservation law discussed in a previous post on Maxwell’s equations. There is no reason to erode the confidence in the reader by the inclusion of quotes around the field angular momentum.

The situation is somewhat relieved by a footnote that attempts to give concrete scenarios. There are two of them and this post will deal with them in turn.

The first part states

If two charges are moving uniformly with parallel velocity vectors that are not perpendicular to the line joing the charges, then the mutual forces are equal and opposite but do not lie along the vector between the charges

which, diagrammitically, looks like

Since there are only two particles, the most economical description has their motion take place in a plane. The parallel velocity vector is

\[ \hat v = \cos(\alpha) \hat \imath + \sin(\alpha) \hat \jmath \]

and the separation vector between them is

\[ \vec r_{12} = \vec r_{1} – \vec r_{2} = \rho \hat \jmath \; .\]

The combined electrostatic and magentic force on particle 1 due to particle 2 is

\[ \vec F_{12} = \frac{q^2}{4 \pi \rho^2} \left[ \mu_0 \hat v \times ( \hat v \times \hat y ) – \frac{1}{\epsilon_0} \hat y \right] \; .\]

Substituting in the form of $$\hat v$$ and simplifing gives

\[ \vec F_{12} = \frac{q^2}{4 \pi \rho^2} \left[ \mu_0 \left( \sin(\alpha) \cos(\alpha) \hat \imath – \sin^2(\alpha) \hat \jmath \right) – \frac{1}{\epsilon_0} \hat y \right] \ ;. \]

Since the only thing that changes in computing the force on particle 2 due to particle 1 is the sign on the relative position vector it is easy to see that

\[ \vec F_{21} = -\vec F_{12} \]

and action-reaction is honored. However, this force is action-reaction in its weak form where the forces don’t lie along the vector joining the two particles. This conclusion is best supported by computing the $$x$$-component of the force

\[ \vec F_{12} \cdot \hat \imath = \frac{q^2}{4 \pi \rho^2} \mu_0 \sin(\alpha) \cos(\alpha) \; , \]

which is zero for only a few isolated angles.

The second part of the footnote states

Consider, further, two charges moving (instantenously) so as to “cross the T”, i.e., one charge moving directly at the other, which in turn is moving at right angles to the first. Then the second charge exerts a nonvanishing force on the first, without experiencing any reaction force at all.

The scenario here can be pictured as

Now the statement in this footnote is clearly gibberish and it is corrected in the 3rd edition to change ‘nonvanishing force’ to ‘nonvanishing magnetic force’, so we’ll focus solely on magentic force.

From the Biot-Savart law the magnetic force on particle 1 due to particle 2 is

\[ \vec F_{12} = -\frac{\mu_0 q^2 v^2}{4 \pi \rho^2} \hat \imath \times ( \hat \jmath \times \hat \jmath ) = 0 \]

while the force on 2 due to 1 is

\[ \vec F_{21} = \frac{\mu_0 q^2 v^2}{4 \pi \rho^2} \hat \jmath \times ( \hat \imath \times \hat \jmath ) = \frac{\mu_0 q^2 v^2}{4 \pi \rho^2} \hat \imath \neq 0 \; .\]

Clearly action-reaction seems violated but strictly speaking this is not true. What should be emphasized is that there is a ‘third body’ in the mix, the electromagnetic field and that the action-reaction can be restored (although not quite point-wise) by looking for additional degrees of freedom.

Moving Frames and the Twisted Cubic

The last column presented the general theory of the method of moving frames and the concept of a minimal frame in which the time derivatives of the basis vectors, when expressed in terms of themselves, requires only two parameters rather than the usual 3. Building on that work, this column has a two-fold aim. The first is to apply these techniques to the often-studied space curve of the the twisted cubic. The second aim is to touch base with earlier columns on moving frames, thus relating the nature of the moving frame (whether minimal or not) to the traditional concept of angular velocity and, in the process, shine light on the concept of a minimal frame from a new angle.

The twisted cubic, which is the star in this month’s drama, is defined as a curve in three dimensions given by:

\[ \vec r(t) = t \hat e_x + t^2 \hat e_y + t^3 \hat e_z \; , \]

where the parameter $$t$$ can be regarded as time.

Since they will be needed later, the velocity and acceleration of the twisted cubic are given by

\[ \vec v(t) = \hat e_x + 2 t \hat e_y + 3 t^2 \hat e_z \; \]

and

\[ \vec a(t) = 2 \hat e_y + 6 t \hat e_z \; , \]

respectively.

The twisted cubic, shown in the following figure for the range $$t \in [-2,2]$$, has an unusual shape that is best revealed by projecting it to the three coordinate planes. In the $$x-y$$ plane, the curve’s projection is a simple parabola (i.e. $$y(x) = x^2$$). In the $$x-z$$ plane, the curve’s projection is a simple cubic (i.e. $$z(x) = x^3$$). Nothing amazing here. The interest lies in the projection into the $$y-z$$ plane, where a ‘cusp’ appears when $$t$$ passes through the value of 0.

Twisted Cubic - 3D View

To apply the method of moving frames, the VBN and RIC coordinate frames will be used. VBN is not minimal and RIC is, so there is a nice opportunity to compare and contrast.

While both frames were discussed in the previous column, it is convenient to reiterate their definitions and to present in detail their derivatives. In this later case, some new results will be presented.

The VBN frame is defined as

\[ \hat V = \frac{\vec v}{|\vec v|} \; ,\]

\[ \hat N = \frac{ \vec r \times \vec v }{ | \vec r \times \vec v | } \;, \]

and

\[ \hat B = \hat V \times \hat N \; .\]

Using the general results derived last time, the derivative of $$\hat V$$ is expressed as

\[ \frac{d \hat V}{dt} = \frac{\vec a}{|\vec v|} – \vec v \left( \frac{\vec v \cdot \vec a}{|\vec v|^3} \right) \; .\]

It will be convenient to define $$\vec L = \vec r \times \vec v$$. Doing so allows for the time derivative of $$\hat N$$ to be compactly written as

\[ \frac{d \hat N}{dt} = \frac{\vec r \times \vec a}{|\vec L|} – \vec L \left( \frac{\vec L \cdot \dot{\vec L}}{|\vec L|^3} \right) \; ,\]

where $$\dot{\vec L} = \vec r \times \vec a$$.

The time derivative of $$\hat B$$ is then given in terms of $$\hat V$$ and $$\hat N$$ and their derivatives as

\[ \frac{d \hat B}{d t} = \frac{d \hat V}{d t} \times \vec N + \vec V \times \frac{d \hat N}{d t} \; .\]

The RIC frame is defined by

\[ \hat R = \frac{\vec r}{|\vec r|} \; ,\]

\[ \hat C = \frac{ \vec r \times \vec v }{ | \vec r \times \vec v | } \; ,\]

and

\[ \hat I = \hat C \times \hat R \; .\]

The time derivative of $$\hat R$$ is

\[ \frac{d \hat R}{dt} = \frac{\vec v}{|\vec r|} – \vec r \left( \frac{\vec r \cdot \vec v}{|\vec r|^3} \right) \; .\]

which is a formula analogous to the one for $$\hat V$$.

Since RIC’s $$\hat C$$ is the same as VBN’s $$\hat N$$, its time derivative is identical with a minor change in symbols.

The time derivative of $$\hat I$$ is given by

\[ \frac{d \hat I}{d t} = \frac{d \hat C}{dt} \times \hat R + \hat C \times \frac{d \hat R}{dt} \; .\]

As discussed in Cross Products and Matrices, the moving frame’s motion can be expressed in terms of classical angular momentum, with the mapping $$\alpha \rightarrow -\omega_z$$, $$\beta \rightarrow \omega_y$$, and $$\gamma \rightarrow – \omega_x$$.

With this mapping in hand, one can see that minimal frames are special in that they only have two non-zero components of their angular velocity. For the twisted cubic, the three components of the VBN angular momentum are

Twisted Cubic - VBN Angular Velocity

and the three components of the RIC angular velocity are

Twisted Cubic - RIC Angular Velocity

Note that, in the VBN frame, all three components of the angular velocity are generally non-zero, whereas in the RIC, $$\omega_y$$ is always zero.

Also interesting is the time evolution of the magnitude of the angular velocity, $$|\mathbf{\omega}|$$, in the two frames.

Twisted Cubic - Mag Angular Velocity

The VBN frame is generally ‘rotating’ more slowly than the RIC frame away from the ‘origin’ but quickly rises above in the vicinity.

To close, it is important to note that minimal frames are not intrinsically better than non-minimal frames. Minimal frames may be more convenient in some contexts and more inconvenient in others, but it is likely that their best use is in analytic models since the amount of work is reduced by a third.

Minimal Frames

One of the most remarkable things about vectors is also one of the most non-intuitive things – at least judging by how hard it is to teach; namely that the rate of change in a set of vectors can be expressed in terms of the vectors themselves. I think it is hard to relate to because it is typically applied in situation where a set of unit vectors, spanning a space, are moving with respect to a set of fixed vectors. The notion that is hard to accept is that time derivatives of these vectors can be written as linear combinations of the same vectors, even though they are changing. On the surface, it seems like a contradiction where something moving and fluid is expressed in terms of something that is moving as well. A sort of self-referential recipe for confusion. And yet, this is precisely what the method of moving frames does and which has proven very successful.

In a nutshell, the method of moving frames states that the motion of a set of unit vectors $$\left\{\hat e_x, \hat e_y, \hat e_z\right\}$$, which, for example, may be the body axes for a rotating object, can be written as

\[ \frac{d}{dt} \left[ \begin{array}{c} \hat e_x \\ \hat e_y \\ \hat e_z \end{array} \right] = \left[ \begin{array}{ccc} 0 & \alpha & \beta \\ -\alpha & 0 & \gamma \\ -\beta & -\gamma & 0 \end{array} \right] \left[ \begin{array}{c} \hat e_x \\ \hat e_y \\ \hat e_z \end{array} \right] \; .\]

The assymetry of the of pre-multiplying matrix, is a consequence of the unit lengths of each member of the set and their mutual orthogonality. In other words, the orthonormality,

\[ \hat e_i \cdot \hat e_j = \delta_{ij} \; ,\]

of the set leads to the relation

\[ \frac{d}{dt} \left( \hat e_i \cdot \hat e_j \right) = 0 \;. \]

The above relation has two very different outcomes depending on what the indices $$i, j$$ are. If they are identical, then the zero on the right-hand side reflects the unit length of each of the basis vectors. When they are different, the zero reflects the fact that they are perpendicular.

To be concrete, consider the case where $$i=j=x$$. Expanding the derivative in this case leads to the relation that the time-rate-of-change of each unit vector is perpendicular to the unit vector itself:

\[ \frac{d}{dt} \left(\hat e_x \cdot \hat e_x \right) = \frac{d \hat e_x}{dt} \cdot \hat e_x + \hat e_x \cdot \frac{d \hat e_x}{dt} = 2 \hat e_x \cdot \frac{d \hat e_x}{dt} = 0\; .\]

A consequence of this relationship and the fact that the set spans the space is that the time derivative of $$\hat e_x$$must be expressed as

\[\frac{d \hat e_x}{dt} = \alpha \hat e_y + \beta \hat e_z \; , \]

where $$\alpha$$ and $$\beta$$ can be determined as shown below.

Next consider the case where $$i=x$$ and $$j=y$$. A simple expansion yields

\[\frac{d}{dt}\left( \hat e_x \cdot \hat e_y \right) = \frac{d \hat e_x}{dt} \cdot \hat e_y + \hat e_x \cdot \frac{d \hat e_y}{dt} = 0 \; ,\]

from which one can conclude that

\[ \frac{d \hat e_x}{dt} \cdot \hat e_y = – \hat e_x \cdot \frac{d \hat e_y}{dt} \; .\]

The assymetry of the matrix follows immediately.

On the surface of it, the above analysis suggests that three functions, $$\alpha, \beta, \gamma$$ are needed to fully specify the motion of a frame. A deeper analysis shows that there are cases where one of these functions can be set identically to zero. I refer to frames of this sort as being minimal frames.

A minimal frame may arise in two ways, either the particular choice of motion for the frame results in a simplification, or the frame is intrinsically minimal, regardless of how the basis vectors twist and turn.

The interest in this column is the latter case (although the former will be touched upon).

The scenario that will be analyzed is where the basis vectors are defined locally on a curve, typically in terms of the position, $$\vec r(t)$$ and the velocity, $$\vec v(t)$$. Since normalization plays an important rule, a wise approach is to explore how to take derivatives of functions of vector norms in generic terms and then apply them widely and fruitfully.

The prototype function is the norm itself, expressed, in terms of an arbitrary vector $$\vec A$$, as

\[ |\vec A | = \sqrt{ A_x^2 + A_y^2 + A_z^2 } \; .\]

The partial derivative of the norm with respect to one of the components is given by

\[ \frac{\partial |\vec A|}{\partial A_i} = \frac{1}{2} \left( A_x^2 + A_y^2 + A_z^2 \right)^{1/2} \frac{\partial}{\partial A_i} \left( A_x^2 + A_y^2 + A_z^2 \right) = \frac{A_i}{|\vec A|} \; .\]

Unitizing a vector involves dividing by the norm, so the corresponding derivative,

\[ \frac{\partial}{\partial A_i} \frac{1}{|\vec A|} = -\frac{A_i}{|\vec A|^3} \; ,\]

is also handy to have lying around.

From these basic pieces, total time derivatives are constructed from the chain rule as

\[ \frac{d}{dt} |\vec A| = \frac{\partial |\vec A|}{\partial A_i} \frac{d A_i}{dt} = \frac{A_i {\dot A}_i}{|\vec A|} \]

and

\[ \frac{d}{dt} \frac{1}{|\vec A|} = – \frac{A_i {\dot A}_i}{|\vec A|^3} \; .\]

All the needed tools are at our fingertips so let’s dig in.

The most famous minimal frame is the Frenet-Serret, defined by the set

\[ \hat T = \frac{\vec v}{|\vec v|} \; ,\]

\[ \hat B = \hat T \times \hat N \; ,\]

and

\[ \hat N = \frac{d \hat T}{d t} / \left| \frac{d \hat T}{d t} \right | \; \]

By definition, the derivative of $$\hat T$$ is expressed solely in terms of $$\hat N$$, and the Frenet-Serret frame is minimal by construction. It is conventional to rescale the derivatives to express them in terms of arc-length with

\[ \frac{d}{dt} \hat T = \frac{d \hat T}{d s} \frac{ds}{dt} = v \frac{d \hat T}{d s} \]

as the key formula. Once the conversion has been performed, the definition of $$\hat N$$ takes a particularly simple form

\[ \frac{d}{d s} \hat T \equiv \kappa \hat N \;\]

The other derivatives follow suite, giving the well-known Frenet-Serret relations

\[ \frac{d}{d s} \hat N = – \kappa \hat T + \tau \hat B \]

and

\[ \frac{d}{d s} \hat B = -\tau \hat N \; .\]

The Frenet-Serret frame is not the only useful moving frame. Within the field of astrodynamics, there are several moving frames that help in understanding the motion of heavenly and man-made satellites. Two of the most useful are the VBN and RIC frames.

The VBN frame is defined as

\[ \hat V = \frac{\vec v}{|\vec v|} \; ,\]

\[ \hat N = \frac{ \vec r \times \vec v }{ | \vec r \times \vec v | } \;, \]

and

\[ \hat B = \hat V \times \hat N \; .\]

Patterned in concept after the Frenet-Serret frame, VBN frame’s $$\hat V$$ vector is the same as $$\hat T$$. However VBN differs in one essential way. It’s $$\hat N$$ points along the instantaneous angular momentum vector of the trajectory. This means that there is always a roll-angle about $$\hat T \equiv \hat V$$ to bring the two frames into alignment. As a result VBN is not a minimal frame.

The proof of this result starts from the computation of the time derivative of $$\hat V$$ given by

\[ \frac{d \hat V}{dt} = \frac{d}{dt} \frac{\vec v}{|\vec v|} = \frac{\vec a}{|\vec v|} – \vec v \left( \frac{\vec v \cdot \vec a}{|\vec v|^3} \right) \; ,\]

using the formulas derived above.

As a check that the computation was carried through correctly, note that time derivative of $$\hat V$$ is perpendicular to $$\hat V$$ itself:

\[ \frac{d \hat V}{dt} \cdot \hat V = \frac{\vec a \cdot \vec v}{|\vec v|^2} – \vec v \cdot \vec v \left( \frac{\vec v \cdot \vec a}{|\vec v|^4} \right) = 0 \; .\]

The time-derivatives of $$\hat N$$ and $$\hat B$$ are straightforward but tedious and are not needed to demonstrate the frame is non-minimal. All that is needed is to take the scalar product of the time derivative $$\hat V$$ with these vectors and then to use the fact the assymetry of the matrix.

Doing so with $$\hat N$$ yields

\[ \frac{d \hat V}{dt} \cdot \hat N = \frac{\vec a \cdot (\vec r \times \vec v) }{|\vec v||\vec r \times \vec v|} – \vec v \cdot (\vec r \times \vec v) \left( \frac{\vec v \cdot \vec a}{|\vec v|^3 |\vec r \times \vec v|} \right) \; . \]

The second term vanishes by the cyclic property of the triple-scalar product and the first can be transformed using the same rule into

\[ \frac{d \hat V}{dt} \cdot \hat N = \frac{\vec v \cdot (\vec a \times \vec r) }{|\vec v||\vec r \times \vec v|} \; . \]

Only in special cases of central forces, where $$\vec a$$ is parallel to $$\vec r$$ is this derivative zero; generically it is not.

The final step is to replace $$\hat N$$ with $$\hat B$$ in the scalar product. This yields

\[ \frac{d \hat V}{dt} \cdot \hat B = \frac{d \hat V}{dt} \cdot (\hat V \times \hat N) = \hat V \cdot \left(\frac{d \hat V}{dt} \times \hat N \right) \; .\]

Expanding the right-hand side of this relation gives

\[ \frac{d \hat V}{dt} \cdot \hat B = \frac{ \vec v \cdot [\vec a \times (\vec r \times \vec v)] }{ |\vec v|^3 |\vec r \times \vec v |} – \frac{ \vec v \cdot [\vec v \times (\vec r \times \vec v)]}{|\vec v|^5 |\vec r \times \vec v|} \; .\]

The second terms is identically zero as can be seen by expanding using the BAC-CAB rule

\[ \vec v \cdot [\vec v \times (\vec r \times \vec v)] = \vec v \cdot \left[ \vec r (\vec v \cdot \vec v) – \vec v (\vec v \cdot \vec r) \right ] = 0 \; .\]

Using the same technique, one concludes that the first term is generally not zero

\[ \vec v \cdot [\vec a \times (\vec r \times \vec v)] = \vec v \cdot \left[ \vec r (\vec v \cdot \vec a) – \vec v (\vec a \cdot \vec r) \right ] \; ,\]

since, generally, the acceleration need not cause a cancellation (although in the case of a central force $$\vec a \cdot \vec v = 0$$ and VBN is a minimal frame).

The other, commonly-used astrodynamics frame is the RIC frame, defined by

\[ \hat R = \frac{\vec r}{|\vec r|} \; ,\]

\[ \hat C = \frac{ \vec r \times \vec v }{ | \vec r \times \vec v | } \; ,\]

and

\[ \hat I = \hat C \times \hat R \; .\]

Note that $$\hat C$$ is the same as VBN’s $$\hat N$$. The RIC frame is minimal, as can be seen by computing the time derivative of $$\hat C$$,

\[ \frac{d}{dt} \hat C = \frac{ \vec r \times \vec a}{|\vec r \times \vec v|} – \frac{ \vec r \times \vec v}{|\vec r \times \vec v|^3} \left[ (\vec r \times \vec v) \cdot (\vec r \times \vec a) \right] \; , \]

and then forming its scalar product with $$\hat R$$,

\[ \frac{d \hat C}{d t} \cdot \hat R = A (\vec r \times \vec a ) \cdot \vec r + B (\vec r \times \vec v) \cdot \vec r \; ,\]

where $$A$$ and $$B$$ are scalar functions whose form is irrelevant for this discussion.

Exploiting the cyclic property of the triple scalar product gives

\[ \frac{d \hat C}{d t} \cdot \hat R = 0 \; .\]

This means that the time derivative of $$\hat C$$ is proportional only to $$\hat I$$ thus proving that the RIC frame is minimal.

Relative Motion and Waves Redux

Last month’s column dealt with the transformation of the wave equation under a Galilean transformation. Under this transformation, a wave moving with speed $$c$$ in a medium (e.g. a string or slinky or whatever) itself moving with speed $$v$$ with respect to some observer will be seen by that observer to be moving with speed $$c \pm v$$ depending on direction. To get this result, some clever manipulations had to be made that related a set of mixed partial derivatives in one frame with respect to the other.

It is natural enough to ask if the wave equation admits other transformations and, if so, do any of these represent anything physical. The idea lurking around is special relativity and the famous null result of the Michelson-Morley interferometer. But for the sake of this post, we will assume only simple physics, dictated by symmetry, and will discover that the Lorentz transformation falls out of the mathematical constraints that arise from the wave equation.

The situation is the same as in the previous post, with an observer $${\mathcal O}’$$ seeing the wave generated in the frame of $${\mathcal O}$$.

This time the transformation used will be symmetric in space and time, with the form

\[ x’ = \gamma( x + vt ) \]

and

\[ t’ = \gamma( t + \alpha x) \; .\]

The term $$\gamma$$ is a dimensionless quantity (whose value, at this point, may be unity) while $$\alpha$$ has units of an inverse velocity. Both of them will be determined from the transformation and some simple physical principles.

Assume, as before, that the wave equation as described by $${\mathcal O}$$ is given by

\[ c^2 \frac{\partial^2 f}{\partial x^2} – \frac{\partial^2 f}{\partial t^2} = 0 \; .\]

The transformation of the partial derivative with respect to the spatial variable in the unprimed to the primed frame is obtained through the chain rule

\[ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial x’} \frac{\partial x’}{\partial x} + \frac{\partial f}{\partial t’} \frac{\partial t’}{\partial x} \; , \]

with an analogous relation for the partial derivatives with respect to time.

Using the transformation above gives

\[ \frac{\partial f}{\partial x} = \gamma \frac{\partial f}{\partial x’} + \alpha \frac{\partial f}{\partial t’} \]

and

\[ \frac{\partial f}{\partial t} = \gamma v \frac{\partial f}{\partial x’} + \gamma \frac{\partial f}{\partial t’} \; .\]

The second partial derivatives are obtained in the same way, but because of the transformation, four terms result from the expansion of

\[ \frac{\partial^2 f}{\partial x^2} = \frac{\partial}{\partial x} \left( \gamma \frac{\partial f}{\partial x’} + \alpha \frac{\partial f}{\partial t’} \right) \]

rather than 2 or 3 in the Galilean case considered before.

Assuming that the mixed partials commute, the middle two terms combine to give

\[ \frac{\partial^2 f}{\partial x^2} = \gamma^2 \frac{\partial^2 f}{\partial x’^2} + 2 \gamma^2 \alpha \frac{\partial^2 f}{\partial x’ \partial t’} + \gamma^2 \alpha^2 \frac{\partial^2 f}{\partial t’^2} \; .\]

Likewise, the second partial derivative with respect to time

\[ \frac{\partial^2 f}{\partial t^2} = \frac{\partial}{\partial t} \left( \gamma v \frac{\partial f}{\partial x’} + \gamma \frac{\partial f}{\partial t’} \right) \]

becomes

\[ \frac{\partial^2 f}{\partial t^2} = \gamma^2 v^2 \frac{\partial^2 f}{\partial x’^2} + 2 \gamma^2 v \frac{\partial^2 f}{\partial t’ \partial x’} + \gamma^2 \alpha^2 \frac{\partial^2 f}{\partial t’^2} \; .\]

Substituting these results into the original wave equation leads to a wave equation expressed in $${\mathcal O}’$$’s frame of

\[ c^2 \frac{\partial^2 f}{\partial x^2} – \frac{\partial^2 f}{\partial t^2} = \gamma^2(c^2 – v^2) \frac{\partial^2 f}{\partial x’^2} – \gamma^2 \left(1 – \frac{v^2}{c^2} \right) \frac{\partial^2 f}{\partial t’^2} \\ + 2 \gamma^2(\alpha c^2 – v) \frac{\partial^2 f}{\partial t’ \partial x’} = 0 \; .\]

There is no need to argue the mixed partial derivative into a different form (as was the case in the Galelean case) since a simple selection of

\[ \alpha c^2 – v = 0 \]

eliminates the term entirely. Solving this equation determines the unknown term $$\alpha$$ as

\[ \alpha = \frac{v^2}{c^2} \; ,\]

leaving the wave equation to take the invariant form of

\[ c^2 \frac{\partial^2 f}{\partial x’^2} – \frac{\partial^2 f}{\partial t’^2} = 0 \; .\]

In other words, the speed of the wave is constant regardless of how either the source or the observer move with respect to each other.

The only unknown term, $$\gamma$$, can be determined by the requirement that the transformation be invertible and that the determinant of the matrix be unity. The reason for the first requirement rests with the fact that there is no preferred frame – that the physics described on one frame can be related to the other no matter which frame is picked as the starting point. Failure to impose the second requirement would mean that the physics in one frame could be changed simply by transforming to another frame and then transforming back.

Since the transformation can be written matrix form as

\[ \left[ \begin{array}{c} x’ \\ t’ \end{array} \right] = \left[ \begin{array}{cc} \gamma & \gamma v \\ \gamma \frac{v}{c^2} & \gamma \end{array} \right] \left[ \begin{array}{c} x \\ t \end{array} \right] \]

then the requirement of a unit determinant means

\[ \gamma^2 – \gamma^2 \frac{v^2}{c^2} = 1 \; .\]

Solving for $$\gamma$$ give the famous Lorentz-Fitzgerald contraction

\[ \gamma = \frac{1}{\sqrt{1-v^2/c^2}} \; .\]

Note that in the limit as $$v \rightarrow 0$$, $$\gamma \rightarrow 1$$ and $$\alpha \rightarrow 0$$, thus restoring the Galilean transformation.

Thus only by starting with a symmetry principle that treats space and time on equal footing and some simple physical requirements on the wave equation, we can find a transformation that predicts that the speed of the wave is constant with respect to all observers. This is exactly what is seen in experiments and the resulting transformation is the gateway into the full machinery of special relativity.

Relative Motion and Waves

Wave motion is ubiquitous; from the sound waves we use to hear, to waves on a string instrument, from water waves in the ocean, to light waves from the nearest lamp. The universe presents countless examples of waving phenomena. And yet wave motion is often difficult to understand. This month’s column discusses an interesting aspect of waves that I had not fully appreciated until I was casually reading through Electricity and Magnetism for Mathematicians: A Guided Path from Maxwell’s Equations to Yang-Mills by Thomas A. Garrity.

Bought on a whim, I thought I might get some insight into how mathematicians see electricity and magnetism, specifically, and physics, in general, from the book. Their perspective is generally quite different from that of physicists, since their focus is mostly on rigor and the machinery (how a conclusion is obtained) rather than the particular physical basis for that conclusion. In the past, I have uncovered some useful nuggets on physics from similar texts.

Well, I didn’t have to wait long to uncover one from Garrity. In his chapter 3 on electromagnetic waves, I found a derivation on the nature of the wave equation under Galilean transformations that I hadn’t seen or, perhaps, hadn’t appreciated before.

Physics textbook discussions of the Galilean transformation of the wave equation carry through the transformation of the partial derivative from one frame to the other. In doing so, a mixed second-order term emerges and the derivations I’ve seen stop there with the statement that Galilean transformations ruin the form of the wave equation. But, as Garrity shows, there is an additional step that one can make, and that one step reveals a lot of structure.

To start, consider the wave equation in one dimension

\[ c^2 \frac{\partial^2 f}{\partial x^2} – \frac{\partial^2 f}{\partial t^2} = 0 \; .\]

As is well known, a function of the form $$f(x \pm ct)$$ is a solution to the wave equation. This is most easily seen by defining $$p_{\pm} = x \pm c t$$ and using the chain rule where each of the space or time derivatives is mapped to derivatives with respect to $$p_{\pm}$$.

The first derivatives are

\[ \frac{\partial f}{\partial x} = \frac{d f}{d p_{\pm}}\frac{\partial p_{\pm}}{\partial x} = \frac{d f}{d p_{\pm}} \]

and

\[ \frac{\partial f}{\partial t} = \frac{d f}{d p_{\pm}}\frac{\partial p_{\pm}}{\partial t} = \pm c \frac{d f}{d p_{\pm}} \; .\]

The second derivatives are

\[ \frac{\partial^2 f}{\partial x^2} = \frac{d}{d p_\pm} \left( \frac{d f}{d p_{\pm}} \right)\frac{\partial p_{\pm}}{\partial x} = \frac{d^2 f}{d p_{\pm}^2} \]

and

\[ \frac{\partial^2 f}{\partial t^2} = \frac{d}{d p_{\pm}} \left( \pm c \frac{d f}{d p_{\pm}} \right)\frac{\partial p_{\pm}}{\partial t} = c^2 \frac{d^2 f}{d p_{\pm}^2} \; .\]

Using this construction, arbitrarily complex waveforms can be constructed without the bother of decomposing them into their Fourier components. For example, the Gaussian pulse

\[ f(x-ct) = e^{-(x-ct)^2} \]

is an exact solution of the wave equation with a pulse height that propagates to the right with a velocity $$c$$.

pulse_propagation

Now suppose that this pulse is propagated along a long string at rest in the frame of an observer $${\mathcal O}$$ and that it persists for a long period of time (i.e. $$c \, t << L$$), so that it can be properly observed. Further, suppose that that string is aboard a moving train that propagates with velocity $$\pm v$$ with respect to an observer $${\mathcal O}'$$ who watches from the ground. The ground-based observer $${\mathcal O}'$$ should be able to conclude that the pulse propagates at speed $$c \mp v$$, without knowing anything about strings, waves, or calculus. The operative question thus becomes whether or not the wave equation supports that conclusion. moving_observers

To decide, the behavior of the wave equation under a Galilean transformation is required.

The transformation linking observations made by $${\mathcal O}’$$ to those made by $${\mathcal O}$$ are

\[ x’ = v t + x \]

and

\[ t’ = t \; ,\]

with the inverse transformations given by

\[ x = x’ – v t’ \]

and

\[ t = t’ \; .\]

The strategy consists of using the chain rule to express the derivatives in observer $${\mathcal O}$$’s frame to those in the frame of $${\mathcal O}’$$. The first derivatives are:

\[ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial x’}\frac{\partial x’}{\partial x} + \frac{\partial f}{\partial t’}\frac{\partial t’}{\partial x} = \frac{\partial f}{\partial x’} \]

and

\[ \frac{\partial f}{\partial t} = \frac{\partial f}{\partial x’}\frac{\partial x’}{\partial t} + \frac{\partial f}{\partial t’}\frac{\partial t’}{\partial t} = v \frac{\partial f}{\partial x’} + \frac{\partial f}{\partial t’} \; . \]

The second spatial derivative follows immediately as

\[ \frac{\partial^2 f}{\partial x^2} = \frac{ \partial^2 f}{\partial x’^2} \; .\]

The second time derivative is a bit more complicated and is best handled in steps. Substitute in for the first derivative to get

\[ \frac{\partial^2 f}{\partial t^2} = \frac{\partial }{\partial t} \left( v \frac{\partial f}{\partial x’} + \frac{\partial f}{\partial t’} \right) \; .\]

Treating the term in parentheses as a new function $$f$$, the rule used above applies again and a subsequent expansion and collection of terms arrives at

\[ \frac{\partial^2 f}{\partial t^2} = v^2 \frac{\partial^2 f}{\partial x’^2} + 2 v \frac{\partial^2 f}{\partial x’ \partial t’} + \frac{\partial^2 f}{\partial t’^2} \; .\]

Combining the two derivatives gives the wave equation in the frame of $${\mathcal O}’$$ as

\[ (c^2 – v^2) \frac{\partial^2 f}{\partial x’^2} – 2 v \frac{\partial ^2 f} {\partial x’ \partial t’} – \frac{\partial^2 f}{\partial t’^2} = 0 \; . \]

Note the mixed second-order term; this is exactly where most textbooks stop.

But for a right moving pulse, $$f(x-c\,t)$$, the first derivatives are related by

\[ \frac{\partial f}{\partial x} = \frac{d f}{d p_+} \; ,\]

\[ \frac{\partial f}{\partial t} = -c \frac{d f}{d p_+} \; ,\]

and

\[ c \frac{\partial}{\partial x} = -\frac{\partial}{\partial t} \; .\]

Using this relation eliminates that mixed term. First, using the rules derived above, eliminate the partial derivatives in the unprimed frame in favor of those in the primed frame to get

\[ c \frac{\partial}{\partial x’} = – v \frac{\partial}{\partial x’} – \frac{\partial}{\partial t’} \; .\]

Lastly, eliminate the time derivative in the second-order mixed term in favor of the space derivative to yield

\[ (c^2 – v^2) \frac{\partial^2 f}{\partial x’^2} + 2(c+v)v \frac{\partial f^2}{\partial x’^2} – \frac{\partial^2 f}{\partial t’^2} = 0 \; .\]

The two terms multiplying the second-order space derivative can be combined

\[ c^2 – v^2 + 2cv + 2 v^2 = c^2 + 2cv + v^2 = (c+v)^2 \]

and the wave equation becomes

\[ (c+v)^2 \frac{\partial^2 f}{\partial x’^2} – \frac{\partial^2 f}{\partial t’^2} = 0 \; .\]

From basic notions of relative motion, observer $${\mathcal O}’$$ would expect a right-moving wave moving with speed $$c$$ in $${\mathcal O}$$’s frame to move at speed $$c+v$$ and that is exactly what the wave equation asserts.

For a left-moving pulse ($$f = f(x+ct)$$), the signs change a bit:

\[ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial p_-} \; ,\]

\[ \frac{\partial f}{\partial t} = + c \frac{\partial f}{\partial p_-} \; , \]

and

\[ c \frac{\partial}{\partial x} = \frac{\partial}{\partial t} \; .\]

Similar manipulations then lead to the corresponding relation in $${\mathcal O}’$$

\[ \frac{\partial }{\partial t’} = (c-v) \frac{\partial}{\partial x’} \]

and the analogous wave equation

\[ (c-v)^2 \frac{\partial^2 f}{\partial x’^2} – \frac{\partial^2 f}{\partial t’^2} = 0 \; .\]

Again, basic notions of relative motion demand that observer $${\mathcal O}’$$ sees a left-moving wave with speed $$c$$ in $${\mathcal O}$$’s frame to move at speed $$c-v$$ and that is exactly what the wave equation gives.

Plasma Oscillations

This month’s installment focuses on that aspect of plasma dynamics that is strongly analogous with mechanical vibration: plasma oscillations.  Plasma oscillations were first predicted by Paul Langmuir in the 1920s and later observed in vacuum tubes of the day.

These oscillations manifest themselves as cooperative movements in a plasma’s charge distribution that behave sinusoidalally.  The harmonic nature of these deviations from perfect neutrality is basically due to an electrostatic restoring force that, for small amplitudes around equilibrium, is well approximated as quadratic.  Thus the electrostatic restoring force gives rise to a harmonic potential.  The analog with mechanical oscillations is completed with the identification of the electron and ion centers-of-mass velocities with the macroscopic kinetic energy of the mass on the spring.  Plasma oscillations provide strong support for the basic assumptions of electromagnetic theory and a nice example of how mechanics and electrodynamics meet in a physically relevant situation.

The aim here is to derive the formulae for the plasma oscillations for two related cases and then to interpret the results in terms of mechanical vibration with the familiar spring mass system.

Imagine that there is a slab of plasma of length $$\ell_0$$, with cross-sectional area $$A$$ and an initial number density of ions and electrons of $$n_0$$.  Since the condition of quasi-neutrality is in effect, there is no net electric field in any part of the plasma.  Now imagine that due to an outside agency, all of the electrons are displaced by the very small amount of $$\Delta x$$ to the left, leaving a net positive charge on the right and net negative charge on the left, with the thickness of these two ‘non-neutral’ regions is $$\Delta x$$.

plasma-oscillations

How does the plasma respond to this out of equilibrium condition? The first approximation one can make is that the electrons are free to move but the ions are fixed in space.  This corresponds to the physical approximation that the ions have an infinite mass – an approximation that is well supported by the ~2000 – 1 ratio between the mass of a proton and that of an electron.

The electrons will feel a strong restoring force (red arrow) and they will move accordingly.  Since the ions are fixed, they will feel an equal but opposite restoring force (blue arrow) but they will not be able to respond.

The electric field that the electrons and ions will feel, which is most easily obtained from Gauss’s law, is non-zero between the strips of positive and negative charge in an exactly analogous fashion as to the textbook problem on parallel plate capacitors.  The density of ions in the red region is $$n_0$$ and the volume is $$A \Delta x$$.   Since $$\Delta x$$ is small compared to the other dimensions, the electric field is closely approximated as being entirely in the $$x$$-direction and the total flux through a Gaussian surface whose boundaries coincide with the ion region is due strictly to the left and right faces.  The right face flux is exactly zero since the flux from the electrons exactly cancels the flux from the ions.  The left face flux, which is equal to the total flux, is then

\[ \Phi_E = E A \; , \]

which is proportional to the total charge enclosed

\[ \Phi_E = \frac{e n_0 A \Delta x}{\epsilon_0} \; .\]

Equating gives

\[ E= \frac{e n_0 }{\epsilon_0 } \Delta x \; ,\]

which is the usual result for the field between a parallel plate capacitor ($$n_0 \Delta x$$ is the surface charge density).

As the electrons were all displaced uniformly and are experiencing a uniform force, they will move together as a unit, that is to say cooperatively.  The mass of the unit is simply the mass density $$m_e n_0$$ times the volume consumed $$V = A \ell_0$$.  The center of mass of the slab is at $$\ell/2- \Delta x$$ with the corresponding acceleration $$\Delta {\ddot x}$$.  The force on the slab is

\[ F = q_{tot} E = -e n_0 V \frac{e n_0}{ \epsilon_0} \Delta x\]

and the equation of motion of the entire slab is given by:

\[ m_e n_0 V \frac{d^2}{dt^2} \Delta x = -e n_0 V \frac{e n_0} {\epsilon_0} \Delta x \; .\]

Dividing out the common factors gives

\[ \frac{d^2}{dt^2} \Delta x = -\frac{e^2 n_0}{m_e \epsilon_0} \Delta x \; .\]

From this equation we would conclude that, for small displacements, the electron slab would oscillate at the angular frequency

\[ \omega_{pe} = \sqrt{ \frac{e^2 n_0}{m_e \epsilon_0} } \; , \]

which is called the electron plasma frequency (note it is really an angular frequency).

Relaxing the assumption that the ions don’t move makes the analysis somewhat more complicated but only changes the result in a small but crucial way.  The location of both electron and ion centers-of-mass now matter and must be tracked separately.  The picture of their motion is now

plasma-oscillations_2

Since both species are free to move, the thickness of the exposed electron or ion regions (red and blue) is not determined solely by the movement of their respective centers-of-mass (the ion region pictured above is much larger than $$\Delta x_i$$) .  The total thickness of the exposed regions depends on the relative motion between both centers-of-mass

\[ t = \Delta x_i – \Delta x_e \; .\]

An initial analysis may suggest that there are 8 separate cases: positive or negative variations for each species (2×2) as well as whether the ions are more displaced versus the electrons.  In fact, there are only two cases based on whether $$t$$ is positive (ion strip is on the right) or whether $$t$$ is negative (ion strip is to the left).  The following table shows the 8 variations for $$\ell = 1$$.

ion_electron_slab_movement

The parameters with the ‘min’ designate the leftmost extent of the electron slab ($$e_{min}$$) and the ion slab ($$i_{min}$$).  The parameter $$t_{low} = i_{min} – e_{min}$$.  A negative value, such as in case 1, means that the ion slab has shifted more to the left than the electrons and that the exposed ion region (red) is now on the left while the exposed electron region (blue) is now on the right.  A positive value of $$t_{low}$$ means the converse.  The corresponding parameters at the rightmost side are defined in an analogous fashion.  The color coding assists in visualizing which species is in which location.  Note that the entire table can be summarized by $$\Delta x_i – \Delta x_e$$, whose magnitude and sign match $$t_{low}$$ and $$t_{high}$$.  Also note that the displacements in the table are purely geometric and that the actual dynamic displacements would not shift the total center-of-mass since the only forces in the problem are internal.

The electric field in between the two uncovered charge regions can also be obtained by Gauss’s law, but it is instructive to use the expression for the electric field due to a sheet of charge that and superposition. The electric field due to the ions will have a magnitude of

\[ E_i = \frac{e n_0}{2 \epsilon_0} \left(\Delta x_i – \Delta x_e \right) \]

using the standard result from elementary E&M.

Likewise the electric field due to the electrons will also have the same magnitude $$E_e = E_i$$. Their directions will be the same in the region between them (the neutral plasma) and opposite outside, so that a non-zero field with magnitude $$2 E_i$$ will result only in the middle region. The direction of the field depends on which one is on the right versus the left but it doesn’t matter for the sake of the dynamical analysis.
The equation of motion for each slab is then given by

\[ m_i \frac{d^2}{dt^2} \Delta x_i = -\frac{e^2 n_0}{\epsilon_0} \left(\Delta x_i – \Delta x_e\right) \]

and

\[ m_e \frac{d^2}{dt^2} \Delta x_e = \frac{e^2 n_0}{\epsilon_0} \left(\Delta x_i – \Delta x_e\right) \; . \]

Combining these two equations gives

\[ \frac{d^2}{dt^2} \left( \Delta x_i – \Delta x_e \right) = -\frac{e^2 n_0}{\epsilon_0}\left(\frac{1}{m_e} + \frac{1}{m_i} \right) \left(\Delta x_i – \Delta x_e\right) \]

from which we conclude that the frequency is now

\[ \omega_p^2 = \frac{e^2 n_0}{\epsilon_0}\left(\frac{1}{m_e} + \frac{1}{m_i} \right) = \frac{e^2 n_0}{\epsilon_0 m_e} + \frac{e^2 n_0}{\epsilon_0 m_i} \equiv \omega_{pe}^2 + \omega_{pi}^2 \; .\]

The above analysis has a direct analog in a mechanical system of two unequal masses $$m_1$$ and $$M_2$$ connected by a spring of spring constant $$k$$ and equilibrium length $$\ell_0$$.

coupled_oscillators

Defining the deviations of each mass from the equilibrium positions as

\[ \Delta x_1 = x_1 – x_1^{(0)} \]

and

\[ \Delta x_2 = x_2 – x_2^{(0)} \]

the distance between them takes the form

\[ x_2 – x_1 = x_2^{(0)} + \Delta x_2 – x_1^{(0)} – \Delta x_1 \equiv \ell \; .\]

Combining these two definitions gives

\[ x_2 – x_1 = \Delta x_2 – \Delta x_1 + \ell_0 \equiv \delta \ell + \ell_0 \]

from which the potential follows as

\[ V = \frac{1}{2} k (\ell – \ell_0)^2 = \frac{1}{2} k (\Delta x_2 – \Delta x_1)^2 \; .\]

The Lagrangian takes the form

\[ L = \frac{1}{2} m_1 \Delta \dot x_1^2 + \frac{1}{2} M_2 \Delta \dot x_2^2 – \frac{1}{2} k (\Delta x_2 – \Delta x_1)^2 \; , \]

with the equations of motion becoming

\[ \frac{d}{dt} \frac{\partial L}{\partial \dot x_1} – \frac{\partial L}{\partial x_1} = m_1 \Delta \ddot x_1 + k(\Delta x_2 – \Delta x_1 ) = 0 \]

and

\[ \frac{d}{dt} \frac{\partial L}{\partial \dot x_2} – \frac{\partial L}{\partial x_2} = M_2 \Delta \ddot x_2 – k(\Delta x_2 – \Delta x_1 ) = 0 \]

These equations combine nicely into one expression for the dynamics of the change in the separation between the two masses

\[ \delta \ddot \ell = -\left( \frac{k}{m_1} + \frac{k}{M_2} \right) \delta \ell \; , \]

which immediately tells us that the frequency of the oscillations is given by

\[ \omega^2 = k \left(\frac{1}{m_1} + \frac{1}{M_2} \right) = \omega_1^2 + \omega_2^2 \; .\]

In the limit as $$ M_2 \rightarrow \infty $$, the equation of motion simplifies to

\[ \delta \ddot \ell + \frac{k}{m_1} \delta \ell = 0 \; ,\]

with frequency of

\[ \omega^2 = \frac{k}{m_1} = \omega_1^2 \; .\]

Thus there is a perfect analogy between the two pictures and the cooperative motion of the electrons and ions within the plasma can be interpreted in strictly mechanical terms.