Latest Posts

Dynamical Systems and Constraints: Part 3 - Simple Pendulum

In the spirit of simplicity embraced by the last post, this month’s analysis also deals with motion in one-dimension. As shown in that analysis, the naive application of Newton's laws to motion of a bead along a curved wire failed except in the simplest of cases – that of motion along a straight line (i.e., essentially down an inclined plane). Instead, Lagrange’s method was used where one of the dynamical degrees of freedom was eliminated by direct substitution of the constraint into the Lagrangian. This post examines another (deceptively) simple problem, the motion of a bob at the end of an ideal pendulum, and compares and contrasts direct substitution of the constraint with the use of Lagrange multipliers. The idea is to examine how Lagrange multipliers work and what information they provide in a problem where one can easily understand all the ‘moving parts’. The content of the post is inspired by the fine write-up on Lagrange multipliers by Ted Jacobson for a course on classical mechanics. I've taken Jacobson's basic discussion on Lagrange multipliers and extended it in several directions; nonetheless, the insights on that particular topic are due to him.

To begin, assume that the pendulum bob can move in the plane subject to the forces of gravity and of an inextensible rod of length R to which the bob is attached.

The Lagrangian for arbitrary two-dimensional motion, subjected to gravity along the x-direction, expressed in plane polar coordinates is

 L = \frac{1}{2} m \left( \dot \rho^2 + \rho^2 \dot \phi^2 \right) + m g r \cos \phi \; ,

where the zero of the potential energy is taken to be the lowest point in the bob's motion and where the constraint \rho = R has not yet been used.

Before digging into the Lagrangian method, it is worth taking a step back to see how this system is described in the Newtonian picture. The forces on the bob at any given point of the motion in the plane lead to a vector equation that is determined by the well-known form of the acceleration in terms of the radial and azimutal unit vectors

 \ddot {\vec r} = (\ddot \rho - \rho \dot \phi^2) \hat e_{\rho} + (2 \dot \rho \dot \phi + \rho \ddot \phi) \hat e_{\phi} \; .

Applying Newton's laws to the force vectors shown in the figure gives

 m g \cos \phi - T = m (\ddot \rho - \rho \dot \phi^2 ) \;


 -m g \sin \phi = m ( 2 \dot \rho \dot \phi + \rho \ddot \phi )\; .

Because the radius is fixed at \rho = R, the kinematics of the pendulum require \dot \rho = 0 and \ddot \rho = 0. The first equation provides a mechanism for calculating T as

 T = m g \cos \phi + m R \dot \phi^2 \; .

The second equation gives the classic equation of motion for the pendulum

 R \ddot \phi + g \sin \phi = 0 \; .

These simple equations will help in interpretting the Lagrangian analysis.

Directly Impose the Constraint

Since the constraint is holonomic, it can be directly imposed on the Lagrangian, allowing for the radial degree of freedom to be eliminated, yielding

 L = \frac{1}{2} m R^2 \dot \phi^2 + m g R \cos \phi \; .

The conjugate momentum is

 p_{\phi} = m R^2 \dot \phi \; ,

the generalized force is

 Q_{\phi} =  - m g R \sin \phi \; ,

and the resulting equation of motion is

 R \ddot \phi + g \sin \phi = 0 \; ,

exactly the result obtained from Newton's laws, where computation of \hat e_{\rho} and \hat e_{\phi} and slogging through a vector set of equations was needed.

It is worth noting how much easier it was to get to these equations than the path through Newton's laws.

Using Lagrange Multipliers

The alternative way of handling the constraint is by leaving both degrees of freedom (radial and azimuthal) in the Lagrangian, augmenting it with the dynamical constraint. The key is in determining how this augmentation is to be done.

Jacobson argues from analogy with multi-variable calculus. In order to optimize a function f(x,y,z) subject to a constraint C(x,y,z) = 0, one must limit the gradient of the function so that

 df = - \lambda d C \; .

Since C=0,

 \lambda d C = C d\lambda + \lambda d C = d( \lambda C) \; .

This manipulation allows for the combination d(f + \lambda C) to now represent an unconstrained optimization problem. In the event there are multiple constraints, each constraint gets a corresponding Lagrange multiplier and is added to the function.

The generalization to variational problems comes by identifying f as the action S. The interesting part is the generalization required to include the constraints. Since the constraint is active at each point of the motion, there are an infinity of constraints, and each one must be included in the action at each time by integration. The resulting unconstrained variational principle is

 \int dt (L + \lambda(t) C(t)) \; .

The new Lagrangian is

 L' = L + \lambda C = \frac{1}{2} m (\dot \rho^2 + \rho^2 \dot \phi^2) + m g \rho \cos \phi + \lambda (\rho - R) \; .

The corresponding radial equation is

 m \ddot \rho = m \rho \dot \phi^2 + m g \cos \phi + \lambda \;

and the azimuthal equation

 m \rho^2 \ddot \phi + 2 m \rho \dot \rho \dot \phi + m g \rho \sin \phi = 0 \; .

To these equations must be added the constaint to yield

  \lambda = - m \rho \dot \phi^2 - m g \cos \phi \; .


 R^2 \ddot \phi + m g R \sin \phi = 0 \; .

Thus we can identify \lambda = T, the force of constraint.

This is a general feature of the Lagrange multipliers. They incorporate the constraint into the variation and we are free to ignore it if we want only the dynamics, or we can use \lambda to determine if it is explicitly desired.

Dynamical Systems and Constraints: Part 2 - Bead on a Wire

This post begins a multi-part exploration of the role of constraints in mechanical systems. It builds on the general framework discussed in the last post but does so by examining specific examples. This month the focus will be on the simplest example of a constraint: a bead sliding on a wire.

Despite its trivial appearance, this problem is surprisingly complex due to the presence of the constraint and, as will be shown below, provides significant obstacles to obtaining an accurate numerical solution.

The following figure shows a portion of the wire, the position, \vec r, and the velocity, \vec v, of the bead, and the two forces acting upon it; namely the force of gravity {\vec F}_g and the normal force {\vec F}_N. With this information, it is, in principle, easy to solve for the motion of the bead via Newton's laws. There are several equivalent ways of carrying this out but I'll be focusing on the most geometric of these.

The position of a particle in a plane is given by

 \vec r = \left[ \begin{array}{c} x \\ y \end{array} \right] \; .

Since the bead must remain on the wire, only those values of y satisfying y = f(x) are permissible and the position can now be written as

 \vec r = \left[ \begin{array}{c} x \\ f(x) \end{array} \right ] \; .

The velocity is then given by

 \vec v = \left[ \begin{array}{c} 1 \\ f'(x) \end{array} \right] \dot x \; .

The normal to the curve, \hat N, is obtained from the requirement that motion consistent with the constraint must always be perpendicular to the normal to the wire at the point in question:

 \hat N \cdot d\vec r = N_x dx + N_y dy = 0 \;.

Using the constraint equation y - f(x) = 0 and then taking the differential yields the component form

 dy - f'(x) dx = 0 \; ,

from which it is easy to read off the components of the normal (the sign being chosen so that \vec F_N is in opposition \vec F_g. Subsequent normalization gives

 \hat N = \frac{1}{\sqrt{1+f'^2}} \left[ \begin{array}{c} -f'(x) \\ 1 \end{array} \right] \; .

The forces on the bead are due to gravity

 \vec F_g = - m g \hat y

and the normal force

 \vec F_N = -(\vec F_g \cdot \hat N) \hat N = \frac{m g}{1+f'^2} \left[ \begin{array}{c} -f' \\ 1 \end{array} \right]\;

For notational convenience in what follows, define

 \tau = \frac{1}{1+f'2} \; .

Summing the forces and inserting them into Newton's second law leads to the following equations (eliminating m from both sides)

 \left[ \begin{array}{c} \ddot x \\ \ddot y \end{array} \right] = \left[ \begin{array}{c} -g \tau f' \\ g (\tau -1 ) \end{array} \right] = \left[ \begin{array}{c} -g \tau f' \\ - g \tau f'^2 \end{array} \right] \; .

Expressing Newton's equations in state-space form then yields

 \frac{d}{dt} \left[ \begin{array}{c} x \\ y\\ \dot x \\ \dot y \end{array} \right] = \left[ \begin{array}{c} \dot x \\ \dot y \\ - g \tau f' \\ - g \tau f'^2 \end{array} \right] \; .

These equations are easy to code and couple to a standard solver. I chose to implement this system in Python and SciPy as follows:

def roller_coaster(state,t,geometry,g):
    #unpack the state
    x     = state[0]
    y     = state[1]
    vx    = state[2]
    vy    = state[3]
    #evaluate the geometry
    f,df,df2 = geometry(x)
    tau      = 1.0/(1.0+df**2)
    #return the derivative
    return np.array([vx,vy,-g*tau*df,-g*tau*df**2])

where the function geometry can be any curve in the x-y plane. For example, for a bead sliding down a parabolic-shaped wire y = x^2, the corresponding function is

def parabola(x):
    return x**2, 2*x, 2

and the assignment

  geometry = parabola

provides this shape to the solver.

The first case studied with this formalism was where the wire shape is a simple straight segment, parallel to the edge of an inclided plane and refered to as such for the sake simplicity. The resulting motion was consistent with the shape

and the energy

was very well conserved.

These good results failed to hold when the shape of the wire was made parabolic. The resulting motion stayed on the parabola for a short time but eventually departed from the constraint by dropping below

while simultaneously showing a large deviation

from the initial energy.

The situation was no different for other geometeries tried. The problem seems to lie in the fact that the normal force is unable to respond to numerical errors. As the constraint begins to be violated, the wire is unable to conjure additional force to enforce it. The Newton method, while attractive from its simplicity, is simply inadequate for working with all but the most trivial of constraints.

The usual way of addressing this problem is to eliminate one degree of dynamical freedom. Either x or y has to go and the conventional approaches seem to favor x as the parameter that stays. Again, there are several ways to effect this change but in this case both are worth examining.

In the first method, Newton's equations are used to eliminate the y in favor of x by combining their equations of motion by first expressing

 -\frac{\ddot x}{g f'} = \tau

and then by inserting the left-hand side into the y equation yielding

 \ddot y + g  = -\frac{\ddot x}{f'} \; .

Solving this to give

 \ddot x = -f'(\ddot y + g)

and then using

 \dot y = \frac{df}{dx} \dot x = f' \dot x


 \ddot y = f' \ddot x + f'' \dot x^2 \; .

The final equation, after substitution for \ddot y has been made and \ddot x has been isolated, is

 \ddot x = - f' \tau (g + f'' \dot x^2 ) \; .

In the second method, the constraint is put into the Lagrangian before forming the Euler-Lagrange equations. The resulting Lagrangian becomes

 L = \frac{1}{2} m (1 + f'^2) \dot x^2 - m g f \; .

The conjugate momentum is

 p_x = m (1 + f'^2) \dot x \;

and the resulting Euler-Lagrange equation is

 m (1+f'^2) \ddot x + 2 m f' f'' \dot x^2 - m f' f'' \dot x^2 + m g f' = 0 \; .

Simplifying a bit yields

 \ddot x = - f' \tau (g + f'' \dot x^2 ) \; ,

which happily is the same equation as obtained from Newton's laws.

The last step is to solve this differential equation. Rewriting this equation in state-space form gives a simple system to solve numerically:

 \frac{d}{dt} \left[ \begin{array}{c} x \\ \dot x \end{array} \right] = \left[ \begin{array}{c} \dot x \\ -f' \tau(g + f'' \dot x ^2) \end{array} \right] \; .

Like the original Newtonian system, this one was easily coded up into Python.

def roller_coaster_Lagrange(state,t,geometry,g):
    #unpack the state
    x  = state[0]
    vx = state[1]
    #evaluate the geometry
    f,df,df2 = geometry(x)
    tau      = 1.0/(1.0+df**2)
    return np.array([vx,-df*tau*(g+df2*vx**2)])

The constraint is always honored since y is obtained after the fact from x^2.

In addition, the conservation of energy

shows that there wasn't any error in either the theory or the numerical implementation.

However, the key question is, does the solution accurately capture the motion by correctly placing the bead at (x,f(x)) at the correct time? There is no easy way to answer this question. The plot of the x component as a function of time

certainly suggests that the Lagrange implementation is correctly capturing not just the dynamics but also the kinematics. That said, a more complete justification will have to wait for another time and another post.

Dynamical Systems and Constraints: Part 1 - General Theory

Constraints in dynamical systems are important in physics for two reasons. First, they show in a variety of real-world mechanical systems with practical applications. In addition, the notion of a constraint, usually of a more abstract variety, is an essential ingredient in modern physical systems like quantum field theory and general relativity. This brief post is meant to cover the basic theory of constraints in classical physics as a precursor to some future posts on specific systems. The approach and the notation is strongly influenced by Theoretical Mechanics by Murray Spiegel, although the arrangement, presentation, and logical flow are my own.

The proper setting for understanding constraints in classical mechanics is the Lagrangian point-of-view and the primary idea for carrying out this program is the notion of generalized coordinates. By expressing the work-energy theorem in generalized coordinates, we will not only arrive at the Euler-Lagrange equations but will find the proper place in which to put the constraints into the formalism through the use of Lagrange multipliers.

The kinetic energy for a multi-particle system is expressed in Cartesian coordinates as

 T = \sum_n \frac{1}{2} m_n {\dot {\vec r}}_n \cdot {\dot {\vec r}}_n \; ,

where the index n labels each particle.

The partial derivatives of T with respect to the generalized coordinate and velocity are:

 \frac{\partial T}{\partial q_{\alpha}} = \sum_n m_n {\dot {\vec r}}_n \cdot \frac{\partial {\dot {\vec r}}_n}{\partial q_{\alpha}}


 \frac{\partial T}{\partial {\dot q}_{\alpha}} = \sum_n m_n {\dot {\vec r}}_n \cdot \frac{\partial {\dot {\vec r}}_n}{\partial {\dot q}_{\alpha}} \; .

Lagrange's cleverness comes is recognizing two somewhat non-obvious steps. First is that one can simplify the last expression by noting that

 {\dot {\vec r}}_n = \frac{\partial {\vec r}_n}{\partial q_{\alpha}} {\dot q}_{\alpha} + \frac{\partial {\vec r}_n}{\partial t} \; ,

which lead immediately to the cancellation-of-dots that says

 \frac{\partial {\dot {\vec r}}_n}{\partial {\dot q}_{\alpha}} = \frac{\partial {{\vec r}}_n}{\partial {q}_{\alpha}} \; .

This results then simplifies

 \frac{\partial T}{\partial {\dot q}_{\alpha}} = \sum_n m_n {\dot {\vec r}}_n \cdot \frac{\partial {{\vec r}}_n}{\partial {q}_{\alpha}} \; .

The second of Lagrange's steps is to recognize that taking the total time-derivative of that same term yields

 \frac{d}{dt} \left( \frac{\partial T}{\partial {\dot q}_{\alpha}} \right) = \sum_n m_n {\ddot {\vec r}}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} + \sum_n m_n {\dot {\vec r}}_n \frac{\partial {\dot {\vec r}}_n}{\partial q_{\alpha}} \; .

The second part of this expression nicely cancels the derivative of the kinetic energy with respect to the generalized coordinate and so we arrive at the identity

 \frac{d}{dt} \left( \frac{\partial T}{\partial {\dot q}_{\alpha}} \right) - \frac{\partial T}{\partial q_{\alpha}} = \sum_n m_n {\ddot {\vec r}}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \; .

Defining the left-hand side compactly as

 Y_{\alpha} \equiv \frac{d}{dt} \left( \frac{\partial T}{\partial {\dot q}_{\alpha}} \right) - \frac{\partial T}{\partial q_{\alpha}} \;


 Y_{\alpha} = \sum_n m_n {\ddot {\vec r}}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \; .

Like Lagrange, we can conclude that the presence of the accelerations in the right-hand side is related to the sum of the forces. However, it would be incorrect to conclude that each individual force can be teased out this the right-hand side as will be discussed below.

Before trying to make sense of the Y_{\alpha}'s, examine the forces in generalized coordinates via the differential work, which is given by

 dW = \sum_n {\vec F}_n \cdot  d {\vec r}_n = \sum_n {\vec F}_n \cdot \sum_{\alpha} \frac{\partial {\vec r}_n}{\partial q_\alpha} dq_{\alpha} \; .

Regrouping this last term gives the final expression

 dW = \sum_\alpha \left( \sum_n {\vec F}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \right) dq_\alpha \equiv \sum_\alpha \Phi_\alpha d q_\alpha \; .

Forming the sum

 \sum_{\alpha} Y_{\alpha} dq_{\alpha} = \sum_{\alpha} \left( \sum_n m_n {\ddot {\vec r}}_n \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \right) dq_\alpha \; ,

yields an expression for the differential work as well, since from the Work-Energy theorem dW = dT.

Subtracting these two expressions gives

 \sum_{\alpha} \left( Y_{\alpha} - \Phi_{\alpha} \right) dq_\alpha = \sum_{\alpha} \left( \sum_n m_n \left[ {\ddot {\vec r}}_n - {\vec F}_n \right] \cdot \frac{\partial {\vec r}_n}{\partial q_\alpha} \right) dq_\alpha = 0 \; .

One might be tempted to conclude that Y_\alpha = \Phi_\alpha but this would be erroneous. All that can be concluded is that the projections of each of the two terms along the constrained surface are the same. The following figure shows a simple geometric case where two vectors (blue and green) (denoting same the generalized accelerations and the generalized forces) are both perpendicular to the red vector (denoting the motion of the system consistent with the constraint) without being equal. If, and only if, there are no constraints in the system does Y_\alpha = \Phi_\alpha.

If the constraints are integrable, that is to say they can be expressed in such a way that one generalized coordinate can be eliminated by expressing it in terms of the others, then a straight substitution will eliminate the constraint and no additional steps are needed. If that step can't be carried out, then the use of Lagrange multipliers are required as follows.

Suppose that there are m constraints of the form

 A_\alpha q_\alpha + A dt = 0 \; ;  B_\alpha q_\alpha + B dt = 0 \; ;  C_\alpha q_\alpha + C dt = 0 \; \ldots \;

and, that any instant, the constraints are satisfied so that the dt term can be set to zero. Then these constraint equations can be added together in an undetermined, as yet, linear combination

 \sum_\alpha ( \lambda_1 A_\alpha + \lambda_2 B_\alpha + \lambda_3 C_\alpha + \cdots ) dq_\alpha = 0 \; .

These m equations augment the original equation to yield the augmented form

 \sum_{\alpha} (Y_\alpha - \Phi_\alpha - \lambda_1 A_\alpha - \lambda_2 B_\alpha + \lambda_3 C_\alpha + \cdots ) dq_\alpha = 0 \; .

These equations, when regrouped, form two systems of equations. The first system can be regrouped and rewritten, using the constraint equations, so that we can eliminate m of the dq_\alpha (say dq_1,\ldots,dq_m) and then solve that system for the Lagrange multipliers. Returning to the augmented form, we can set the coefficients of the dependent terms equal to zero. Now the augmented form reduces to

 \sum_{\alpha'} (Y_{\alpha'} - \Phi_{\alpha'} - \lambda_1 A_{\alpha'} - \lambda_2 B_{\alpha'} + \lambda_3 C_{\alpha'} + \cdots ) dq_\alpha = 0 \; ,

where the prime indicates that the sum is restricted to the independent terms, which allows us to immediately conclude that

 Y_\alpha - \Phi_\alpha - \lambda_1 A_\alpha - \lambda_2 B_\alpha + \lambda_3 C_\alpha + \cdots  = 0 \;

for all \alpha, both the dependent and independent set.

Next post, I'll explore this framework with some instructive examples taken from the classical mechanics literature.

Energy and Hamiltonians: Part 3 - Examples

This post, building on the work of the previous two posts, investigates the six cases possible in which E=h or E\neq h and in which either one, the other, neither, or both are conserved. The examples are generally taken from simple mechanical systems in an attempt to illustrate the underlying physics without bringing in an unwieldy set of mathematics.

As a reminder, the energy E is always defined as the sum of kinetic and potential energies and the function h reflects not only the underlying physics (i.e., E) but also the generalized coordinate system in which it is described. The power of the generalized coordinates will allow us, in certain circumstances, to find conserved energy-like quantities that are not the energy proper. This is the power and complexity of generalized coordinates and it is this feature that makes it fun.

Case 1 - E \neq h, both conserved: Free Particle with Co-moving Transformation

Consider the free particle. The Lagrangian for the system is given by

 L = \frac{1}{2} m {\dot x}^2 \; .

The conjugate momentum and the generalized force are

 p_x = m {\dot x}


 \Phi_x = \frac{\partial L}{\partial x} = 0 \; ,

respectively. The equation of motion is the familar

 m \ddot x = 0 \; ,

leading to the usual solution

 x(t) = x_0 + v_0 t \; .

The conserved quantity h_x is the energy, which is entirely kinetic, given by:

 h_x = \dot x p_x - L = \frac{1}{2} m v_0^2 = E\; .

Now suppose that we define the generalized coordinate q through the transformation x = q + v_0 t and \dot x = \dot q + v_0. The corresponding Lagrangian is

 L = \frac{1}{2} m \left(\dot q + v_0 \right)^2

with the corresponding equation of motion

 m \ddot q = 0 \; .

The conserved quantity h_q is given by

 h_q = \dot q p_q - L = m \dot q (\dot q + v_0) - \frac{1}{2} m (\dot q + v_0)^2 = \frac{1}{2} m \dot q^2  - \frac{1}{2} m v_0^2 \; .

From the transformation equation, q = v_0 and so h_q = 0. The physical interpretation is that, by transforming (via a Galilean transformation) into a frame co-moving with the free particle, the conserved quantity is identically zero, a value different from the energy in the original frame.

Case 2 - E \neq h, E conserved: Free Particle with Accelerating Transformation

Building off of the last case, again look at the free particle and consider the transformation into an accelerating frame given by

 x = q + \frac{1}{2} a t^2 \; ,


 \dot x = \dot q + a t \; .

The transformed Lagrangian is

 L = \frac{1}{2} m \left( \dot q + a t \right) ^2 \; .

The corresponding h_q is

 h_q = m \dot q (\dot q + a t) - \frac{1}{2} m \left(\dot q + a t \right)^2  = \frac{1}{2} m \dot q^2 - \frac{1}{2} m a^2 t^2  \; ,

which, since it is time varying, is not conserved.

Case 3 - E = h, both conserved: Planar SHO

An example where the value of the energy E and the function h_q can take on dramatically different forms and yet yield the same value is supplied by the planar simple harmonic oscillator. In this case, a mass is found in cyclindrically symmetric harmonic potential V = 1/2 \, k(x^2 + y^2).

The corresponding Lagrangian is given by

 L = \frac{1}{2} m \left( {\dot x}^2 + {\dot y}^2 \right) - \frac{1}{2} k \left( x^2 + y^2 \right) \; ,

with the equation of motion being

  m \ddot {\vec r} = -k \vec r \; .

This equation has well known analytical solutions, but for a change of pace, I implemented numerical solutions of these equations with a Jupyter notebook. The state-space equation took the form of

  \frac{d}{dt} \left[ \begin{array}{c} x \\ \dot x \\ y \\ \dot y \end{array} \right] = \left[ \begin{array}{c} \dot x \\ - k/m \; , x \\ \dot y \\ -k/m \; , y \end{array} \right] \; .

Under the point transformation to plane polar coordinates

 x = r \cos ( \omega_0 t)


 y = r \sin ( \omega_0 t) \; ,

(where \omega_0 = \sqrt{k/m}), the Lagrangian becomes

 L = \frac{1}{2} m \left( {\dot r}^2 + r^2 {\dot \theta}^2 \right) - \frac{1}{2} k r^2 \; .

The resulting Euler-Lagrange equations are

 m \ddot r + k r = 0


 \frac{d}{dt} \left( m r^2 \dot \theta \right) = 0 \; ,

which are interpretted as the equation of motion of a one-dimensional radial oscillator and the conservation of the angular momentum in the x-y plane. The corresponding numerical form is

 \frac{d}{dt} \left[ \begin{array}{c} r \\ \dot r \\ \theta \\ \dot \theta \end{array} \right] = \left[ \begin{array}{c} \dot r \\ - k/m \, r + r \dot \theta^2  \\ \dot \theta \\ -2 \dot r \dot \theta /r \end{array} \right] \; .

The initial conditions for the simulation were

 {\bar S}_0 = \left[ \begin{array}{c} R \\ 0 \\ 0\\ R \omega_0 \end{array} \right]


 {\bar S}_0 = \left[ \begin{array}{c} R \\ 0 \\ 0\\ \omega_0 \end{array} \right] \; ,

for Cartesian and polar forms, respectively.

After the trajectories had been computed, the ephemerides were fed into functions that computed the Cartesian energy

 E = \frac{1}{2} m \left( {\dot x}^2 + {\dot y}^2 \right)


 E = \frac{1}{2} m \left( {\dot r}^2 + r^2 {\dot \theta}^2 \right) \; ,

as appropriate. As expected, the values for the energy, arrived at in two independent ways, are identical

Case 4 - E \neq h, h conserved: Constrained Planar SHO

In constrast to the Case 3, the constrained planar simple harmonic oscillator executes its motion subject to the constraint

 {\dot \theta} = \omega \; ,

where \omega is the driving frequency. The mechanical analog would be something like the following picture

where a mass on a spring is forced by a motor that keeps the oscillator rotating at a constant angular rate.

The Lagrangian in this case is given by

 L = \frac{1}{2} m \left( {\dot r}^2  + r^2 \omega^2 \right) - \frac{1}{2} m k (r-\ell_0)^2 \; .

Note that there is no easy way to express the Lagrangian in Cartesian coordinates - another example of the power of generalized coordinates.

The energy of the system is

 E = T + U = \frac{1}{2} m \left( {\dot r}^2  + r^2 \omega^2 \right) + \frac{1}{2} m k (r-\ell_0)^2 \; ,


 h_r = \frac{1}{2} m \left( {\dot r}^2  - r^2 \omega^2 \right) + \frac{1}{2} m k (r-\ell_0)^2 \; .

The minus sign makes all the difference here. Since there is a motor in the problem, the expectation is that the energy should not be conserved, even though the structure of Lagrangian mechanics guarentees the conservation of h (since it is not time-dependent).

While there are analytic solutions to these equations, I again opted for a numerical simulation. The equation of motion is

 {\ddot r} = (\omega^2 - \omega_0^2)r + k \ell_0 \; ,

where \omega_0^2 = k/m.

The resulting dynamics show the radial motion oscillating sinusoidally

about an equilibrium value determined by

 r = \frac{k \ell_0}{\omega_0^2 - \omega^2} \; .

The graph of the energy and h_r shows clearly that the energy is not conserved, even though h_r, as expected, is

Case 5 - E \neq h, neither conserved: Damped SHO

The final example is the damped simple harmonic oscillator. The Lagragian is given by

 L = e^{\gamma t/m} \left[\frac{1}{2} m {\dot x}^2 - \frac{1}{2} k x^2 \right] \; ,

which results in the equation of motion

 m {\ddot x} + \gamma {\dot x} + k x = 0 \; .

The energy,

 E = \frac{1}{2} {\dot x}^2 + \frac{1}{2} k x^2 \; ,

is clearly not conserved and the h function,

 h = {\dot x} \frac{\partial L}{\partial \dot x} - L = e^{t \gamma/m} E \; ,

is also not conserved since it depends on time.

Case 6 - E = h, neither conserved

This last case is something of a contrivance. As may have been guessed, by the above examples, finding a case where E=h and yet both are not conserved is not particularly easy to do. Rotating drivers that leave a dynamical condition like \dot \theta = constant \equiv \omega lead to a conserved h since the transformation to a rotating frame leaves the system in a co-moving configuration, despite the time varying nature of the transformation. Equally frustrating is trying to incorporate dissipation into the problem. Dissipation generally leads to a different form for h compared with E as Case 5 demonstrates.

The one example that does work, is by defining the potential energy as

 V = x F(t)

and the Lagrangian as

 L = \frac{1}{2} m {\dot x}^2 - x F(t)

resulting in the equation of motion

 m {\ddot x} = F(t) \; .

Both the energy and h are given by

 \frac{1}{2} m {\dot x}^2 + x F(t) \; ,

which is clearly not conserved.

Energy and Hamiltonians: Part 2 - Conservation Considerations

Last month, I covered the basic ground work that established under what conditions generalized coordinates can be used in the Lagrangian formulation. Generalized coordinates enable vast simplifications of the analysis of mechanical systems and provide for deep insight into the possible motion. The analysis found that the permissible transformations are the point transformations

q^i = q^i(s^j,t) \; ,

since these leave the Euler-Lagrange equations of motion invariant.

The next step is to examine what these transformations do to the expression of the energy. Note that no transformation can 'destroy' energy conservation - if the energy is conserved in one set of coordinates it is conserved in all coordinate systems. But it is possible that a poor transformation will hide the conservation of energy and that a clever transformation will reveal the conservation of a related quantity (the function h to be discussed shortly) even when the energy is not conserved.

In Newtonian mechanics, the energy is given by the sum of the kinetic and potential energy (each possibly depending on q^i, {\dot q}^i, and t)

 E = T + V \; .

The corresponding quantity in Lagrangian dynamics is the h function defined as

 h = \frac{\partial L}{\partial {\dot q}^i} {\dot q}^i - L(q^j,{\dot q^j};t) \; ,

with L = T - V.

At first glance, it isn't at all obvious why h should ever be conserved and why it can be identified as the energy, in certain circumstances. To see the conservation properties of h examine its total time derivative given by

 \frac{d h}{d t} = \frac{d}{d t} \left(\frac{\partial L}{\partial {\dot q}^i} \right) {\dot q}^i + \frac{\partial L}{\partial {\dot q}^i} {\ddot q}^i - \frac{\partial L}{\partial q^i} {\dot q}^i - \frac{\partial L}{\partial {\dot q}^i} {\ddot q}^i - \frac{\partial L}{\partial t} \; .

The second and fourth terms cancel and the first and third terms can be grouped so that the expression now reads

 \frac{d h}{d t} = \left[ \frac{d}{d t} \left(\frac{\partial L}{\partial {\dot q}^i} \right) - \frac{\partial L}{\partial q^i} \right] {\dot q}^i - \frac{\partial L}{\partial t} \; .

Physical motion requires the first term, which are the Euler-Lagrange equations, to be zero, leaving the very nice expression that

 \frac{d h}{dt} = \frac{\partial L}{\partial t} \; .

Thus, under some very general circumstances, h is conserved if the Lagrangian, in some coordinate system, doesn't have any time dependence. Since the allowed point transformations can be time-dependent, it is possible for h to be conserved in certain generalized coordinate systems and not in others. This is the genesis of the confusion of between E and h discussed in the previous post.

Having answered the first question about the conservation properties of h, we now turn to when h can be identified as the energy E. In order not to obscure the physical points, for the bulk of this post 1-d systems will be examined with the generalized coordinates being denoted as x (starting Cartesian coordinate) and q (arbitrary generalized coordinate). At the end, this restriction will be relaxed.

The kinetic energy, given by

 T = \frac{1}{2} m {\dot x}^2 \; ,

will be the key parameter, with the potential energy just coming along for the ride.

Expanding on Morin (, we examine three cases: 1) the limited point transformation x = x(q), 2) the full point transformation x = x(q,t), and 3) the 'forbidden' phase-space transformation x=x(q,{\dot q}). In each of these cases, we exploit the fact that the transformations are invertible and so either x or q can be regarded as the base coordinate. Assuming the Cartesian x as a function of the generalized coordinate q is simply a convenient choice.

Case 1: x = x(q)

This case is commonly encountered when the geometry of the system better matches cylindrical, spherical, or some other curvilinear coordinate system. The velocities are related by

 {\dot x} =  \frac{\partial x}{\partial q} {\dot q} \; ,

leading to the kinetic energy transforming as

 T = \frac{1}{2} m {\dot x}^2 \rightarrow \frac{1}{2} m \left( \frac{\partial x}{\partial q} \right)^2 {\dot q}^2 \equiv \frac{1}{2} \mu_0(q) {\dot q}^2 \; .


 p_q = \frac{\partial L}{\partial {\dot q}} = \mu_0(q) {\dot q} \; ,

the expression for h becomes

 h = p_q {\dot q} - L = \mu_0(q) {\dot q}^2 - \frac{1}{2} \mu_0(q) {\dot q}^2 + V(q) \;

Simplifying yields

 h = \frac{1}{2} \mu_0(q) {\dot q}^2 + V(q) = T + V = E \; .

So in the case where the point transformation is limited to exclude time dependance, h = E. Since the motion is derivable from a potential, we can immediately conclude that the energy is conserved.

Case 2:  x = x(q,t)

This case is commonly encountered in rotating systems where a time-dependent transformation puts the viewpoint coincident with a co-rotating observer. This is the trickiest of all the physical situations since the energy may or may not be conserved independently of what h is doing. These cases will be examined in detail in future posts.

The velocities are related by

 {\dot x} =  \frac{\partial x}{\partial q} {\dot q} + \frac{\partial x}{\partial t} \; ,

leading to the kinetic energy transforming as

 T =  \frac{1}{2} m {\dot x}^2 \rightarrow \frac{1}{2} m \left( \frac{\partial x}{\partial q} {\dot q} + \frac{\partial x}{\partial t} \right)^2

Expanding and simplifying yields the kinetic energy in generalized coordinates as

 T = \frac{1}{2} m \left[ \left(\frac{\partial x}{\partial q}\right)^2 {\dot q}^2 + 2 \frac{\partial x}{\partial q} \frac{\partial x}{\partial t} {\dot q} + \left(\frac{\partial x}{\partial t} \right)^2 \right] \;

or, more compactly as,

 T = \frac{1}{2} \mu_0(q,t) {\dot q}^2 + \mu_1(q,t) {\dot q} + \frac{1}{2} \mu_t(q,t) \; .

Once again defining

 p_q = \frac{\partial L}{\partial {\dot q}} = \mu_0(q,t) {\dot q} + \mu_1(q,t) \; ,

the expression for

 h = p_q {\dot q} - L = \mu_0(q,t) {\dot q}^2 + \mu_1(q,t) {\dot q} - \frac{1}{2} \mu_0(q,t){\dot q}^2 - \mu_1(q,t) {\dot q} - \frac{1}{2} \mu_t(q,t) + V(q,t) \; .

Simplifying gives

 h =  \frac{1}{2} \mu_0(q,t){\dot q}^2 - \frac{1}{2} \mu_t(q,t) + V(q,t) \neq T + V \; .

So, in this case, h is not identified with the energy. Whether h or E are conserved is a matter for case-by-case analysis.

Case 3: x = x(q,{\dot q})

This final case is more of a curiosity rather than a serious exploration. The case is 'forbidden' by last month's analysis since it doesn't preserve the transformation properties of the Euler-Lagrange equations. Nonetheless, it is interesting to see what would happen to h should this transformation be allowed.

The velocities are related by

 {\dot x} =  \frac{\partial x}{\partial q} {\dot q} + \frac{\partial x}{\partial {\dot q}} {\ddot q}

leading to the kinetic energy becoming

 T = \frac{1}{2} \mu_0(q,{\dot q}) {\dot q}^2 + \mu_3(q,{\dot q}) {\dot q}{\ddot q} + \frac{1}{2} \mu_4(q,{\dot q}) {\ddot q}^2 \; .

Following the same process above yields

 h = \frac{1}{2} \mu_0(q,{\dot q}) {\dot q}^2 - \frac{1}{2} \mu_4(q,{\dot q}) {\ddot q}^2 + V(q,{\dot q}) \neq T + V \; .

Finally, a note on relaxing the restriction to 1-d. In multiple dimensions, the velocities in the most general case transform as

 {\dot s}^j = \frac{\partial s^j}{\partial q^i} {\dot q}^i + \frac{\partial s^j}{\partial t} \; .

Following the same method as above, the kinetic energy will take the form

 T = \frac{1}{2} \mu_0^i ({\dot q}^i)^2 + \mu^{ij} {\dot q}^i {\dot q}^j + \mu_1^i {\dot q}^i + \frac{1}{2} \mu_t^i \; ,

using the same shorthand notation as above.

The only new feature (beyond the trivial decoration of the qs with an index and the implied summation) is the cross-term \mu^{ij} {\dot q}^i {\dot q}^j. These terms will disappear (as do all the terms linear in a given q) from h since

 p_{q^i} = \mu^{ij} {\dot q}^j

will yield a positive term

 p_{q^i} {\dot q}^i =  \mu^{ij} {\dot q}^i {\dot q}^j

that is exactly cancelled by the term coming from -L. Everything else follows through as in the 1-d case.

Energy and Hamiltonians: Part 1 - Generalized Coordinates

One of the most confusing aspects of Lagrangian/Hamiltonian mechanics centers on the energy E and its relationship to the function h (or to the Hamiltonian H, to which it is closely related). The key questions are, to what extent these two terms can be identified with each other, and which of them is conserved and in which circumstances. The answers have deep implications into the analysis of the possible motions the system supports. The opportunity for confusion arises because there are many different possibilities that show up in practice.

The aim of this and the following columns is to delve into these cases in sufficient detail to flesh out the six logically possible cases that result from E being either conserved or not, h being either conserved or not, both subject to the possible constraint that E=h.

Before going into specific analysis of this case or that, it is important to get an overview as to how these various cases result in practice. Central to the variety of possibilities is the notion of generalize coordinates and the application of dynamical constraints.

As discussed in great detail in Classical Mechanics by Goldstein (especially Chapter 1), generalized coordinates are what provide the Lagrangian method with its power by enabling it to rise above Cartesian coordinates and into a frame better adapted to the geometry of the dynamics. For example, central force motion is best described in plane-polar coordinates that not only make the computations easier but also provide a simple method of imposing constraints like, the motion must take place at a fixed radius (e.g., a bead on a loop of wire). The number of generalized coordinates can always match the number of degrees of freedom, a condition that usually can't be fulfilled in Cartesian coordinates if there are constraints. Generalized coordinates serve as the cornerstone in understanding rigid body motion - the three Euler angles mapping directly to the three degrees of freedom such a body has to move about its center of mass.

The price for such freedom is that the universe of possible conserved quantities must be enlarged from the obvious ones evident in elementary mechanics to a broader set. This expansion in scope leads to the need to distinguish between the energy E and the function h. One of the most important and famous examples of this is the conservation of the Jacobi constant in the circular-restricted three-body problem.

But, before launching into an analysis that compares and contrasts E with h, it is important to establish some basic results of the Lagrangian method.

The first result is that the Lagrangian equations of motion are invariant under what Goldstein calls a point transformation

 q^i = q^i(s^j,t) \; ,

relating one set of generalized coordinates q^i to another s^j (note that Cartesian coordinates are a subset of generalized coordinates).

Transformation of the Lagrangian L(q^i,t) to L'(s^j,t) starts with determining the form of the generalized velocities

 {\dot q}^i = \frac{\partial q^i}{\partial s^j} {\dot s}^j + \frac{\partial q^i}{\partial t} \; ,

(\dot f \equiv \frac{d}{dt} f).

From this relation comes the somewhat counterintuitive 'cancellation-of-the-dots' that states

 \frac{\partial {\dot q}^i}{\partial {\dot s}^j} = \frac{\partial q^i}{\partial s^j} \; .

A needed corollary of the cancellation-of-the-dots results from applying the total time derivative to both sides of the above relation. The right-hand side expands as

 \frac{d}{dt} \left( \frac{\partial q^i}{\partial s^j} \right) = \frac{\partial^2 q^i}{\partial s^k \partial s^j} {\dot s}^k + \frac{\partial^2 q^i}{\partial t \partial s^j} \; .

Switching the order of partial derivatives (always allowed for physical functions) and regrouping terms gives the right-hand side (noting along the way that \partial {\dot s}^k / \partial s^j = 0 ) as

 \frac{\partial}{\partial s^j} \left( \frac{\partial q^i}{\partial s^k} {\dot s}^k + \frac{\partial q^i}{\partial t} \right) = \frac{\partial}{\partial s^j} \frac{d q^i}{d t} \; .

The next step is to grind through the equations of motion, which is straightforward if a bit tedious. Start by labeling the Lagrangian that results after the substitution of the point transformation as

 L' (s^j,{\dot s}^j,t ) = L (q^i(s^j),{\dot q}^i(s^j,{\dot s}^j),t ) \; .

The coordinate-derivative piece of the equations of motion gives

 \frac{\partial L'}{\partial s^j} = \frac{\partial L}{\partial q^i} \frac{\partial q^i}{\partial s^j} + \frac{\partial L}{\partial {\dot q}^i} \frac{\partial {\dot q}^i}{\partial s^j} \; .

The velocity derivative piece of the equations of motion gives

 \frac{\partial L'}{\partial {\dot s}^j} = \frac{\partial L}{\partial {\dot q^i}} \frac{\partial {\dot q}^i}{\partial {\dot s}^j} \; .

Combining these pieces into the Euler-Lagrange equations gives

 \frac{d}{dt} \left( \frac{\partial L'}{\partial {\dot s}^j } \right) - \frac{\partial L'}{\partial s^j} = \frac{d}{d t} \left( \frac{\partial L}{\partial {\dot q}^i} \right) \frac{\partial {\dot q}^i}{\partial {\dot s}^j} + \frac{\partial L}{\partial {\dot q}^i} \frac{d}{dt} \left( \frac{\partial {\dot q}^i}{\partial {\dot s}^j} \right) - \frac{\partial L}{\partial q^i} \frac{\partial q^i}{\partial s^j} - \frac{\partial L}{\partial {\dot q}^i} \frac{\partial {\dot q}^i}{\partial s^j} \; .

Cancellation-of-the-dots and its corollary tells us that the second and fourth terms on the right-hand side cancel, leaving

 \frac{d}{dt} \left( \frac{\partial L'}{\partial {\dot s}^j } \right) - \frac{\partial L'}{\partial s^j} = \left[ \frac{d}{d t} \left( \frac{\partial L}{\partial {\dot q}^i} \right) - \frac{\partial L}{\partial q^i} \right] \frac{\partial q^i}{\partial s^j} \; .

Since the original equations of motion are zero and the Jacobian of the point transformation is not, we conclude that

 \frac{d}{d t} \left( \frac{\partial L}{\partial {\dot q}^i} \right) - \frac{\partial L}{\partial q^i} = 0 \; ,

which informs us that the Euler-Lagrange equations are invariant under point transformations. This result is our hunting license to use any coordinates related to Cartesian coordinates by a point transformation to describe a system's degrees of freedom.

The next result is that the Lagrangian is invariant with respect to the addition of a total time derivative. In other words, the same equations of motion result from a Lagrangian L as do from one defined as

 L' = L + \frac{d F(q^i,t)}{dt} \; .

The proof is as follows. Applying the Euler-Lagrange operator to both sides of the definition of L' gives

 \frac{d}{dt} \left( \frac{\partial L'}{ \partial {\dot q}^i } \right) - \frac{\partial L'}{\partial q^i} = \frac{d}{dt} \left( \frac{\partial L}{ \partial {\dot q}^i } \right) - \frac{\partial L}{\partial q^i} + \frac{d}{dt} \left( \frac{\partial}{\partial {\dot q}^i} \frac{dF}{dt} \right) - \frac{\partial}{\partial q^i} \frac{dF}{dt} \;.

Invariance of the equations of motion requires the last two terms to cancel. The easiest way to see that they do is to first concentrate on the expansion of the total time derivative of F as

 \frac{d F}{dt} = \frac{\partial F}{\partial q^j} {\dot q}^j + \frac{\partial F}{\partial t} \; .

The third term above then becomes

 \frac{\partial}{\partial {\dot q}^j} \left( \frac{d F}{dt} \right) = \frac{\partial F}{\partial q^i} \; .

Cancellation of the last two terms now rests on the recognition that

 \frac{d}{dt} \left( \frac{\partial F}{\partial q^i} \right) = \frac{\partial}{\partial q^i} \left(\frac{df}{dt} \right) \; ,

a fact easily established by realizing that the total time derivative is a specific sum of partial derivatives and that partial derivatives commute with each other.

This result allows us to play with the form of the Lagrangian in an attempt to find the simplest or most compact way of describing the motion of the physical system.

Next month's column will use the general structure of this formalism as a guide to the connection between the energy E and h. The columns in the following months will apply the two results derived here to various physical systems in order to illustrate the various cases for E and h outlined above.

Action and Reaction

Action and reaction; Newton's 3rd law tells us that for every action there is an equal an opposite reaction. Nature, in effect, is a banker whose accounts are carefully balanced according to the principle of double-entry bookkeeping. Modern theories emphasize this point of view by saying that at the fundamental level all interactions are mediated by the exchange of 'force quanta' that transfer energy, momentum, and angular momentum. So all is well - or is it?

Often textbooks muddy the water of this 'simple' situation by engaging in nuanced discussions and citing exceptions without emphasizing two points. One is that action-reaction is always honored. The second is that it may require careful logic supported by exact experiements to rescue new cases that seem to be the exception.

Now, these nuances and exceptions, which roughly fall into three prototypical classes, are real and need to be addressed. Each is interesting in its own right, helping to illustarte and explore the richness of the action-reaction concept, but the 'answer' for each of these cases is always the same - there are degrees of freedom that are being ignored that balance the action-reaction.

Case 1 - Dissipative forces

Friction is one of the earliest examples of what is discussed in which action-reaction seems to be violated. Energy is bled out of the system and the motion eventually comes to a halt. The prototype example is the damped harmonic oscillator. Its equation of motion is given by

 m {\ddot x} + \Gamma {\dot x} + k x = 0 \; ,

with the general solution

 x(t) = A e^{-\gamma t} \cos( \omega t + \phi ) \; ,

with \gamma = \Gamma/2m and \omega = \sqrt{\Gamma^2/4 m^2 - k/m}.

It is natural to ask, where does the energy go? In this model, the energy disappears with no account as to where and how. Of course, the accepted answer is straightfoward enough. The energy goes into heat; into degrees of freedom not directly tracked by this model. There is balance on the action-reaction front. There is a reaction force of the oscillator on the environment and it is this force which transfers the energy to the surroundings.

While this accepted answer is known to virtually everybody who's taken a physics class, it is worth pausing to consider that it wasn't neccessarily obvious to those who first grappled with this problem, nor is it obvious to the beginner who is introduced to it. In fact, it is very difficult to find a detailed argument that traces through all of the evidence in a logical fashion. It is simply common knowledge supported by our belief in the balance of action and reaction.

Case 2 - Driving forces

The second case, which is actually closely related to the first, involves a driving force that maintains some aspect of the system. Energy, momentum, and/or angular momentum enter and leave the system without being explicitly tracked. The best example of such a system is the circular restricted three-body problem. There are numerous applications of this model to spaceflight, with many missions exploiting the L_1 and L_2 Lagrange points.

In the circular-restricted three-body problem, two massive bodies are in orbit around their common barycenter. This motion honors action-reaction perfectly with the gravitational force between the bodies being equal and opposite. The 'forced motion' which breaks action-reaction, takes place in the idealization where a test particle is introduced into the system. The test particle is thought to be so small that its back-reaction on the motion of the two bodies can be ignored. The equation of motion is

  \ddot {\vec r} = m_1 \frac{\vec r_1 - \vec r}{|\vec r_1 - \vec r|^3} + m_2 \frac{\vec r_2 - \vec r}{|\vec r_2 - \vec r|^3} \; ,

with the motion of the primaries given by

 \vec r_1 = m_2 L \left( \cos(\omega t) \hat \imath + \sin(\omega t) \hat \jmath \right)


 \vec r_2 = -m_1 L \left( \cos(\omega t) \hat \imath + \sin(\omega t) \hat \jmath ) \right)

and with

 \omega = \frac{G(m_1 + m_2)}{L^3} \; .

That the motions of m_1 and m_2 are set by their interaction with no influence from the test particle is the source of the driving force. So, by construction, the model violates Newton's third law.

But often treatments of the circular-restricted three-body problem confuse this simple point by dwelling on the conservation of the Jacobi constant. This conservation is clearly an important point as it helps in the understanding of the global properties of the solutions. Nonetheless, discussions about its importance rarely emphasize that while the model violates action-reaction, real physical situations do not. The conservation of the Jacobi constant is a useful tool but it does not actually occur in nature. Rather a close approximation results. While this distinction may seem subtle in theory it is quite important in practice - real physical situations cannot be expected to exactly conserve the Jacobi constant.

Case 3 - Velocity-Dependent Forces

The case of velocity-dependent forces is perhaps the most confusing member of this list. There are two reasons for this. The first is that the physics literature usually employs this term to mean only one kind of velocity-dependent force: the force of magnetism. Second, the force of magnetism cannot be studied without direct analysis of the fields that act as intermediaries. This latter point should be one of clarity in the community when in fact it is often one of confusion.

Case in point, the discussion in Chapter 1 of Goldstein's Classical Mechanics concerning the weak and strong forms of action-reaction pairs. Goldstein has this interesting point to make about action and reaction

...[T]he Biot-Savart law indeed may violate both forms of the action and reaction law.*

He goes on to say that:

Usually it is then possible to find some generalization of {\mathbf P} or {\mathbf L} that is conserved. Thus, in an isolated system of moving charges it is the sum of the mechanical angular momentum and the electromagnetic "angular momentum" of the field is conserved.

Of course, this is eactly the well-known conservation law discussed in a previous post on Maxwell's equations. There is no reason to erode the confidence in the reader by the inclusion of quotes around the field angular momentum.

The situation is somewhat relieved by a footnote that attempts to give concrete scenarios. There are two of them and this post will deal with them in turn.

The first part states

If two charges are moving uniformly with parallel velocity vectors that are not perpendicular to the line joing the charges, then the mutual forces are equal and opposite but do not lie along the vector between the charges

which, diagrammitically, looks like

Since there are only two particles, the most economical description has their motion take place in a plane. The parallel velocity vector is

 \hat v = \cos(\alpha) \hat \imath + \sin(\alpha) \hat \jmath

and the separation vector between them is

 \vec r_{12} = \vec r_{1} - \vec r_{2} = \rho \hat \jmath \; .

The combined electrostatic and magentic force on particle 1 due to particle 2 is

 \vec F_{12} = \frac{q^2}{4 \pi \rho^2} \left[ \mu_0 \hat v \times ( \hat v \times \hat y ) - \frac{1}{\epsilon_0} \hat y \right] \; .

Substituting in the form of \hat v and simplifing gives

 \vec F_{12} = \frac{q^2}{4 \pi \rho^2} \left[ \mu_0 \left( \sin(\alpha) \cos(\alpha) \hat \imath - \sin^2(\alpha) \hat \jmath \right) - \frac{1}{\epsilon_0} \hat y \right] \ ;.

Since the only thing that changes in computing the force on particle 2 due to particle 1 is the sign on the relative position vector it is easy to see that

 \vec F_{21} = -\vec F_{12}

and action-reaction is honored. However, this force is action-reaction in its weak form where the forces don't lie along the vector joining the two particles. This conclusion is best supported by computing the x-component of the force

 \vec F_{12} \cdot \hat \imath = \frac{q^2}{4 \pi \rho^2} \mu_0 \sin(\alpha) \cos(\alpha) \; ,

which is zero for only a few isolated angles.

The second part of the footnote states

Consider, further, two charges moving (instantenously) so as to "cross the T", i.e., one charge moving directly at the other, which in turn is moving at right angles to the first. Then the second charge exerts a nonvanishing force on the first, without experiencing any reaction force at all.

The scenario here can be pictured as

Now the statement in this footnote is clearly gibberish and it is corrected in the 3rd edition to change 'nonvanishing force' to 'nonvanishing magnetic force', so we'll focus solely on magentic force.

From the Biot-Savart law the magnetic force on particle 1 due to particle 2 is

 \vec F_{12} = -\frac{\mu_0 q^2 v^2}{4 \pi \rho^2} \hat \imath \times ( \hat \jmath \times \hat \jmath ) = 0

while the force on 2 due to 1 is

 \vec F_{21} =  \frac{\mu_0 q^2 v^2}{4 \pi \rho^2} \hat \jmath \times ( \hat \imath \times \hat \jmath ) = \frac{\mu_0 q^2 v^2}{4 \pi \rho^2} \hat \imath \neq 0 \; .

Clearly action-reaction seems violated but strictly speaking this is not true. What should be emphasized is that there is a 'third body' in the mix, the electromagnetic field and that the action-reaction can be restored (although not quite point-wise) by looking for additional degrees of freedom.

Moving Frames and the Twisted Cubic

The last column presented the general theory of the method of moving frames and the concept of a minimal frame in which the time derivatives of the basis vectors, when expressed in terms of themselves, requires only two parameters rather than the usual 3. Building on that work, this column has a two-fold aim. The first is to apply these techniques to the often-studied space curve of the the twisted cubic. The second aim is to touch base with earlier columns on moving frames, thus relating the nature of the moving frame (whether minimal or not) to the traditional concept of angular velocity and, in the process, shine light on the concept of a minimal frame from a new angle.

The twisted cubic, which is the star in this month's drama, is defined as a curve in three dimensions given by:

 \vec r(t) = t \hat e_x + t^2 \hat e_y + t^3 \hat e_z \; ,

where the parameter t can be regarded as time.

Since they will be needed later, the velocity and acceleration of the twisted cubic are given by

 \vec v(t) = \hat e_x + 2 t \hat e_y + 3 t^2 \hat e_z \;


 \vec a(t) =  2 \hat e_y + 6 t \hat e_z \; ,


The twisted cubic, shown in the following figure for the range t \in [-2,2], has an unusual shape that is best revealed by projecting it to the three coordinate planes. In the x-y plane, the curve's projection is a simple parabola (i.e. y(x) = x^2). In the x-z plane, the curve's projection is a simple cubic (i.e. z(x) = x^3). Nothing amazing here. The interest lies in the projection into the y-z plane, where a 'cusp' appears when t passes through the value of 0.

Twisted Cubic - 3D View

To apply the method of moving frames, the VBN and RIC coordinate frames will be used. VBN is not minimal and RIC is, so there is a nice opportunity to compare and contrast.

While both frames were discussed in the previous column, it is convenient to reiterate their definitions and to present in detail their derivatives. In this later case, some new results will be presented.

The VBN frame is defined as

 \hat V = \frac{\vec v}{|\vec v|} \; ,

 \hat N = \frac{ \vec r \times \vec v }{ | \vec r \times \vec v | } \;,


 \hat B = \hat V \times \hat N \; .

Using the general results derived last time, the derivative of \hat V is expressed as

 \frac{d \hat V}{dt}  = \frac{\vec a}{|\vec v|} - \vec v \left( \frac{\vec v \cdot \vec a}{|\vec v|^3} \right) \; .

It will be convenient to define \vec L = \vec r \times \vec v. Doing so allows for the time derivative of \hat N to be compactly written as

 \frac{d \hat N}{dt} = \frac{\vec r \times \vec a}{|\vec L|} - \vec L \left( \frac{\vec L \cdot \dot{\vec L}}{|\vec L|^3} \right) \; ,

where \dot{\vec L} = \vec r \times \vec a.

The time derivative of \hat B is then given in terms of \hat V and \hat N and their derivatives as

 \frac{d \hat B}{d t} = \frac{d \hat V}{d t} \times \vec N + \vec V \times \frac{d \hat N}{d t} \; .

The RIC frame is defined by

 \hat R = \frac{\vec r}{|\vec r|} \; ,

 \hat C =  \frac{ \vec r \times \vec v }{ | \vec r \times \vec v | } \; ,


 \hat I = \hat C \times \hat R \; .

The time derivative of \hat R is

 \frac{d \hat R}{dt}  = \frac{\vec v}{|\vec r|} - \vec r \left( \frac{\vec r \cdot \vec v}{|\vec r|^3} \right) \; .

which is a formula analogous to the one for \hat V.

Since RIC's \hat C is the same as VBN's \hat N, its time derivative is identical with a minor change in symbols.

The time derivative of \hat I is given by

 \frac{d \hat I}{d t} = \frac{d \hat C}{dt} \times \hat R + \hat C \times \frac{d \hat R}{dt} \; .

As discussed in Cross Products and Matrices, the moving frame's motion can be expressed in terms of classical angular momentum, with the mapping \alpha \rightarrow -\omega_z, \beta \rightarrow \omega_y, and \gamma \rightarrow - \omega_x.

With this mapping in hand, one can see that minimal frames are special in that they only have two non-zero components of their angular velocity. For the twisted cubic, the three components of the VBN angular momentum are

Twisted Cubic - VBN Angular Velocity

and the three components of the RIC angular velocity are

Twisted Cubic - RIC Angular Velocity

Note that, in the VBN frame, all three components of the angular velocity are generally non-zero, whereas in the RIC, \omega_y is always zero.

Also interesting is the time evolution of the magnitude of the angular velocity, |\mathbf{\omega}|, in the two frames.

Twisted Cubic - Mag Angular Velocity

The VBN frame is generally 'rotating' more slowly than the RIC frame away from the 'origin' but quickly rises above in the vicinity.

To close, it is important to note that minimal frames are not intrinsically better than non-minimal frames. Minimal frames may be more convenient in some contexts and more inconvenient in others, but it is likely that their best use is in analytic models since the amount of work is reduced by a third.

Minimal Frames

One of the most remarkable things about vectors is also one of the most non-intuitive things - at least judging by how hard it is to teach; namely that the rate of change in a set of vectors can be expressed in terms of the vectors themselves. I think it is hard to relate to because it is typically applied in situation where a set of unit vectors, spanning a space, are moving with respect to a set of fixed vectors. The notion that is hard to accept is that time derivatives of these vectors can be written as linear combinations of the same vectors, even though they are changing. On the surface, it seems like a contradiction where something moving and fluid is expressed in terms of something that is moving as well. A sort of self-referential recipe for confusion. And yet, this is precisely what the method of moving frames does and which has proven very successful.

In a nutshell, the method of moving frames states that the motion of a set of unit vectors \left\{\hat e_x, \hat e_y, \hat e_z\right\}, which, for example, may be the body axes for a rotating object, can be written as

 \frac{d}{dt} \left[ \begin{array}{c} \hat e_x \\ \hat e_y \\ \hat e_z \end{array} \right] = \left[ \begin{array}{ccc} 0 & \alpha & \beta \\ -\alpha & 0 & \gamma \\ -\beta & -\gamma & 0 \end{array} \right] \left[ \begin{array}{c} \hat e_x \\ \hat e_y \\ \hat e_z \end{array} \right] \; .

The assymetry of the of pre-multiplying matrix, is a consequence of the unit lengths of each member of the set and their mutual orthogonality. In other words, the orthonormality,

 \hat e_i \cdot \hat e_j = \delta_{ij} \; ,

of the set leads to the relation

 \frac{d}{dt} \left( \hat e_i \cdot \hat e_j \right) = 0 \;.

The above relation has two very different outcomes depending on what the indices i, j are. If they are identical, then the zero on the right-hand side reflects the unit length of each of the basis vectors. When they are different, the zero reflects the fact that they are perpendicular.

To be concrete, consider the case where i=j=x. Expanding the derivative in this case leads to the relation that the time-rate-of-change of each unit vector is perpendicular to the unit vector itself:

 \frac{d}{dt} \left(\hat e_x \cdot \hat e_x \right) = \frac{d \hat e_x}{dt} \cdot \hat e_x + \hat e_x \cdot \frac{d \hat e_x}{dt} = 2 \hat e_x \cdot \frac{d \hat e_x}{dt} = 0\; .

A consequence of this relationship and the fact that the set spans the space is that the time derivative of \hat e_xmust be expressed as

\frac{d \hat e_x}{dt} = \alpha \hat e_y + \beta \hat e_z  \; ,

where \alpha and \beta can be determined as shown below.

Next consider the case where i=x and j=y. A simple expansion yields

\frac{d}{dt}\left( \hat e_x \cdot \hat e_y \right) = \frac{d \hat e_x}{dt} \cdot \hat e_y + \hat e_x \cdot \frac{d \hat e_y}{dt} = 0 \; ,

from which one can conclude that

 \frac{d \hat e_x}{dt} \cdot \hat e_y = - \hat e_x \cdot \frac{d \hat e_y}{dt} \; .

The assymetry of the matrix follows immediately.

On the surface of it, the above analysis suggests that three functions, \alpha, \beta, \gamma are needed to fully specify the motion of a frame. A deeper analysis shows that there are cases where one of these functions can be set identically to zero. I refer to frames of this sort as being minimal frames.

A minimal frame may arise in two ways, either the particular choice of motion for the frame results in a simplification, or the frame is intrinsically minimal, regardless of how the basis vectors twist and turn.

The interest in this column is the latter case (although the former will be touched upon).

The scenario that will be analyzed is where the basis vectors are defined locally on a curve, typically in terms of the position, \vec r(t) and the velocity, \vec v(t). Since normalization plays an important rule, a wise approach is to explore how to take derivatives of functions of vector norms in generic terms and then apply them widely and fruitfully.

The prototype function is the norm itself, expressed, in terms of an arbitrary vector \vec A, as

 |\vec A | = \sqrt{ A_x^2 + A_y^2 + A_z^2 } \; .

The partial derivative of the norm with respect to one of the components is given by

 \frac{\partial |\vec A|}{\partial A_i} = \frac{1}{2} \left( A_x^2 + A_y^2 + A_z^2 \right)^{1/2} \frac{\partial}{\partial A_i} \left( A_x^2 + A_y^2 + A_z^2 \right) = \frac{A_i}{|\vec A|} \; .

Unitizing a vector involves dividing by the norm, so the corresponding derivative,

 \frac{\partial}{\partial A_i} \frac{1}{|\vec A|} = -\frac{A_i}{|\vec A|^3} \; ,

is also handy to have lying around.

From these basic pieces, total time derivatives are constructed from the chain rule as

 \frac{d}{dt} |\vec A| = \frac{\partial |\vec A|}{\partial A_i} \frac{d A_i}{dt} = \frac{A_i {\dot A}_i}{|\vec A|}


 \frac{d}{dt} \frac{1}{|\vec A|} = - \frac{A_i {\dot A}_i}{|\vec A|^3} \; .

All the needed tools are at our fingertips so let's dig in.

The most famous minimal frame is the Frenet-Serret, defined by the set

 \hat T = \frac{\vec v}{|\vec v|} \; ,

 \hat B = \hat T \times \hat N \; ,


 \hat N = \frac{d \hat T}{d t} / \left| \frac{d \hat T}{d t} \right | \;

By definition, the derivative of \hat T is expressed solely in terms of \hat N, and the Frenet-Serret frame is minimal by construction. It is conventional to rescale the derivatives to express them in terms of arc-length with

 \frac{d}{dt} \hat T = \frac{d \hat T}{d s}  \frac{ds}{dt} = v \frac{d \hat T}{d s}

as the key formula. Once the conversion has been performed, the definition of \hat N takes a particularly simple form

 \frac{d}{d s} \hat T \equiv \kappa \hat N \;

The other derivatives follow suite, giving the well-known Frenet-Serret relations

 \frac{d}{d s} \hat N = - \kappa \hat T + \tau \hat B


 \frac{d}{d s} \hat B = -\tau \hat N \; .

The Frenet-Serret frame is not the only useful moving frame. Within the field of astrodynamics, there are several moving frames that help in understanding the motion of heavenly and man-made satellites. Two of the most useful are the VBN and RIC frames.

The VBN frame is defined as

 \hat V = \frac{\vec v}{|\vec v|} \; ,

 \hat N = \frac{ \vec r \times \vec v }{ | \vec r \times \vec v | } \;,


 \hat B = \hat V \times \hat N \; .

Patterned in concept after the Frenet-Serret frame, VBN frame's \hat V vector is the same as \hat T. However VBN differs in one essential way. It's \hat N points along the instantaneous angular momentum vector of the trajectory. This means that there is always a roll-angle about \hat T \equiv \hat V to bring the two frames into alignment. As a result VBN is not a minimal frame.

The proof of this result starts from the computation of the time derivative of \hat V given by

 \frac{d \hat V}{dt}  = \frac{d}{dt} \frac{\vec v}{|\vec v|} = \frac{\vec a}{|\vec v|} - \vec v \left( \frac{\vec v \cdot \vec a}{|\vec v|^3} \right) \; ,

using the formulas derived above.

As a check that the computation was carried through correctly, note that time derivative of \hat V is perpendicular to \hat V itself:

 \frac{d \hat V}{dt} \cdot \hat V = \frac{\vec a \cdot \vec v}{|\vec v|^2} - \vec v \cdot \vec v \left( \frac{\vec v \cdot \vec a}{|\vec v|^4} \right) = 0 \; .

The time-derivatives of \hat N and \hat B are straightforward but tedious and are not needed to demonstrate the frame is non-minimal. All that is needed is to take the scalar product of the time derivative \hat V with these vectors and then to use the fact the assymetry of the matrix.

Doing so with \hat N yields

 \frac{d \hat V}{dt} \cdot \hat N = \frac{\vec a \cdot (\vec r \times \vec v) }{|\vec v||\vec r \times \vec v|} - \vec v \cdot (\vec r \times \vec v) \left( \frac{\vec v \cdot \vec a}{|\vec v|^3 |\vec r \times \vec v|} \right) \; .

The second term vanishes by the cyclic property of the triple-scalar product and the first can be transformed using the same rule into

 \frac{d \hat V}{dt} \cdot \hat N = \frac{\vec v \cdot (\vec a \times \vec r) }{|\vec v||\vec r \times \vec v|} \; .

Only in special cases of central forces, where \vec a is parallel to \vec r is this derivative zero; generically it is not.

The final step is to replace \hat N with \hat B in the scalar product. This yields

 \frac{d \hat V}{dt} \cdot \hat B = \frac{d \hat V}{dt} \cdot (\hat V \times \hat N) = \hat V \cdot \left(\frac{d \hat V}{dt} \times \hat N \right) \; .

Expanding the right-hand side of this relation gives

 \frac{d \hat V}{dt} \cdot \hat B = \frac{ \vec  v \cdot [\vec a \times (\vec r \times \vec v)] }{ |\vec v|^3 |\vec r \times \vec v |} - \frac{ \vec v \cdot [\vec v \times (\vec r \times \vec v)]}{|\vec v|^5 |\vec r \times \vec v|} \; .

The second terms is identically zero as can be seen by expanding using the BAC-CAB rule

 \vec v \cdot [\vec v \times (\vec r \times \vec v)] = \vec v \cdot \left[ \vec r (\vec v \cdot \vec v) - \vec v (\vec v \cdot \vec r) \right ] = 0 \; .

Using the same technique, one concludes that the first term is generally not zero

 \vec v \cdot [\vec a \times (\vec r \times \vec v)] = \vec v \cdot \left[ \vec r (\vec v \cdot \vec a) - \vec v (\vec a \cdot \vec r) \right ] \; ,

since, generally, the acceleration need not cause a cancellation (although in the case of a central force \vec a \cdot \vec v = 0 and VBN is a minimal frame).

The other, commonly-used astrodynamics frame is the RIC frame, defined by

 \hat R = \frac{\vec r}{|\vec r|} \; ,

 \hat C =  \frac{ \vec r \times \vec v }{ | \vec r \times \vec v | } \; ,


 \hat I = \hat C \times \hat R \; .

Note that \hat C is the same as VBN's \hat N. The RIC frame is minimal, as can be seen by computing the time derivative of \hat C,

 \frac{d}{dt} \hat C = \frac{ \vec r \times \vec a}{|\vec r \times \vec v|} - \frac{ \vec r \times \vec v}{|\vec r \times \vec v|^3}  \left[ (\vec r \times \vec v) \cdot (\vec r \times \vec a) \right] \; ,

and then forming its scalar product with \hat R,

 \frac{d \hat C}{d t} \cdot \hat R = A (\vec r \times \vec a ) \cdot \vec r + B (\vec r \times \vec v) \cdot \vec r \; ,

where A and B are scalar functions whose form is irrelevant for this discussion.

Exploiting the cyclic property of the triple scalar product gives

 \frac{d \hat C}{d t} \cdot \hat R = 0 \; .

This means that the time derivative of \hat C is proportional only to \hat I thus proving that the RIC frame is minimal.

Relative Motion and Waves Redux

Last month's column dealt with the transformation of the wave equation under a Galilean transformation. Under this transformation, a wave moving with speed c in a medium (e.g. a string or slinky or whatever) itself moving with speed v with respect to some observer will be seen by that observer to be moving with speed c \pm v depending on direction. To get this result, some clever manipulations had to be made that related a set of mixed partial derivatives in one frame with respect to the other.

It is natural enough to ask if the wave equation admits other transformations and, if so, do any of these represent anything physical. The idea lurking around is special relativity and the famous null result of the Michelson-Morley interferometer. But for the sake of this post, we will assume only simple physics, dictated by symmetry, and will discover that the Lorentz transformation falls out of the mathematical constraints that arise from the wave equation.

The situation is the same as in the previous post, with an observer {\mathcal O}' seeing the wave generated in the frame of {\mathcal O}.

This time the transformation used will be symmetric in space and time, with the form

 x' = \gamma( x + vt )


 t' = \gamma( t + \alpha x) \; .

The term \gamma is a dimensionless quantity (whose value, at this point, may be unity) while \alpha has units of an inverse velocity. Both of them will be determined from the transformation and some simple physical principles.

Assume, as before, that the wave equation as described by {\mathcal O} is given by

 c^2 \frac{\partial^2 f}{\partial x^2} - \frac{\partial^2 f}{\partial t^2} = 0 \; .

The transformation of the partial derivative with respect to the spatial variable in the unprimed to the primed frame is obtained through the chain rule

 \frac{\partial f}{\partial x} = \frac{\partial f}{\partial x'} \frac{\partial x'}{\partial x} + \frac{\partial f}{\partial t'} \frac{\partial t'}{\partial x} \; ,

with an analogous relation for the partial derivatives with respect to time.

Using the transformation above gives

 \frac{\partial f}{\partial x} = \gamma \frac{\partial f}{\partial x'} + \alpha \frac{\partial f}{\partial t'}


 \frac{\partial f}{\partial t} = \gamma v \frac{\partial f}{\partial x'} + \gamma \frac{\partial f}{\partial t'} \; .

The second partial derivatives are obtained in the same way, but because of the transformation, four terms result from the expansion of

 \frac{\partial^2 f}{\partial x^2} = \frac{\partial}{\partial x} \left( \gamma \frac{\partial f}{\partial x'} + \alpha \frac{\partial f}{\partial t'} \right)

rather than 2 or 3 in the Galilean case considered before.

Assuming that the mixed partials commute, the middle two terms combine to give

 \frac{\partial^2 f}{\partial x^2} = \gamma^2 \frac{\partial^2 f}{\partial x'^2} + 2 \gamma^2 \alpha \frac{\partial^2 f}{\partial x' \partial t'} + \gamma^2 \alpha^2 \frac{\partial^2 f}{\partial t'^2} \; .

Likewise, the second partial derivative with respect to time

 \frac{\partial^2 f}{\partial t^2} = \frac{\partial}{\partial t} \left( \gamma v \frac{\partial f}{\partial x'} + \gamma \frac{\partial f}{\partial t'} \right)


 \frac{\partial^2 f}{\partial t^2} = \gamma^2 v^2 \frac{\partial^2 f}{\partial x'^2} + 2 \gamma^2 v \frac{\partial^2 f}{\partial t' \partial x'} + \gamma^2 \alpha^2 \frac{\partial^2 f}{\partial t'^2} \; .

Substituting these results into the original wave equation leads to a wave equation expressed in {\mathcal O}''s frame of

 c^2 \frac{\partial^2 f}{\partial x^2} - \frac{\partial^2 f}{\partial t^2}  = \gamma^2(c^2 - v^2) \frac{\partial^2 f}{\partial x'^2} - \gamma^2 \left(1 - \frac{v^2}{c^2} \right) \frac{\partial^2 f}{\partial t'^2} + 2 \gamma^2(\alpha c^2 - v) \frac{\partial^2 f}{\partial t' \partial x'} = 0 \; .

There is no need to argue the mixed partial derivative into a different form (as was the case in the Galelean case) since a simple selection of

 \alpha c^2 - v = 0

eliminates the term entirely. Solving this equation determines the unknown term \alpha as

 \alpha = \frac{v^2}{c^2} \; ,

leaving the wave equation to take the invariant form of

 c^2 \frac{\partial^2 f}{\partial x'^2} - \frac{\partial^2 f}{\partial t'^2} = 0 \; .

In other words, the speed of the wave is constant regardless of how either the source or the observer move with respect to each other.

The only unknown term, \gamma, can be determined by the requirement that the transformation be invertible and that the determinant of the matrix be unity. The reason for the first requirement rests with the fact that there is no preferred frame - that the physics described on one frame can be related to the other no matter which frame is picked as the starting point. Failure to impose the second requirement would mean that the physics in one frame could be changed simply by transforming to another frame and then transforming back.

Since the transformation can be written matrix form as

 \left[ \begin{array}{c} x' \\ t' \end{array} \right] = \left[ \begin{array}{cc} \gamma  & \gamma v \\ \gamma \frac{v}{c^2} & \gamma \end{array} \right] \left[ \begin{array}{c} x \\ t \end{array} \right]

then the requirement of a unit determinant means

 \gamma^2 - \gamma^2 \frac{v^2}{c^2} = 1 \; .

Solving for \gamma give the famous Lorentz-Fitzgerald contraction

 \gamma = \frac{1}{\sqrt{1-v^2/c^2}} \; .

Note that in the limit as v \rightarrow 0, \gamma \rightarrow 1 and \alpha \rightarrow 0, thus restoring the Galilean transformation.

Thus only by starting with a symmetry principle that treats space and time on equal footing and some simple physical requirements on the wave equation, we can find a transformation that predicts that the speed of the wave is constant with respect to all observers. This is exactly what is seen in experiments and the resulting transformation is the gateway into the full machinery of special relativity.