Monthly Archive: February 2015

Varying Variations

Mathematicians often complain about the physicist’s use of mathematics. They have even claimed that the math that physicists use is comparable to a pidgin in a spoken language.

That may be true. Certainly physicists tend to play with mathematics in a way that makes mathematicians uncomfortable, since physicists are interested in modeling the physical world and not expanding the mathematical frontiers. Nonetheless, I think that the criticism is mostly misplaced and widely exaggerated. However, there are pockets of ‘slang’ that pop up from time to time that are particularly troubling, and not just to mathematicians. This week, I would like to critique some notation that is commonly embraced in quantum field theory (QFT) circles for performing variational calculus.

My aim here is to normalize relations between a physicist’s desire for brevity and minimalism and the need for clarity for the beginner. To do that, I am going to first present the traditional approach to the calculus of variations, and then compare and contrast that to the approach currently in vouge in QFT circles. My presentation for the traditional approach follows roughly what can be found in Classical Dynamics of Particles and System by Marion or Classical Mechanics by Goldstein, although I have remedied some notational shortcomings in preparation for the comparison to the QFT approach. The modern QFT notation is taken from The Six Core Theories of Modern Physics by Stevens and Quantum Field Theory for the Gifted Amateur by Lancaster and Blundell.

The basic notion of variational calculus is the idea of a functional – a mapping between a set of functions and the real numbers. The definite integral

\[ I = \int_{-1}^{1} dx \, f(x) \]

is the prototype example, since it takes in a given function $$f(x)$$ and spits out a real number. Generally, more sophisiticated examples take a function $$f(x)$$, feed it into expression $$L(f(x),f'(x);x)$$ as the integrand (where $$f'(x) = df/dx$$), and then carry out the integral

\[ J[f(x)] = \int_{x_1}^{x_2} dx \, L( f(x) ; x) \; .\]

A few words are in order about my notation. The square brackets are where the function $$f(x)$$ is plugged into the functional, and the variable appearing as the argument for the function (here $$x$$) is the dummy variable of integral. Generally, we don’t have to specify the dummy variable in an integral but, as will become clear below, when we are computing functional derivatives, specification of the dummy varible serves as a compass for navigating the notation. The expression $$L$$ that maps the function $$f$$ into the integrand is called the Lagrangian. Any additional parameters upon which the integral may depend will be either understood or, when needed explicitly, will be called out by appending conventional function notation on the end. So $$J[f(x)](\alpha)$$ means that the dependence of $$J[f(x)]$$ on the parameter $$\alpha$$ is being explicitly expressed. Note that $$L$$ can produce an integrand that involves some nonlinear function of $$f$$ (e.g., $$f^2$$, $$\cos(f)$$, etc.) and one or more of it’s derivatives (e.g., $$(f’)^2$$, $$\sqrt{1 + f’^2}$$, $$f”$$, etc.). It is understood that the limits are known and fixed ahead of time so that these values will be suppressed.

As an example in using this notation, let’s take the Lagrangian to be

\[ L(f(x) ; x) = x^2 f(x) \]

and the functional as

\[ J[f(x)] = \int_{-1}^{1} dx \, L(f(x);x) = \int_{-1}^{1} dx \, x^2 f(x) \; \]

Then $$J[x^2] = 2/5$$, $$J[cos(y)] = 4 \cos(1) – 2 \sin(1)$$, and $$J[\alpha/\sqrt{1+x^2}] = \alpha \left( \sqrt(2) – \sinh^{-1}(1) \right)$$. The Maxima code that performs this functional is

J(arg,x) := block([expr],
                  expr : arg*x^2,
                  integrate(expr,x,-1,1));

Note that the dummy variable of integration is separately specified and no error checking is done to ensure that arg is expressed in the same variable as x.

The natural question to ask is how the value of the functional changes as the function plugged in is varied, and which function gives the minimum or maximum value. This is a much more complicated problem than the traditional derivative as it isn’t immediately clear what it means for one function to be close to another. The classical approach to this was to define a function family

\[ f_{\alpha}(x) = f(x) + \alpha \eta(x) \]

so that the function that yields an extremum obtains when $$\alpha = 0$$. Also $$\eta$$ is continuous and non-singular and $$\eta(x_1) = \eta(x_2) = 0 $$ so that $$f_{\alpha}$$ behaves like $$f$$ at the end points and differs only in between. Plugging $$f_{\alpha}$$ into the functional gives

\[ J[f_{\alpha}(x)] = \int_{x_1}^{x_2} dx L(f_{\alpha}(x);x) \; .\]

As rendered, $$J$$ is now a function of $$\alpha$$ and the extremum of the functional results from

\[ \left( \frac{ d J[f_{\alpha}(x)]}{d \alpha} \right) _{\alpha = 0} = 0 \; . \]

The derivative with respect to $$\alpha$$ is relatively easy to compute using the chain rule to yield

\[ \frac{d J[f_{\alpha}(x)]}{d\alpha} = \int_{x_1}^{x_2} dx \left\{ \frac{\partial L}{\partial f_\alpha} \frac{\partial f_\alpha}{\partial \alpha} + \frac{\partial L}{\partial {f_\alpha}’} \frac{\partial {f_\alpha}’}{\partial \alpha} \right\} \; .\]

The classical Euler-Lagrange equation result following an integration-by-parts on the second term, the subsequent setting of the boundary term to zero, and finally setting $$\alpha$$ to zero as well to give

\[ \left. \frac{d J[f_{\alpha}(x)]}{d \alpha} \right|_{\alpha = 0} = \int_{x_1}^{x_2} dx \left\{ \frac{\partial L}{\partial f} – \frac{d}{dx} \left( \frac{\partial L}{\partial f’} \right) \right\} \left. \frac{\partial f}{\partial \alpha} \right|_{\alpha = 0} \; .\]

Of course, in order to get the equation free of the integral, one has to argue that the function $$\eta(x)$$ is arbitrary and so the only way that the integral can be zero is if the part of the integrand multiplying $$\eta(x)$$ is a zero.

Over time, there was a subsequent evolution of the notation to make variational derivatives look more like traditional derivatives. This ‘delta-notation’ defines

\[ \left. \frac{d J[f_{\alpha}(x)]}{dt} \right|_{\alpha = 0} d \alpha \equiv \delta J \]

and

\[ \left. \frac{d f_{\alpha}}{d \alpha} \right|_{\alpha = 0} d \alpha \equiv \delta f \]

and then expresses the variation as

\[ \delta J = \int_{x_1}^{x_2} dx \left\{ \frac{\partial L}{\partial f} – \frac{d}{dx} \left( \frac{\partial L}{\partial f’} \right) \right\} \delta f = \int_{x_1}^{x_2} dx \, \frac{\delta J(x)}{\delta f(x)} \delta f(x) \; ,\]

which is analogous to the total derivative of a function $$J(\{q^i\})$$

\[ d J = \sum_{i} \frac{\partial J}{\partial q^i} dq^i \; .\]

The idea here is a typical one employed in theoretical physics where $$\sum_i \rightarrow \int dx$$, $$dq^i \rightarrow \delta f(x)$$, and $$i$$ and $$x$$ are indices, the former being discrete and the latter being continuous.

Over time, the physics community (mainly the QFT community) abstracted the delta notation even further by assuming that $$\eta(x) = \delta(x-x’)$$ where $$x’$$ is some fixed value, which I call the source value since it is imagined as the point source where the perturbation originates. The problem is that their notation for the functional derivative is given by (see e.g. Stevens page 34)

\[ \frac{\delta F[f(x)]}{\delta f(x’)} \; . \]

The natural question for the beginner who has seen the classical approach or is consulting older books is why the two indices $$x$$ and $$x’$$ when the delta-notation expression above has only $$x$$? There is no clear answer to be found in any of the texts I’ve surveyed so I’ll offer one on their behalf. By slightly modifying the definition of the functional derivative from the classical $$\alpha$$-family result to

\[ \frac{\delta J[f(x)]}{\delta f(x’)} = \lim_{\alpha \rightarrow 0} \left\{ \frac{ J[f(x) + \alpha \delta(x-x’)] – J[f(x)] }{\alpha} \right\} \; .\]

we can get an economy of notation later on. The price we pay is that we now need to track two indices. The top one now gives us the dummy variable integration and the bottom the source point. Of course, the dummy variable eventually becomes the source variable when the delta function breaks the integral but the definition requires its presence. (Note that Lancaster and Blundell, in particular, have a haphazard notation that sometimes drops the dummy variable and/or reverses the dummy and source points – none of these are particularly wrong but are confusing and sloppy.)

Now for the promised gain. To see the benefit of the new notation, consider the identity functional

\[ I[f(x)](y) = \int dx \, f(x) \delta(x-y) = f(y) \; .\]

It’s functional derivative is

\[ \frac{\delta I[f(x)](y)}{\delta f(z)} = \lim_{\alpha \rightarrow 0} \frac{ I[f(x) + \alpha \delta(x-z)](y) – I[f(x)](y) }{\alpha} \; ,\]

which simplifies to

\[ \frac{\delta I[f(x)](y)}{\delta f(z)} = \int dx \, \delta(x-y) \delta(x-z) = \delta(y-z) \; . \]

Substituting $$f(y) = I[f(x)](y)$$ we get the very nice relation

\[ \frac{\delta f(y)}{\delta f(z)} = \delta(y-z) \; . \]

This relation leads to a particularly clean derivation of the functional derivative of the kernel

\[ J[f(x)](y) = \int dx \, K(y,x) f(x) \; \]

as

\[ \frac{\delta J[f(x)](y)}{\delta f(z)} = \frac{\delta}{\delta f(z)} \int dx \, K(y,x) f(x) = \int dx \, K(y,x) \delta(x-z) = K(y,z) \; . \]

From this relation we can also get the classical Euler-Lagrange equations in short order as follows. First compute the variational derivative

\[ \frac{\delta J[f(x)]}{\delta f(y)} = \frac{\delta}{\delta f(y)} \int dx \, L(f(x);x) = \int dx \, \left\{ \frac{\partial L}{\partial f(x)} \frac{\delta f(x)}{\delta f(y)} + \frac{\partial L}{\partial f'(x)} \frac{\delta f'(x)}{\delta f(y)} \right\} \; .\]

and then integrate-by-parts (assuming the boundary term is zero) to get

\[ \frac{\delta J[f(x)]}{\delta f(y)} = \int dx \, \left\{ \frac{\partial L}{\partial f(x)} – \frac{d}{dx} \left( \frac{\partial L}{\partial f'(x)} \right) \right\} \frac{\delta f(x)}{\delta f(y)} \; .\]

The delta function $$\delta f(x)/\delta f(y)$$ breaks the integral and gives the Euler-Lagrange equation.

I close this note out but emphasizing that there is no new content in the QFT abstraction. At its core, it is exactly the same computation as the classical case. It doesn’t even provide a new outlook like Lagrangian of Hamiltonian mechanics do for Newtonian mechanics. It is simply a better way of bookkeeping.

Cross Products and Matrices

Last time we found the relationship

\[ \left( \vec \omega \times \right) \vec r = \left( -A(t)^{-1} \dot A(t) \right) \vec r \]

where $$A(t)$$ is the transformation (attitude) matrix for a time-varying transformation from a fixed inertial frame $$\{ \hat \imath, \hat \jmath, \hat k \}$$ to a rotating frame $$\{\hat I,\hat J, \hat K\}$$, and $$\vec \omega$$ is the instantaneous angular velocity of the rotating frame expressed in the fixed frame. As I argued earlier, the angular velocity is difficult to find, and I advocate solving the problem using the transformation matrix. In this entry, I will be discussing the bigger picture of why a vector cross product can be characterized by a matrix multiplication.

Understanding this connection starts with a general discussion of how to describe the motion of a rotating frame. The key observation can be summarized succinctly. Since the vectors of the rotating frame span the space, the time derivatives of the basis vectors can be expressed in terms of the basis vectors themselves. Mathematically, this observation is expressed as

\[ \frac{d \hat I }{dt} = \alpha \hat I + \beta \hat J + \gamma \hat K \; ,\]

with analogous expressions for $$\frac{d \hat J}{d t}$$ and $$\frac{d \hat K}{d t}$$.

At first glance there is a tendancy to assume that $$9$$ numbers are needed to express the time-rate-of-change of the frame in terms of the frame itself. The actual number needed is much smaller and can be determined by using the orthogonality relation

\[ \hat e_i \cdot \hat e_j = \delta_{ij} \; i, j = 1,2,3 \; ,\]

where $$\hat e_1 = \hat I$$, $$\hat e_2 = \hat J$$, and $$\hat e_3 = \hat K$$.

Taking the time derivative of the orthogonality relation leads to the innocent-looking expression

\[ \frac{d }{dt} \left( \hat e_i \cdot \hat e_j \right) = 0 \; , \]

which is packed with lot of simplifications. The first simplification comes by setting $$i = j$$ giving

\[ \frac{d \hat e_i}{dt} \cdot \hat e_i = 0 \; . \]

From that we immediately see that

\[ \frac{d \hat I }{dt} \cdot \hat I = 0 \; \; \Rightarrow \; \; \alpha = 0 \; . \]

Likewise, when $$i \neq j$$, we get

\[ \frac{d \hat e_i}{dt} \cdot \hat e_j = – \hat e_i \cdot \frac{d \hat e_j}{dt} \]

which gives, in turn,

\[ \frac{d \hat I }{dt} \cdot \hat J = – \frac{d \hat J }{dt} \cdot \hat I = \beta \]

and

\[ \frac{d \hat I }{dt} \cdot \hat K = – \frac{d \hat K }{dt} \cdot \hat I = \gamma \; .\]

This process can be carried out for $$\hat J$$ with its expansion being

\[ \frac{d \hat J }{dt} = -\beta \hat I + 0 \hat J + \delta \hat K \; , \]

where $$\delta$$ is defined through the equation

\[ \frac{d \hat J }{dt} \cdot \hat K = – \frac{d \hat K }{dt} \cdot \hat J = \delta \; .\]

At this point, there is no freedom left for $$\hat K$$ and the three functions $$\left\{ \alpha, \beta, \gamma \right\}$$ completely specify how the rotating frame moves relative to itself. These three relationships better disclose their content when written in matrix form as

\[ \left[ \begin{array}{c} d \hat I/dt \\ d \hat J / dt \\ d \hat K /dt \end{array} \right] = \underbrace{\left[ \begin{array}{ccc} 0 & \beta & \gamma \\ -\beta & 0 & \delta \\ -\gamma & -\delta & 0 \end{array} \right]}_{W(t)} \left[ \begin{array}{c} \hat I \\ \hat J \\ \hat K \end{array} \right] \; . \]

There are two remaining steps. First is to relate $$\left\{ \alpha, \beta, \gamma \right\}$$ to $$\vec \omega_{rotating}$$. The second step is to relate $$\vec \omega_{rotating}$$ to $$\vec \omega$$ by using the transformation matrix $$A(t)$$.

To related $$\vec \omega_{rotating}$$ to $$\left\{ \alpha, \beta, \gamma \right\}$$ let’s look at $$\vec \omega_{rotating} \times \vec r$$

\[ \left| \begin{array}{ccc} \hat I & \hat J & \hat K \\ \omega_I & \omega_J & \omega_K \\ r_I & r_J & r_K \end{array} \right| = \left[ \begin{array}{c} \omega_J r_K – \omega_K r_J \\ \omega_K r_I – \omega_I r_K \\ \omega_I r_J – \omega_J r_I \end{array} \right] \]

and compare it to $$W(t) \vec r_{rotating}$$

\[ W(t) \vec r_{rotating} = \left[ \begin{array}{c} \beta r_J + \gamma r_K \\ -\beta r_I + \delta r_K \\ -\delta r_J -\gamma r_I \end{array} \right] \; ,\]

from which we conclude that

\[ W(t) = \left[ \begin{array}{ccc} 0 & \beta & \gamma \\ -\beta & 0 & \delta \\ -\gamma & -\delta & 0 \end{array} \right] = \left[ \begin{array}{ccc} 0 & -\omega_K & \omega_J \\ \omega_K & 0 & -\omega_I \\ -\omega_J & \omega_I & 0 \end{array} \right] \; .\]

It is convenient to define

\[ \Omega(t) = – W(t) = \left[ \begin{array}{ccc} 0 & \omega_K & -\omega_J \\ -\omega_K & 0 & \omega_I \\ \omega_J & -\omega_I & 0 \end{array} \right] \]

since now the action of $$\Omega(t)$$ on $$\vec r_{rotating}$$ is the same as the action of $$\vec \omega_{rotating} \times$$.

Finally, to get back to the inertially fixed frame, we can use the chain of relations

\[A(t) \left( \vec \omega \times \vec r \right) = (A(t) \vec \omega) \times (A(t) \vec r) \\ = \left( \vec \omega_{rotating} \times \right) \left( A(t) \vec r \right) = \Omega(t) A(t) \vec r = – \dot A(t) \vec r \]

to obtain

\[ \Omega(t) A(t) = – \dot A(t) \; .\]

Note that this relation follows immediately from the $$d \hat e_i /dt = W_{ij} \hat e_j$$ equation by expanding $$\hat e_i $$ and $$d \hat e_i /dt$$ in terms of $$\{ \hat \imath, \hat \jmath, \hat k\}$$.

Now a few remarks on why this works. First note that, from the time derivative of the orthogonality relation, the matrix relating $$\{\dot {\hat e_i} \}$$ to $$\{\hat e_j\}$$ must be antisymmetric. The number of free components of an $$N$$-dimensional antisymmetric matrix is $$N(N-1)/2$$. Only in $$N=3$$ is $$N(N-1)/2$$ equal to $$N$$. So, quite by accident (or providence), only in three dimensions does an antisymmetric matrix have the same number of components as a vector. The cross-product then mimics or prefigures the matrix product. Geometrically, these observations can be summarized by saying that, in three dimensions, a two-dimensional plane is in one-to-one correspondence with the normal vector to the plane. In lower dimensions, there is simply not enough structure to construct the normal spaces. In four or more dimensions, the normal space is larger than one-dimensional. This result also explains why there seems to be a ‘mismatch’ in the number of transformations needed for the $$\vec \omega \times \vec r$$ expression

\[ A(t) \left( \vec \omega \times \vec r \right) = \left( A(t) \vec \omega \right) \times \left( A(t) \vec r \right) \; .\]

So, living in three dimensions is a special place to be.

Rotating Frames

Why care about rotating frames? Well, the first and foremost reason is that we live on a large rotating frame. The rotation of the Earth about its axis may not seem to be of direct importance in our day-to-day lives, but the Coriolis force due to the planet’s rotation has its effects. It shapes the patterns of the weather and plays an intimate role in long-range warfare. And who hasn’t been at least a little enchanted by the motion of Foucault’s pendulum? So, a basic understanding of rotating frames may be motivated by these reasons alone.

That said, I am actually thinking of more mundane and more focused applications of rotating frames. Control of the orbital motion of spacecraft is best understood and implemented in frames that are attached to the motion itself. For example, suppose we want to change the energy of a trajectory. The best way to perform this adjustment is by firing thrusters so that they are aligned with the instantaneous direction of the velocity. Other mechanical applications include understanding the effects of high accelerations to the occupants of cars, planes, roller coasters, etc.

So, how do we describe motion in a rotating frame? There are many ways to do this, but I’ll confine myself to just two different methods. The first is the traditional method found in many physics textbooks, which I will call the ‘classical’ method. It involves the $$\vec \omega \times$$ terms. In my opinion, this method should be used only with great care, if at all. It is often confusing and ambiguous in most people’s hands. The second method, which I will call ‘semi-classical’, is much more reliable. It can be used by novice and expert alike and it doesn’t require the same amount of care. It does require a bit more in the way of computations, but these are easy to implement. In addition, the very structure of this method opens the doors for the more sophisticated computations of differential geometry, differential forms, and geometric algebras.

The classical picture starts with an object located at a position $$\vec r$$ from some origin, with a corresponding velocity $$\vec v$$ with respect to some fixed frame (inertial) spanned by the unit triad $$ \left\{ \hat \imath, \hat \jmath, \hat k \right\} $$. Co-located at the origin of the first frame is a second one spanned by the unit triad $$\left\{ \hat I, \hat J, \hat K \right\}$$ which is rotating arbitrarily as a function of time. Since the rotation, while time-dependent, is an instantaneous orthogonal transformation, the magnitude of the radius vector $$\vec r$$ remains fixed. Therefore, the velocity $$\vec v$$ must be perpendicular to $$\vec r$$ and, using arguments derived from uniform circular motion, the angular velocity, $$\vec \omega$$, defined implicitly as

\[ \vec v = \vec \omega \times \vec r \; ,\]

shows up on the scene.

Combining the velocity in inertial frame with the apparent velocity caused by the rotating frame, we arrive at the ‘old tried-and-true’ relation

\[ \left. \frac{d}{dt} \right)_{rotating} = \left. \frac{d}{dt} \right)_{fixed} – \vec \omega \times \; .\]

This relationship is completely frame-independent, which is both its strength and its weakness. Common mistakes associated with this frame-independence are either to ‘expect’ a particular result based on physical intuition and to fail to obtain it because the outcome is expressed in the wrong frame, or to use the wrong angular velocity, $$\vec \omega$$, which is easy to do for even the simplest of cases. Even ‘experts’ come to seriously wrong conclusions when applying or deriving the classical relationship (e.g., see Chapter 1 of ‘Classical Dynamics of Particle and Systems’, 2nd edition by Marion). In particular, one has to be very careful about being consistent with which frame is being used.

The better treatment, suitable for beginners and experts alike, is what I call the ‘semi-classical’ treatment, in which transformation matrices play a central role. At any instant, the component of the position vector transform is

\[ \vec r_{rotating} = A(t) \vec r_{fixed} \; .\]

The corresponding relationship for the velocity is obtained by taking the time derivative of the position equation to get

\[ \vec v_{rotating} = A(t) \vec v_{fixed} + \dot A(t) \vec r_{fixed} \; \]

where $$A(t)$$ is the transformation matrix that gives the instantaneous orientation (attitude) of the rotating frame relative to the fixed one. Of course, it takes some effort to construct that matrix, but it is straightforward to do since

\[ A(t) = \left[ \begin{array}{ccc} \hat I \cdot \hat \imath & \hat I \cdot \hat \jmath & \hat I \cdot \hat k \\ \hat J \cdot \hat \imath & \hat J \cdot \hat \jmath & \hat J \cdot \hat k \\ \hat K \cdot \hat \imath & \hat K \cdot \hat \jmath & \hat K \cdot \hat k \end{array} \right] \]

and

\[ \dot A(t) = \left[ \begin{array}{ccc} \dot {\hat I} \cdot \hat \imath & \dot {\hat I} \cdot \hat \jmath & \dot {\hat I} \cdot \hat k \\ \dot {\hat J} \cdot \hat \imath & \dot {\hat J} \cdot \hat \jmath & \dot {\hat J} \cdot \hat k \\ \dot {\hat K} \cdot \hat \imath & \dot {\hat K} \cdot \hat \jmath & \dot {\hat K} \cdot \hat k \end{array} \right] \;.\]

For most cases where one has enough knowledge to construct $$\vec \omega$$ one has enough knowledge to construct $$A(t)$$ since a model of how the rotating frame unit vectors are moving relative to the fixed frame is needed in both cases.

To illustrate the two methods side by side, let’s consider the example of a bead on a helical trajectory (with the ‘fixed’ subscript dropped to keep the notational clutter down)

\[ \vec r = a \cos \omega t \hat \imath + a \sin \omega t \hat \jmath + b t \hat k \; .\]

The fixed-frame velocity is immediately obtained

\[ \vec v = \frac{d}{dt} \vec r = -a \omega \sin \omega t \hat \imath + a \omega \cos \omega t \hat \jmath + b \hat k \; .\]

Now, suppose we want to view the bead’s motion in a frame co-rotating with the bead. The rotating frame’s $$\hat K$$ axis coincides with the fixed frame $$\hat k$$ and the rotating frame spins about this axis with an angular rate of $$\omega$$ and with a corresponding angular velocity of

\[ \vec \omega = \omega \hat k = \omega \hat K \; .\]

Our physical expectation is that, in the rotating frame, the bead should just move vertically along $$\hat K$$. Let’s see if it does.

Application of the classical formula gives

\[ \vec v_{rotating} = \vec v – \vec \omega \times \vec r = \left[ \begin{array}{c} -a \omega \sin \omega t \\ a \omega \cos \omega t \\ b \end{array} \right] – \left| \begin{array}{ccc} \hat \imath & \hat \jmath & \hat k \\ 0 & 0 & \omega \\ a \cos \omega t & a \sin \omega t & 0 \end{array} \right| \; , \]

which becomes

\[ \vec v_{rotating} = \left[ \begin{array}{c} -a \omega \sin \omega t \\ a \omega \cos \omega t \\ b \end{array} \right] – \left[ \begin{array}{c} -a \omega \sin \omega t \\ a \omega \cos \omega t \\ 0 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \\ b \end{array} \right] = b \hat k\; .\]

So, from the point of view of an observer rotating with $$\vec \omega = \omega \hat k$$ the bead moves upward at a constant velocity $$b \hat k $$. But is this correct? Technically it is not correct, because our observer should be referencing his observation to the rotating frame. He ‘gets’ the correct answer only because $$\vec v_{rotating} = b \hat K = b \hat k$$.

As pointed out earlier, the classical formula is expressed in a frame-independent way, but it is easy for that distinction to be lost. In this case, mixing the frames results only in minor confusion that can be straightened out with some thought. For more complicated problems, this mistake can grind the computation to a complete halt.

Now let’s try the semi-classical approach. In this approach, the unit vectors of the rotating frame are expressed in the fixed frame as

\[ \hat I = \cos \omega t \hat \imath + \sin \omega t \hat \jmath \; ,\]

\[ \hat J = – \sin \omega t \hat \imath + \cos \omega t \hat \jmath \; ,\]

and

\[ \hat K = \hat k \; .\]

The corresponding attitude matrix is

\[ A(t) = \left[ \begin{array}{ccc} \cos \omega t & \sin \omega t & 0 \\ – \sin \omega t & \cos \omega t & 0 \\ 0 & 0 & 1 \end{array} \right] \]

and its time derivative is

\[ \dot A(t) = \left[ \begin{array}{ccc} – \omega \sin \omega t & \omega \cos \omega t & 0 \\ – \omega \cos \omega t & – \omega \sin \omega t & 0 \\ 0 & 0 & 0 \end{array} \right] \; .\]

Application of $$A(t)$$ to the position vector in the fixed frame gives

\[ A(t) \vec r = \left[ \begin{array}{c} a \\ 0 \\ b t \end{array} \right] = a \hat I + b t \hat K \; ,\]

which is the position of the bead seen by an observer co-moving in the rotating frame. As expected of our physical intuition, the bead is moving upward while maintaining a fixed offset of $$a$$ in the $$\hat I-\hat J$$ plane.

The velocity in the rotating frame obtains immediately by differentiating this term

\[ \frac{d}{dt} \left( A(t) \vec r \right) = \left[ \begin{array}{c} 0 \\ 0 \\ b \end{array} \right] = b \hat K \; ,\]

and that’s all that is needed to find the velocity.

That said, we can get some insight into the $$\vec \omega \times$$ term in the classical equation by taking various combinations of $$A(t)$$, $$\dot A(t)$$, $$\vec r$$, $$\vec v$$, and $$\vec \omega$$.

First let’s compare the application of $$A(t)$$ to the fixed-frame velocity $$\vec v$$,

\[ A \vec v = \left[ \begin{array}{c} 0 \\ a \omega \\ b \end{array} \right] = a \omega \hat J + b \hat K \; , \]

to the application of $$\dot A(t)$$ to the fixed-frame position $$\vec r$$,

\[ \dot A \vec r = \left[ \begin{array}{c} 0 \\ – a \omega \\ 0 \end{array} \right] = -a \omega \hat J \; .\]

We interpret the first term as the linear velocity of the bead moving in uniform circular motion in the fixed frame but expressed in terms of the rotating frame’s basis vectors. It is not what a co-moving observer would observe, but rather a convenient way for the fixed-frame observer to talk about the motion by describing it saying something like ‘the bead is moving upward with speed $$b$$ and is moving in uniform circular motion with tangential or linear speed $$a \omega$$’. The second term compensates for what is seen as uniform circular motion by the fixed-frame observer. The sum of these two, $$b \hat K$$, is what the co-moving observer sees in his frame.

Next let’s look at the application of $$A(t)$$ to $$-\vec \omega \times \vec r$$, which yields

\[ -A(t) \left( \vec \omega \times \vec r \right) = \left[ \begin{array}{c} 0 \\- a \omega \\ 0 \end{array} \right] = a \omega \hat J \; ,\]

which is just the frame correction term discussed above. So we can deduce the following ‘frame-correction’ relationship:

\[ -A(t) \left( \vec \omega \times \vec r \right) = \dot A(t) \vec r \]

or

\[ \vec \omega \times \vec r = – A^{-1}(t) \dot A(t) \vec r \; .\]

Another fruitful comparison starts by asking what happens if $$\vec \omega$$ and $$\vec r$$ are first converted to the rotating frame and then inserted into the cross-product. This computation gives

\[ \left( A(t) \vec \omega \right) \times \left( A(t) \vec r \right) = \left| \begin{array}{ccc} \hat I & \hat J & \hat K \\ 0 & 0 & \omega \\ a & 0 & b t \end{array} \right| = \left[ \begin{array}{c} 0 \\ a \omega \\ 0 \end{array} \right] = a \omega \hat J \]

from which we are lead to the interesting relationship

\[ A(t) \left( \vec \omega \times \vec r \right) = \left( A(t) \vec \omega \times A(t) \vec r \right) .\]

Next time I’ll explore why this relation holds and how it leads into more sophisticated ways of thinking about frame transformations.

One-Sided Greens Functions and Causality

This week we pick up where we left off in the last post and continue probing the structure of the one-sided Greens function $$K(t,\tau)$$. While the computations of the previous post can be found in most introductory textbooks, I would be remiss if I didn’t mention that both the previous post and this one were heavily influenced by two books: Martin Braun’s ‘Differential Equations and Their Applications’ and Larry C. Andrews’ ‘Elementary Partial Differential Equations with Boundary Value Problems’.

As a recap, we found that a second order inhomogeneous linear ordinary differential equation

\[ y”(t) + p(t) y'(t) + q(t) y(t) \equiv L[y] = f(t) \; , \]

($$y'(t) = \frac{d}{dt} y(t)$$) with boundary conditions

\[ y(t_0) = y_0 \; \; \& \; \; y'(t_0) = y_0′ \; \]

possesses the solution

\[ y(t) = A y_1(t) + B y_2(t) + y_p(t) \; ,\]

where $$y_i$$ are solutions to the homogeneous equation, $$\{A,B\}$$ are constants chosen to meet the initial conditions, and $$y_p$$ is the particular solution of the form

\[ y_p(t) = \int_{t_0}^{t} \, d\tau \, K(t,\tau) f(\tau) \; . \]

By historical convention, we call the kernel that propagates the influence of the inhomogeneous term in time (either forward or backward) a one-sided Greens function. The Wronskian provides the explicit formula

\[ K(t,\tau) = \frac{ \left| \begin{array}{cc} y_1(\tau) & y_2(\tau ) \\ y_1(t) & y_2(t) \end{array} \right| } { W[y_1,y_2](t)} = \frac{ \left| \begin{array}{cc} y_1(\tau) & y_2(\tau) \\ y_1(t) & y_2(t) \end{array} \right| } { \left| \begin{array}{cc} y_1(\tau) & y_2(\tau) \\ y_1′(\tau) & y_2′(\tau) \end{array} \right| } \; . \]

for the one-sided Greens function. Plugging $$t=t_0$$ into the particular solution, gives

\[ y_p(t_0) = \int_{t_0}^{t_0} \, d\tau \, K(t_0,\tau) f(\tau) = 0 \]

as the initial datum for $$y_p$$ and

\[ y_p'(t_0) = K(t_0,t_0) f(t_0) + \int_{t_0}^{t_0} \left. \frac{\partial}{\partial t} K(t,\tau) \right|_{t=t_0} f(\tau) = 0 \]

for the initial datum for $$y_p’$$, since the definite integral of any integrand with same lower and upper limits is identically zero and because

\[ K(t,t) = \frac{ \left| \begin{array}{cc} y_1(t) & y_2(t) \\ y_1(t) & y_2(t) \end{array} \right| } { W[y_1,y_2](t)} = 0 \; . \]

The initial conditions on the particular solution provide the justification that the constants $$\{A, B\}$$ can be chosen to meet the initial conditions or, in other words, the initial values are carried by the homogeneous solutions.

The results for the one-sided Greens function can be extended in four ways that make the practice of handling systems much more convenient.

Arbitrary Finite Dimensions

Arbitrary number of dimensions in the original differential equation are handled straightforwardly by the relation that

\[ K(t,\tau) = \frac{ \left| \begin{array}{cccc} y_1(\tau) & y_2(\tau) & \cdots & y_n(\tau) \\ y_1′(\tau) & y_2′(\tau) & \cdots & y_n'(\tau) \\ \vdots & \vdots & \ddots & \vdots \\ y_1^{(n-1)}(\tau) & y_2^{(n-1)}(\tau) & \cdots & y_n^{(n-1)} (\tau) \\ y_1(t) & y_2(t) & \cdots & y_n(t) \end{array} \right| } { W[y_1,y_2,\ldots,y_n](\tau)} \]

where the corresponding Wronskian is given by

\[ W[y_1,y_2,\cdots,y_n](\tau) = \left| \begin{array}{cccc} y_1(\tau) & y_2(\tau) & \cdots & y_n(\tau) \\ y_1′(\tau) & y_2′(\tau) & \cdots & y_n'(\tau) \\ \vdots & \vdots & \ddots & \vdots \\ y_1^{(n-1)}(\tau) & y_2^{(n-1)}(\tau) & \cdots & y_n^{(n-1)} (\tau) \\ y_1^{(n)}(\tau) & y_2^{(n)}(\tau) & \cdots & y_n^{(n)} (\tau) \end{array} \right| \]

and

\[ y^{(n)} \equiv \frac{d^n y}{d t^n} \; .\]

The generation of one-side Greens functions is then a fairly mechanical process once the homogeneous solutions are known and since we are guaranteed that the solutions for initial value problems exist and are unique, the corresponding one-sided Greens functions also exist and are unique. The following is a tabulated set of $$K(t,\tau)$$s adapted from Andrew’s book.

One-side Greens Functions – adapted from Elementary Partial Diff. Eqs. by L. C. Andrews
Operator \[ K(t,\tau) \]
\[ D + b\] \[ e^{-b(t-\tau)} \]
\[ D^n, \; n = 2, 3, 4, \ldots \] \[ \frac{(t-\tau)^{n-1}}{(n-1)!} \]
\[ D^2 + b^2 \] \[ \frac{1}{b} \sin b(t-\tau) \]
\[ D^2 – b^2 \] \[ \frac{1}{b} \sinh b(t-\tau) \]
\[ (D-a)(D-b), \; a \neq b \] \[ \frac{1}{a-b} \left[ e^{a(t-\tau)} + e^{b(t-\tau)} \right] \]
\[ (D-a)^n, \; n = 2, 3, 4, \ldots \] \[ \frac{(t-\tau)^{n-1}}{(n-1)!} e^{a(t-\tau)} \]
\[ D^2 -2 a D + a^2 + b^2 \] \[ \frac{1}{b} e^{a(t-\tau)} \sin b (t-\tau) \]
\[ D^2 -2 a D + a^2 – b^2 \] \[ \frac{1}{b} e^{a(t-\tau)} \sinh b (t-\tau) \]
\[ t^2 D^2 + t D – b^2 \] \[ \frac{\tau}{2 b}\left[ \left( \frac{t}{\tau} \right)^b – \left( \frac{\tau}{t} \right)^b \right] \]

Imposing Causality

The second extension is a little more subtle. Allow the inhomogenous term $$f(t)$$ to be a delta-function so that the differential equation becomes

\[ L[y] = \delta(t-a), \; \; y(t_0) = 0, \; \; y'(t_0) = 0 \; .\]

The particular solution

\[ y = \int_{t_0}^t \, d \tau \, K(t,\tau) \delta(\tau – a) = \left\{ \begin{array}{lc} 0, & t_0 \leq t < a \\ K(t,a), & t \geq a \end{array} \right. \] now represents how the system responds to the unit impulse delivered at time $$t=a$$ by the delta-function. The discontinuous response results from the fact that the system at $$t=a$$ receives a sharp blow that changes its evolution from the unforced evolution it was following before the impulse to a new unforced evolution with new initial conditions at $$t=a$$ that reflect the influence of the impulse. By applying a little manipulation to the right-hand side, and allowing $$t_0$$ to recede to infinity, the above result transforms into \[ K^+(t,\tau) = \left\{ \begin{array}{lc} 0, & t_0 \leq t < \tau \\ K(t,\tau), & \tau \leq t < \infty \end{array} \right. = \theta(t-\tau) K(t,\tau) \; ,\] which is a familiar result from Quantum Evolution – Part 3. In this derivation, we get an alternative and more mathematically rigorous was of understanding why Heaviside theta function (or step function, if you prefer) enforces causality. The undecorated one-sided Greens function $$K(t,\tau)$$ is a mathematical object capable of evolving the system forward or backward in time with equal facility. The one-sided retarded Greens function $$K^+(t,\tau)$$ is physically meaningful because it will not evolve the influence of an applied force to a time earlier than the force was applied.

Recasting in State Space Notation

An alternative and frequently more insightful approach to solving ordinary differential equations comes in recasting the structure into state space language, in which the differential equation(s) reduce to a set of coupled first order equations of the form

\[ \frac{d}{dt} \bar S = \bar f(\bar S; t) \]

Quantum Evolution – Part 2 presents this approach applied to the simple harmonic oscillator. The propagator (or state transition matrix or fundamental matrix) of the system contains the one-sided Greens function as the upper-right portion of its structure. It is easiest to see that result by working with a second order system with linearly-independent solutions $$y_1$$ and $$y_2$$ and initial conditions $$y(t_0) = y_0$$ and $$y'(t_0) = y’_0$$. In analogy with the previous post, the initial conditions can be solved at time $$t_0$$ to yield the expression

\[ \left[ \begin{array}{c} C_1 \\ C_2 \end{array} \right] = \frac{1}{W(t_0)} \left[ \begin{array}{cc} y_2′ & -y_2 \\ -y_1′ & y_1 \end{array} \right]_{t_0} \left[ \begin{array}{c} y_0 \\ y_0′ \end{array} \right] \equiv M_{t_0} \left[ \begin{array}{c} y_0 \\ y_0′ \end{array} \right] \; , \]

where the subscript notation $$[]_{t_0}$$ means that all of the expressions in the matrix are evaluated at time $$t_0$$. Now the arbitrary solution $$y(t)$$ to the homogeneous equation is a linear combination of the independent solutions weighted by the constants just determined

\[ \left[ \begin{array}{c} y(t) \\ y'(t) \end{array} \right] = \left[ \begin{array}{cc} y_1 & y_2 \\ y_1′ & y_2′ \end{array} \right]_{t} \left[ \begin{array}{c} C_1 \\ C_2 \end{array} \right] \equiv \Omega_{t} \left[ \begin{array}{c} C_1 \\ C_2 \end{array} \right] \equiv \Omega_{t} M_{t_0} \left[ \begin{array}{c} y_0 & y_0′ \end{array} \right] \; .\]

The propagator, which is formally defined as

\[ U(t,t_0) = \frac{\partial \bar S(t)}{\partial \bar S(t_0) } \; ,\]

is easily read off to be

\[ U(t,t_0) = \Omega_{t} M_{t_0} \; , \]

which, when back-substituting the forms of $$\Omega_t$$ and $$M_{t_0}$$, gives

\[ U(t,t_0) = \frac{1}{W(t_0)} \left[ \begin{array}{cc} y_1 & y_2 \\ y_1′ & y_2′ \end{array} \right]_{t} \left[ \begin{array}{cc} y_2′ & -y_2 \\ -y_1′ & y_1 \end{array} \right]_{t_0} \; .\]

In state space notation, the inhomogeneous term takes the form $$\left[ \begin{array}{c} 0 \\ f(t) \end{array} \right]$$ and so the relative component of the matrix multiplication is the upper right element, which is
\[ \left\{ U(t,t_0) \right\}_{1,2} = \frac{y_1(t_0) y_2(t) – y_1(t) y_2(t_0)}{W(t_0)} \; , \]

which we recognize as the one-sided Greens function. Multiplication of the whole propagator by the Heaviside function yields enforces causality and gives the retarded, one-sided Greens function in the $$(1,2)$$ component.

Using the Fourier Transform

While all of the machinery discussed above is straightforward to apply, it does involve a lot of steps (e.g., finding the independent solutions, forming the Wronskian, forming the one-sided Greens function, applying causality, etc.). There is often a faster way to perform all of these steps using the Fourier transform. This will be illustrated for a simple one-dimensional problem (adapted from ‘Mathematical Tools for Physics’ by James Nearing) of a mass moving in a viscous fluid subjected to a time-varying force

\[ \frac{dv}{dt} + \beta v = f(t) \; ,\]

where $$\beta$$ is a constant characterizing the fluid and $$f(t)$$ is the force per unit mass.

We assume that the velocity has a Fourier transform

\[ v(t) = \int_{-\infty}^{\infty} d \omega \, V(\omega) e^{-i\omega t} \; \]

with the corresponding transform pair

\[ V(\omega) = \frac{1}{2 \pi} \int_{-\infty}^{\infty} dt \, v(t) e^{+i\omega t} \; .\]

Likewise, the force possesses a Fourier transform

\[ f(t) = \int_{-\infty}^{\infty} d \omega \, F(\omega) e^{-i\omega t} \; .\]

Plugging the transforms into the differential equation yields the algebraic equation

\[ -i \omega V(\omega) + \beta V(\omega) = F(\omega) \; ,\]

which is easily solved for $$V(\omega)$$ and which, when substituted back in, gives the expression for particular solution

\[ v_p(t) = i \int_{-\infty}^{\infty} d \omega \frac{F(\omega)}{\omega + i \beta} e^{-i\omega t} \; .\]

Eliminating $$F(\omega)$$ by using its transform pair, we find that

\[ v_p(t) = \frac{i}{2 \pi} \int_{-\infty}^{\infty} d\tau K(t,\tau) f(\tau) \]

with the kernel

\[ K(t,\tau) = \int_{-\infty}^{\infty} d \omega \frac{e^{-i \omega (t-\tau)}}{\omega + i \beta} \; .\]

This is exactly the form of a one-sided Greens function. Even more pleasing is the fact that when complex contour integration is used to solve the integral, we discover that causality is already built-in and that what we have obtained is actually a retarded one-side Green’s function

\[ K^+(t,\tau) = \left\{ \begin{array}{lc} 0 & t < \tau \\ -2 \pi i e^{-i \beta(t-\tau)} & t > \tau \end{array} \right. \]

Causality results since the pole of the denominator is in the lower half of the complex plane. The usual semi-circular contour used in Jordan’s lemma must be in the upper half-plane when $$t < \tau$$, in which case no poles are contained and no residue exists. When $$t > \tau$$, the semi-circle, which must be in the lower-half plane, and surrounds the pole at $$\omega = – i \beta$$, giving a non-zero residue.

The final form of the particular solution is

\[ v_p(t) = \int_{-\infty}^t d \tau e^{-\beta (t-\tau)} f(\tau) \]

which is the same result we would have received from using the one-sided Greens function for the operator $$D + \beta$$ shown in the table above.