Varying Variations

Mathematicians often complain about the physicist’s use of mathematics. They have even claimed that the math that physicists use is comparable to a pidgin in a spoken language.

That may be true. Certainly physicists tend to play with mathematics in a way that makes mathematicians uncomfortable, since physicists are interested in modeling the physical world and not expanding the mathematical frontiers. Nonetheless, I think that the criticism is mostly misplaced and widely exaggerated. However, there are pockets of ‘slang’ that pop up from time to time that are particularly troubling, and not just to mathematicians. This week, I would like to critique some notation that is commonly embraced in quantum field theory (QFT) circles for performing variational calculus.

My aim here is to normalize relations between a physicist’s desire for brevity and minimalism and the need for clarity for the beginner. To do that, I am going to first present the traditional approach to the calculus of variations, and then compare and contrast that to the approach currently in vouge in QFT circles. My presentation for the traditional approach follows roughly what can be found in Classical Dynamics of Particles and System by Marion or Classical Mechanics by Goldstein, although I have remedied some notational shortcomings in preparation for the comparison to the QFT approach. The modern QFT notation is taken from The Six Core Theories of Modern Physics by Stevens and Quantum Field Theory for the Gifted Amateur by Lancaster and Blundell.

The basic notion of variational calculus is the idea of a functional – a mapping between a set of functions and the real numbers. The definite integral

\[ I = \int_{-1}^{1} dx \, f(x) \]

is the prototype example, since it takes in a given function $$f(x)$$ and spits out a real number. Generally, more sophisiticated examples take a function $$f(x)$$, feed it into expression $$L(f(x),f'(x);x)$$ as the integrand (where $$f'(x) = df/dx$$), and then carry out the integral

\[ J[f(x)] = \int_{x_1}^{x_2} dx \, L( f(x) ; x) \; .\]

A few words are in order about my notation. The square brackets are where the function $$f(x)$$ is plugged into the functional, and the variable appearing as the argument for the function (here $$x$$) is the dummy variable of integral. Generally, we don’t have to specify the dummy variable in an integral but, as will become clear below, when we are computing functional derivatives, specification of the dummy varible serves as a compass for navigating the notation. The expression $$L$$ that maps the function $$f$$ into the integrand is called the Lagrangian. Any additional parameters upon which the integral may depend will be either understood or, when needed explicitly, will be called out by appending conventional function notation on the end. So $$J[f(x)](\alpha)$$ means that the dependence of $$J[f(x)]$$ on the parameter $$\alpha$$ is being explicitly expressed. Note that $$L$$ can produce an integrand that involves some nonlinear function of $$f$$ (e.g., $$f^2$$, $$\cos(f)$$, etc.) and one or more of it’s derivatives (e.g., $$(f’)^2$$, $$\sqrt{1 + f’^2}$$, $$f”$$, etc.). It is understood that the limits are known and fixed ahead of time so that these values will be suppressed.

As an example in using this notation, let’s take the Lagrangian to be

\[ L(f(x) ; x) = x^2 f(x) \]

and the functional as

\[ J[f(x)] = \int_{-1}^{1} dx \, L(f(x);x) = \int_{-1}^{1} dx \, x^2 f(x) \; \]

Then $$J[x^2] = 2/5$$, $$J[cos(y)] = 4 \cos(1) – 2 \sin(1)$$, and $$J[\alpha/\sqrt{1+x^2}] = \alpha \left( \sqrt(2) – \sinh^{-1}(1) \right)$$. The Maxima code that performs this functional is

J(arg,x) := block([expr],
                  expr : arg*x^2,
                  integrate(expr,x,-1,1));

Note that the dummy variable of integration is separately specified and no error checking is done to ensure that arg is expressed in the same variable as x.

The natural question to ask is how the value of the functional changes as the function plugged in is varied, and which function gives the minimum or maximum value. This is a much more complicated problem than the traditional derivative as it isn’t immediately clear what it means for one function to be close to another. The classical approach to this was to define a function family

\[ f_{\alpha}(x) = f(x) + \alpha \eta(x) \]

so that the function that yields an extremum obtains when $$\alpha = 0$$. Also $$\eta$$ is continuous and non-singular and $$\eta(x_1) = \eta(x_2) = 0 $$ so that $$f_{\alpha}$$ behaves like $$f$$ at the end points and differs only in between. Plugging $$f_{\alpha}$$ into the functional gives

\[ J[f_{\alpha}(x)] = \int_{x_1}^{x_2} dx L(f_{\alpha}(x);x) \; .\]

As rendered, $$J$$ is now a function of $$\alpha$$ and the extremum of the functional results from

\[ \left( \frac{ d J[f_{\alpha}(x)]}{d \alpha} \right) _{\alpha = 0} = 0 \; . \]

The derivative with respect to $$\alpha$$ is relatively easy to compute using the chain rule to yield

\[ \frac{d J[f_{\alpha}(x)]}{d\alpha} = \int_{x_1}^{x_2} dx \left\{ \frac{\partial L}{\partial f_\alpha} \frac{\partial f_\alpha}{\partial \alpha} + \frac{\partial L}{\partial {f_\alpha}’} \frac{\partial {f_\alpha}’}{\partial \alpha} \right\} \; .\]

The classical Euler-Lagrange equation result following an integration-by-parts on the second term, the subsequent setting of the boundary term to zero, and finally setting $$\alpha$$ to zero as well to give

\[ \left. \frac{d J[f_{\alpha}(x)]}{d \alpha} \right|_{\alpha = 0} = \int_{x_1}^{x_2} dx \left\{ \frac{\partial L}{\partial f} – \frac{d}{dx} \left( \frac{\partial L}{\partial f’} \right) \right\} \left. \frac{\partial f}{\partial \alpha} \right|_{\alpha = 0} \; .\]

Of course, in order to get the equation free of the integral, one has to argue that the function $$\eta(x)$$ is arbitrary and so the only way that the integral can be zero is if the part of the integrand multiplying $$\eta(x)$$ is a zero.

Over time, there was a subsequent evolution of the notation to make variational derivatives look more like traditional derivatives. This ‘delta-notation’ defines

\[ \left. \frac{d J[f_{\alpha}(x)]}{dt} \right|_{\alpha = 0} d \alpha \equiv \delta J \]

and

\[ \left. \frac{d f_{\alpha}}{d \alpha} \right|_{\alpha = 0} d \alpha \equiv \delta f \]

and then expresses the variation as

\[ \delta J = \int_{x_1}^{x_2} dx \left\{ \frac{\partial L}{\partial f} – \frac{d}{dx} \left( \frac{\partial L}{\partial f’} \right) \right\} \delta f = \int_{x_1}^{x_2} dx \, \frac{\delta J(x)}{\delta f(x)} \delta f(x) \; ,\]

which is analogous to the total derivative of a function $$J(\{q^i\})$$

\[ d J = \sum_{i} \frac{\partial J}{\partial q^i} dq^i \; .\]

The idea here is a typical one employed in theoretical physics where $$\sum_i \rightarrow \int dx$$, $$dq^i \rightarrow \delta f(x)$$, and $$i$$ and $$x$$ are indices, the former being discrete and the latter being continuous.

Over time, the physics community (mainly the QFT community) abstracted the delta notation even further by assuming that $$\eta(x) = \delta(x-x’)$$ where $$x’$$ is some fixed value, which I call the source value since it is imagined as the point source where the perturbation originates. The problem is that their notation for the functional derivative is given by (see e.g. Stevens page 34)

\[ \frac{\delta F[f(x)]}{\delta f(x’)} \; . \]

The natural question for the beginner who has seen the classical approach or is consulting older books is why the two indices $$x$$ and $$x’$$ when the delta-notation expression above has only $$x$$? There is no clear answer to be found in any of the texts I’ve surveyed so I’ll offer one on their behalf. By slightly modifying the definition of the functional derivative from the classical $$\alpha$$-family result to

\[ \frac{\delta J[f(x)]}{\delta f(x’)} = \lim_{\alpha \rightarrow 0} \left\{ \frac{ J[f(x) + \alpha \delta(x-x’)] – J[f(x)] }{\alpha} \right\} \; .\]

we can get an economy of notation later on. The price we pay is that we now need to track two indices. The top one now gives us the dummy variable integration and the bottom the source point. Of course, the dummy variable eventually becomes the source variable when the delta function breaks the integral but the definition requires its presence. (Note that Lancaster and Blundell, in particular, have a haphazard notation that sometimes drops the dummy variable and/or reverses the dummy and source points – none of these are particularly wrong but are confusing and sloppy.)

Now for the promised gain. To see the benefit of the new notation, consider the identity functional

\[ I[f(x)](y) = \int dx \, f(x) \delta(x-y) = f(y) \; .\]

It’s functional derivative is

\[ \frac{\delta I[f(x)](y)}{\delta f(z)} = \lim_{\alpha \rightarrow 0} \frac{ I[f(x) + \alpha \delta(x-z)](y) – I[f(x)](y) }{\alpha} \; ,\]

which simplifies to

\[ \frac{\delta I[f(x)](y)}{\delta f(z)} = \int dx \, \delta(x-y) \delta(x-z) = \delta(y-z) \; . \]

Substituting $$f(y) = I[f(x)](y)$$ we get the very nice relation

\[ \frac{\delta f(y)}{\delta f(z)} = \delta(y-z) \; . \]

This relation leads to a particularly clean derivation of the functional derivative of the kernel

\[ J[f(x)](y) = \int dx \, K(y,x) f(x) \; \]

as

\[ \frac{\delta J[f(x)](y)}{\delta f(z)} = \frac{\delta}{\delta f(z)} \int dx \, K(y,x) f(x) = \int dx \, K(y,x) \delta(x-z) = K(y,z) \; . \]

From this relation we can also get the classical Euler-Lagrange equations in short order as follows. First compute the variational derivative

\[ \frac{\delta J[f(x)]}{\delta f(y)} = \frac{\delta}{\delta f(y)} \int dx \, L(f(x);x) = \int dx \, \left\{ \frac{\partial L}{\partial f(x)} \frac{\delta f(x)}{\delta f(y)} + \frac{\partial L}{\partial f'(x)} \frac{\delta f'(x)}{\delta f(y)} \right\} \; .\]

and then integrate-by-parts (assuming the boundary term is zero) to get

\[ \frac{\delta J[f(x)]}{\delta f(y)} = \int dx \, \left\{ \frac{\partial L}{\partial f(x)} – \frac{d}{dx} \left( \frac{\partial L}{\partial f'(x)} \right) \right\} \frac{\delta f(x)}{\delta f(y)} \; .\]

The delta function $$\delta f(x)/\delta f(y)$$ breaks the integral and gives the Euler-Lagrange equation.

I close this note out but emphasizing that there is no new content in the QFT abstraction. At its core, it is exactly the same computation as the classical case. It doesn’t even provide a new outlook like Lagrangian of Hamiltonian mechanics do for Newtonian mechanics. It is simply a better way of bookkeeping.