Latest Posts

A Curvilinear Mantra – Part 2

The last post introduced the curvilinear mantra for students working with field equations in such disciplines as fluid mechanics, general relativity, and electricity and magnetism.  The textbook example (see, e.g. Acheson Appendix A, pp 352-3) is Euler’s equations for ideal fluids in two spatial dimensions. 

In cartesian coordinates these equations read

\[ \rho \left( V_x \partial_x + V_y \partial_y + \partial_t \right) V_x = – \partial_x p + f_x \;  \]

and

\[ \rho \left( V_x \partial_x + V_y \partial_y + \partial_t \right) V_y = -\partial_y p + f_y \; ,\]

whereas, in polar coordinates these equations read

\[ \rho \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta  + \partial_t \right) V_r – \rho \frac{{V_\theta}^2}{r} = -\partial_r p + f_r \; \]

and

\[ \rho \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta  + \partial_t \right) V_\theta + \rho \frac{V_r V_\theta}{r} = -\frac{1}{r} \partial_\theta p + f_\theta \; . \]

As discussed in the previous post, beginning students are often confused by two changes when transitioning from cartesian to polar coordinates.  The first is the appearance of $1/r$ scale factors that decorate various terms such as $V_\theta/r \partial_\theta$.  The second is the appearance of additional additive terms, such as $V_r V_\theta/r$. 

The curvilinear mantra explains these changes as follows: the scale factors come from minding the units and the additive terms show up to account for how the basis unit vectors change from place to place.

The first half of the mantra was covered in the previous post.  This post finishes the exploration by demonstrating how the additive terms arise due to the spatial variations of the basis vectors. 

The first step involves writing the position vector in terms of the polar coordinates and the cartesian unit basis vectors

\[ {\vec r} = r \cos \theta {\hat x} + r \sin \theta {\hat y} \; .\]

The polar unit basis vectors are defined by taking the derivatives of the position vector with respect to the polar coordinates and then unitizing.  The radial basis vector (not unitized) is

\[ {\vec e}_r \equiv \frac{\partial {\vec r}}{\partial r} = \cos \theta {\hat x} + \sin \theta {\hat y} \; .\]

Conveniently, this vector has a unit length and we can immediately write the radial unit basis vector as

\[ {\hat r} = \cos \theta {\hat x} + \sin \theta {\hat y} \; . \]

Following the same procedure, the polar angle basis vector (not unitized) is

\[ {\vec e}_\theta \equiv \frac{\partial {\vec r}}{\partial \theta} = -r \sin \theta {\hat x} + r \cos \theta {\hat y} \; . \]

This vector has length $r$ and so the polar angle unit base vector is

\[ {\hat \theta} = -\sin \theta {\hat x} + \cos \theta {\hat y}  \; .\]

Both vectors are independent of $r$ but do depend on $\theta$ and their variations are

\[ \partial_\theta {\hat r} = {\hat \theta} \; \]

and

\[ \partial_\theta {\hat \theta} = – {\hat r} \; . \]

At this point we have all the ingredients we need.  From the first part of the curvilinear mantra we have the velocity in polar coordinates is

\[ {\vec V} = V_r {\hat r} + \frac{V_\theta}{r} {\hat \theta} \;  \]

and the material (or total) time derivative is

\[ \frac{D}{Dt} = V_r \partial_r + \frac{V_\theta}{r} \partial_\theta + \partial_t \; , \]

where the scale factors on the polar angle terms are due to minding units.

Applying the material time derivative to the velocity gives

\[ \frac{D {\vec V}}{Dt} = \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta + \partial_t \right) \left( V_r {\hat r} + \frac{V_\theta}{r} {\hat \theta} \right) \; . \]

Expanding this expression term-by-term yields

\[ V_r \partial_r \left( V_r {\hat r} \right) + \frac{V_\theta}{r} \partial_\theta \left( V_\theta {\hat \theta} \right) + \frac{V_\theta}{r} \partial_\theta \left( V_r {\hat r} \right) + \left(\partial_t V_r \right) {\hat r} + \left( \partial_t V_\theta \right) {\hat \theta} \; . \]

Expanding the derivatives, taking care to evaluate the spatial derivatives of the unit basis vectors, yields

\[ V_r \partial_r V_r {\hat r} + \frac{V_\theta}{r} \left( \partial_\theta V_\theta \right) {\hat \theta} – \frac{{V_\theta}^2}{r} {\hat r} + \left( \frac{V_\theta}{r} \partial_\theta V_r \right) {\hat r} + \\ \frac{V_\theta V_r}{r} {\hat \theta}  + \left(\partial_t V_r \right) {\hat r} + \left( \partial_t V_\theta \right) {\hat \theta} \; . \]

Collecting terms gives the radial term as

\[ V_r \partial_r V_r + \frac{V_\theta}{r} \partial_\theta V_r – \frac{{V_\theta}^2}{r} + \partial_t V_r \; \]

and the polar angle term as

\[  \frac{V_\theta}{r} \partial_\theta V_\theta  + \frac{V_\theta}{r} \partial_\theta V_\theta + \frac{V_\theta V_r}{r} + \partial_t V_\theta \; .\]

Factoring the terms yields

\[ \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta + \partial_t \right) V_r – \frac{{V_\theta}^2}{r} \; \]

and

\[ \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta + \partial_t \right) V_\theta + \frac{V_\theta V_r}{r} \; .\]

Happily, these expressions match term-for-term the textbook (up to multiplication by $\rho$).  This shows the accuracy and power of the curvilinear mantra.  Hopefully it will catch on in classrooms.

A Curvilinear Mantra – Part 1

These next two posts are a bit of a departure from the thermal physics theme that had been the central point for the last many months.  They grew out of some discussions on classical field theory that arose in several venues with different people and it seemed important to capture what is a clean (and perhaps new) argument for the beginning student on the best way to transform differential equations into curvilinear coordinates.

The starting point is the recasting of the Euler equation for an ideal fluid (typically a gas)

\[ \rho \frac{D {\vec V}}{Dt} = -{\vec \nabla p} + {\vec f} \; , \]

where $\rho$ and $p$ are the mass density and pressure of the fluid, ${\vec V}$ is its velocity, $\frac{D}{Dt}$ is the material derivative, and ${\vec f}$ is the body force per unit mass.

Typically, within basic discussions of fluid mechanics, Euler’s equation is expressed in Cartesian coordinates (assumed here, without loss of generality to the method, to cover a two dimensional space) where the velocity is given by

\[ {\vec V} = V_x {\hat x} + V_y {\hat y} \; ,\]

the material derivative takes on the simple form

\[ \frac{D}{Dt} = V_x \partial_x + V_y \partial_y + \partial_t \; ,\]

and Euler’s equation, in component form is

\[ \rho \frac{D}{Dt} V_x = -\partial_x p + f_x \; ,\]

and

\[ \rho \frac{D}{Dt} V_y  = -\partial_y p + f_y \; .\]

This relatively, simple form allows the student to focus on the Lagrangian nature of following a fluid flow but it typical hides a subtle complication when using curvilinear (or even rotating coordinates).  For example, the corresponding version of Euler’s equations in cylindrical coordinates (see also Acheson’s Appendix A.6) uses

\[ \frac{D}{Dt} = \partial_t + V_r \partial_r + \frac{V_{\theta}}{r} \partial_{\theta} \; \]

for the material derivative with the component equations being

\[ \rho \frac{D}{Dt} V_r – \frac{{V_{\theta}}^2}{r^2} = – \partial_r p + f_r \; \]

and

\[ \rho \frac{D}{Dt} V_{\theta} + \frac{V_r V_{\theta}}{r} = -\frac{1}{r} \partial_{\theta} p + f_{\theta} \; . \]

Suddenly there are new multiplicative terms (e.g. the $1/r$ multiplying the derivative with respect to the polar angle $\theta$) as well as additive terms on the left-hand side of the component equations (e.g. $-\frac{{V_{\theta}}^2}{r^2}$) that weren’t there in the Cartesian version.  The student is left to wonder about just why they are there.

Many books and lecture notes on the internet try to justify one or the other (but rarely both) with varying degrees of success.  The aim of this note is to suggest a simple mantra:  the multiplicative terms are strictly the result of minding units and the additive terms are strictly the result of the curvilinear basis vectors changing from point to point. 

The strategy behind the mantra is that even if the students don’t fully connect all the dots the first few times, they will have an explanation that is rock solid and easy to remember to guide them in exploring on their own. 

Let’s examine each of these claims in turn. 

The first claim of the mantra is that the multiplication of the $\partial_{\theta}$ term by $1/r$ is the result of minding units.  Of the two claims of the mantra, this one is the more conceptually difficult even though it is the easier of the two claims to understand mathematically.  The conceptual hurdle is rooted in the arguments used to define it in terms of the partial derivatives of a scalar field, $f(x,y,t)$, expressed in Cartesian coordinates

\[ df = \partial_x f dx + \partial_y f dy + \partial_t f dt \; .\]

Dividing by $dt$ immediately gives the Cartesian form the material derivative

\[ \frac{Df}{Dt} = V_x \partial_x f + V_y \partial_y f + \partial_t f \; .\]

The student then asks why doesn’t a similar relationship hold for curvilinear coordinates.  For example, why isn’t the material derivative in cylindrical coordinates not based on the differential of $g(r,\theta,t)$

\[ dg = \partial_r g dr + \partial_\theta g d\theta + \partial_t g dt \; ?\]

This is a point most often most clearly discussed within the realm of continuum mechanics or general relativity.  Schutz, in his book A First Course in General Relativity, notes, in Section 5.5, that defining the gradient of $g$ essentially in terms of the differential given above is perfectly acceptable but that the price paid for using it is that the basis vectors that are not normalized, which he summarizes with the equation

\[ {\vec e}_{\alpha} \cdot {\vec e}_{\beta} = g_{\alpha \beta} \neq \delta_{\alpha \beta} \; .\]

While this is certainly true and quite clearly argued, the beginning student consulting Schutz (or some similar text) as a reference has to know either the definition of the metric or the difference between vectors and differential forms and the natural duality between them.  In the first case, they need to know that the metric encodes all of the possible dot products between the basis vectors.  In the second, they are confronted with notation that expresses the duality between basis forms and vectors in the coordinate version as

\[ \left<d\theta, \partial_{\theta} \right> = 1 \; \]

and in the non-coordinate version as

\[ \left< {\tilde \omega}^{\hat \theta}, {\vec e}_{\hat \theta} \right> = 1 \; .\]

These mathematical distinctions are quite beyond the beginning student who, by definition, is struggling with a host of other things.

A cleaner way of justifying the first point of the mantra is to perform a unit analysis on the differential $dg$.  It doesn’t matter what units $g$ possesses but for the sake of this argument let’s assume $g$ has units of temperature.  The idea of a temperature field is familiar and the units are well known.  We will denote the units of a physical quantity by square brackets so that in this case $[g] = T$.

The differential must also have units of temperature which means that the partial derivatives have mixed units.  The partial derivative with respect to the radius $r$ has units of temperature per length

\[ \left[ \partial_r g \right] = T/L \; \]

while the partial derivative with respect to the azimuth $\theta$ has units of temperature

\[ \left[ \partial_{\theta} g \right] = T \; .\]

Dividing by $dt$ gives a material derivative of the form

\[ \frac{Dg}{Dt} = V_r \partial_r g + U_{\theta} \partial_{\theta} g + \partial_t g \; .\]

The units on the radial velocity $V_r \equiv dr/dt$ are length per unit time as we expect of a conventional derivative but the units on the azimuthal velocity $U_{\theta} \equiv d\theta/dt$ are radians per unit time, which are quite different (hence the use of the letter $U$ in place of $V$).  The next step is to challenge the student to think about how any lab would measure this angular velocity and to then argue that a much better way to link to experiments is to multiply $U_{\theta}$ by the radius $r$. 

Once this step is done, the remaining piece involves rewriting the differential as

\[ dg = dr \partial_r g + (r d\theta) (\frac{1}{r} \partial_{\theta} g) + dt \partial_t g \; , \]

where we’ve multiplied the second term by unity in the form of $r/r$.  Dividing by $dt$ immediately gives

\[ \frac{Dg}{Dt} = V_r \partial_r g + V_{\theta} \frac{1}{r} \partial_{\theta} g + \partial_t g \; , \]

which is the accepted form of the material derivative.

Next post will cover the second part of the mantra by showing that the additive terms result from how the basis vectors in curvilinear components change from point-to-point in space.

A Binomial Gas

The last installment discussed Robert Swendsen’s critique of the common, and in his analysis, erroneous method of understanding the entropy of a classical gas of distinguishable particles.  As discussed in that post, his aim in making this analysis is to persuade the physics community to re-examine its understanding of entropy and to rediscover Boltzmann’s fundamental definition based on probability and not on phase space volume.  To quote some of Swendsen closing words:

Although the identification of the entropy with the logarithm of a volume in phase space did originate with Boltzmann, it was only a special case. Boltzmann’s fundamental definition of the entropy in his 1877 paper has none of the shortcomings resulting from applying an equation for a special case beyond its range of validity.

On the question of how this special case blossomed into textbook dogma we will have to content ourselves with speculations.  It seems likely that the passion by which quantum mechanics gripped the physics community made it attractive to view the entire world through the lens of indistinguishable particles.  Furthermore, quantum mechanics also elevated the concept of phase space since various dimensions could be viewed as canonically conjugate variables subject to the uncertainty principle.  So, it is plausible that the physics community, dazzled by this new theory of the subatomic, latched onto the special case and ignored Boltzmann’s fundamental definition.  If true, this would be incredibly ironic since the key focus of Boltzmann was on probability which is arguably the most shocking and intriguing aspect of quantum mechanics.

Regardless of these finer points of physics history, since the concept of probability is key in deriving the correct formula for a classical distinguishable gas let’s focus on the toy example Swendsen provides in order to illustrate his point.  As in the last post, we will assume that the average energy per particle $

If we imagine a system with $N$ total distinguishable particles distributed between a volume $V$ partitioned into sub-volumes $V_1$ and $V_2$ then the probability $P(N_1,N_2)$ of having $N_1$ particles in $V_1$ and $N_2 = N – N_1$ in $V_2 = V – V_1$ is given by the binomial distribution

\[ P(N_1,N_2) = \left( \begin{array}{c} N \\ N_1 \end{array} \right) p^{N_1} (1-p)^{N_2} \; ,\]

where  $p$ is the probability of being found in $V_1$ (i.e. a ‘success’).  Since there are no constraints forcing particles to accumulate in any one section compared to the others they will distribute randomly within the entire domain.  Therefore, $p = V_1/V$ and the probability is given by

\[ P(N_1,N_2) = \left( \frac{N!}{N_1! N_2!} \right) \left( \frac{V_1}{V} \right)^{N_1} \left( \frac{V_2}{V} \right)^{N_2} \; .\]

This expression is Swendsen’s launching point for deriving the correct expression for a classical gas of distinguishable particles.  But before continuing with the analysis it is worth taking a few moments to better understand the physical content of that expression (even for those you understand the binomial distribution well). 

There is a very compact way to make a Monte Carlo simulation of this thought experiment using the python ecosystem.  One starts by defining a random realization of the classical gas particles placed within the volume and then reporting out the macroscopic thermodynamic state. 

def particles_in_a_box(V1,N,V):
    import numpy as np

    #get random positions of the particles
    pos = np.random.random(N)

    #count number in subvolume V1
    threshold = V1/V
    return len(np.where(pos<threshold)[0])

In this context, the macroscopic thermodynamic state is a measure of how many particles are found in the sub-volume $V_1$.  This is a critical point, particularly in light of the quantum interpretation that so many have embraced: two thermodynamic states can be identical without the underlying microstates being the same.  For example, if $N=3$ and $N_1=2$ then each of the following lists results in the same thermodynamic state:

  • [True,True,False]
  • [True,False,True]
  • [False,True,True]

where True and False result from the call to the numpy.where function and indicate whether the particle is found within $V_1$ (True) or not (False).

To get the probabilities, one makes an ensemble of such systems and this is what the following function does

def generate_MC_estimate(V1,N,V,num_trials):
    import numpy as np
    results = np.zeros(num_trials)
    for i in range(num_trials):
        results[i] = particles_in_a_box(V1,N,V)
    return results

The following plot shows how well the empirical results for an ensemble with 100,000 realizations agree with the formula derived above for a simulation of 2000 particles placed in a box where $V_1 = 0.3 V$.

Following Boltzmann, the entropy is

\[ S = k \ln P + C = k \ln \left[ \left( \frac{N!}{V^{N_1}V^{N_2}}\right) \left( \frac{{V_1}^{N_1}}{N_1 !} \right) \left( \frac{{V_2}^{N_2}}{N_2 !} \right) + C \right] \; ,\]

where the previous expression has been grouped into parts dealing with the entire subsystem $(N,V)$, the first sub-volume $(N_1,V_1)$, and the second subsystem $(N_2,V_2)$.  The constant $C$ depends only the whole system $N$ and $V$ but not on the subdivisions and, for reasons that should become obvious, we will take it to be

\[ C = – k \ln \left( \frac{{V}^{N}}{N !} \right) \; . \]

We first expand the entropy expression along this grouping to get

\[ S = k \ln \left( \frac{N!}{{V}^{N}} \right) + k \ln \left( \frac{{V_1}^{N_1}}{N_1 !} \right) + k \ln \left( \frac{{V_2}^{N_2}}{N_2 !} \right) –  k \ln \left( \frac{{V}^{N}}{N !} \right) \; .\]

The first and last terms are inverses of each other and, under the action of the logarithm, cancel, leaving

\[ S = k \ln \left( \frac{{V_1}^{N_1}}{N_1 !} \right) + k \ln \left( \frac{{V_2}^{N_2}}{N_2 !} \right) \; .\]

As the whole is a sum of the parts, this expression is clearly extensive.

The final step is the application of Sterlings approximation ($\ln n! \approx  n \ln n – n$).  To keep things clear, we will apply it to terms of the form

\[ S = k \ln \left( \frac{V^N}{N!} \right) \; \]

to get

\[ S = k \left( \ln V^N – \ln N! \right) = k \left( N \ln V – N \ln N – N \right) = k N \left( \ln \frac{V}{N} – 1 \right) \; , \]

which clearly shows that $S$ scales linearly with the system size (at least in the thermodynamic limit).

All told, Swendsen argues persuasively that the correct interpretation of the entropy is that it is always proportional to the logarithm of the probability, that the ‘traditional’ expression depending on the volume of phase is a special case of the larger rule, and that by misapplying this special case large numbers of physicists have taught or have been taught incorrectly for decades.  So much for the ideas of settled science.

Of Milk and Entropy

Last month’s column teased the idea that there was a challenge to the common wisdom that states that the reason that the traditional (T) expression for the entropy of a classical (C) gas of distinguishable (D) particles, given by

\[ S_{TCD} = k N \left[ \ln V + \frac{3}{2} \ln \frac{E}{N}  + X \right] \; , \]

where  $X$ is some constant, fails to be extensive is that classical mechanics overcounts the number of possible configurations.  A division of the partition function by $N!$ yields an extensive expression

\[ S_{T} = k N \left[ \ln \frac{V}{N} + \frac{3}{2} \ln \frac{E}{N} + X \right] \; , \]

as the ‘correct one’ and the conclusion is a philosophical one: there is no escaping quantum statistics; all gases are made up of indistinguishable (I) particles.

This conclusion seems ably rebutted by a paper entitled Statistical mechanics of colloids and Boltzmann’s definition of the entropy, by Robert H. Swendsen in 2006 in the American Journal of Physics.  Swendsen’s argument centers on looking at whole milk – the kind you can buy in any supermarket or convenience store.

While I must confess that even though I purchase whole milk regularly and was well aware that the term ‘homogenized’ attached to it, I never really bothered to understand just what was made homogeneous.  The basic notion is that the homogenized milk is a colloid with tiny fat and protein globules (Swendsen states characteristic sizes of the fat globules of ~0.5 microns) separated in a water medium.  Whole milk, which is 4

There are two key assumptions that Swendsen makes at the core of his analysis of whole milk as a classical colloid:

  1. The globules are distinguishable
  2. The globules constitute a gas

That the globules are distinguishable is strongly supported by the fact that at a diameter of ~0.5 microns, there are approximately $10^{9}$ atoms (give or take an order of magnitude) contained within each globule, and, so, it would be extremely unlikely that any two globules would contain exactly the same number of atoms.  The odds of finding identical globules drops many orders of magnitude more once one considers that each globule will contain some amount of foreign contaminants so that both the composition and the number of atoms found within any given globule will likely be unique and thus each globule will be microscopically distinguishable.

That the globules can be model as an ideal gas takes a bit more thought.  The key features of an ideal gas are that it is a collection of similar objects that only interact with each other over a very short range and that the time between interactions is large compared to the duration of the interaction.  Despite the fact that the globules are suspended in water, a substance which continuously jostles the individual globules, doesn’t alter the fact that they interact with other fat globules through a short-range electrostatic repulsion only occasionally (on the order of 4

With Swendsen’s two assumptions well-supported, we are now equipped to argue against the conclusion that quantum mechanics is inescapable.  Here we have a gas of distinguishable particles, all much larger than an atom so that quantum statistics can hold no sway, for which the traditional expression for entropy predicts startlingly wrong conclusions.  One, we’ve already encountered in the Gibbs paradox discussion in the last post.  The other, which is a variation, also deals with mixing and goes something like this.

Imagine that we divide a tank of total volume into two subdivisions $V  = V_1 + V_2$ subject to the constraint $V_1 > V_2$.  The larger sub-volume $V_1$ is filled with whole milk and the smaller sub-volume $V_2$ is filled with skim milk (completely devoid of fat globules).  Let $N$ be the total number of fat globules in the system, which are initially contained in $V_1$ and $E = E_1$ be their total energy.  For simplicity, we can also assume that the average energy per particle $\epsilon$ remains fixed (no heat transfer and no work done).  The initial entropy of the system given by the traditional formula is

\[ S_{TCD,initial} = k N \left[ \ln V_1 + \frac{3}{2} \ln \epsilon + X \right] \; . \]

We then imagine opening a small port for the two systems to mix and then closing it.  The final entropy is

\[ S_{TCD,final} = k N_1 \left[ \ln V_1 + \frac{3}{2} \ln \epsilon + X \right] + k N_2 \left[ \ln V_2 + \frac{3}{2} \ln \epsilon + X \right] \; . \]

The difference in the entropy is then

\[ \Delta S_{TCD} = k N_1 \ln V_1 + k N_2 \ln V_2 – k N \ln V_1 = k N_2 \left( \ln V_2 – \ln V_1 \right) = k N_2 \ln \left( \frac{V_2}{V_1} \right) \; . \]

And here is the problem: given that by construction $V2 < V1$ the entropy is always negative, even though mixing is an irreversible process; it takes work to restore the system to its ‘before’ state (macroscopically that all the globules are back in the larger volume if not precisely at the same microstate of positions ${\vec r}_i$ and velocities ${\vec v}_i$.

This second inconsistency (and likely there are others) further emphasizes that the classical expression is deeply flawed and, of course, we already knew this.  But we can’t resort to quantum mechanics to come in save the day, as was done with simpler gases, since this approach is also deeply flawed as each object in this system is also distinguishable.

Swendsen resolves this problem by arguing that the methodology that led to the classical expression is wrong because it makes entropy related to the volume in phase space and not to the probability.  To quote Swendsen:

Oddly enough, Boltzmann would not have encountered these problems, because he would not have used Eq.1 [for $S_{TCD}$]. He wrote the entropy (in modern notation) as

\[ S_{dist} = kN \left[ \ln \frac{V}{N} + \frac{3}{2} \ln \frac{E}{N} + X + 1\right] \; .\] If we use [this equation], the entropy remains constant in the first experiment when the wall between the two subvolumes of milk is either removed or reinserted, as is appropriate for a reversible process. For the second experiment, it is easy to show that $S_{dist,total}$ is always positive; the entropy increases as it must for an irreversible process.

Next blog will delve into this question about probability a bit more to show how Monte Carlo simulations dealing with microstates dovetail with entropy and these observations.

Gibbs Paradox

This month’s column builds upon the basic building blocks from last month, namely that despite the seemingly simple presentation that most textbooks afford to the idea of entropy there is an enormous amount of subtly and nuance in an idea that is well over a hundred years old.  As discussed in that earlier post, Robert Swendsen argues in his 2011 article How physicists disagree on the meaning of entropy (American Journal of Physics 79, 342), the primary area where things seem to break down is that different people presuppose an implicit set of assumptions not necessarily shared by anyone else.  To quote Swendsen

When people discuss the foundations of statistical mechanics, the justification of thermodynamics, or the meaning of entropy, they tend to assume that the basic principles they hold are shared by others.  These principles often go unspoken, because they are regarded as obvious. It has occurred to me that it might be good to restart the discussion of these issues by stating basic assumptions clearly and explicitly, no matter how obvious they might seem.

The one area that has triggered this realization was his recent work on (and subsequent debate over) the Gibbs paradox.

The Gibbs Paradox,  named after Josiah Willard Gibbs, is the derivation from classical statistical mechanics which leads to an entropy expression for the ideal gas that is not extensive.  The expectation that entropy is extensive amounts to saying that one expects to see that the entropy of a system doubles when the system itself doubles in size (keeping all other things equal).  Since the ideal gas is the standard textbook example of a nontrivial collection of matter perfectly designed for understanding thermodynamics finding a result that flies in the face of this expectation casts doubt on the underpinnings of statistical physics.  The usual way that this doubt is remedied is to patch up the classical analysis by appealing to quantum mechanics and the indistinguishability of particles.  The concept of indistinguishability among the particles is, of course, lies at the heart of the Fermi-Dirac and the Bose-Einstein statistics for fermions and boson, respectively.  The idea basically being that there are no ways of labeling, of painting, of hanging a number on individual particles and, therefore, that our basic ignorance must be included in the way we use in statistical mechanics.

Specifically, the classical analysis of an ideal gas made of distinguishable particles (using what Swendsen calls the traditional definition of entropy) leads to the following expression for the entropy (‘CD’ = classical, distinguishable)

\[  S_{CD} = k N \left[ \ln V + \frac{3}{2} \ln \frac{E}{N} + X \right] \; , \]

where $$X$$ is some constant.  The objection is that expression is not extensive due to the $$\ln V$$ term in brackets.  For example, scaling the system by some overall factor $$\alpha$$ ($$N \rightarrow \alpha N$$, $$E \rightarrow \alpha E$$, and $$V \rightarrow \alpha V$$  gives an entropy of

\[  S_{CD,\alpha} = k \alpha N \left[ \ln ( \alpha V)  + \frac{3}{2} \ln \frac{E}{N} + X \right] \; , \]

which simplifies to

\[ S_{CD,\alpha} = \alpha S_{CD} + k N  \alpha \ln \alpha \; . \]

On the surface, the lack of extensivity might not seem alarming but consider the following composite system comprised of two tanks of an identical gas placed side-by-side.  Each collection has the same density and average energy per particle and each has the same volume.  Further suppose that there is a sliding panel at the interface between the tanks.  By removing the partition, the tank size now doubles (i.e. $$\alpha = 2$$) and the entropy change is

\[S_{CD,new} – S_{CD,old} = 2 S_{CD} + 2\ln 2 k N – 2 S_{CD} = 2 \ln 2 k N \; . \]

At this point, Gibbs notes that something is quite wrong.  The removal of the partition is a reversible process (since the gas is thermodynamically the same on both sides the presence or absence of the partition shouldn’t make a difference), meaning that the entropy should not increase at all. 

The remedy found in most textbooks (e.g. Fundamentals of Statistical and Thermal Physics by Reif from which the following quoted expressions come) starts by arguing that when we remove the partition and allow the gas molecules in one tank to mix with those in another we are implicitly assuming them “individually distinguishable, as though interchanging the positions of two like molecules would lead to a physically distinct state of the gas.”   The argument concludes by directing us to correct for the overcounting that “taking classical mechanics too seriously” has foisted upon us.  The correction for over-counting involves dividing a term earlier in the derivation (the partition function) by $$N!$$, which corrects the entropy (now adapted to indistinguishable particles hence the change from ‘D’ to ‘I’) to read

\[ S_{CI} = k N \left[ \ln \frac{V}{N} + \frac{3}{2} \ln \frac{E}{N} + X’ \right] \; , \]

which is obviously extensive with the equally obvious implication that the problem is solved and nothing more needed to be done.

Here the story seems to have staled for some long period of time (decades), most likely due to the belief that quantum mechanics was the correct viewpoint (or at least more correct) for the world at large.  It seems to have been only fairly recent that a revived interest in putting classical statistical mechanics on firmer footing arose.  The result of this new effort has been the rediscovery of an old definition of entropy that Swendsen, who has been championing this viewpoint for nearly two decade, argues leads to more sensible results than a simple reflexive appeal to quantum mechanics.  And his most compelling argument to support this revived viewpoint is a substance that is likely to surprise:  simple homogenized milk.  However, that story, in all its glory, will have to wait until next month’s column.

An Invitation to Entropy

The subject matter over the last few months has touched upon thermodynamics in a variety of guises.  For example, the concept of enthalpy and isentropic flow has played a key role in compressible fluid flow.  In the posts discussing the Maxwell relations, the thermodynamics square and the classic relationships between second order partial derivatives were the main tools used to eliminate pesky terms involving the entropy in favor of quantities easier to measure in the lab.

It seems that it is now prudent to put down a few notions about entropy itself.  No other physical quantity, with the possible exception of energy, is as ubiquitously used as entropy and none is as poorly understood as entropy.  Indeed, in his 2011 article entitled How physicists disagree on the meaning of entropy, Robert Swendsen starts with the quotation from von Neuman “nobody understands entropy”.  Chemists use entropy to determine the direction of chemical reactions, physicists use it when looking at matter in motion (e.g. compressible gas within a cylinder), electrical engineers use it when characterizing information loss on channel, the amount software can compress an depends on its entropy, and so on.

Entropy seems to be a Swiss army knife concept with lots of different built-in gadgets that can be pulled out and used on a moment’s notice.  It’s no wonder that such multi-faceted idea is not only poorly understood but also gives rise to radically contradictory notions.  For example, Swendsen starts his article with the following list of 18 properties that he has seen or heard attributed to entropy:

  • The theory of probability has nothing to do with statistical mechanics.
  • The theory of probability is the basis of statistical mechanics.
  • The entropy of an ideal classical gas of distinguishable particles is not extensive.
  • The entropy of an ideal classical gas of distinguishable particles is extensive.
  • The properties of macroscopic classical systems with distinguishable and indistinguishable particles are different.
  • The properties of macroscopic classical systems with distinguishable and indistinguishable particles are the same.
  • The entropy of a classical ideal gas of distinguishable particles is not additive.
  • The entropy of a classical ideal gas of distinguishable particles is additive.
  • Boltzmann defined the entropy of a classical system by the logarithm of a volume in phase space.
  • Boltzmann did not define the entropy by the logarithm of a volume in phase space.
  • The symbol W in the equation S=k log W, which is inscribed on Boltzmann’s tombstone, refers to a volume in phase space.
  • The symbol W in the equation S=k log W, which is inscribed on Boltzmann’s tombstone, refers to the German word “Wahrscheinlichkeit” (probability).
  • The entropy should be defined in terms of the properties of an isolated system
  • The entropy should be defined in terms of the properties of a composite system.
  • Thermodynamics is only valid in the “thermodynamic limit,” that is, in the limit of infinite system size.
  • Thermodynamics is valid for finite systems.
  • Extensivity is essential to thermodynamics.
  • Extensivity is not essential to thermodynamics.

This list, which is really a list of 9 pairs of contradictory statements about entropy, goes out of its way to show just how many diverging ideas scientists have about entropy.  And since it is trendy to have one’s own pet idea(s) about this fundamental concept, it seems about time, that I get my own and that it is the aim of this blog and the ones that follow.  As a warm up to a deeper dive, I decided to return to the basic ideas introduced in Halliday and Resnick physics. 

The most intriguing aspect of the textbook discussion of entropy is that it is a state variable, that is to say, its value depends only on what the system is doing at any given time and not how the system got there.  This is a key concept because it means that we are relieved in trying to find the particular path through which the system evolved.

What is particularly remarkable about this discovery is that it came about in the 19th century.  This was the time in which the idea of smooth distributions of matter held the day.  When the primary concept was that of a field, continuous in every way.  A time well before the concept of discrete, microscopic states emerged from an understanding of the quantum mechanics of atoms, molecules, and other substances. 

The thermodynamic relationship for entropy reads

\[ S_f – S_i = \int_{i}^{f} \frac{dQ}{T} \; , \]

where any path connecting the initial state (denoted by $$i$$) with the final state (denoted by $$f$$) will do.  Nowhere in this definition can one find any clear signpost to indicate lumpy matter or the concept of the discrete.  In addition, nothing in this definition even hints at a particular substance or class of them; nor is a particular phase of matter required.  A breathtaking sweep of generality is hidden behind a few simple glyphs on a page. 

As an example of the universality of the fundamental statement consider a familiar household system, say a glass of milk.  If we do something prosaic like warm it by 10 degrees Celsius we arrive at the same entropy change as we would have if we had boiled the milk off into a vapor, melted the glass down, reconstituted the latter and recondensed the former.  No matter what bizarre journey we subject a material to, the resulting change in entropy will simply depend on the initial and final configurations and not on the details connecting one to the other.

The usual playground for first thinking about entropy is the ideal gas and the usual example given the student is the computation of the entropy change of the free expansion of a gas.  The context of this discussion usually follows upon the heels of an introduction to the kinetic theory of gases – a theory that presupposes the existence of atoms.  The free expansion of gas is, perhaps, the most radical of all irreversible processes.  There is no orderly flow, the very concept of a continuum fails to apply; every atom goes its own way and no macroscopic evolution of thermodynamic state can even be imagined. 

And yet, almost blithely, textbooks argue the ease at which the entropy change in such a process can calculated.  The argument goes as follows.  From the kinetic theory of gases, one can show that during a free expansion, the internal energy does not change.  The reason for this is that the gas does no work (that is what ‘free’ really means) and the process happens fast enough that no heat is transferred in or out.  Since the change in internal energy is given by

\[ \Delta U = n C_V \Delta T \; \]

any ideal gas process that doesn’t change the internal energy also leaves the temperature unchanged.  The matching thermodynamic process, where reversibility and equilibrium are maintained at all times is the isothermal expansion. 

The first law

\[ dU = dQ – dW \; \]

can be specialized to any reversible ideal gas process, to yield

\[ n C_V dT = dQ – p dV = dQ – \frac{n R T}{V} dV \; .\]

Solving for $$dQ/T$$ gives

\[ \frac{dQ}{T} = n R \frac{dV}{V} + n C_V \frac{dT}{T} \; .\]

Integrating both sides from the initial to final state gives

\[ S_f – S_I = \int_i^{f} \frac{dQ}{T} = n R \ln \left( \frac{V_f}{V_i} \right) + n C_V \ln \left( \frac{T_f}{T_i} \right) \; .\]

This simplifies for an isothermal process to be

\[ S_f – S_i = n R \ln \left( \frac{V_f}{V_i} \right) \; .\]

So, the change in entropy for a free expansion must be exactly equal to the expression above even though, as the free expansion occurs, there is a complete absence of anything resemble a volume.  This is a subtle result that gets only more subtle when one reflects on the fact that statistical mechanics wasn’t available when the concept of entropy first appeared.

It is for this reason that the next several blogs will be looking at entropy.

Maxwell’s Relations in Action

Last month’s column introduced the notion of the thermodynamic square as a mnemonic for organizing certain second-order partial derivatives amongst the various thermodynamic potentials: the internal energy $$U$$, the Gibbs and Helmholtz free energies $$G$$ and $$F$$, and the enthalpy $$H$$.  As previously alluded to, physicist primarily use these relations (called the Maxwell Relations) to eliminate the terms difficult or impossible to measure experimentally in favor of parameters that are easily measured in the lab.

A practical example of the application of the Maxwell relations is the simplification of the ‘first’ and ‘second $$T dS$$ equations’ as listed in Exercise 7.3-1 on page 189 of Herbert B. Callen’s book Thermodynamics and an Introduction to Thermostatistics, 2nd edition.  The relevant physical properties in these equations are (Sec. 3.9 of Callen):

  • the number of particles $$N$$,
  • the differential heat $$dQ = T dS$$,
  • the heat capacity at constant volume $$c_V = \frac{T}{N} \left( \frac{\partial S }{\partial T} \right)_V= \frac{1}{N} \left(\frac{\partial Q }{\partial T}\right)_{V}$$,
  • the heat capacity at constant pressure $$c_P = \frac{T}{N} \left( \frac{\partial S }{\partial T} \right)_P = \frac{1}{N}\left(\frac{\partial Q}{\partial T}\right)_{P}$$,
  • the coefficient of thermal expansion $$\alpha = \frac{1}{V} \left(\frac{\partial V}{\partial T}\right)_P $$, and
  • the isothermal compressibility $$\kappa_T = – \frac{1}{V} \left( \frac{\partial V}{\partial P} \right)_T$$.

First $$TdS$$ Relation

The first relation we want to verify is

\[ T dS = N c_V dT + \frac{T \alpha}{\kappa_T} dV \; .\]

From the form of this equation, assume that the quantity in question is the entropy as a function of the temperature and volume $$S = S(T,V)$$.  Taking the first differential gives

\[ dS = \left(\frac{\partial S}{\partial T}\right)_{V} dT + \left(\frac{\partial S}{\partial V}\right)_{T} dV \; .\]

The first term is relatively easy to deal with in terms of the heat capacity at constant volume $$c_V$$:

\[ \left(\frac{\partial S}{\partial T}\right)_{V} = \frac{N c_V}{T} \; .\]

The second term requires a bit more work.  First use the Maxwell relation associated with the Helmholtz free energy $$F$$

to get

\[ –  \left(\frac{\partial S}{\partial V}\right)_{T} = – \left(\frac{\partial P}{\partial T}\right)_{V} \; .\]

Next use the second classical partial derivative identity

\[ \left(\frac{\partial P}{\partial T}\right)_{V} \left(\frac{\partial T}{\partial V}\right)_{P} \left(\frac{\partial V}{\partial P}\right)_{T} = – 1 \; ,\]

and solve for

\[ \left(\frac{\partial P}{\partial T}\right)_{V} = \frac{-1}{\left(\frac{\partial V}{\partial P}\right)_{T} \left(\frac{\partial T}{\partial V}\right)_{P} } \; . \]

Use another of the classical partial derivative identities to move the $$\left(\frac{\partial T}{\partial V}\right)_{P}$$ to the numerator to get

\[ \left(\frac{\partial P}{\partial V}\right)_{T} = – \left(\frac{\partial V}{\partial T}\right)_{P} / \left(\frac{\partial V}{\partial P}\right)_{T} \; .\]

Multiply the numerator and denominator by $$1/V$$ and use the definitions of $$\alpha$$ and $$\kappa_T$$ to get

\[ \left(\frac{\partial P}{\partial V}\right)_{T} = \frac{\alpha}{\kappa_T} \; . \]

At this point, the first differential stands as

\[ dS = \frac{N c_V}{T} dT + \frac{\alpha}{\kappa_T} dV \; . \]

Multiplying each side by $$T$$ gets us to the final form of the first $$T dS$$ equation

\[ T dS = N c_V dT + \left(\frac{T \alpha}{\kappa_T} \right) dV \; .\]

Second $$TdS$$ Relation

The second relation is

\[ T dS = N c_P dT – T V \alpha dP \; . \]

From the form of this equation, assume that the quantity in question is the entropy as a function of the temperature and pressure $$S = S(T,P)$$.  As in the first $$T dS$$ relation, taking the first differential gives

\[ dS = \left(\frac{\partial S}{\partial T}\right)_{P} dT + \left(\frac{\partial S}{\partial P}\right)_{T} dV \; .\]

The first term, as in the case above, is also relatively easy to deal with in terms of the heat capacity, this time at constant volume, $$c_P$$:

\[ \left(\frac{\partial S}{\partial T}\right)_{P} = \frac{N c_P}{T} \; .\]

The second term only requires a Maxwell relation in terms of the Gibbs free energy

\[ -\left(\frac{\partial S}{\partial P}\right)_{T} = \left(\frac{\partial V}{\partial T}\right)_{P} \; .\]

The first differential becomes

\[ dS = \frac{N c_P}{T} dT –  \left(\frac{\partial V}{\partial T}\right)_{P} dP \; .\]

Multiplying the second term by $$V/V$$ and simplifying gives

\[ dS = \frac{N c_P}{T} dT –  V \alpha  dP \; ,\]

which becomes the desired relation when multiplying both sides by $$T$$

\[ T dS = N c_P dT – T V \alpha dP \; .\]

To summarize, these two relations show how to express the heat $$T dS$$, expressed in terms of the entropy (which cannot be measured) in terms of parameters, such as temperature, pressure, and heat capacity, that are easily measured in the lab.   All that is needed is the machinery of partial derivatives.  This observation is the reason that so many textbooks on Thermodynamics have specific sections devoted to these approaches.

Thermodynamic Partials – Maxwell’s Relation

Last month’s installment presented a clean derivation of the classic relations between partial derivatives and showed a simple example of how they work in the concrete.  As nice as that presentation is, the real power of these relations is only realized when dealing with systems with a large number of variables in play and in which various manipulations are required to extract meaning from the systems involved.  The prototypical example is classical thermodynamics.   

The fundamental concept in thermodynamics is the existence of a thermodynamic potential, which is a scalar function that encodes the state of the thermodynamic system in terms of the measurable quantities that describe the system, such as volume or temperature.  Changes in the values, these independent physical variables (sometimes called the natural variables), relate directly to the potential of the corresponding partial derivatives. 

The textbook example of this type of relation is defined in the first law of thermodynamics, which asserts that there exists a function, called the internal energy $$U$$, that is a function of the entropy $$S$$, the volume $$V$$, and the number of particles $$N$$ making up the physical system being modeled (assuming a single type of substance; generalizations to multiple species is straightforward but cumbersome).  Changes in the internal energy $$U$$ can be calculated by

\[ dU = T dS – P dV + \mu dN \; ,\]

where the temperature, pressure and chemical potential are defined as

\[ {T} = \left( \frac{\partial U}{\partial S} \right)_{V,N} \; , \]

\[ {P} = -\left( \frac{\partial U}{\partial V} \right)_{S,N} \; , \]

and

\[ {\mu} = \left( \frac{\partial U}{\partial N} \right)_{S,V} \; , \]

respectively.

Without dwelling on the theory, suffice it to stay that laboratory conditions vary and there are many circumstances where it is preferable to work with a different set of independent variables.  For example, heating water on a stove top in an uncovered pan is better understood in terms of fixed pressure rather than fixed volume. 

Thermodynamics supplies an approach for dealing with these cases using the Legendre transformation.  In the stove-top experiment mentioned above, the appropriate potential is called the enthalpy, defined as

\[ H = U + PV \; . \]

Taking the first differential gives

\[ dH = dU + P dV + V dP \\ = T dS – P dV + \mu dN + P dV + V dP \\ = T dS + V dP + \mu dN \; , \]

which demonstrates that $$H = H(S,V,N)$$.

Assuming that the order of differentiation can be exchanged, there are several relationships that exist (called Maxwell relations) between various partial derivatives.  For example:

\[ \left( \frac{\partial T}{\partial V} \right)_{S,N} = \left( \frac{\partial  }{\partial V } \left( \frac{\partial U}{\partial S} \right)_{V,N} \right)_{S,N} \\ = \left( \frac{\partial }{\partial S} \left( \frac{\partial U}{\partial V} \right)_{S,N} \right)_{V,N} \\ = -\left( \frac{\partial P}{\partial S} \right)_{V,N} \; . \]

Obviously, there is a lot of notational overhead in the above relation.  For the sake of this analysis, we will make two simplifications to improve the clarity.  First, we will assume a single species with a fixed number of moles.  This assumption removes the need to carry $$N$$ and $$\mu$$.  Second, we will forego keeping track of the other variables being held constant.  Since we will be tacitly tracking which thermodynamic potential is being used, there is little chance of confusion.

The primary purpose the Maxwell relations serve is to eliminate terms involving the entropy in favor of physical parameters that can be experimentally measured, such as temperature, volume, or pressure.

A useful mnemonic exists for looking up the various relations without scanning through a table.  It’s called the thermodynamic square, and it’s constructed as below

The 4 most important thermodynamic potentials, the Helmholtz free energy $$F = U – TS$$, the Gibbs free energy $$G = U + PV – TS$$, the enthalpy $$H = U + PV$$, and the internal energy $$U$$, are arranged along the edges starting up at the top and going clockwise.  The thermodynamic variables – the temperature $$T$$, the pressure $$P$$, the entropy $$S$$, and the volume $$V$$ – are arranged at the corners, starting just after $$F$$ and also going clockwise.  The Maxwell relations relate a partial derivative expressed in terms of three consecutive corners (shaded blue below) to the partial derivative expressed in terms another three consecutive corners (shaded yellow).  The arrows lying along the diagonal set the sign of the partial derivative based on which variable is in the numerator:  those with the arrowhead positive and those without negative. 

These rules are far easier to understand with the mixed partial derivatives of $$U$$ discussed above.

The blue shaded region reads counterclockwise while the yellow region reads clockwise.  These orientations follow from the overlap they share on the left-hand side of the square (the one labeled $$U$$).  The signs are determined by the arrows.  Since the $$T$$ corner is the numerator and it has an arrowhead, the partial derivative is positive.  Likewise, since the $$P$$ corner is the numerator and it lacks an arrowhead, the partial derivative is negative.

Once the basic operations using the square are understood, it is easier to present a single square with the common side being both shaded in blue and yellow, resulting in green.  An example for the Maxwell relations coming from the Helmholtz free energy $$F$$ being

With these tools, we can get to the meatier topics, namely using a combination of the classical rules for partial derivatives and the Maxwell relations, as presented in the thermodynamic square, to eliminate the entropy in favor of physically measurable quantities.  But this will be the topic for next month’s post.

Partial Derivatives – Completely Done

Partial derivatives are an important mathematical tool in a number of physics disciplines, most notably field theories (e.g. electricity & magnetism and general relativity) and in thermodynamics.

However, working with partial derivatives are always a bit tricky and teaching students about them is usually fraught with difficulties.  So it was to my pleasant surprise that I found a really nice discussion of how to derive the various ‘classic’ rules cleanly presented in Classical and Statistical Thermodynamics by Ashley H. Carter. 

My presentation here is strongly influenced and closely follows her presentation in Appendix A, although I’ve added on a bit in the theoretical flow and I’ve also provided explicit examples in terms of the standard paraboloid found in freshman calculus. 

Assume a function of 3 variables can be expressed as $$f(x,y,z) = 0$$.  This equation can be viewed as a constrain equation linking the values of the variables such that two variables are independent.  That means that we can (at least locally) solve for:

  • $$x(y,z) = 0$$
  • $$y(x,z) = 0$$
  • $$z(x,y) = 0$$

Focus on the first and second forms (other pairings will follow a simple relabeling of the variables).  The corresponding differentials are

\[ dx = \left( \frac{\partial x}{\partial y} \right)_z dy + \left( \frac{\partial x}{\partial z} \right)_y dz \; \]

and

\[ dy = \left( \frac{\partial y}{\partial x} \right)_z dx + \left( \frac{\partial y}{\partial z} \right)_x dz \; .\]

Now substitute the expansion of $$dy$$ into the expansion of $$dx$$

\[ dx = \left(\frac{\partial x}{\partial y}\right)_z \left[ \left(\frac{\partial y}{\partial x}\right)_z dx + \left(\frac{\partial y}{\partial z}\right)_x dz \right] + \left(\frac{\partial x}{\partial z}\right)_y dz \; ,\]

which simplifies to

\[ dx = \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial x}\right)_z dx + \left[ \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x + \left(\frac{\partial x}{\partial z}\right)_y \right] dz \; .\]

Putting it all together gives

\[ \left[ 1 – \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial x}\right)_z \right] dx – \left[ \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x + \left(\frac{\partial x}{\partial z}\right)_y \right] dz = 0 \; .\]

Since $$dx$$ and $$dz$$ are independent, each differential can be set to zero independently, giving one of the classic identities.

First set $$dz = 0$$ to get

\[ \left(\frac{\partial x}{\partial y}\right)_z = 1/ \left(\frac{\partial y}{\partial x}\right)_z \; , \]

which is called the reciprocal rule.

Next, setting $$dx = 0$$ yields

\[ \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x = \; – \; \left(\frac{\partial x}{\partial z}\right)_y \; , \]

which is called the fraction rule.

The manipulations are complete when using the reciprocal rule in the fraction rule and simplify to get

\[ \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x \left(\frac{\partial z}{\partial x}\right)_y = -1 \; , \]

which is called the cyclic rule.

Let’s take a look at these relationships in action. Consider the implicit definition of the paraboloid

\[ x^2 + y^2 – z = 0 \; . \]

As mentioned earlier, this equation can be considered as a constraint equation that selects out a value for any one of the three variables given the other two. In other words, we can imagine a look up table where we select a value of $$x$$ and $$y$$, we rummage through the table to find a row with both values and then we scan to the right to find the allowed value of $$z$$ that makes it satisfy the implicit equation.

How do you construct this table; not at a finite set of points but functionally so that it works at any point? It is natural and easy to determine $$z$$ given $$x$$ and $$y$$ by simply rewriting the implicit equation as

\[ z(x,y) = x^2 + y^2 \; .\]

However, it isn’t as easy to express $$x$$ or $$y$$ as functions of the remaining two variables because of the two possible signs that result from taking the square root. We need to have four functional relationships

\[ x_p(y,z) = \sqrt{ z \; – \; y^2 } \; ,\]

\[ x_n(y,z) = \; – \; \sqrt{z \; – \; y^2} \; , \]

\[ y_p(x,z) = \sqrt{z \; – \; x^2} \; , \]

and

\[ y_n(x,z) = \; – \; \sqrt{z \; – \; x^2} \; , \]

depending on the particular combination of whether $$x$$ is positive or negative and whether $$y$$ is also positive or negative.  In the language of differential geometry, we have a 5 charts in our atlas.

We are now in position to try the various relations derived above. For example, let’s examine the reciprocal relation in the first quadrant of the $$x$$-$$y$$ plane.  We need to use $$x_p$$ as our local chart.

\[ \left(\frac{\partial x_p}{\partial z}\right)_y = \frac{1}{2} \frac{1}{\sqrt{z – y^2}} \; \]

or once we recognize the denominator as $$x_p$$ 

\[\left(\frac{\partial x_p}{\partial z}\right)_y = \frac{1}{2 x_p} \; . \]

The ‘reciprocal’ partial derivative is

\[ \left( \frac{\partial z}{\partial x} \right)_y = 2 x = 2 x_p \; ,\]

where there is no need for the $$z$$-chart to distinguish between positive and negative values of $$x$$.  As expected the derivatives are reciprocals of each other.

Next, let’s test the fraction rule.  For fun, this time let’s test it in the 2nd quadrant in the $$x$$-$$y$$ plane ($$x < 0$$ and $$y > 0$$).  Calculating the partial derivatives on the left-hand side yields

\[ \left(\frac{\partial x_n }{\partial y_p } \right)_z = \frac{y_p}{\sqrt{z – y_p^2}} = \; – \; \frac{y_p}{x_n} \; \]

and

\[ \left(\frac{\partial y_p }{\partial z} \right)_{x_n} = \frac{1}{2\sqrt{z-x_n^2}} = \frac{1}{2 y_p} \; . \]

It is a simple matter to verify that

\[ \left(\frac{\partial x_n }{\partial y_p} \right)_z \left(\frac{\partial y_p }{\partial z} \right)_{x_n} = -\frac{1}{2 x_n} \; \]

is identical to 

\[ – \left(\frac{\partial x_n }{\partial z} \right)_{y_p} = \frac{1}{2 \sqrt{z – y_p^2} } = -\frac{1}{2 x_n} \; .\]

Finally, for the cyclic rule, let’s go into the 4th quadrant in the $$x$$-$$y$$ plane ($$x>0$$ and $$y<0$$).  Taking each partial derivative in turns yields

\[ \left(\frac{\partial x_p }{\partial y_n} \right)_{z} = -\frac{y_n}{\sqrt{z-y_n^2}} = -\frac{y_n}{x_p} \; ,\]

\[ \left(\frac{\partial y_n }{\partial z} \right)_{x_p} = -\frac{1}{2 \sqrt{z-x_p^2} } = \frac{1}{2 y_n} \; ,\]

and 

\[ \left(\frac{\partial z }{\partial x_p} \right)_{y_n} = 2 x_p \; .\]

Multiplying these terms in order gives

\[ -\frac{y_n}{x_p} \frac{1}{2 y_n} 2 x_p = -1 \; .\]

Nice, neat, and more than partially done.

Elementary Compressible Flow – Part 4

In this final post on compressible fluid flow, we will take a look at an application of the machinery developed over the previous three posts to the converging-diverging nozzle, often called a rocket nozzle or a de Laval nozzle.  The setup in straightforward even though it is probably unfamiliar.  We imagine a reservoir containing a gas at a high pressure and, not often, high temperature that serves to supply the flow through the nozzle.  By convention, the thermodynamic state variables in the reservoir are given a ‘0’ subscript and are called the stagnant conditions as there is no flow within the reservoir itself.  This reservoir feeds into a converging section of a nozzle in which the cross-sectional area decreases steadily until a minimum occurs at what is called the throat of the nozzle.  After the throat, the cross-sectional area increases again until the nozzle ends at the exit.  The nozzle and reservoir are placed into a larger pressure vessel (sometimes called the receiver) where the pressure outside of the exit, called by convention the back pressure, $P_b$.

Initially the two pressures are equal and there is no flow from the reservoir through the nozzle.  As the back pressure is dropped, flow develops from the reservoir.  de Laval’s innovation was to realize that if the flow can be brought to sonic conditions (i.e., Mach number $M=1$) at the throat then the diverging section would further speed up the flow to supersonic speeds (see the discussion at the end of the second post of this series).

The goal of this post is to discuss the flow profiles along the length of the nozzle, from inlet at the reservoir, to exit into the pressure vessel, as a function of the ratio $P_b/P_0$ and to give a flavor of how the previous equations are used to quantitatively.  Despite the relative simplicity of the system (reservoir, nozzle, and pressure vessel) a rich set of flow patterns result.  Qualitatively, these fall into two broad classes with several variations within them. 

In the first class (shown in blue traces in the figure below), the flow is everywhere isentropic from the reservoir. through the nozzle, and into the pressure vessel.  Included in this class is the flat trace where $P_b/P_0 = 1$ where no flow results.  As $P_b/P_0$ is lowered, flow commences with the speed of the flow increasing as it approaches the throat from the converging section and then subsequently slowing down in the diverging section.  The maximum speed in the nozzle (which for this class is always at the throat) continues to increase until as $P_b/P_0$ is lowered until a critical value of the back pressure, $P_{b_s}$ when the flow is sonic (i.e., $M=1$) in the throat but still subsonic everywhere else in the nozzle (including the diverging section).

At this point, lowering the $P_b/P_0$ ratio further develops the second class of flow (shown in red traces in the figure below) where some portion of the flow in the diverging section is supersonic.  A shock develops somewhere downstream of the throat in order to decelerate the flow back to subsonic conditions so that it can match boundary conditions with the environment in the pressure vessel.  The only remaining question is precisely where the shock is found.  Five possible options are available:  1) the shock is found in the nozzle between the throat and the exit, 2) the shock falls at the exit, 3) the shock is beyond the nozzle but the exit pressure $P_e < P_b$, 4) the shock is beyond the nozzle and $P_e = P_b$, and 5) the shock is beyond the nozzle but $P_e > P_b$.  Options 3), 4), and 5) are called over-expanded, perfectly-expanded, and under-expanded flows, respectively (see discussion of The Converging-Diverging Nozzle applet).

In the third post of this series, we derived expressions relating the pressure, density, and temperature anywhere in an isentropic portion of the flow to the stagnation properties in the reservoir.  The relations derived were:

\[ \frac{P_0}{P} = \left[1 + \frac{\gamma – 1}{2} M^2 \right]^{\frac{\gamma}{\gamma -1}} \; ,\]

\[ \frac{\rho_0}{\rho} = \left[1 + \frac{\gamma – 1}{2} M^2 \right]^{\frac{1}{\gamma -1}} \; ,\]

and

\[ \frac{T_0}{T} = 1 + \frac{\gamma – 1}{2} M^2 \; .\]

We also derived the area-Mach number relationship, (AMR),

\[ \frac{A}{A^*} = \frac{1}{M} \left(1 + \frac{\gamma – 1}{2} \right)^{\frac{\gamma+1}{2(1-\gamma)}} \left[ 1 + \frac{\gamma – 1}{2} M^2 \right]^{\frac{\gamma+1}{2(\gamma-1)}} \; \]

that specified what the ratio of the local area, $A$, to the area of the throat, $A^*$, must be to achieve a target value of Mach number $M$.  A typical problem where these equations are useful comes from Fluid Mechanics Demystified by Merle Potter and reads

Air flows through a converging-diverging nozzle attached to a reservoir maintained at 400 kPa absolute and 20$^{\circ}$C to a receiver.  If the throat and exit diameters are 10 and 24 cm, respectively, what is the receiver pressure that will just result in supersonic flow throughout the diverging portion of the nozzle.

The solution starts with recognizing that if the flow in the diverging portion of the nozzle is to be supersonic then choked conditions occurs at the throat and, as a result, the throat area coincides with the critical area $A_{throat} = A^*$.  The area ratio of the exit to the throat is then $\frac{A}{A^*} = \left( \frac{24}{10} \right)^2=5.76$. Next we have to invert the AMR to find the value of $M$ that corresponds to this ratio.  A bisection, specifically adapted to the two possible dynamical behaviors: subsonic and supersonic, was written in python.  The resulting value was $M=3.3244$ was then plugged into the $P_0/P$ relation and the reciprocal taken to arrive at the exit pressure being $P_{exit} = 6746.659 \, Pa$, corresponding to an exit pressure only about $1.6%$ as large as the stagnant pressure.

The possibility of a shock in the possible flows requires us to add three new relations to our collection.  Our starting point for deriving the so-called jump conditions across the shock are the three basic fluid equations for mass, momentum, and energy, derived in the first post, but simplified by the tacit assumption that the shock is negligibly thick so that the cross-sectional area is the same on both sides of the shock, giving:

\[ \rho_u V_u = \rho_d V_d \; , \]

\[ (\rho_u {V_u}^2 + P_u) = (\rho_d {V_d}^2 + P_d) \; ,\]

and

\[ h_u + \frac{{V_u}^2}{2} = h_0 = h_d + \frac{{V_d}^2}{2} \; , \]

respectively.  Note that the subscripts `u’ and `d’ stand for upstream and downstream, respectively.

As before, our thermodynamics will assume the fluid to be an ideal gas with the equation of state given by $P/\rho = R T$.  The general form of enthalpy $h = e + P/\rho$ takes the simple form $h = (\gamma/\gamma-1) P/\rho$.  The state variables relate to their values at different parts of the flow via 

\[ \frac{P_0}{P} = \left( \frac{\rho_0}{\rho} \right)^{\gamma} = \left( \frac{h_0}{h} \right)^{\frac{\gamma}{\gamma-1}} \; . \]

The speed of sound in the gas, which is temperature dependent, takes on the forms

\[ c = \sqrt{\gamma R T } = \sqrt{\frac{\gamma P}{\rho}} = \sqrt{(\gamma-1) h} \; , \]

each useful in context.

Finally, the energy equation, which had been relegated to a by-stander role in incompressible fluid flow, offers two very useful relations.  The first, which will be termed the stagnant enthalpy expression (SEE), expresses the stagnant enthalpy in terms of the local sound speed via $(\gamma -1) h = c^2$ giving

\[ h_0 = h + \frac{V^2}{2} = h \left(1 + \frac{V^2}{2h} \right) = \frac{c^2}{\gamma-1} \left(1 + \frac{\gamma – 1}{2} M^2 \right) \; . \]

The second, which will be termed the local enthalpy expression (LEE), expresses the local enthalpy in terms of the local flow speed by

\[ (\gamma – 1) h = (\gamma – 1) \left( h_0 – \frac{1}{2}V^2 \right) \; . \]

One caution when using these relations.  The reservoir is operational defined as the location where the flow is isentropically brought to a stop.  Since a shock produces a great deal of entropy for any fluid element crossing from one side to another, the downstream thermodynamic conditions for a reservoir will be different from the actual reservoir attached to the nozzle and care must be taken not to confuse the notional one with the actual physical one.

To derive the normal shock relations, we follow these notes

Begin by dividing the momentum equation by the mass continuity equation to get

\[ V_u – V_d = \frac{1}{\gamma} \left( \frac{{c_d}^2}{V_d} – \frac{{c_u}^2}{V_u} \right) \; .  \]

Eliminate the local speeds of sound by relating them to their local enthalpies and then using the LEE to get

\[ V_u – V_d = \frac{\gamma -1}{\gamma} \left[ \frac{h_0 (V_u – V_d)}{V_u V_d} + \frac{1}{2}(V_u – V_d) \right] \; .\]

Dividing both sides by $V_u – V_d$ and rearranging yields

\[ \left( \frac{\gamma+1}{2} \right)^2 = \frac{h_0^2 (\gamma-1)^2}{{V_u}^2 {V_d}^2 } \; .\]

Next, employ a neat trick by writing the numerator on the left-hand side as ${h_0}^2 (\gamma-1)^2 = h_{0u} (\gamma-1) h_{0d} (\gamma – 1) $.  Use the SEE to express each of these expressions in terms of the local sound speed and Mach number.  Doing so yields

\[ \left( \frac{\gamma+1}{2} \right)^2 = \frac{1}{{M_u}^2} \left[1 + \frac{\gamma-1}{2} {M_u}^2 \right] \frac{1}{{M_d}^2} \left[1 + \frac{\gamma-1}{2} {M_d}^2 \right] \; . \]

Solving for $M_d$ is relatively painless with the substitutions $L = (\gamma-1)/2$ and $Q = (\gamma +1)/2$.  These auxiliary variables keep the clutter down and the observation that $L^2 – Q^2 = -\gamma$ simplifies things enormously so that we arrive at 

\[ {M_d}^2 = \frac{1 + \left(\frac{\gamma – 1}{2}\right) {M_u}^2 }{\gamma {M_u}^2 – \left( \frac{1-\gamma}{2} \right)} \; . \]

This relation allows u get the jump in all of the local thermodynamic variables.  Starting with the density, use the continuity equation to

\[ \frac{\rho_d}{\rho_u} = \frac{V_u}{V_d} = \frac{{V_u}^2}{V_u V_d} \; .\]

Using the definition of Mach number and the SEE together yields

\[ {V_u}^2 = \frac{(\gamma-1)h_0}{1 + \frac{\gamma-1}{2} {M_u}^2} {M_u}^2 \; . \]

Earlier in the derivation of the Mach jump relations we found that $1/V_u V_d = (\gamma+1)/2 h_0(\gamma-1)$.  Substituting these expressions back into the density equation yields

\[ \frac{\rho_d}{\rho_u} = \frac{(\gamma+1) {M_u}^2}{2 + (\gamma-1){M_u}^2} \; . \]

The momentum equation  gives

\[ P_d – P_u = \rho_u {V_u}^2 – \rho_d {V_d}^2 = \rho_u {V_u}^2 \left(1 – \frac{\rho_d V_d V_d}{\rho_u V_u V_u} \right) = \rho_u {V_u}^2 \left(1 – \frac{\rho_u}{\rho_d} \right)\; . \]

Substitute $\rho V^2 = \gamma P M^2$ from the definition of the speed of sound and dividing by $P_u$ gives

\[ \frac{P_d}{P_u} – 1 = \gamma {M_u}^2 \left( 1- \frac{\rho_u}{\rho_d} \right) \; . \]

Using the already obtained expression for the density ratio (and noting to flip it based on which density is in the numerator) followed with some simplifications gives

\[ \frac{P_d}{P_u} = 1 + \frac{2\gamma}{\gamma+1} \left( {M_u}^2 – 1 \right) \; . \]

Finally, when needed, the temperature ratio comes subtituting into the ideal gas equation of state the ratios for density and pressure

\[ \frac{T_d}{T_u} = \frac{P_d}{P_u} \frac{\rho_u}{\rho_d} = \left[ 1 + \frac{2\gamma}{\gamma+1} \left( {M_u}^2 – 1 \right) \right] \frac{2 + (\gamma-1){M_u}^2}{(\gamma+1) {M_u}^2} \; .\]

We are now in a position to solve another typical problem associated with converging-diverging nozzle design, again taken from Merle Potter, that reads

Air flows from a reservoir through a nozzle into a receiver.  The reservoir is maintained at $400 kPa$ absolute and $20{}^\circ C$.  The nozzle has a 10-cm-diameter throat and a 20-cm-diameter exit.  What is the back pressure that locates the shock wave at the exit?

Since the shock is at the exit, we can assume that the upstream side of the shock is supersonic at the area of the exit.  So, the value of $A/A* = 4$.  Using the inverse AMR function, the upstream Mach number is $M_u = 2.940$.  Then using the standard isentropic relations the pressure ratio is $P_u/P_0 = 0.0298$ resulting in an upstream pressure of $P_u = 11,914.79 Pa$.  Finally, the Mach shock jump relationship yields $P_d/P_u = 9.919$ for a downstream pressure, which is equal to the receiver pressure, of $P_d = 118179.93 Pa$. 

Of course, this analysis just barely scratches the surface but these basic relations demonstrate the interplay between three basic laws of fluid mechanics: continuity, momentum, and energy and how they contribute to the practical construction of a converging-diverging nozzle.  And, so, I guess this is rocket science.