Uncategorized

Carnot, Clausius, and Kelvin

This month we were return to our exploration of entropy after our brief detour into field theory.  The earlier posts explored the definition of entropy derived from the statistical mechanics. In this installment, we return to the thermodynamic roots of entropy that originated in the analysis of the 19th Century. Our key players in this drama are Sadi Carnot, Lord Kelvin, and Rudolph Clasius, who will be with us both here and for several of the following posts.

This analysis closely follows the presentation found in Enrico Fermi’s book Thermodynamics with some additional extensions in the logic and new, explanatory diagrams that attempt to provide a cleaner approach to traditional material.

Thermodynamics rests upon the idea of a system in equilibrium so that we can characterize it in terms of a very small number of state variables compared with the overwhelmingly enormous number of degrees of freedom the system possesses.  A bottle of water is a good poster child for a system in equilibrium. The bottle can be described by the amount of water $m$ (or the number of moles $n$), the volume it occupies $V$, its temperature $T$, pressure $P$, and the like. Even if the water were not pure and we were forced to also specify the percentage of impurities by type there would still be far, far fewer numbers to specify than the incredibly astronomical number of position and velocity components required to describe the state as Newton would.  The state variables are completely independent of how the system made it into that configuration. Their values represent average quantities where individual, finer-grained fluctuations are smeared out.  There are two state variables that stand above the rest both in importance: the internal energy $U$ and the entropy $S$.

The internal energy is relatively familiar to us based on its analogy to the traditional energies defined in classical mechanics and electrodynamics.  That said, it took quite a long time before it was appreciated in the mid-1800s that mechanical energy and heat were equivalent.  When the dust had settled, the first law of thermodynamics had been postulated as

\[ \Delta U = Q – W \; , \]

where $Q$ is the heat that enters or leaves the system and $W$ the work done by the system on its surroundings.  The sign convention is such that heat entering and work performed are both positive quantities.  If one regards energy as the ‘currency’ for physical transactions, then the first law amounts to an accounting principle that says the books must balance and, in this regard, it is relatively easy to understand the physical content.

The entropy, on the other hand, is more difficult to summarize succinctly.  Many people can offer euphemisms stating that the principle of entropy means that there is ‘no free lunch’ or that it ‘forbids perpetual motion’ but these euphemisms don’t provide much in the way of physical understanding.

There are several steps in providing a firm understanding of entropy.  The rest of this post centers on the first step which involves the different ways of expressing the limitations the second law recognizes in the conversion between work and heat.

We start by looking at the isothermal expansion of an ideal gas, in which a flame provides the heat which causes the expantion.  Since the internal energy of an ideal gas only depends on temperature, as long as the temperature remains constant, there is no change in the internal energy $\Delta U = 0$.  Then from the first law $W = Q$, which means that all of heat energy is changed into the work needed to raise the piston.

It natural to ask if all physical processes allow for a complete conversion of heat to work as allowed by the first law whenever $\Delta U = 0$ or are there limitations on how efficient arbitrary physical processes can be? 

After much analysis and experimentation, most of which was done in the 1800s, the second law of thermodynamics emerged with a clear set of limitations for how changes between heat and work are made.  Its modern form expresses the statement in terms of entropy but we will avoid it in favor of more macroscopic statements.

Fermi provides two postulates that capture different aspects of the second law. The first postulate, attributed to Lord Kelvin, states that:

a transformation who’s only final result is to transform into work the heat extracted from a source that is at the same temperature is impossible.

Graphically this forbidden process is represented on a $PVT$ diagram as follows.

The circular arc reminds us that the state of the system must remain unchanged at the end of the transformation. 

The second postulate, attributed to Clausius, states

if heat flows by conduction from a body A to another body B then the transformation who’s only final result is to transfer heat from B to A is impossible.

The graphical representation of this forbidden process as follows.

Note that both postulates rule out as impossible certain transformations that leave the state of the system otherwise unchanged (“only final result is…”).  Since the state of system is unchanged we will focus on cyclic processes, of the kind used in engines, in which a complete circuit returns the system to its original state ($\Delta U = 0$) with some fraction of the heat absorbed being transformed into work.

The textbook example of a cyclic process is the Carnot cycle, which operates a system between two thermal reservoirs with temperatures $T_C$ and $T_H$ (with $T_C < T_H$). 

While the details of the Carnot cycle will be explored in the next post, for our purposes the final result relating the work derived to the heat exchanged given by

\[ W = Q_H – Q_C \; \]

will be all that is needed.

Fermi devotes a large amount of effort showing that the Kelvin and Clausius postulates are logically equivalent and are different facets of the same underlying limitations of the second law.

The first part of the proof, that Kelvin’s postulate implies the Clausius postulate, is the easiest to understand.  Suppose that the Kelvin postulate were false.  Then we could extract from system $A$ some work $W$ leaving system $A$ otherwise unchanged.  We then use the work to raise a block up an inclined plane gaining gravitational potential energy.  We then let the block slide down the plane using friction to transform the potential energy into heat which we can then dump to the hot reservoir in system $B$, thus violating the Clausius postulate.

The converse leg of the proof, that the Clausius postulate implies the Kelvin postulate, is a bit more difficult.  Suppose that the Clausius postulate were false and that it were possible to transfer some heat $Q_H$ from the cold reservoir at temperature $T_C$ with no other changes in the system to the hot reservoir $Q_H$.  As long as the amount of heat is consistent with what is normally adsorbed from by a Carnot cycle to produce an amount of work $W$ we then find that we can return the hot reservoir to its original state with no additional changes.  We could then use the Carnot cycle to adsorb this heat to produce some work $W$ thus extracting heat from the hot reservoir without any additional changes in violation of the Kelvin postulate.

The logical equivalence of the Kelvin and Clausius postulates demonstrate that these various limitations are different facets of the second law.  This logical structure serves as the launching pad for exploring the concept of entropy from the macroscopic point-of-view.

[Note added after publication – the equivalence between the Kelvin and Clausius postulates is nicely described here.]

A Curvilinear Mantra – Part 2

The last post introduced the curvilinear mantra for students working with field equations in such disciplines as fluid mechanics, general relativity, and electricity and magnetism.  The textbook example (see, e.g. Acheson Appendix A, pp 352-3) is Euler’s equations for ideal fluids in two spatial dimensions. 

In cartesian coordinates these equations read

\[ \rho \left( V_x \partial_x + V_y \partial_y + \partial_t \right) V_x = – \partial_x p + f_x \;  \]

and

\[ \rho \left( V_x \partial_x + V_y \partial_y + \partial_t \right) V_y = -\partial_y p + f_y \; ,\]

whereas, in polar coordinates these equations read

\[ \rho \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta  + \partial_t \right) V_r – \rho \frac{{V_\theta}^2}{r} = -\partial_r p + f_r \; \]

and

\[ \rho \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta  + \partial_t \right) V_\theta + \rho \frac{V_r V_\theta}{r} = -\frac{1}{r} \partial_\theta p + f_\theta \; . \]

As discussed in the previous post, beginning students are often confused by two changes when transitioning from cartesian to polar coordinates.  The first is the appearance of $1/r$ scale factors that decorate various terms such as $V_\theta/r \partial_\theta$.  The second is the appearance of additional additive terms, such as $V_r V_\theta/r$. 

The curvilinear mantra explains these changes as follows: the scale factors come from minding the units and the additive terms show up to account for how the basis unit vectors change from place to place.

The first half of the mantra was covered in the previous post.  This post finishes the exploration by demonstrating how the additive terms arise due to the spatial variations of the basis vectors. 

The first step involves writing the position vector in terms of the polar coordinates and the cartesian unit basis vectors

\[ {\vec r} = r \cos \theta {\hat x} + r \sin \theta {\hat y} \; .\]

The polar unit basis vectors are defined by taking the derivatives of the position vector with respect to the polar coordinates and then unitizing.  The radial basis vector (not unitized) is

\[ {\vec e}_r \equiv \frac{\partial {\vec r}}{\partial r} = \cos \theta {\hat x} + \sin \theta {\hat y} \; .\]

Conveniently, this vector has a unit length and we can immediately write the radial unit basis vector as

\[ {\hat r} = \cos \theta {\hat x} + \sin \theta {\hat y} \; . \]

Following the same procedure, the polar angle basis vector (not unitized) is

\[ {\vec e}_\theta \equiv \frac{\partial {\vec r}}{\partial \theta} = -r \sin \theta {\hat x} + r \cos \theta {\hat y} \; . \]

This vector has length $r$ and so the polar angle unit base vector is

\[ {\hat \theta} = -\sin \theta {\hat x} + \cos \theta {\hat y}  \; .\]

Both vectors are independent of $r$ but do depend on $\theta$ and their variations are

\[ \partial_\theta {\hat r} = {\hat \theta} \; \]

and

\[ \partial_\theta {\hat \theta} = – {\hat r} \; . \]

At this point we have all the ingredients we need.  From the first part of the curvilinear mantra we have the velocity in polar coordinates is

\[ {\vec V} = V_r {\hat r} + \frac{V_\theta}{r} {\hat \theta} \;  \]

and the material (or total) time derivative is

\[ \frac{D}{Dt} = V_r \partial_r + \frac{V_\theta}{r} \partial_\theta + \partial_t \; , \]

where the scale factors on the polar angle terms are due to minding units.

Applying the material time derivative to the velocity gives

\[ \frac{D {\vec V}}{Dt} = \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta + \partial_t \right) \left( V_r {\hat r} + \frac{V_\theta}{r} {\hat \theta} \right) \; . \]

Expanding this expression term-by-term yields

\[ V_r \partial_r \left( V_r {\hat r} \right) + \frac{V_\theta}{r} \partial_\theta \left( V_\theta {\hat \theta} \right) + \frac{V_\theta}{r} \partial_\theta \left( V_r {\hat r} \right) + \left(\partial_t V_r \right) {\hat r} + \left( \partial_t V_\theta \right) {\hat \theta} \; . \]

Expanding the derivatives, taking care to evaluate the spatial derivatives of the unit basis vectors, yields

\[ V_r \partial_r V_r {\hat r} + \frac{V_\theta}{r} \left( \partial_\theta V_\theta \right) {\hat \theta} – \frac{{V_\theta}^2}{r} {\hat r} + \left( \frac{V_\theta}{r} \partial_\theta V_r \right) {\hat r} + \\ \frac{V_\theta V_r}{r} {\hat \theta}  + \left(\partial_t V_r \right) {\hat r} + \left( \partial_t V_\theta \right) {\hat \theta} \; . \]

Collecting terms gives the radial term as

\[ V_r \partial_r V_r + \frac{V_\theta}{r} \partial_\theta V_r – \frac{{V_\theta}^2}{r} + \partial_t V_r \; \]

and the polar angle term as

\[  \frac{V_\theta}{r} \partial_\theta V_\theta  + \frac{V_\theta}{r} \partial_\theta V_\theta + \frac{V_\theta V_r}{r} + \partial_t V_\theta \; .\]

Factoring the terms yields

\[ \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta + \partial_t \right) V_r – \frac{{V_\theta}^2}{r} \; \]

and

\[ \left( V_r \partial_r + \frac{V_\theta}{r} \partial_\theta + \partial_t \right) V_\theta + \frac{V_\theta V_r}{r} \; .\]

Happily, these expressions match term-for-term the textbook (up to multiplication by $\rho$).  This shows the accuracy and power of the curvilinear mantra.  Hopefully it will catch on in classrooms.

A Curvilinear Mantra – Part 1

These next two posts are a bit of a departure from the thermal physics theme that had been the central point for the last many months.  They grew out of some discussions on classical field theory that arose in several venues with different people and it seemed important to capture what is a clean (and perhaps new) argument for the beginning student on the best way to transform differential equations into curvilinear coordinates.

The starting point is the recasting of the Euler equation for an ideal fluid (typically a gas)

\[ \rho \frac{D {\vec V}}{Dt} = -{\vec \nabla p} + {\vec f} \; , \]

where $\rho$ and $p$ are the mass density and pressure of the fluid, ${\vec V}$ is its velocity, $\frac{D}{Dt}$ is the material derivative, and ${\vec f}$ is the body force per unit mass.

Typically, within basic discussions of fluid mechanics, Euler’s equation is expressed in Cartesian coordinates (assumed here, without loss of generality to the method, to cover a two dimensional space) where the velocity is given by

\[ {\vec V} = V_x {\hat x} + V_y {\hat y} \; ,\]

the material derivative takes on the simple form

\[ \frac{D}{Dt} = V_x \partial_x + V_y \partial_y + \partial_t \; ,\]

and Euler’s equation, in component form is

\[ \rho \frac{D}{Dt} V_x = -\partial_x p + f_x \; ,\]

and

\[ \rho \frac{D}{Dt} V_y  = -\partial_y p + f_y \; .\]

This relatively, simple form allows the student to focus on the Lagrangian nature of following a fluid flow but it typical hides a subtle complication when using curvilinear (or even rotating coordinates).  For example, the corresponding version of Euler’s equations in cylindrical coordinates (see also Acheson’s Appendix A.6) uses

\[ \frac{D}{Dt} = \partial_t + V_r \partial_r + \frac{V_{\theta}}{r} \partial_{\theta} \; \]

for the material derivative with the component equations being

\[ \rho \frac{D}{Dt} V_r – \frac{{V_{\theta}}^2}{r^2} = – \partial_r p + f_r \; \]

and

\[ \rho \frac{D}{Dt} V_{\theta} + \frac{V_r V_{\theta}}{r} = -\frac{1}{r} \partial_{\theta} p + f_{\theta} \; . \]

Suddenly there are new multiplicative terms (e.g. the $1/r$ multiplying the derivative with respect to the polar angle $\theta$) as well as additive terms on the left-hand side of the component equations (e.g. $-\frac{{V_{\theta}}^2}{r^2}$) that weren’t there in the Cartesian version.  The student is left to wonder about just why they are there.

Many books and lecture notes on the internet try to justify one or the other (but rarely both) with varying degrees of success.  The aim of this note is to suggest a simple mantra:  the multiplicative terms are strictly the result of minding units and the additive terms are strictly the result of the curvilinear basis vectors changing from point to point. 

The strategy behind the mantra is that even if the students don’t fully connect all the dots the first few times, they will have an explanation that is rock solid and easy to remember to guide them in exploring on their own. 

Let’s examine each of these claims in turn. 

The first claim of the mantra is that the multiplication of the $\partial_{\theta}$ term by $1/r$ is the result of minding units.  Of the two claims of the mantra, this one is the more conceptually difficult even though it is the easier of the two claims to understand mathematically.  The conceptual hurdle is rooted in the arguments used to define it in terms of the partial derivatives of a scalar field, $f(x,y,t)$, expressed in Cartesian coordinates

\[ df = \partial_x f dx + \partial_y f dy + \partial_t f dt \; .\]

Dividing by $dt$ immediately gives the Cartesian form the material derivative

\[ \frac{Df}{Dt} = V_x \partial_x f + V_y \partial_y f + \partial_t f \; .\]

The student then asks why doesn’t a similar relationship hold for curvilinear coordinates.  For example, why isn’t the material derivative in cylindrical coordinates not based on the differential of $g(r,\theta,t)$

\[ dg = \partial_r g dr + \partial_\theta g d\theta + \partial_t g dt \; ?\]

This is a point most often most clearly discussed within the realm of continuum mechanics or general relativity.  Schutz, in his book A First Course in General Relativity, notes, in Section 5.5, that defining the gradient of $g$ essentially in terms of the differential given above is perfectly acceptable but that the price paid for using it is that the basis vectors that are not normalized, which he summarizes with the equation

\[ {\vec e}_{\alpha} \cdot {\vec e}_{\beta} = g_{\alpha \beta} \neq \delta_{\alpha \beta} \; .\]

While this is certainly true and quite clearly argued, the beginning student consulting Schutz (or some similar text) as a reference has to know either the definition of the metric or the difference between vectors and differential forms and the natural duality between them.  In the first case, they need to know that the metric encodes all of the possible dot products between the basis vectors.  In the second, they are confronted with notation that expresses the duality between basis forms and vectors in the coordinate version as

\[ \left<d\theta, \partial_{\theta} \right> = 1 \; \]

and in the non-coordinate version as

\[ \left< {\tilde \omega}^{\hat \theta}, {\vec e}_{\hat \theta} \right> = 1 \; .\]

These mathematical distinctions are quite beyond the beginning student who, by definition, is struggling with a host of other things.

A cleaner way of justifying the first point of the mantra is to perform a unit analysis on the differential $dg$.  It doesn’t matter what units $g$ possesses but for the sake of this argument let’s assume $g$ has units of temperature.  The idea of a temperature field is familiar and the units are well known.  We will denote the units of a physical quantity by square brackets so that in this case $[g] = T$.

The differential must also have units of temperature which means that the partial derivatives have mixed units.  The partial derivative with respect to the radius $r$ has units of temperature per length

\[ \left[ \partial_r g \right] = T/L \; \]

while the partial derivative with respect to the azimuth $\theta$ has units of temperature

\[ \left[ \partial_{\theta} g \right] = T \; .\]

Dividing by $dt$ gives a material derivative of the form

\[ \frac{Dg}{Dt} = V_r \partial_r g + U_{\theta} \partial_{\theta} g + \partial_t g \; .\]

The units on the radial velocity $V_r \equiv dr/dt$ are length per unit time as we expect of a conventional derivative but the units on the azimuthal velocity $U_{\theta} \equiv d\theta/dt$ are radians per unit time, which are quite different (hence the use of the letter $U$ in place of $V$).  The next step is to challenge the student to think about how any lab would measure this angular velocity and to then argue that a much better way to link to experiments is to multiply $U_{\theta}$ by the radius $r$. 

Once this step is done, the remaining piece involves rewriting the differential as

\[ dg = dr \partial_r g + (r d\theta) (\frac{1}{r} \partial_{\theta} g) + dt \partial_t g \; , \]

where we’ve multiplied the second term by unity in the form of $r/r$.  Dividing by $dt$ immediately gives

\[ \frac{Dg}{Dt} = V_r \partial_r g + V_{\theta} \frac{1}{r} \partial_{\theta} g + \partial_t g \; , \]

which is the accepted form of the material derivative.

Next post will cover the second part of the mantra by showing that the additive terms result from how the basis vectors in curvilinear components change from point-to-point in space.

A Binomial Gas

The last installment discussed Robert Swendsen’s critique of the common, and in his analysis, erroneous method of understanding the entropy of a classical gas of distinguishable particles.  As discussed in that post, his aim in making this analysis is to persuade the physics community to re-examine its understanding of entropy and to rediscover Boltzmann’s fundamental definition based on probability and not on phase space volume.  To quote some of Swendsen closing words:

Although the identification of the entropy with the logarithm of a volume in phase space did originate with Boltzmann, it was only a special case. Boltzmann’s fundamental definition of the entropy in his 1877 paper has none of the shortcomings resulting from applying an equation for a special case beyond its range of validity.

On the question of how this special case blossomed into textbook dogma we will have to content ourselves with speculations.  It seems likely that the passion by which quantum mechanics gripped the physics community made it attractive to view the entire world through the lens of indistinguishable particles.  Furthermore, quantum mechanics also elevated the concept of phase space since various dimensions could be viewed as canonically conjugate variables subject to the uncertainty principle.  So, it is plausible that the physics community, dazzled by this new theory of the subatomic, latched onto the special case and ignored Boltzmann’s fundamental definition.  If true, this would be incredibly ironic since the key focus of Boltzmann was on probability which is arguably the most shocking and intriguing aspect of quantum mechanics.

Regardless of these finer points of physics history, since the concept of probability is key in deriving the correct formula for a classical distinguishable gas let’s focus on the toy example Swendsen provides in order to illustrate his point.  As in the last post, we will assume that the average energy per particle $

If we imagine a system with $N$ total distinguishable particles distributed between a volume $V$ partitioned into sub-volumes $V_1$ and $V_2$ then the probability $P(N_1,N_2)$ of having $N_1$ particles in $V_1$ and $N_2 = N – N_1$ in $V_2 = V – V_1$ is given by the binomial distribution

\[ P(N_1,N_2) = \left( \begin{array}{c} N \\ N_1 \end{array} \right) p^{N_1} (1-p)^{N_2} \; ,\]

where  $p$ is the probability of being found in $V_1$ (i.e. a ‘success’).  Since there are no constraints forcing particles to accumulate in any one section compared to the others they will distribute randomly within the entire domain.  Therefore, $p = V_1/V$ and the probability is given by

\[ P(N_1,N_2) = \left( \frac{N!}{N_1! N_2!} \right) \left( \frac{V_1}{V} \right)^{N_1} \left( \frac{V_2}{V} \right)^{N_2} \; .\]

This expression is Swendsen’s launching point for deriving the correct expression for a classical gas of distinguishable particles.  But before continuing with the analysis it is worth taking a few moments to better understand the physical content of that expression (even for those you understand the binomial distribution well). 

There is a very compact way to make a Monte Carlo simulation of this thought experiment using the python ecosystem.  One starts by defining a random realization of the classical gas particles placed within the volume and then reporting out the macroscopic thermodynamic state. 

def particles_in_a_box(V1,N,V):
    import numpy as np

    #get random positions of the particles
    pos = np.random.random(N)

    #count number in subvolume V1
    threshold = V1/V
    return len(np.where(pos<threshold)[0])

In this context, the macroscopic thermodynamic state is a measure of how many particles are found in the sub-volume $V_1$.  This is a critical point, particularly in light of the quantum interpretation that so many have embraced: two thermodynamic states can be identical without the underlying microstates being the same.  For example, if $N=3$ and $N_1=2$ then each of the following lists results in the same thermodynamic state:

  • [True,True,False]
  • [True,False,True]
  • [False,True,True]

where True and False result from the call to the numpy.where function and indicate whether the particle is found within $V_1$ (True) or not (False).

To get the probabilities, one makes an ensemble of such systems and this is what the following function does

def generate_MC_estimate(V1,N,V,num_trials):
    import numpy as np
    results = np.zeros(num_trials)
    for i in range(num_trials):
        results[i] = particles_in_a_box(V1,N,V)
    return results

The following plot shows how well the empirical results for an ensemble with 100,000 realizations agree with the formula derived above for a simulation of 2000 particles placed in a box where $V_1 = 0.3 V$.

Following Boltzmann, the entropy is

\[ S = k \ln P + C = k \ln \left[ \left( \frac{N!}{V^{N_1}V^{N_2}}\right) \left( \frac{{V_1}^{N_1}}{N_1 !} \right) \left( \frac{{V_2}^{N_2}}{N_2 !} \right) + C \right] \; ,\]

where the previous expression has been grouped into parts dealing with the entire subsystem $(N,V)$, the first sub-volume $(N_1,V_1)$, and the second subsystem $(N_2,V_2)$.  The constant $C$ depends only the whole system $N$ and $V$ but not on the subdivisions and, for reasons that should become obvious, we will take it to be

\[ C = – k \ln \left( \frac{{V}^{N}}{N !} \right) \; . \]

We first expand the entropy expression along this grouping to get

\[ S = k \ln \left( \frac{N!}{{V}^{N}} \right) + k \ln \left( \frac{{V_1}^{N_1}}{N_1 !} \right) + k \ln \left( \frac{{V_2}^{N_2}}{N_2 !} \right) –  k \ln \left( \frac{{V}^{N}}{N !} \right) \; .\]

The first and last terms are inverses of each other and, under the action of the logarithm, cancel, leaving

\[ S = k \ln \left( \frac{{V_1}^{N_1}}{N_1 !} \right) + k \ln \left( \frac{{V_2}^{N_2}}{N_2 !} \right) \; .\]

As the whole is a sum of the parts, this expression is clearly extensive.

The final step is the application of Sterlings approximation ($\ln n! \approx  n \ln n – n$).  To keep things clear, we will apply it to terms of the form

\[ S = k \ln \left( \frac{V^N}{N!} \right) \; \]

to get

\[ S = k \left( \ln V^N – \ln N! \right) = k \left( N \ln V – N \ln N – N \right) = k N \left( \ln \frac{V}{N} – 1 \right) \; , \]

which clearly shows that $S$ scales linearly with the system size (at least in the thermodynamic limit).

All told, Swendsen argues persuasively that the correct interpretation of the entropy is that it is always proportional to the logarithm of the probability, that the ‘traditional’ expression depending on the volume of phase is a special case of the larger rule, and that by misapplying this special case large numbers of physicists have taught or have been taught incorrectly for decades.  So much for the ideas of settled science.

Of Milk and Entropy

Last month’s column teased the idea that there was a challenge to the common wisdom that states that the reason that the traditional (T) expression for the entropy of a classical (C) gas of distinguishable (D) particles, given by

\[ S_{TCD} = k N \left[ \ln V + \frac{3}{2} \ln \frac{E}{N}  + X \right] \; , \]

where  $X$ is some constant, fails to be extensive is that classical mechanics overcounts the number of possible configurations.  A division of the partition function by $N!$ yields an extensive expression

\[ S_{T} = k N \left[ \ln \frac{V}{N} + \frac{3}{2} \ln \frac{E}{N} + X \right] \; , \]

as the ‘correct one’ and the conclusion is a philosophical one: there is no escaping quantum statistics; all gases are made up of indistinguishable (I) particles.

This conclusion seems ably rebutted by a paper entitled Statistical mechanics of colloids and Boltzmann’s definition of the entropy, by Robert H. Swendsen in 2006 in the American Journal of Physics.  Swendsen’s argument centers on looking at whole milk – the kind you can buy in any supermarket or convenience store.

While I must confess that even though I purchase whole milk regularly and was well aware that the term ‘homogenized’ attached to it, I never really bothered to understand just what was made homogeneous.  The basic notion is that the homogenized milk is a colloid with tiny fat and protein globules (Swendsen states characteristic sizes of the fat globules of ~0.5 microns) separated in a water medium.  Whole milk, which is 4

There are two key assumptions that Swendsen makes at the core of his analysis of whole milk as a classical colloid:

  1. The globules are distinguishable
  2. The globules constitute a gas

That the globules are distinguishable is strongly supported by the fact that at a diameter of ~0.5 microns, there are approximately $10^{9}$ atoms (give or take an order of magnitude) contained within each globule, and, so, it would be extremely unlikely that any two globules would contain exactly the same number of atoms.  The odds of finding identical globules drops many orders of magnitude more once one considers that each globule will contain some amount of foreign contaminants so that both the composition and the number of atoms found within any given globule will likely be unique and thus each globule will be microscopically distinguishable.

That the globules can be model as an ideal gas takes a bit more thought.  The key features of an ideal gas are that it is a collection of similar objects that only interact with each other over a very short range and that the time between interactions is large compared to the duration of the interaction.  Despite the fact that the globules are suspended in water, a substance which continuously jostles the individual globules, doesn’t alter the fact that they interact with other fat globules through a short-range electrostatic repulsion only occasionally (on the order of 4

With Swendsen’s two assumptions well-supported, we are now equipped to argue against the conclusion that quantum mechanics is inescapable.  Here we have a gas of distinguishable particles, all much larger than an atom so that quantum statistics can hold no sway, for which the traditional expression for entropy predicts startlingly wrong conclusions.  One, we’ve already encountered in the Gibbs paradox discussion in the last post.  The other, which is a variation, also deals with mixing and goes something like this.

Imagine that we divide a tank of total volume into two subdivisions $V  = V_1 + V_2$ subject to the constraint $V_1 > V_2$.  The larger sub-volume $V_1$ is filled with whole milk and the smaller sub-volume $V_2$ is filled with skim milk (completely devoid of fat globules).  Let $N$ be the total number of fat globules in the system, which are initially contained in $V_1$ and $E = E_1$ be their total energy.  For simplicity, we can also assume that the average energy per particle $\epsilon$ remains fixed (no heat transfer and no work done).  The initial entropy of the system given by the traditional formula is

\[ S_{TCD,initial} = k N \left[ \ln V_1 + \frac{3}{2} \ln \epsilon + X \right] \; . \]

We then imagine opening a small port for the two systems to mix and then closing it.  The final entropy is

\[ S_{TCD,final} = k N_1 \left[ \ln V_1 + \frac{3}{2} \ln \epsilon + X \right] + k N_2 \left[ \ln V_2 + \frac{3}{2} \ln \epsilon + X \right] \; . \]

The difference in the entropy is then

\[ \Delta S_{TCD} = k N_1 \ln V_1 + k N_2 \ln V_2 – k N \ln V_1 = k N_2 \left( \ln V_2 – \ln V_1 \right) = k N_2 \ln \left( \frac{V_2}{V_1} \right) \; . \]

And here is the problem: given that by construction $V2 < V1$ the entropy is always negative, even though mixing is an irreversible process; it takes work to restore the system to its ‘before’ state (macroscopically that all the globules are back in the larger volume if not precisely at the same microstate of positions ${\vec r}_i$ and velocities ${\vec v}_i$.

This second inconsistency (and likely there are others) further emphasizes that the classical expression is deeply flawed and, of course, we already knew this.  But we can’t resort to quantum mechanics to come in save the day, as was done with simpler gases, since this approach is also deeply flawed as each object in this system is also distinguishable.

Swendsen resolves this problem by arguing that the methodology that led to the classical expression is wrong because it makes entropy related to the volume in phase space and not to the probability.  To quote Swendsen:

Oddly enough, Boltzmann would not have encountered these problems, because he would not have used Eq.1 [for $S_{TCD}$]. He wrote the entropy (in modern notation) as

\[ S_{dist} = kN \left[ \ln \frac{V}{N} + \frac{3}{2} \ln \frac{E}{N} + X + 1\right] \; .\] If we use [this equation], the entropy remains constant in the first experiment when the wall between the two subvolumes of milk is either removed or reinserted, as is appropriate for a reversible process. For the second experiment, it is easy to show that $S_{dist,total}$ is always positive; the entropy increases as it must for an irreversible process.

Next blog will delve into this question about probability a bit more to show how Monte Carlo simulations dealing with microstates dovetail with entropy and these observations.

Gibbs Paradox

This month’s column builds upon the basic building blocks from last month, namely that despite the seemingly simple presentation that most textbooks afford to the idea of entropy there is an enormous amount of subtly and nuance in an idea that is well over a hundred years old.  As discussed in that earlier post, Robert Swendsen argues in his 2011 article How physicists disagree on the meaning of entropy (American Journal of Physics 79, 342), the primary area where things seem to break down is that different people presuppose an implicit set of assumptions not necessarily shared by anyone else.  To quote Swendsen

When people discuss the foundations of statistical mechanics, the justification of thermodynamics, or the meaning of entropy, they tend to assume that the basic principles they hold are shared by others.  These principles often go unspoken, because they are regarded as obvious. It has occurred to me that it might be good to restart the discussion of these issues by stating basic assumptions clearly and explicitly, no matter how obvious they might seem.

The one area that has triggered this realization was his recent work on (and subsequent debate over) the Gibbs paradox.

The Gibbs Paradox,  named after Josiah Willard Gibbs, is the derivation from classical statistical mechanics which leads to an entropy expression for the ideal gas that is not extensive.  The expectation that entropy is extensive amounts to saying that one expects to see that the entropy of a system doubles when the system itself doubles in size (keeping all other things equal).  Since the ideal gas is the standard textbook example of a nontrivial collection of matter perfectly designed for understanding thermodynamics finding a result that flies in the face of this expectation casts doubt on the underpinnings of statistical physics.  The usual way that this doubt is remedied is to patch up the classical analysis by appealing to quantum mechanics and the indistinguishability of particles.  The concept of indistinguishability among the particles is, of course, lies at the heart of the Fermi-Dirac and the Bose-Einstein statistics for fermions and boson, respectively.  The idea basically being that there are no ways of labeling, of painting, of hanging a number on individual particles and, therefore, that our basic ignorance must be included in the way we use in statistical mechanics.

Specifically, the classical analysis of an ideal gas made of distinguishable particles (using what Swendsen calls the traditional definition of entropy) leads to the following expression for the entropy (‘CD’ = classical, distinguishable)

\[  S_{CD} = k N \left[ \ln V + \frac{3}{2} \ln \frac{E}{N} + X \right] \; , \]

where $$X$$ is some constant.  The objection is that expression is not extensive due to the $$\ln V$$ term in brackets.  For example, scaling the system by some overall factor $$\alpha$$ ($$N \rightarrow \alpha N$$, $$E \rightarrow \alpha E$$, and $$V \rightarrow \alpha V$$  gives an entropy of

\[  S_{CD,\alpha} = k \alpha N \left[ \ln ( \alpha V)  + \frac{3}{2} \ln \frac{E}{N} + X \right] \; , \]

which simplifies to

\[ S_{CD,\alpha} = \alpha S_{CD} + k N  \alpha \ln \alpha \; . \]

On the surface, the lack of extensivity might not seem alarming but consider the following composite system comprised of two tanks of an identical gas placed side-by-side.  Each collection has the same density and average energy per particle and each has the same volume.  Further suppose that there is a sliding panel at the interface between the tanks.  By removing the partition, the tank size now doubles (i.e. $$\alpha = 2$$) and the entropy change is

\[S_{CD,new} – S_{CD,old} = 2 S_{CD} + 2\ln 2 k N – 2 S_{CD} = 2 \ln 2 k N \; . \]

At this point, Gibbs notes that something is quite wrong.  The removal of the partition is a reversible process (since the gas is thermodynamically the same on both sides the presence or absence of the partition shouldn’t make a difference), meaning that the entropy should not increase at all. 

The remedy found in most textbooks (e.g. Fundamentals of Statistical and Thermal Physics by Reif from which the following quoted expressions come) starts by arguing that when we remove the partition and allow the gas molecules in one tank to mix with those in another we are implicitly assuming them “individually distinguishable, as though interchanging the positions of two like molecules would lead to a physically distinct state of the gas.”   The argument concludes by directing us to correct for the overcounting that “taking classical mechanics too seriously” has foisted upon us.  The correction for over-counting involves dividing a term earlier in the derivation (the partition function) by $$N!$$, which corrects the entropy (now adapted to indistinguishable particles hence the change from ‘D’ to ‘I’) to read

\[ S_{CI} = k N \left[ \ln \frac{V}{N} + \frac{3}{2} \ln \frac{E}{N} + X’ \right] \; , \]

which is obviously extensive with the equally obvious implication that the problem is solved and nothing more needed to be done.

Here the story seems to have staled for some long period of time (decades), most likely due to the belief that quantum mechanics was the correct viewpoint (or at least more correct) for the world at large.  It seems to have been only fairly recent that a revived interest in putting classical statistical mechanics on firmer footing arose.  The result of this new effort has been the rediscovery of an old definition of entropy that Swendsen, who has been championing this viewpoint for nearly two decade, argues leads to more sensible results than a simple reflexive appeal to quantum mechanics.  And his most compelling argument to support this revived viewpoint is a substance that is likely to surprise:  simple homogenized milk.  However, that story, in all its glory, will have to wait until next month’s column.

An Invitation to Entropy

The subject matter over the last few months has touched upon thermodynamics in a variety of guises.  For example, the concept of enthalpy and isentropic flow has played a key role in compressible fluid flow.  In the posts discussing the Maxwell relations, the thermodynamics square and the classic relationships between second order partial derivatives were the main tools used to eliminate pesky terms involving the entropy in favor of quantities easier to measure in the lab.

It seems that it is now prudent to put down a few notions about entropy itself.  No other physical quantity, with the possible exception of energy, is as ubiquitously used as entropy and none is as poorly understood as entropy.  Indeed, in his 2011 article entitled How physicists disagree on the meaning of entropy, Robert Swendsen starts with the quotation from von Neuman “nobody understands entropy”.  Chemists use entropy to determine the direction of chemical reactions, physicists use it when looking at matter in motion (e.g. compressible gas within a cylinder), electrical engineers use it when characterizing information loss on channel, the amount software can compress an depends on its entropy, and so on.

Entropy seems to be a Swiss army knife concept with lots of different built-in gadgets that can be pulled out and used on a moment’s notice.  It’s no wonder that such multi-faceted idea is not only poorly understood but also gives rise to radically contradictory notions.  For example, Swendsen starts his article with the following list of 18 properties that he has seen or heard attributed to entropy:

  • The theory of probability has nothing to do with statistical mechanics.
  • The theory of probability is the basis of statistical mechanics.
  • The entropy of an ideal classical gas of distinguishable particles is not extensive.
  • The entropy of an ideal classical gas of distinguishable particles is extensive.
  • The properties of macroscopic classical systems with distinguishable and indistinguishable particles are different.
  • The properties of macroscopic classical systems with distinguishable and indistinguishable particles are the same.
  • The entropy of a classical ideal gas of distinguishable particles is not additive.
  • The entropy of a classical ideal gas of distinguishable particles is additive.
  • Boltzmann defined the entropy of a classical system by the logarithm of a volume in phase space.
  • Boltzmann did not define the entropy by the logarithm of a volume in phase space.
  • The symbol W in the equation S=k log W, which is inscribed on Boltzmann’s tombstone, refers to a volume in phase space.
  • The symbol W in the equation S=k log W, which is inscribed on Boltzmann’s tombstone, refers to the German word “Wahrscheinlichkeit” (probability).
  • The entropy should be defined in terms of the properties of an isolated system
  • The entropy should be defined in terms of the properties of a composite system.
  • Thermodynamics is only valid in the “thermodynamic limit,” that is, in the limit of infinite system size.
  • Thermodynamics is valid for finite systems.
  • Extensivity is essential to thermodynamics.
  • Extensivity is not essential to thermodynamics.

This list, which is really a list of 9 pairs of contradictory statements about entropy, goes out of its way to show just how many diverging ideas scientists have about entropy.  And since it is trendy to have one’s own pet idea(s) about this fundamental concept, it seems about time, that I get my own and that it is the aim of this blog and the ones that follow.  As a warm up to a deeper dive, I decided to return to the basic ideas introduced in Halliday and Resnick physics. 

The most intriguing aspect of the textbook discussion of entropy is that it is a state variable, that is to say, its value depends only on what the system is doing at any given time and not how the system got there.  This is a key concept because it means that we are relieved in trying to find the particular path through which the system evolved.

What is particularly remarkable about this discovery is that it came about in the 19th century.  This was the time in which the idea of smooth distributions of matter held the day.  When the primary concept was that of a field, continuous in every way.  A time well before the concept of discrete, microscopic states emerged from an understanding of the quantum mechanics of atoms, molecules, and other substances. 

The thermodynamic relationship for entropy reads

\[ S_f – S_i = \int_{i}^{f} \frac{dQ}{T} \; , \]

where any path connecting the initial state (denoted by $$i$$) with the final state (denoted by $$f$$) will do.  Nowhere in this definition can one find any clear signpost to indicate lumpy matter or the concept of the discrete.  In addition, nothing in this definition even hints at a particular substance or class of them; nor is a particular phase of matter required.  A breathtaking sweep of generality is hidden behind a few simple glyphs on a page. 

As an example of the universality of the fundamental statement consider a familiar household system, say a glass of milk.  If we do something prosaic like warm it by 10 degrees Celsius we arrive at the same entropy change as we would have if we had boiled the milk off into a vapor, melted the glass down, reconstituted the latter and recondensed the former.  No matter what bizarre journey we subject a material to, the resulting change in entropy will simply depend on the initial and final configurations and not on the details connecting one to the other.

The usual playground for first thinking about entropy is the ideal gas and the usual example given the student is the computation of the entropy change of the free expansion of a gas.  The context of this discussion usually follows upon the heels of an introduction to the kinetic theory of gases – a theory that presupposes the existence of atoms.  The free expansion of gas is, perhaps, the most radical of all irreversible processes.  There is no orderly flow, the very concept of a continuum fails to apply; every atom goes its own way and no macroscopic evolution of thermodynamic state can even be imagined. 

And yet, almost blithely, textbooks argue the ease at which the entropy change in such a process can calculated.  The argument goes as follows.  From the kinetic theory of gases, one can show that during a free expansion, the internal energy does not change.  The reason for this is that the gas does no work (that is what ‘free’ really means) and the process happens fast enough that no heat is transferred in or out.  Since the change in internal energy is given by

\[ \Delta U = n C_V \Delta T \; \]

any ideal gas process that doesn’t change the internal energy also leaves the temperature unchanged.  The matching thermodynamic process, where reversibility and equilibrium are maintained at all times is the isothermal expansion. 

The first law

\[ dU = dQ – dW \; \]

can be specialized to any reversible ideal gas process, to yield

\[ n C_V dT = dQ – p dV = dQ – \frac{n R T}{V} dV \; .\]

Solving for $$dQ/T$$ gives

\[ \frac{dQ}{T} = n R \frac{dV}{V} + n C_V \frac{dT}{T} \; .\]

Integrating both sides from the initial to final state gives

\[ S_f – S_I = \int_i^{f} \frac{dQ}{T} = n R \ln \left( \frac{V_f}{V_i} \right) + n C_V \ln \left( \frac{T_f}{T_i} \right) \; .\]

This simplifies for an isothermal process to be

\[ S_f – S_i = n R \ln \left( \frac{V_f}{V_i} \right) \; .\]

So, the change in entropy for a free expansion must be exactly equal to the expression above even though, as the free expansion occurs, there is a complete absence of anything resemble a volume.  This is a subtle result that gets only more subtle when one reflects on the fact that statistical mechanics wasn’t available when the concept of entropy first appeared.

It is for this reason that the next several blogs will be looking at entropy.

Maxwell’s Relations in Action

Last month’s column introduced the notion of the thermodynamic square as a mnemonic for organizing certain second-order partial derivatives amongst the various thermodynamic potentials: the internal energy $$U$$, the Gibbs and Helmholtz free energies $$G$$ and $$F$$, and the enthalpy $$H$$.  As previously alluded to, physicist primarily use these relations (called the Maxwell Relations) to eliminate the terms difficult or impossible to measure experimentally in favor of parameters that are easily measured in the lab.

A practical example of the application of the Maxwell relations is the simplification of the ‘first’ and ‘second $$T dS$$ equations’ as listed in Exercise 7.3-1 on page 189 of Herbert B. Callen’s book Thermodynamics and an Introduction to Thermostatistics, 2nd edition.  The relevant physical properties in these equations are (Sec. 3.9 of Callen):

  • the number of particles $$N$$,
  • the differential heat $$dQ = T dS$$,
  • the heat capacity at constant volume $$c_V = \frac{T}{N} \left( \frac{\partial S }{\partial T} \right)_V= \frac{1}{N} \left(\frac{\partial Q }{\partial T}\right)_{V}$$,
  • the heat capacity at constant pressure $$c_P = \frac{T}{N} \left( \frac{\partial S }{\partial T} \right)_P = \frac{1}{N}\left(\frac{\partial Q}{\partial T}\right)_{P}$$,
  • the coefficient of thermal expansion $$\alpha = \frac{1}{V} \left(\frac{\partial V}{\partial T}\right)_P $$, and
  • the isothermal compressibility $$\kappa_T = – \frac{1}{V} \left( \frac{\partial V}{\partial P} \right)_T$$.

First $$TdS$$ Relation

The first relation we want to verify is

\[ T dS = N c_V dT + \frac{T \alpha}{\kappa_T} dV \; .\]

From the form of this equation, assume that the quantity in question is the entropy as a function of the temperature and volume $$S = S(T,V)$$.  Taking the first differential gives

\[ dS = \left(\frac{\partial S}{\partial T}\right)_{V} dT + \left(\frac{\partial S}{\partial V}\right)_{T} dV \; .\]

The first term is relatively easy to deal with in terms of the heat capacity at constant volume $$c_V$$:

\[ \left(\frac{\partial S}{\partial T}\right)_{V} = \frac{N c_V}{T} \; .\]

The second term requires a bit more work.  First use the Maxwell relation associated with the Helmholtz free energy $$F$$

to get

\[ –  \left(\frac{\partial S}{\partial V}\right)_{T} = – \left(\frac{\partial P}{\partial T}\right)_{V} \; .\]

Next use the second classical partial derivative identity

\[ \left(\frac{\partial P}{\partial T}\right)_{V} \left(\frac{\partial T}{\partial V}\right)_{P} \left(\frac{\partial V}{\partial P}\right)_{T} = – 1 \; ,\]

and solve for

\[ \left(\frac{\partial P}{\partial T}\right)_{V} = \frac{-1}{\left(\frac{\partial V}{\partial P}\right)_{T} \left(\frac{\partial T}{\partial V}\right)_{P} } \; . \]

Use another of the classical partial derivative identities to move the $$\left(\frac{\partial T}{\partial V}\right)_{P}$$ to the numerator to get

\[ \left(\frac{\partial P}{\partial V}\right)_{T} = – \left(\frac{\partial V}{\partial T}\right)_{P} / \left(\frac{\partial V}{\partial P}\right)_{T} \; .\]

Multiply the numerator and denominator by $$1/V$$ and use the definitions of $$\alpha$$ and $$\kappa_T$$ to get

\[ \left(\frac{\partial P}{\partial V}\right)_{T} = \frac{\alpha}{\kappa_T} \; . \]

At this point, the first differential stands as

\[ dS = \frac{N c_V}{T} dT + \frac{\alpha}{\kappa_T} dV \; . \]

Multiplying each side by $$T$$ gets us to the final form of the first $$T dS$$ equation

\[ T dS = N c_V dT + \left(\frac{T \alpha}{\kappa_T} \right) dV \; .\]

Second $$TdS$$ Relation

The second relation is

\[ T dS = N c_P dT – T V \alpha dP \; . \]

From the form of this equation, assume that the quantity in question is the entropy as a function of the temperature and pressure $$S = S(T,P)$$.  As in the first $$T dS$$ relation, taking the first differential gives

\[ dS = \left(\frac{\partial S}{\partial T}\right)_{P} dT + \left(\frac{\partial S}{\partial P}\right)_{T} dV \; .\]

The first term, as in the case above, is also relatively easy to deal with in terms of the heat capacity, this time at constant volume, $$c_P$$:

\[ \left(\frac{\partial S}{\partial T}\right)_{P} = \frac{N c_P}{T} \; .\]

The second term only requires a Maxwell relation in terms of the Gibbs free energy

\[ -\left(\frac{\partial S}{\partial P}\right)_{T} = \left(\frac{\partial V}{\partial T}\right)_{P} \; .\]

The first differential becomes

\[ dS = \frac{N c_P}{T} dT –  \left(\frac{\partial V}{\partial T}\right)_{P} dP \; .\]

Multiplying the second term by $$V/V$$ and simplifying gives

\[ dS = \frac{N c_P}{T} dT –  V \alpha  dP \; ,\]

which becomes the desired relation when multiplying both sides by $$T$$

\[ T dS = N c_P dT – T V \alpha dP \; .\]

To summarize, these two relations show how to express the heat $$T dS$$, expressed in terms of the entropy (which cannot be measured) in terms of parameters, such as temperature, pressure, and heat capacity, that are easily measured in the lab.   All that is needed is the machinery of partial derivatives.  This observation is the reason that so many textbooks on Thermodynamics have specific sections devoted to these approaches.

Thermodynamic Partials – Maxwell’s Relation

Last month’s installment presented a clean derivation of the classic relations between partial derivatives and showed a simple example of how they work in the concrete.  As nice as that presentation is, the real power of these relations is only realized when dealing with systems with a large number of variables in play and in which various manipulations are required to extract meaning from the systems involved.  The prototypical example is classical thermodynamics.   

The fundamental concept in thermodynamics is the existence of a thermodynamic potential, which is a scalar function that encodes the state of the thermodynamic system in terms of the measurable quantities that describe the system, such as volume or temperature.  Changes in the values, these independent physical variables (sometimes called the natural variables), relate directly to the potential of the corresponding partial derivatives. 

The textbook example of this type of relation is defined in the first law of thermodynamics, which asserts that there exists a function, called the internal energy $$U$$, that is a function of the entropy $$S$$, the volume $$V$$, and the number of particles $$N$$ making up the physical system being modeled (assuming a single type of substance; generalizations to multiple species is straightforward but cumbersome).  Changes in the internal energy $$U$$ can be calculated by

\[ dU = T dS – P dV + \mu dN \; ,\]

where the temperature, pressure and chemical potential are defined as

\[ {T} = \left( \frac{\partial U}{\partial S} \right)_{V,N} \; , \]

\[ {P} = -\left( \frac{\partial U}{\partial V} \right)_{S,N} \; , \]

and

\[ {\mu} = \left( \frac{\partial U}{\partial N} \right)_{S,V} \; , \]

respectively.

Without dwelling on the theory, suffice it to stay that laboratory conditions vary and there are many circumstances where it is preferable to work with a different set of independent variables.  For example, heating water on a stove top in an uncovered pan is better understood in terms of fixed pressure rather than fixed volume. 

Thermodynamics supplies an approach for dealing with these cases using the Legendre transformation.  In the stove-top experiment mentioned above, the appropriate potential is called the enthalpy, defined as

\[ H = U + PV \; . \]

Taking the first differential gives

\[ dH = dU + P dV + V dP \\ = T dS – P dV + \mu dN + P dV + V dP \\ = T dS + V dP + \mu dN \; , \]

which demonstrates that $$H = H(S,V,N)$$.

Assuming that the order of differentiation can be exchanged, there are several relationships that exist (called Maxwell relations) between various partial derivatives.  For example:

\[ \left( \frac{\partial T}{\partial V} \right)_{S,N} = \left( \frac{\partial  }{\partial V } \left( \frac{\partial U}{\partial S} \right)_{V,N} \right)_{S,N} \\ = \left( \frac{\partial }{\partial S} \left( \frac{\partial U}{\partial V} \right)_{S,N} \right)_{V,N} \\ = -\left( \frac{\partial P}{\partial S} \right)_{V,N} \; . \]

Obviously, there is a lot of notational overhead in the above relation.  For the sake of this analysis, we will make two simplifications to improve the clarity.  First, we will assume a single species with a fixed number of moles.  This assumption removes the need to carry $$N$$ and $$\mu$$.  Second, we will forego keeping track of the other variables being held constant.  Since we will be tacitly tracking which thermodynamic potential is being used, there is little chance of confusion.

The primary purpose the Maxwell relations serve is to eliminate terms involving the entropy in favor of physical parameters that can be experimentally measured, such as temperature, volume, or pressure.

A useful mnemonic exists for looking up the various relations without scanning through a table.  It’s called the thermodynamic square, and it’s constructed as below

The 4 most important thermodynamic potentials, the Helmholtz free energy $$F = U – TS$$, the Gibbs free energy $$G = U + PV – TS$$, the enthalpy $$H = U + PV$$, and the internal energy $$U$$, are arranged along the edges starting up at the top and going clockwise.  The thermodynamic variables – the temperature $$T$$, the pressure $$P$$, the entropy $$S$$, and the volume $$V$$ – are arranged at the corners, starting just after $$F$$ and also going clockwise.  The Maxwell relations relate a partial derivative expressed in terms of three consecutive corners (shaded blue below) to the partial derivative expressed in terms another three consecutive corners (shaded yellow).  The arrows lying along the diagonal set the sign of the partial derivative based on which variable is in the numerator:  those with the arrowhead positive and those without negative. 

These rules are far easier to understand with the mixed partial derivatives of $$U$$ discussed above.

The blue shaded region reads counterclockwise while the yellow region reads clockwise.  These orientations follow from the overlap they share on the left-hand side of the square (the one labeled $$U$$).  The signs are determined by the arrows.  Since the $$T$$ corner is the numerator and it has an arrowhead, the partial derivative is positive.  Likewise, since the $$P$$ corner is the numerator and it lacks an arrowhead, the partial derivative is negative.

Once the basic operations using the square are understood, it is easier to present a single square with the common side being both shaded in blue and yellow, resulting in green.  An example for the Maxwell relations coming from the Helmholtz free energy $$F$$ being

With these tools, we can get to the meatier topics, namely using a combination of the classical rules for partial derivatives and the Maxwell relations, as presented in the thermodynamic square, to eliminate the entropy in favor of physically measurable quantities.  But this will be the topic for next month’s post.

Partial Derivatives – Completely Done

Partial derivatives are an important mathematical tool in a number of physics disciplines, most notably field theories (e.g. electricity & magnetism and general relativity) and in thermodynamics.

However, working with partial derivatives are always a bit tricky and teaching students about them is usually fraught with difficulties.  So it was to my pleasant surprise that I found a really nice discussion of how to derive the various ‘classic’ rules cleanly presented in Classical and Statistical Thermodynamics by Ashley H. Carter. 

My presentation here is strongly influenced and closely follows her presentation in Appendix A, although I’ve added on a bit in the theoretical flow and I’ve also provided explicit examples in terms of the standard paraboloid found in freshman calculus. 

Assume a function of 3 variables can be expressed as $$f(x,y,z) = 0$$.  This equation can be viewed as a constrain equation linking the values of the variables such that two variables are independent.  That means that we can (at least locally) solve for:

  • $$x(y,z) = 0$$
  • $$y(x,z) = 0$$
  • $$z(x,y) = 0$$

Focus on the first and second forms (other pairings will follow a simple relabeling of the variables).  The corresponding differentials are

\[ dx = \left( \frac{\partial x}{\partial y} \right)_z dy + \left( \frac{\partial x}{\partial z} \right)_y dz \; \]

and

\[ dy = \left( \frac{\partial y}{\partial x} \right)_z dx + \left( \frac{\partial y}{\partial z} \right)_x dz \; .\]

Now substitute the expansion of $$dy$$ into the expansion of $$dx$$

\[ dx = \left(\frac{\partial x}{\partial y}\right)_z \left[ \left(\frac{\partial y}{\partial x}\right)_z dx + \left(\frac{\partial y}{\partial z}\right)_x dz \right] + \left(\frac{\partial x}{\partial z}\right)_y dz \; ,\]

which simplifies to

\[ dx = \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial x}\right)_z dx + \left[ \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x + \left(\frac{\partial x}{\partial z}\right)_y \right] dz \; .\]

Putting it all together gives

\[ \left[ 1 – \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial x}\right)_z \right] dx – \left[ \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x + \left(\frac{\partial x}{\partial z}\right)_y \right] dz = 0 \; .\]

Since $$dx$$ and $$dz$$ are independent, each differential can be set to zero independently, giving one of the classic identities.

First set $$dz = 0$$ to get

\[ \left(\frac{\partial x}{\partial y}\right)_z = 1/ \left(\frac{\partial y}{\partial x}\right)_z \; , \]

which is called the reciprocal rule.

Next, setting $$dx = 0$$ yields

\[ \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x = \; – \; \left(\frac{\partial x}{\partial z}\right)_y \; , \]

which is called the fraction rule.

The manipulations are complete when using the reciprocal rule in the fraction rule and simplify to get

\[ \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x \left(\frac{\partial z}{\partial x}\right)_y = -1 \; , \]

which is called the cyclic rule.

Let’s take a look at these relationships in action. Consider the implicit definition of the paraboloid

\[ x^2 + y^2 – z = 0 \; . \]

As mentioned earlier, this equation can be considered as a constraint equation that selects out a value for any one of the three variables given the other two. In other words, we can imagine a look up table where we select a value of $$x$$ and $$y$$, we rummage through the table to find a row with both values and then we scan to the right to find the allowed value of $$z$$ that makes it satisfy the implicit equation.

How do you construct this table; not at a finite set of points but functionally so that it works at any point? It is natural and easy to determine $$z$$ given $$x$$ and $$y$$ by simply rewriting the implicit equation as

\[ z(x,y) = x^2 + y^2 \; .\]

However, it isn’t as easy to express $$x$$ or $$y$$ as functions of the remaining two variables because of the two possible signs that result from taking the square root. We need to have four functional relationships

\[ x_p(y,z) = \sqrt{ z \; – \; y^2 } \; ,\]

\[ x_n(y,z) = \; – \; \sqrt{z \; – \; y^2} \; , \]

\[ y_p(x,z) = \sqrt{z \; – \; x^2} \; , \]

and

\[ y_n(x,z) = \; – \; \sqrt{z \; – \; x^2} \; , \]

depending on the particular combination of whether $$x$$ is positive or negative and whether $$y$$ is also positive or negative.  In the language of differential geometry, we have a 5 charts in our atlas.

We are now in position to try the various relations derived above. For example, let’s examine the reciprocal relation in the first quadrant of the $$x$$-$$y$$ plane.  We need to use $$x_p$$ as our local chart.

\[ \left(\frac{\partial x_p}{\partial z}\right)_y = \frac{1}{2} \frac{1}{\sqrt{z – y^2}} \; \]

or once we recognize the denominator as $$x_p$$ 

\[\left(\frac{\partial x_p}{\partial z}\right)_y = \frac{1}{2 x_p} \; . \]

The ‘reciprocal’ partial derivative is

\[ \left( \frac{\partial z}{\partial x} \right)_y = 2 x = 2 x_p \; ,\]

where there is no need for the $$z$$-chart to distinguish between positive and negative values of $$x$$.  As expected the derivatives are reciprocals of each other.

Next, let’s test the fraction rule.  For fun, this time let’s test it in the 2nd quadrant in the $$x$$-$$y$$ plane ($$x < 0$$ and $$y > 0$$).  Calculating the partial derivatives on the left-hand side yields

\[ \left(\frac{\partial x_n }{\partial y_p } \right)_z = \frac{y_p}{\sqrt{z – y_p^2}} = \; – \; \frac{y_p}{x_n} \; \]

and

\[ \left(\frac{\partial y_p }{\partial z} \right)_{x_n} = \frac{1}{2\sqrt{z-x_n^2}} = \frac{1}{2 y_p} \; . \]

It is a simple matter to verify that

\[ \left(\frac{\partial x_n }{\partial y_p} \right)_z \left(\frac{\partial y_p }{\partial z} \right)_{x_n} = -\frac{1}{2 x_n} \; \]

is identical to 

\[ – \left(\frac{\partial x_n }{\partial z} \right)_{y_p} = \frac{1}{2 \sqrt{z – y_p^2} } = -\frac{1}{2 x_n} \; .\]

Finally, for the cyclic rule, let’s go into the 4th quadrant in the $$x$$-$$y$$ plane ($$x>0$$ and $$y<0$$).  Taking each partial derivative in turns yields

\[ \left(\frac{\partial x_p }{\partial y_n} \right)_{z} = -\frac{y_n}{\sqrt{z-y_n^2}} = -\frac{y_n}{x_p} \; ,\]

\[ \left(\frac{\partial y_n }{\partial z} \right)_{x_p} = -\frac{1}{2 \sqrt{z-x_p^2} } = \frac{1}{2 y_n} \; ,\]

and 

\[ \left(\frac{\partial z }{\partial x_p} \right)_{y_n} = 2 x_p \; .\]

Multiplying these terms in order gives

\[ -\frac{y_n}{x_p} \frac{1}{2 y_n} 2 x_p = -1 \; .\]

Nice, neat, and more than partially done.