The traditional mathematical axiomatization of probability, due to Kolmogorov, begins with a probability space and constructs random variables as certain functions . But start doing any probability and it becomes clear that the space is de-emphasized as much as possible; the real focus of probability theory is on the algebra of random variables. It would be nice to have an approach to probability theory that reflects this.
Moreover, in the traditional approach, random variables necessarily commute. However, in quantum mechanics, the random variables are self-adjoint operators on a Hilbert space , and these do not commute in general. For the purposes of doing quantum probability, it is therefore also natural to look for an approach to probability theory that begins with an algebra, not necessarily commutative, which encompasses both the classical and quantum cases.
Happily, noncommutative probability provides such an approach. Terence Tao’s notes on free probability develop a version of noncommutative probability approach geared towards applications to random matrices, but today I would like to take a more leisurely and somewhat scattered route geared towards getting a general feel for what this formalism is capable of talking about.
Classical and quantum probability
(Below, if the reader chooses, she can restrict herself to finite sets and finite-dimensional Hilbert spaces so as to ignore measure-theoretic and analytic difficulties.)
A classical probability space consists of the following data:
- A set , the sample space. Elements of this set describe possible states of some system.
- A -algebra of subsets of , the events. Events describe properties of states. The pair is a measurable space
- A probability measure on . This measures the probability that the system has some property.
Example. Let be the set of possible outcomes of coin flips. Letting be the set of all subsets of , we can describe various events like “no heads are flipped” or “at most three tails are flipped,” and we can compute their probabilities using the fact that every point in has probability .
Example. Let be a -dimensional symplectic manifold with symplectic form (e.g. the cotangent bundle of some other manifold). The exterior power of the symplectic form defines a volume form on which defines a Borel measure on called Liouville measure (locally just Lebesgue measure). Since Liouville measure is built from the symplectic form, it is preserved under all symplectomorphisms, and in particular under time evolution with respect to any Hamiltonian.
A random variable is a measurable function (where is given the Borel -algebra generated by the Euclidean topology). In our coin-flipping example, “number of heads flipped” is a random variable. A random variable which only takes the values or encodes the same data as an event (more precisely, it is the indicator function of a unique event, which takes the value on and on its complement). More generally, we can construct events from random variables: for any Borel subset the preimage is an event (the event that lies in ), often written , and so we can consider its probability . (This is the pushforward measure.)
Random variables should be thought of as real-valued observables of our system (and events, as random variables which take the value or , are the observables given by answers to yes-no questions). By repeatedly measuring an observable and averaging, we can obtain its expected value
.
(if this integral converges). If only takes the values , then reduces to the probability of the corresponding event. In general, if is a random variable and is a Borel subset of , then is the indicator function of , and .
If we wanted to define a quantum probability space by analogous data, it would consist of the following (not standard):
- A Hilbert space , the space of states.
- An abstract -algebra of closed subspaces of , the events. The intersection of two subspaces is their set-theoretic intersection, the union is the closure of their span, and the complement is the orthogonal complement.
- A unit vector , the state vector.
Example. Every classical probability space defines a quantum probability space as follows: is the Hilbert space of (equivalence classes of) square-integrable functions under the inner product
,
consists of the closed subspaces of functions which are (a.e.) equal to zero except on a given event , and is the function which is identically equal to .
Example. The quantum probability space describing a qubit comes from applying the above construction to a bit; thus is a -dimensional Hilbert space with orthonormal basis , consists of four subspaces , and is the state of the qubit.
A quantum probability space does not have points in the classical sense, but we can still talk about the probability of an event : if denotes the projection onto , then it is given by
and writing as the sum of its components parallel and orthogonal to we see that this is the square of the absolute value of the component of parallel to . This is a simple form of the Born rule, and it describes the probability that , when measured to determine whether or not it lies in , will in fact lie in . Applied to a qubit, we conclude that a qubit described by , when measured, takes the value with probability and the value with probability .
Note that if is the entire Hilbert space then the condition that the corresponding probability is is precisely the condition that is a unit vector. Note also that the probability assigned by to an event does not change if is multiplied by a unit complex number; for this reason, state vectors are really points in the projective space over . Thus the possible states of a qubit are parameterized by the Riemann sphere (called in this context the Bloch sphere).
A (real-valued) quantum random variable (probably not standard) is a self-adjoint operator on (possibly unbounded and/or densely defined in general). The values taken by are precisely its spectral values (the points in its spectrum ). This specializes even to the classical case: the values for which a random variable has the property that fails to be invertible are precisely its values (up to the subtlety that we can ignore the behavior of on a set of measure zero, but in practice we cannot meaningfully evaluate random variables at points anyway). In particular, for bounded, takes only the values if and only if it is idempotent by Gelfand-Naimark, hence if and only if it is a projection; thus as in the classical case, random variables generalize events.
The expected value of a quantum random variable is
(when lies in the domain of ). If happens to have a countable orthonormal basis of eigenvectors with eigenvalues , then writing we compute that
so this really is the expected value of if we think of it classically as a random variable taking the value with probability .
As in the classical case, we can make sense of probabilities such as where is a Borel subset of , but this requires more work. If has a countable orthonormal basis this is straightforward; in general, we need the Borel functional calculus in order to define as an operator so that we can compute (note that we do not need to be a projection whose image lies in to compute this expectation, although this would be the appropriate analogue of the random variable being measurable). Roughly speaking we ought to be able to start from the continuous functional calculus and approximate the indicator function of by continuous functions, then show that the corresponding limit exists as a self-adjoint operator.
Unlike the classical case, the expected value can be computed independently of any measurability hypotheses on ; in particular, the probability of a particular event occurring (that is, the expected value of an arbitrary projection) is automatically well-defined.
Noncommutative probability
The classical and quantum cases above have several features in common. In both cases we saw that, although we started with a description of events and their probabilities and moved on to a description of random variables and their expected values, we could recover events through their indicator functions as the idempotent random variables and recover probabilities of events as expected values of indicator functions. This suggests that we might fruitfully approach probability in general using algebras of random variables and the expectation.
If the algebra is commutative, we might hope to recover an underlying probability space, but a random-variables-first approach will allow us to work independently of a particular representation of a family of random variables as an algebra of functions on a probability space. If the algebra is noncommutative, we might hope to recover a Hilbert space on which it acts, but again, a random-variables-first approach will allow us to work independently of a particular representation as operators on a Hilbert space. We can also think of the algebra as the algebra of functions on a noncommutative space in the spirit of noncommutative geometry. Although noncommutative spaces don’t have a good notion of point, quantum probability spaces suggest that they have a good notion of measure (which we can think of as a “smeared-out” point, the Dirac measures corresponding to ordinary points).
The following definition is morally due to von Neumann and Segal. A random algebra (not standard) is a complex -algebra together with a -linear functional such that
.
Such a functional is called a state on (as it describes the state of some probabilistic system by describing the expected value of observables). The (real-valued) random variables in are its self-adjoint elements (and an event is a projection; that is, a self-adjoint idempotent).
A morphism of random algebras is a morphism of complex -algebras such that . This defines the category of random algebras, and the category of noncommutative probability spaces is . (This is probably the wrong choice of morphisms, but we’ll ignore that for now.)
Example. From a classical probability space we obtain a random algebra by letting be the von Neumann algebra of essentially bounded measurable functions under conjugation and letting be the integral.
Example. From a quantum probability space we obtain a random algebra by letting be the span of the space of self-adjoint operators such that for all Borel subsets and letting be the functional .
(Because we have not developed the Borel functional calculus, it will be cleaner just to work with an arbitrary -algebra of bounded operators from which can but need not be derived. We can do the same thing in the classical case by starting with a collection of functions and taking the preimages under all of them of the Borel subsets of to define a -algebra on .)
The above examples require some analysis to define in full generality. However, the reason we do not require any analytic hypotheses on is to have a formalism flexible enough to discuss more algebraic examples such as the following.
Example. Let be a group. The group algebra is a -algebra in the usual way (with involution extending , so that every element of is unitary). There is a distinguished state given by and for every non-identity .
The axioms we have chosen require some explanation. Working in a complex -algebra is both convenient and has clear ties to quantum mechanics, but I do not have a good explanation of this axiom from first principles. The condition that is -linear reflects linearity of expectation, which holds both in the classical and quantum cases, and the fact that we want the expected value of a self-adjoint element to be real. The condition that (positivity) reflects the fact that we want probabilities to be non-negative in the following sense.
In any complex -algebra , we may define a positive (really non-negative) element to be an element of the form . A positive element is in particular self-adjoint. In the case of measurable functions on a probability space, the positive elements are precisely the elements which are (a.e.) non-negative, and in the case of operators on a Hilbert space, the positive elements are precisely the self-adjoint elements which have non-negative spectrum (by Gelfand representation this is subtle; see the comments below). Hence positivity is a natural analogue in the algebraic setting of the condition that probabilities are non-negative.
(Edit, 1/2/22: It’s been pointed out in the comments that a more natural definition of “positive” is an element which is a sum of the form ; this makes the positive elements form a convex cone. However, this doesn’t affect the definition of a positive linear functional, and the two are equivalent in any C*-algebra.)
Finally, the condition that reflects the fact that we want the total probability to be .
The semi-inner product
The state allows us to define a bilinear map on any random algebra which satisfies all of the axioms of an inner product except that it is not necessarily positive-definite, but only satisfies the weaker axiom that . We call such a gadget a semi-inner product (since it is positive-semidefinite).
As for classical random variables we can define the covariance
of two elements, and positive-semidefiniteness implies that the variance is non-negative, hence that . More generally, the proof of the Cauchy-Schwarz inequality goes through without modification, and we conclude that
.
This is already enough for us to prove the following general version of Heisenberg’s uncertainty principle.
Theorem (Robertson uncertainty): Let be self-adjoint elements of a random algebra . Then
.
Proof. Since both sides are invariant under translation of either or by a nonzero constant, we may assume without loss of generality that have mean zero (that is, that ). This gives
by Cauchy-Schwarz. We can write as the sum of its real and imaginary parts
and computing using the above decomposition gives
.
where and . The conclusion follows.
Interpreting Robertson uncertainty will be easier once we do a little more work. By Cauchy-Schwarz, if an element satisfies then in fact it satisfies for all (and the converse is clear). In the classical picture, a function satisfying either of these conditions is equal to zero almost everywhere, which motivates the following definition. An element of a random algebra is null or zero almost surely (abbreviated a.s.) if for all , which as we have seen is equivalent to . The null elements form a subspace of . Two elements are equal almost surely if , hence equality a.s. is equivalent to equality in the quotient .
An element has variance zero if and only if it is constant almost surely. Robertson uncertainty then says that if two self-adjoint elements have the property that their commutator has nonzero expectation, then neither of them can be constant almost surely in a strong sense: the product of their variances is bounded below by a positive constant, so as one increases, the other must decrease. In other words, not only are they uncertain, but a state in which is less uncertain is a state in which is more uncertain.
The standard application of Robertson uncertainty is to the case that are the position and momentum operators respectively acting on a quantum particle on . This application has the following purely mathematical interpretation: a function in and its Fourier transform cannot simultaneously be too localized.
Independence
A fundamental notion in classical probability theory is the notion of independence. It can be generalized to random algebras as follows: two -subalgebras of a random algebra are independent if
for all .
Example. Let be the random algebra associated to a classical probability space and let be the subalgebras of functions which are measurable with respect to two -subalgebras of . Then are independent in the above sense if and only if are independent in the sense that
where (by the monotone class theorem). Note that this condition is equivalent to .
Example. Let be two random algebras with expectations . Their tensor product acquires a natural -algebra structure given by on pure tensors (it is the universal -algebra admitting morphisms from whose images commute), and moreover we can define on it a state given by
on pure tensors. Conversely, any state on such that and are independent is of this form. This is a noncommutative generalization of product measure; when come from classical probability spaces , a suitable completion of is the corresponding algebra of functions on the product , and the state above comes from integration against the corresponding product measure.
Example. is independent of itself (in ) if and only if the state is actually a homomorphism of -algebras. Thinking of the case that is a C*-algebra in particular, the corresponding states can be thought of as Dirac measures supported at points of . In the noncommutative case, may admit no homomorphisms to (for example if contains the Weyl algebra), hence no Dirac measures, an expression of the general intuition that noncommutative spaces are “smeared out” and not easily expressible in terms of points.
Independence is a formalization of the intuitive idea that knowing the values of the random variables in doesn’t allow you to deduce anything about the values of the random variables in and vice versa. One indication of how this works in the setting of random algebras is as follows: if is a projection with (that is, an event that occurs with positive probability) we can define a conditional expectation
.
(The first factor of is necessary in the noncommutative case to ensure that the result is still a state.) This represents the expected value of given that the event occurred. If are independent, it follows that ; in other words, knowing that occurred has no effect on the expected value of any of the elements of .
Independence is a very strong condition to impose if the subalgebras do not commute. For example, it implies that for all , which is the only condition under which Robertson uncertainty cannot relate the variances of . In the particular case of the position and momentum operators, is a nonzero scalar, hence always has nonzero expectation; it follows that position and momentum cannot be made independent! (By contrast, in the classical setting any pair of random variables is independent with respect to a Dirac measure.)
In the noncommutative setting, a different notion of independence, free independence (replacing the tensor product with the free product), becomes more natural and useful. We will not discuss this issue further, but see Terence Tao’s notes linked above.
The Gelfand-Naimark-Segal construction
If is any inner product space, any -algebra of linear operators on , and is any unit vector, then is a concrete random algebra with expectation . This subsumes the examples coming from both classical and quantum probability spaces. The goal of this section is to determine to what extent we can prove a Cayley’s theorem for random algebras to the effect that random algebras are concrete.
The above suggests the following definition. If is a complex -algebra, then a -representation of is a homomorphism from to the endomorphisms of an inner product space such that
for all . (Note that if is not a Hilbert space then is not necessarily a -algebra because adjoints may not exist in general.) A Hilbert -representation is a -representation on a Hilbert space.
The semi-inner product on a random algebra descends to the quotient space , where it is becomes an inner product because we have quotiented by the elements of norm zero. Moreover, since
it follows that is a left ideal, so the quotient map is a quotient of left -modules; consequently, acts on by linear operators. Since
it follows that defines a -representation of . The procedure we have outlined is essentially the Gelfand-Naimark-Segal (GNS) construction: we associate to any state on a -algebra a corresponding -representation such that the state can be recovered from the representation as
where . This may be regarded as a weak Cayley’s theorem: unfortunately, this -representation is not faithful in general. To get a stronger statement about random algebras, we will now assume another condition, namely that the if , then (the state is faithful).
The faithfulness axiom is equivalent to requiring and also equivalent to requiring that is an inner product space (rather than a semi-inner product space). It implies, but is stronger than, the assumption that the action of on is faithful. The remarks about the state above then prove the following.
“Cayley’s theorem for random algebras”: A random algebra with a faithful state is concrete.
This is still not a true analog of Cayley’s theorem because the converse is false: the state of a concrete random algebra need not be faithful.
From here we will assume, in addition to faithfulness, another condition, namely that for every there exists a constant such that (boundedness). Boundedness is equivalent to requiring that acts on itself by bounded linear operators. This action therefore uniquely extends to the completion of with respect to its inner product, which we’ll denote by , and consequently it follows that in this case admits a Hilbert -representation . The closure of the image of in is a C*-algebra of bounded linear operators on , and moreover since where , the expectation uniquely extends to .
This motivates the following definition: a random C*-algebra is a random algebra which is also a C*-algebra. The above discussion proves the following.
Theorem: Let be a random algebra with a faithful state satisfying boundedness. Then canonically embeds as a dense -subalgebra of a random C*-algebra equipped with a Hilbert -representation via the GNS construction; moreover, there is a canonical vector such that for all .
This is a much stronger conclusion than the conclusion that is concrete, since it allows us to use facts from the theory of C*-algebras.
Corollary: Let be a commutative random algebra with a faithful state satisfying boundedness. Then canonically embeds as a dense -subalgebra of the algebra of continuous functions on a compact Hausdorff space .
Proof. The closure of a commutative -subalgebra of is also commutative, since commutativity is a continuous condition. The conclusion then follows from Gelfand-Naimark.
Corollary (“Maschke’s theorem”): Let be a finite-dimensional random algebra with a faithful state. Then is semisimple.
Proof. A finite-dimensional random algebra automatically satisfies boundedness. The GNS construction equips with a faithful -representation, namely itself. Let be a submodule of . Then for every ,
so is also a submodule of . So every submodule of is a direct summand; consequently, is semisimple.
Note that we really do recover Maschke’s theorem for complex representations of finite groups as a corollary, since is a finite-dimensional random algebra with a faithful state.
Moments
The axioms for a random algebra may not seem strong enough to capture random variables. For example, it does not seem possible to directly access probabilities like . However, our axioms are enough to define the moments
of a random variable, and under suitable hypotheses (discussed under the general heading of the moment problem) it is possible to recover a random variable in the classical sense from its moments. We prove a result of this type for random C*-algebras.
Proposition: Any state on a C*-algebra has norm (and in particular is continuous).
Proof. By examining real and imaginary parts, it suffices to show that a self-adjoint element of norm maps to an element of norm at most . Since has non-negative spectrum, by the continuous functional calculus it has a square root, hence is positive, so
.
Similarly, has non-negative spectrum, so by the continuous functional calculus it has a square root, hence is positive, so
.
We conclude that , with equality if .
In fact a much stronger statement is true due to the following corollary of the Riesz-Markov theorem, which we will not prove; see Terence Tao’s notes.
Theorem: Let be a compact Hausdorff space and be a positive linear functional. Then there is a unique Radon measure on such that
for all (and conversely any Radon measure defines a positive linear functional on ).
It follows by Gelfand-Naimark that specifying a commutative random C*-algebra is equivalent to specifying a compact Hausdorff space and a Radon measure on it of total measure .
Corollary: Let be a random C*-algebra. If is normal, then there is a unique Radon measure on such that
for all continuous functions (where is defined using the continuous functional calculus).
Proof. Since is dense in by construction, a morphism is uniquely determined by what it does to , hence , regarded as a function , is injective. Since it is a continuous map between compact Hausdorff spaces, it is also an embedding, so we may regard as canonically embedded into . (This embedding is actually a homeomorphism but we do not need this.) By Tietze extension, any continuous function extends to a continuous function , so the continuous functions given by applying the continuous functional calculus to include all continuous functions , and we reduce to the previous result.
Corollary: With the same hypotheses as above, the Radon measure above is uniquely determined by the values
where is a polynomial in and . Consequently, is uniquely determined by the -moments . If is self-adjoint, is uniquely determined by the values
where is a polynomial in one variable. Equivalently, is uniquely determined by the moments .
Proof. is a compact subset of , so by Stone-Weierstrass the polynomial functions in and are uniformly dense in the space of continuous functions . Now recall that the continuous functional calculus and both preserve uniform limits. If is self-adjoint, is real, so we only need to take polynomial functions in .
The proofs above generalize essentially unchanged to the following.
Corollary: Let be commuting normal elements of a random C*-algebra . Then there exists a unique Radon measure on such that
for all continuous functions . Furthermore, is uniquely determined by the joint -moments
of the . If the are self-adjoint, is uniquely determined by the joint moments of the .
The hypothesis that the commute is crucial in the following sense. We restrict to the self-adjoint case for simplicity.
Proposition: Let be self-adjoint elements of a random C*-algebra with faithful state such that there exists a measure on a measure space and two measurable functions satisfying
for all . Then .
Proof. If are self-adjoint then so is . The hypothesis above implies that , but since is self-adjoint is positive, hence by faithfulness .
This result may be interpreted as saying that two noncommuting random variables do not in general have a reasonable notion of joint distribution.
Some closing remarks about quantumness
Classical mechanics is in principle deterministic: if the initial state of a system is known deterministically, then classical mechanics can in principle determine all future states. The predictions of quantum mechanics are, however, probabilistic: all that can be determined is a probability distribution on possible outcomes of a given experiment.
The two can be made to seem more similar if classical mechanics is generalized by allowing the state of the system to be probabilistic in the classical sense. Then classical and quantum mechanics can both be subsumed under the heading of random algebras, where in the classical case we do not keep track of the position and momentum of a particle but a probability distribution over all possible positions and momenta. What distinguishes the classical from the quantum cases is the noncommutativity of the random algebras in the latter case, and in particular the fact that the random algebras occurring in quantum mechanics generally do not admit any homomorphisms , hence admit no Dirac measures, so we are forced to always work probabilistically.
The formal similarity between classical and quantum mechanics described here only applies to states and observables; to get time evolution back into the picture we should endow our random algebras with Poisson brackets, giving us random Poisson algebras, and Hamiltonians…