In a previous post we discussed some of the history of functional analysis and we also said some vague things about its role in mathematics. In this second part of the introduction we will see an example of the spirit of functional analysis in action, by taking a close look at the Stone-Weierstrass approximation theorem.
1. The Weierstrass approximation theorem
In 1885 Karl Weierstrass proved the following theorem:
Weierstrass Theorem: Let $latex f : [a,b] rightarrow mathbb{R}$ be a continuous function. Then for every $latex epsilon > 0$ there exists a polynomial $latex p$ (with real coefficients) such that for all $latex x in [a,b]$,
$latex |p(x) – f(x)|< epsilon .$
The theorem can be phrased differently, by saying that the polynomials are dense in the space of continuous functions with respect to the topology of uniform convergence. I think that this is a remarkable theorem. You see, only thirteen years before Weierstrass proved this theorem he broke the news that continuous functions can be much crazier then anyone ever thought, by constructing his famous Weierstrass function – a continuous function that is nowhere differentiable. But now, in 1885, Weierstrass actually showed that for many purposes, in practice (well, at least theoretically), continuous functions could be taken to be polynomials. We will see the usefulness of this theorem, directly and indirectly, at least at two points in the lectures.
2. The Stone-Weierstrass theorem – real version
Given a topological space $latex X$, the question whether or not a family of functions can approximate uniformly to any given precision any continuous function on $latex X$ is interesting and important. In a few lectures, when we will study Fourier series, we will need to use the following theorem:
Theorem (trigonometric approximation theorem): Let $latex f: mathbb{R}^d rightarrow mathbb{R}$ be a continuous, $latex mathbb{Z}^d$-periodic function. Then for every $latex epsilon > 0$ there exists a trigonometric polynomial $latex q$ such that for all $latex x in mathbb{R}^d$,
$latex |q(x) – f(x)|< epsilon .$
Two explanations are in order. A $latex mathbb{Z}^d$-periodic function is a function $latex f$ such that $latex f(x+n) = f(x)$ for all $latex x in mathbb{R}^d$ and $latex n in mathbb{Z}^d$. A trigonometric polynomial is a function $latex q$ of the form
$latex q(x) = sum_{n} a_n cos(2 pi ncdot x) + b_n sin(2 pi n cdot x) .$
(a finite sum). As it happens many times in mathematics, it turns out that the nicest way to obtain this theorem is to a consider a vastly more general problem. The solution of this more general problem is the Stone-Weierstrass Theorem, which we shall presently state and prove. Before doing so let us describe the setting for this theorem.
Let $latex X$ be a compact Hausdorff space. We will denote by $latex C_mathbb{R}(X)$ the space of continuous, real-valued functions on $latex X$, and by $latex C(X)$ the space of continuous, complex-valued functions on $latex X$. On both of these space we define the sup-norm of a function $latex f$ to be
$latex |f|_infty = sup_{x in X}|f(x)| .$
The quantity $latex |f-g|_infty$ is considered to be distance between the two functions $latex f$ and $latex g$. This distance makes both $latex C_mathbb{R}(X)$ and $latex C(X)$ into complete metric spaces. Both these spaces are vector spaces over the appropriate field, in fact they are algebras, with the usual operations of pointwise addition and multiplication. If $latex A$ is a subspace of $latex C_mathbb{R}(X)$ or of $latex C(X)$ then it is said to be a subalgebra if for all $latex f,g in A$, $latex fg$ is also in $latex A$. A subalgebra $latex A$ is said to separate points if for all $latex x,y in X$ there exists some $latex f in A$ such that $latex f(x) neq f(y)$. A subalgebra is said to be closed if it is a closed subset with respect to the metric.
Stone-Weierstrass Theorem (real version): Let $latex A$ be a closed subalgebra of $latex C_mathbb{R}(X)$ which contains the constant functions and separates points. Then $latex A = C_mathbb{R}(X)$.
To obtain the theorem above regarding approximation using trigonometric polynomials, you first note that, due to standard trigonometric identities, the trigonometric polynmials form an algebra. Next, you think about and realize that the $latex mathbb{Z}^d$-periodic functions on $latex mathbb{R}^d$ can be identified with the continuous functions on the $latex d$-torus $latex mathbb{T}^d = mathbb{R}^d / mathbb{Z}^d$. You then put $latex X = mathbb{T}^d$, and take $latex A$ to be the norm closure of the trigonometric polynomials in $latex C_mathbb{R}(X)$, noting that the norm closure of an algebra is an algebra. To apply the Stone-Weierstrass theorem it suffices now to show that the trigonometric polynomials separate points, and this is elementary. That concludes the proof of the theorem on approximation by trigonometric polynomials.
3. Proof of the Stone-Weierstrass Theorem
Let us isolate three lemmas before we reach the main argument of the proof.
Lemma 1: For every pair of distinct points $latex x,y in X$, and every $latex a,b in mathbb{R}$, there exists a function $latex g in A$ such that $latex g(x) = a$ and $latex g(y) = b$.
Proof: Exercise.
Lemma 2: If $latex f in A$ then the function $latex |f|$ is also in $latex A$.
Here $latex |f|$ denotes the function that sends every $latex x in X$ to $latex |f(x)|$.
Proof: Let $latex epsilon > 0$ be given. We will find a function $latex g in A$ such that $latex |g- |f||_infty < epsilon$. Since $latex A$ is closed and since $latex epsilon$ is arbitrary, this will show that $latex |f| in A$.
Let $latex I = [-|f|_infty, |f|]$. By the Weierstrass Theorem there exists a polynomials $latex p$ such that $latex sup_{t in I} |p(t) – |t|| < epsilon$. Put $latex g = p circ f$. Since $latex A$ is an algebra and $latex p$ is a polynomial, $latex g in A$. Thus
$latex |g-|f||_infty = sup_{x in X} |p(f(x)) – |f(x)|| leq sup_{t in I} |p(t) – |t|| < epsilon$
as required.
Lemma 3: If $latex f,g in A$, then the functions $latex f wedge g$ and $latex f vee g$ are also in $latex A$.
Here $latex f wedge g$ denotes the function that sends every $latex x in X$ to $latex min{f(x),g(x)}$, and $latex f vee g$ denotes the function that sends every $latex x in X$ to $latex max{f(x),g(x)}$.
Proof: This follows immediately from Lemma 2 together with the formulas $latex a wedge b = frac{a+b-|a-b|}{2}$ and $latex a vee b = frac{a+b+|a-b|}{2}$, which hold true for all real $latex a$ and $latex b$.
Completion of the proof of the Stone-Weierstrass Theorem. Let $latex f in C_mathbb{R}(X)$. We must show that $latex f in A$. It suffices, for every $latex epsilon >0$, to find $latex h in A$ such that $latex |f-h|_infty < epsilon$.
We start by constructing, for every $latex x,y in X$, a function $latex f_{x,y} in A$ such that $latex f_{xy}(x) = f(x)$ and $latex f_{xy}(y) = f(y)$. This is possible thanks to Lemma 1.
Next we produce, for every $latex x in X$, a function $latex g_x in A$ such that $latex g_x(x) = f(x)$ and $latex g_x(y) < f(y) + epsilon$ for all $latex y in X$. This is done as follows. For every $latex y in X$, let $latex U_y$ be a neighborhood of $latex y$ in which $latex f_{xy} < f + epsilon$. The compactness of $latex X$ ensures that there are finitely many of these neighborhoods, say $latex U_{y_1}, ldots, U_{y_m}$, that cover $latex X$. Then $latex g_x = f_{xy_1} wedge ldots wedge f_{xy_m}$ (which is in $latex A$, thanks to Lemma 3) does the job.
Finally, we find $latex h in A$ such that $latex |h(x)-f(x)| < epsilon$ for all $latex x in X$. For every $latex x in X$ let $latex V_x$ be a neighborhood of $latex x$ where $latex g_x > f – epsilon$. Again we find a finite cover $latex V_{x_1}, ldots, V_{x_n}$ and then define $latex h = g_{x_1} vee ldots vee g_{x_n}$. This function lies between $latex f + epsilon$ and $latex f – epsilon$, so it satisfies $latex |h(x)-f(x)| < epsilon$ for all $latex x in X$, and the proof is complete.
Remark: In the proof, we never used the assumption that $latex X$ is Hausdorff. But from the way the theorem is stated it seems like it only expects to be invoked in the compact Hausdorff case. Can you figure out why the theorem is stated with this assumption?
4. The Stone-Weierstrass theorem – complex version
Often, one finds it more convenient to study or to use the algebra $latex C(X)$ of complex valued functions. It turns out that it is much harder for a sub-algebra of functions to be dense in $latex C(X)$. Consider for example $latex A(mathbb{D})$ which is defined to be the closure of polynomials in $latex C(overline{mathbb{D}})$ (here $latex mathbb{D}$ denotes the open unit disc in $latex mathbb{C}$, and $latex overline{mathbb{D}}$ denotes its closure). Certainly, $latex A(mathbb{D})$ is a complex algebra which contains the constants and separates points. However, every element of $latex A(mathbb{D})$ is holomorphic in $latex mathbb{D}$, so this algebra is quite far from being the entire algebra $latex C(overline{mathbb{D}})$. To make the Stone-Weierstrass Theorem work in the complex valued case one needs to add another assumption, which is very simple, but turns out to be deeper than it seems at first.
Definition: A subspace $latex S subseteq C(X)$ is said to be self-adjoint if for every $latex f in S$, the complex conjugate $latex overline{f}$ is also in $latex S$.
Stone-Weierstrass Theorem (complex version): Let $latex A$ be a closed and self-adjoint subalgebra of $latex C(X)$ which contains the constant functions and separates points. Then $latex A = C(X)$.
Proof: From the assumptions that $latex A$ is a subspace that separates points it follows that $latex textrm{Re}A$, the space of all functions of the form $latex textrm{Re}f$, $latex f in A$, is a real subspace that separates points. From the self-adjointness assumption it follows that $latex textrm{Re}A subseteq A$, and from closedness of $latex A$ is follows now that $latex textrm{Re}A$ is closed. Thus $latex textrm{Re}A$ is a closed real algebra that contains the constants and separates points. By the (real) Stone-Weierstrass Theorem, $latex textrm{Re}A = C_mathbb{R}(X)$. It follows that every real valued continuous function on $latex X$ is in $latex A$. Symmetrically, every imaginary valued continuous function on $latex X$ is in $latex A$, so $latex C(X) = A$.
5. Concluding remarks
The Stone-Weierstrass Theorem and the way that we have applied it up serve as a baby example of functional analysis at work. We had a concrete approximation problem, which was solved, as it happens often in mathematics, by considering a vastly more general approximation problem. Considering a more general problem serves two purposes. First, after we have proved the result we have a ready-to-use tool that will be applicable in many situations. Second, by generalizing the problem we strip it away of the irrelevant details (for examples the particular nature of the functions we are trying to approximate with or the particular nature of the space on which they live) and we are left with the essence of the problem. Sometimes this leads to a very elegant proof (like above), and sometimes not.
To prove the theorem it was convenient to employ the language of functional analysis, namely, to introduce a norm and to consider the problem inside an algebra which is also a metric space. It was convenient to consider a closed algebra $latex A$, even though there was no closed algebra in the original problem (the trigonometric approximation theorem), and even though this closed algebra turned out to be the entire space of continuous functions. The student might wish to review the proof to see how this streamlined the argument.
Note that we did not obtain a new and “abstract” proof of the Weierstrass Theorem – in fact, the Weierstrass Theorem was used in the proof of the Stone-Weierstrass Theorem. This is not at all surprising, as it is an instance of the following guiding principle that I have come to believe in (please take this with a grain of salt):
There is no analysis without hard analysis.
What I mean by this is that you will never get an interesting theorem of substance, which applies to concrete cases in analysis, that does not involve some kind of hard analysis, some epsilonaustics, some sweat. In our example, the hard analysis comes in the proof of the Weierstrass Theorem. The Stone-Weierstrass Theorem, which has a very soft proof, then spares us the use of hard techniques when it gives us the trigonometric approximation theorem for free. But the hard analysis cannot be eliminated entirely.
I know of a different proof of the Stone-Weierstrass Theorem that does not use the Weierstrass Theorem, which relies on the Hahn-Banach and Krein-Milman theorems, but that proof uses the structure of the dual of $latex C(X)$, and that I consider another bit of hard analysis.
When we moved from the setting of the trigonometric approximation theorem to the setting of the Stone-Weierstrass Theorem we abstracted away the topological space on which the functions live, and we only cared that we have continuous functions on some compact space. In the next lecture, and in most of the rest of the course, we will abstract away everything, and we will consider abstract spaces satisfying certain axioms, and we will not care usually what the points of these spaces are.
We will begin with Hilbert spaces.
Remark 1.
In the definition of trigonometric polynomials perhaps you meant that the sum is finite and that the terms in the sum are inside parenthesis?
finite sum : corrected, thanks.
parenthesis : it is common practice to omit these parenthesis.
Remark 2.
Regarding the guiding principle that “There is no analysis without hard analysis”, that is, “that you will never get an interesting theorem of substance, which applies to concrete cases in analysis, that does not involve some kind of hard analysis”. Well, I don’t agree. In fact, this can be shown already in the proof Lemma 1 above because instead of using the Weierstrass approximation theorem one can use the following simple argument. Before formulating it let me say that I saw it many years ago when studying topology and later when studying approximation theory.
Before posting this comment I looked at many places to try to locate it ahgain, but the only places I saw an evidence to its existence
are [2, page 244] and [4, pages 175-177] (see below for the references).
The argument: it suffices to show that g(t)=|t| can be approximated by polynomials uniformly on [-1,1]. Once we know this we continue as in the proof given in the post. Note that |t|=(t^2)^(0.5)=(1-s)^(0.5) where s=1-t^2. Note that s is between 0 and 1. The function h(s)=(1-s)^(0.5) has a Taylor expansion about 0 and this series converges uniformly on (-1,1) due to well known and elementary theorems on power series. Convergence at s=1 follows from verifying the conditions of Abel’s theorem on power series. Since s is in [0,1] only the uniform convergent of the power series on [0,1] is needed. Therefore there is a sequence p_k(s) of polynomials which converges uniformly to h on [0,1]. By putting s=1-t^2 we obtain another sequence q_k(t) of polynomials which converges to g(t) uniformly on [-1,1].
One can argue that there is some hard analysis in this argument, e.g., when verifying the conditions of Abel’s theorem (in [4] the derivation is long but in [2] it is quite short). But this verification is elementary.
Indeed, one only requires that the function $latex g(t)=|t|$ be the uniform limit of polynomials. This can be proved directly, as you suggest, but I consider that good old hard analysis.
A few corrections (sorry):
1. Lemma 1 ==> Lemma 2
2. The power series of h converges uniformly to h on [-r,r] for
any 0<r<1. The Abel theorem shows that it converges uniformly
to h on [-r,1].
Remark 3 (another counterexample)
Now let me mention another counterexample, from one of my research papers. It is related to a geometric-analytic object called zone diagram, which was defined originally in computer science (computational geometry) in the following setting. A finite set of distinct points in the Euclidean plane is given. The points are called sites. To each site p_i one associate a region R_i defined as the set of all points in the plane whose distance to p_i is not greater than its distance to the union of the other regions R_j. This definition seems circular, since in order to know R_1 we need to know R_2,R_3, etc, but for knowing R_2 we need to know R_1 and the other regions, but we still don’t know R_1. However, after some thinking one can realize that formally a zone diagram is a fixed point of a certain mapping acting on a space of tuples of sets. Denote this mapping by Dom. The tuple R=(R_1,…,R_n) is a zone diagram if and only if R=Dom(R). Now it becomes clear that it is not at all obvious that a zone diagram exists, and if it does exist perhaps it is not unique. Asano, Matousek, and Tokuyama, the people who invented this concept in [1], proved that there exists a unique zone diagram. Their proof is long and not at all simple.
After knowing this, we generalize the setting so that the points belong to any nonempty set X (instead of R^2 as in the original setting), the sites are any nonempty subsets of X (instead of distinct points), there can be infinitely many sites (instead of finitely many), and the distance function is (instead of the Euclidean distance) any metric or even a more general function from X^2 to [-infinity, infinity] satisfying only the assumption that d(x,x) is not greater than d(x,y) for all x and y in X. It seem somewhat hard to believe that something nontrivial can be said in this setting, even for the case of two (general) sites.
However, in a joint work with Simeon Reich [3] we proved that in the case of two sites there does exist at least one zone diagram. The proof is much more simple and short than the original proof in [1] and it can be regarded as soft analysis. Moreover, using an even simpler argument we proved that a more complicated object called double zone diagram always exists, even if there are infinitely many sites. The definition of a double zone diagram is a tuple R satisfying R=Dom^2(R), that is, R is a fixed point of the second iteration of Dom (a periodic point of order 2).
Thank you, that is a very nice example!
Thanks for the compliment. Below is the link to the arXiv version of ref [3]. The existence theorems can be found in Section 5 (pages 9-11).
http://arxiv.org/abs/0708.2668
References
[1] T. Asano, J. Matoušek, and T. Tokuyama, Zone Diagrams: Existence, Uniqueness, and Algorithmic Challenge”. SIAM Journal on Computing 37 (2007), 1182–1198, a preliminary version in Proc. SODA 2007, pp. 756-765.
http://epubs.siam.org/doi/abs/10.1137/06067095X
[2] J. L. Kelley, General topology, Graduate texts in mathematics, Springer, New York, 1975 (reprint of the 1955 edition published by Van Nostrand).
[3] D. Reem and S. Reich, Zone and double zone diagrams in abstract spaces. Colloquium Mathematicum 115 (2009), 129–145.
http://journals.impan.pl/cgi-bin/doi?cm115-1-11
[4] M. H. Stone, The Generalized Weierstrass Approximation Theorem, Mathematics Magazine 21 (1948), 167–184.
http://www.jstor.org/stable/10.2307/3029750?origin=crossref
A small question: perhaps you can make the discussion about the identification of periodic functions and functions defined on the flat torus more precise? a short appendix at the end or a short post or something is enough. Sorry, but at the moment it seems too intuitive for an advanced course in mathematics. And please don’t give it as an exercise to the students, because in most textbooks this issue is ignored (it seems that almost everyone have adopted the habit of waving their hands regarding this). It might be useful to remember also to the fact that once a measure is introduced on the flat torus (for discussing integrable functions, etc.), it is no longer the Lebesgue measure but rather the Haar measure (but fortunately, the is no real difference from the Lebesgue measure in this setting and in particular all the computations and rules are as one expects them to be).
P.S. I see that your rate of posts is high: very good!
I will certainly try to improve the presentation, it might take me some time. Thanks for the comment.
No problem. Perhaps this may be postponed to the next time you teach the course. The point is that one should find an explanation which is clear not only for the teacher, but also for the students, and I am not sure that in this case it is a completely trivial task.
However, since it is not clear to me what is the background of the students (e.g., is measure theory assumed to be well known?
otherwise “Lebesgue” and “Haar” are just names which can be considered as black boxes) I can’t say for sure what can be regarded as a clear explanation.
Following previous remarks, the discussion in [pp. 136-137, 5] shows
that identifications can some times lead to problems. The context there is a Hilbert space and its dual.
[5] H. Brezis, Functional Analysis, Sobolev Spaces, and Partial Differential Equations, Universitext, Springer, New York, 2011
This is a good discussion (the one we are having and the one Brezis is making in his book). True, identifications are dangerous. But remember: if identifications are completely avoided then mathematics is impossible (and becomes much less interesting).