Category Archives: Solutions

Exercise 9.1.13

Exercise statement

Prove Theorem 9.1.24.

Theorem 9.1.24 (Heine-Borel theorem for the line). Let X be a subset of \mathbf R. Then the following two statements are equivalent:

(a) X is closed and bounded.

(b) Given any sequence (a_n)_{n=0}^\infty of real numbers which takes values in X (i.e., a_n \in X for all n), there exits a subsequence (a_{n_j})_{j=0}^\infty of the original sequence, which converges to some number L in X.

Hints

  1. To show (a) implies (b), use the Bolzano-Weierstrass theorem (Theorem 6.6.8) and Corollary 9.1.17.
  2. To show (b) implies (a), argue by contradiction, using Corollary 9.1.17 to establish that X is closed (note by Issa: actually I don’t think proof by contradiction is needed here). You will need the axiom of choice to show that X is bounded, as in Lemma 8.4.5. (Note by Issa: the axiom of choice is also not directly needed for this part, but proving this without direct use of the axiom of choice requires first showing that X is closed, and showing that X is closed implicitly requires the axiom of choice because it depends on Lemma 9.1.14.)

How to think about the exercise

This is a fairly simple exercise so I don’t have much to say.

Model solution 1

First we will show that (a) implies (b). Suppose X is closed and bounded. Let (a_n)_{n=0}^\infty be a sequence which takes values in X. Since X is bounded, this means that there exists some real number M > 0 such that X \subseteq [-M, M]. Since a_n \in X for each n, this means that a_n \in [-M, M] for each n as well. But this means -M \leq a_n \leq M, i.e., |a_n| \leq M for each n. So (a_n)_{n=0}^\infty is a bounded sequence. Thus by the Bolzano-Weierstrass theorem (Theorem 6.6.8), there exists a subsequence (a_{n_j})_{j=0}^\infty which converges to some limit L \in \mathbf R. The only thing left to show now is that L \in X. By assumption X is closed, and since (a_n)_{n=0}^\infty is a sequence in X this means that (a_{n_j})_{j=0}^\infty is also a sequence in X. Thus we can apply Corollary 9.1.17 to conclude that L = \lim_{j\to\infty} a_{n_j} \in X as desired.

Now we prove the converse, that (b) implies (a). Suppose that given any sequence (a_n)_{n=0}^\infty in X, there exits a subsequence (a_{n_j})_{j=0}^\infty which converges to some number L\in X. Our goal is to show that X is both closed and bounded.

We start by showing that X is closed. By Corollary 9.1.17, it suffices to show that every convergent sequence (a_n)_{n=0}^\infty of elements in X has its limit in X. So let (a_n)_{n=0}^\infty be a convergent sequence consisting of elements in X. Call the limit L. Our goal is to show that L \in X. By our assumption (b), there exists a subsequence (a_{n_j})_{j=0}^\infty which converges to some number L' \in X. But now by Proposition 6.6.5, since (a_n)_{n=0}^\infty converges to L, we know that its subsequence (a_{n_j})_{j=0}^\infty must also converge to L, so in fact L = L', and thus L \in X as required.

Finally, we show that X is bounded. Suppose for the sake of contradiction that X is not bounded. Thus for each natural number n, there exists some element x_n \in X such that |x_n| > n. More formally, for each natural number n, we are choosing some element x_n from the set \{ x \in X : |x| > n \} (which is non-empty for each n because X is not bounded) in order to define a sequence (x_n)_{n=0}^\infty; making infinitely many choices like this requires the axiom of choice. By our assumption (b), there exists some subsequence (x_{n_j})_{j=0}^\infty which converges to some number L \in X. But |x_{n_j}| > n_j \geq j for each j so (x_{n_j})_{j=0}^\infty is not bounded, and unbounded sequences cannot be convergent (Corollary 6.1.17). We have thus shown that (x_{n_j})_{j=0}^\infty is both convergent and not convergent, a contradiction. This contradiction shows that X must in fact be bounded.

Model solution 2

Here we prove just the fact that (b) implies X is bounded, without directly using the axiom of choice. Thanks to Leshuri and The Royal Group for help with this proof.

From Model solution 1, we know that X is closed; note that proving this fact implicitly used the axiom of choice because it depended on Corollary 9.1.17, which depends on Lemma 9.1.14. Lemma 9.1.14 requires the axiom of choice. Thanks to The Royal Group for pointing this out to me. However, in the rest of the proof below, we won’t make any new appeals to the axiom of choice. So you could view the proof below as an axiom-of-choice-free proof that property (b) (which is called sequential compactness) together with the assumption that X is closed implies that X is bounded. See also Remark 8.4.6.

To make the proof easier to understand, we first prove a lemma that X is bounded if and only if X is both bounded above and bounded below. More formally, we will show that if X is bounded, then there exists some M > 0 such that x \leq M for all x \in X, and also that there exists some N < 0 such that N \leq x for all x \in X.

If X is bounded, then there is some M > 0 such that -M \leq x \leq M, so M witnesses the fact that X is bounded above, and -M witnesses the fact that X is bounded below. Conversely, if X is bounded above, say by M > 0, and X is bounded below, say by N < 0, then we can consider the bound M - N. Then for all x \in X we have x \leq M < M - N and -(M - N) = -M + N < N \leq x. Thus M - N bounds the set X.

With that lemma out of the way, we will now show that X is bounded. Suppose for the sake of contradiction that X is not bounded. By the lemma above, X is either not bounded above or not bounded below. We will show the case when X is not bounded above; the case when X is not bounded below is very similar.

Since X is not bounded above, the set X_n := \{ x \in X : x > n \} is non-empty for each natural number n. Define x_n := \inf(X_n). Notice that we did not choose from the set X_n, rather we simply constructed the exact x_n we needed, so we did not use the axiom of choice in constructing the sequence (x_n)_{n=0}^\infty.

We want to show that each x_n lies in X. First we show that x_n is an adherent point of X_n. Let \varepsilon > 0 be given. Since x_n is the greatest lower bound of X_n, we see that x_n - \varepsilon is not a lower bound of X_n, so there exists some x \in X_n such that x_n \leq x < x_n + \varepsilon. So subtracting x_n from each side, this means -\varepsilon < 0 \leq x - x_n < \varepsilon so |x - x_n| < \varepsilon. But this means x_n is \varepsilon-adherent to X_n, and since \varepsilon was arbitrary, this means that x_n is an adherent point of X_n. In fact, we have proved something more. Since X_n \subseteq X, the x \in X_n which witnessed the fact that x_n is an adherent point X_n is also an element of X. So in fact, x_n is an adherent point of X as well. But X was already shown to be a closed set, so by Definition 9.1.15, we see that x_n \in X.

The hard part is now done. By our assumption (b), there exists some subsequence (x_{n_j})_{j=0}^\infty which converges to some number L \in X. But x_{n_j} \geq n_j \geq j for each j so (x_{n_j})_{j=0}^\infty is not bounded, and unbounded sequences cannot be convergent (Corollary 6.1.17). We have thus shown that (x_{n_j})_{j=0}^\infty is both convergent and not convergent, a contradiction. This contradiction shows that X must in fact be bounded.

Exercise 3.1.8

Exercise statement

Let A,B be sets. Prove the absorption laws A \cap (A \cup B) = A and A \cup (A \cap B) = A.

Hints

  • What are the ways to show that two sets are equal?

How to think about the exercise

Let’s take the first law, A \cap (A \cup B) = A. It would be simple enough to prove this by splitting into two statements, A \cap (A \cup B) \subseteq A and A \cap (A \cup B) \supseteq A. And then to say, for the first one, let x \in A \cap (A \cup B). Since x \in A \cap (A \cup B), we have x \in A and x \in A \cup B. But x \in A is what we wanted, so we are done, and we have A \cap (A \cup B) \subseteq A. Then we do the other side. Then we do the other equality. It’s kind of a tedious exercise. Is there a faster way, perhaps by direct algebraic manipulation?

Well, one thing to notice is that A \cap (A \cup B) looks expandable, using Proposition 3.1.27 (in the 4th edition; it’s Proposition 3.1.28 in the 3rd edition of the book). In particular, using distributivity we get A \cap (A \cup B) = (A \cap A) \cup (A \cap B). Now using the identity property (A \cap A = A) from the same Proposition, we can write this as A \cup (A \cap B). Now look at what we have: that’s just the left-hand side of the second absorption law! If we expand this using distributivity and then simplify using identity again, we get A \cup (A \cap B) = (A \cup A) \cap (A \cup B) = A \cap (A \cup B). Uh oh, we ended up right where we started. So this approach doesn’t get us all the way to a proof, but at least we can prove one of the absorption laws in terms of the other, which does cut down on about half of the tedium!

You might wonder how to know ahead of time whether it is possible to prove a set equality like this using algebra versus whether it is impossible and you need to go back to the equality of sets axiom (Axiom 3.2 in the 4th edition; Definition 3.1.4 in 3rd edition). I don’t actually know the answer to this. If anyone knows something about this, please write to me in the comments.

Model solution

Let’s first consider the second absorption law, A \cup (A \cap B) = A. By the distributivity and identity properties from Proposition 3.1.27, we have

A \cup (A \cap B) = (A \cup A) \cap (A \cup B) = A \cap (A \cup B)

But A \cap (A \cup B) is the left-hand side of the first absorption law. So if we can show the first absorption law, we get the second absorption law too simply by using the above chain of equalities.

To show A \cap (A \cup B) = A, we will show both A \cap (A \cup B) \subseteq A and A \cap (A \cup B) \supseteq A. First let x \in A \cap (A \cup B). Then we have x \in A and x \in A \cup B. But x \in A is what we wanted, so this shows A \cap (A \cup B) \subseteq A. Now let x \in A. Our goal is to show x \in A \cap (A \cup B). Since x \in A, we have that x \in A or x \in B. Thus x \in A \cup B. But now we have both x \in A and x \in A \cup B, so we have x \in A \cap (A \cup B) as desired. This completes the proof that A \cap (A \cup B) \supseteq A. Since we’ve now shown each set is a subset of the other, we have A \cap (A \cup B) = A, and the proof is done.

Exercise 6.6.2

Exercise statement

Can you find two sequences (a_n)_{n=0}^\infty and (b_n)_{n=0}^\infty which are not the same sequence, but such that each is a subsequence of the other?

Hints

None.

How to think about the exercise

In the book, Tao writes “The property of being a subsequence is reflexive and transitive, though not symmetric” (meaning it fails to be an equivalence relation). Using terminology from Definition 8.5.1, this exercise shows that the relation “is a subsequence of” is not anti-symmetric either, and hence is not a partial order. (Contrast this with the relation “is a subset of” for sets, which is anti-symmetric, and also happens to be reflexive and transitive, and so is a partial order.)

You will see in the solution below that in finding a counterexample, we are taking advantage of how the sequence elements repeat. Must all the examples we can find have repeating elements? I think it’s possible to prove this, and I think it’s even possible to prove a slightly stronger claim that all examples must have infinitely repeating elements, so that at least for the class of non-repeating (or non-infinitely repeating) sequences, “is a subsequence of” is a partial order.

Model solution

Consider (a_n)_{n=0}^\infty = (0,1,0,1,0,1,\ldots) defined by

\displaystyle a_n := \begin{cases}0 & \text{if }n\text{ is even} \\ 1 & \text{if }n\text{ is odd}\end{cases}

and (b_n)_{n=0}^\infty = (1,0,1,0,1,0,\ldots) defined by

\displaystyle b_n := \begin{cases}1 & \text{if }n\text{ is even} \\ 0 & \text{if }n\text{ is odd}\end{cases}

The function f : \mathbf N \to \mathbf N defined by f(n) := n+1 is increasing (why?), and we have b_n = a_{f(n)} and a_n = b_{f(n)} for each n \in \mathbf N. So each sequence is a subsequence of the other, as required.

(To show this rigorously we’d need to define “even” and “odd”, prove a version of Euclidean division with uniqueness so we can say that a natural number is exactly one of even or odd, and show that a natural number n is even (respectively odd) iff f(n) is odd (respectively even). But I think this exercise was intended just as an opportunity to quickly come up with a counterexample and move on, not a place to spend a bunch of time proving facts about number theory. But if anyone in the comments wants help making this rigorous, I can maybe do this at some point.)

 

Exercise 6.4.5

Exercise statement

Use Lemma 6.4.13 to prove Corollary 6.4.14.

Corollary 6.4.14 (Squeeze test). Let (a_n)_{n=m}^\infty, (b_n)_{n=m}^\infty, and (c_n)_{n=m}^\infty be sequences of real numbers such that

a_n \leq b_n \leq c_n

for all n \geq m. Suppose also that (a_n)_{n=m}^\infty and (c_n)_{n=m}^\infty both converge to the same limit L. Then (b_n)_{n=m}^\infty is also convergent to L.

Hints

  1. You will probably also want to make use of some other results from this section.

How to think about the exercise

This is one of those exercises that you can do purely by pattern matching, without any deep thought. Let me show what I mean by doing the exercise in this way.

We are supposed to use Lemma 6.4.13. Lemma 6.4.13 has a hypothesis that looks like a_n \leq b_n. Here we have a_n \leq b_n \leq c_n instead. That means we get for free:

\displaystyle \sup(a_n)_{n=m}^\infty \leq \sup(b_n)_{n=m}^\infty \leq \sup(c_n)_{n=m}^\infty

\displaystyle \inf(a_n)_{n=m}^\infty \leq \inf(b_n)_{n=m}^\infty \leq \inf(c_n)_{n=m}^\infty

\displaystyle \limsup_{n\to\infty} a_n \leq \limsup_{n\to\infty} b_n \leq \limsup_{n\to\infty} c_n

\displaystyle \liminf_{n\to\infty} a_n \leq \liminf_{n\to\infty} b_n \leq \liminf_{n\to\infty} c_n

Can we do anything useful with these inequalities? Nothing immediately comes to mind. But there’s still some hypotheses we haven’t used yet, namely that both (a_n)_{n=m}^\infty and (c_n)_{n=m}^\infty converge to the same limit L. Is there some previous result we have learned that allows us to connect limits to things like \sup, \inf, \limsup, \liminf? Flipping backwards in the book, we find Proposition 6.4.12(f), which mentions both \limsup, \liminf and limits. Can we make use of this result? The first part seems relevant. It says that if a limit exists, then both the limit superior and limit inferior are equal to it. So we can update two of the inequalities above:

L=\displaystyle \limsup_{n\to\infty} a_n \leq \limsup_{n\to\infty} b_n \leq \limsup_{n\to\infty} c_n=L

L=\displaystyle \liminf_{n\to\infty} a_n \leq \liminf_{n\to\infty} b_n \leq \liminf_{n\to\infty} c_n=L

This seems useful. We’ve constrained the values of both \limsup_{n\to\infty} b_n and \liminf_{n\to\infty} b_n. In fact, they both have to be equal to L. At this point, the second part of Proposition 6.4.12(f) becomes useful, and completes the result.

So the general strategy for this sort of exercise is to just flip backwards through the book, trying to look for any result that seems useful (e.g. mentions some of the same kinds of objects we are currently working with), and then see what new facts we can get.

Model solution

Since we have a_n \leq b_n \leq c_n for all n \geq m, Lemma 6.4.13 gives us

\displaystyle \limsup_{n\to\infty} a_n \leq \limsup_{n\to\infty} b_n \leq \limsup_{n\to\infty} c_n

and

\displaystyle \liminf_{n\to\infty} a_n \leq \liminf_{n\to\infty} b_n \leq \liminf_{n\to\infty} c_n

But since \lim_{n\to \infty} a_n = L, Proposition 6.4.12(f) gives us \limsup_{n\to\infty} a_n = \liminf_{n\to\infty} a_n = L. Similarly, since \lim_{n\to \infty} c_n = L, Proposition 6.4.12(f) gives us \limsup_{n\to\infty} c_n = \liminf_{n\to\infty} c_n = L. Thus the above inequalities become

\displaystyle L \leq \limsup_{n\to\infty} b_n \leq L

and

\displaystyle L \leq \liminf_{n\to\infty} b_n \leq L

This means that \limsup_{n\to\infty} b_n = L and \liminf_{n\to\infty} b_n = L. So by Proposition 6.4.12(f) again, we have \lim_{n\to\infty} b_n = L as desired.

Exercise 8.5.5

Exercise statement

Let f: X \to Y be a function from one set X to another set Y. Suppose that Y is partially ordered with some ordering relation \leq_Y. Define a relation \leq_X on X by defining x \leq_X x' if and only if f(x) <_Y f(x') or x=x'. Show that this relation \leq_X turns X into a partially ordered set. If we know in addition that the relation \leq_Y makes Y totally ordered, does this mean that the relation \leq_X makes X totally ordered also? If not, what additional assumption needs to be made on f in order to ensure that \leq_X makes X totally ordered?

Hints

  1. Why did Tao define \leq_X by the condition f(x) <_Y f(x') or x=x' rather than simply the condition f(x) \leq_Y f(x')?

How to think about the exercise

The first part of the exercise, where you must show that \leq_X is partially ordered, is straightforward so I won’t comment on it. The latter parts of the exercise are more interesting and may be tricky, so I will discuss how I thought through the exercise.

As you might have guessed, when Tao asks you whether something is true rather than just telling you to prove a fact, it typically means the statement in question is not true. Let’s try to see why. I’m going to make use of two heuristics here, which I will call “try a simple example” and “try to prove the false statement and see what goes wrong”.

Try a simple example

One way to solve problems that pretty reliably works is to just to try scribbling down some simple concrete examples. I started out by drawing some dots to represent the sets X and Y, with order signs  to indicate that Y is totally ordered:

And then I just drew some arrows to represent the function f:

I tried to not make the function too nice, so that it would appear to be a “random” function. In particular, I chose a function that was neither injective nor surjective. Now, can we totally order X? Notice that the rightmost dot in X is clearly larger than either of the dots on the left, but among the first and second dots, there’s no clear ordering between them.

Because the lack of total ordering happened at exactly the spot where f failed to be injective (we have two points in X both mapping to the same point in Y), our first guess should be that injectivity is the additional assumption we need to force \leq_X to be a total order. And sure enough, drawing a few more examples should convince you this is the case.

Try to prove the false statement and see what goes wrong

Another good strategy when trying to figure out what assumptions are needed to show something is to just try proving the thing, and seeing what goes wrong and what extra thing you need. In this case, we assume \leq_Y is a total order, and then we try to prove that \leq_X is also a total order. To do this, let x, x' \in X. We want to show that x \leq_X x' or x' \leq_X x. In other words, we want to show that f(x) <_Y f(x') or f(x') <_Y f(x) or x=x'. The case x=x' is easy; maybe we just got lucky and happened to pick the same element out of X both times. So suppose x\ne x'. Can we always show that f(x) <_Y f(x') or f(x') <_Y f(x)? Since we assumed that \leq_Y is a total order, we know that f(x) \leq_Y f(x') or f(x') \leq_Y f(x). Maybe we can show that when f(x) \leq_Y f(x') we actually have f(x) <_Y f(x') so that x \leq_X x' and similarly when f(x') \leq_Y f(x) we’ll have x' \leq_X x? In either case, the important thing we must show in order to prove strict inequality is f(x) \ne f(x'). And here we see the problem: we have only assumed x\ne x', and now we’re trying to show f(x) \ne f(x'). This is the defining property of an injection.

Another way to see the problem is that  we know f(x) <_Y f(x') or f(x') <_Y f(x), but we might have both f(x) \leq_Y f(x') and f(x') \leq_Y f(x), in which case anti-symmetry of \leq_Y gives us that f(x) = f(x'). In other words, we’ll end up with x\ne x' such that f(x) = f(x'), which you might notice is precisely a failure of f to be injective.

Model solution

We first show that the relation \leq_X is a partial ordering of X. To do this, we must show that \leq_X satisfies reflexivity, anti-symmetry, and transitivity. We will go through each property one by one:

  • Reflexivity: Let x \in X. We must show that x \leq_X x. But x=x, so by definition of \leq_X we have x \leq_X x.
  • Anti-symmetry: Let x,x' \in X and suppose we have both x \leq_X x' and x' \leq_X x. To show that \leq_X is anti-symmetric, we must now show that x=x'. Suppose for the sake of contradiction that x\ne x'. Since x \leq_X x', this means that f(x) <_Y f(x') (because we’ve ruled out the possibility that x=x'), and since x' \leq_X x this means that f(x') <_Y f(x). Since \leq_Y is a partial order, this means that we have both f(x) = f(x') (via anti-symmetry of \leq_Y) as well as f(x) \ne f(x') (by definition of <_Y). This contradiction shows that we have x=x' after all, and thus \leq_X is anti-symmetric.
  • Transitivity: Let x, x', x'' \in X and suppose we have both x \leq_X x' and x' \leq_X x''. To show transitivity, we must now prove that x \leq_X x''. If x=x' or x'=x'' then we automatically have transitivity; for instance if x=x' then x' \leq_X x'' can be written as x \leq_X x'', and similarly for the other case. Thus we only need to consider the case where x\ne x' and x'\ne x''. In this case, we have both f(x) <_Y f(x') and f(x') <_Y f(x''). Now by transitivity of \leq_Y we have f(x) \leq_Y f(x''). But not only that: if f(x)=f(x'') we would have f(x'') \leq_Y f(x) which combined with f(x) <_Y f(x') would (by transitivity) show that f(x'') \leq_Y f(x'). So we would have both f(x'') \leq_Y f(x') and f(x') <_Y f(x''), so by anti-symmetry we would have f(x'')=f(x'), a contradiction of f(x') <_Y f(x''). So we must have f(x)\ne f(x'') after all, which combined with f(x) \leq_Y f(x'') means f(x) <_Y f(x''). Thus x \leq_X x''.

Now suppose that \leq_Y is a total order. We will show that this does not necessarily mean \leq_X is a total order. Any non-injective function f will do, but for concreteness, let X := \{1,2\}, let Y := \mathbf Z, and let \leq_Y be the usual ordering relation \leq on the integers. Define f : X \to Y by f(x) := 0 for all x \in X. Now \leq_X cannot order 1 and 2 because all three of the statements f(1) < f(2), f(2) < f(1), and 1=2 are false. This shows that \leq_X is not a total order. In fact, this happens anytime f is not injective, as we will presently show.

The additional assumption that f is injective is both necessary and sufficient to ensure that \leq_X makes X totally ordered. First suppose that f is injective. We want to show that \leq_X is a total ordering of X. So let x,x' \in X. We want to show that x \leq_X x' or x' \leq_X x. In other words, we must show that f(x) <_Y f(x') or f(x') <_Y f(x) or x=x'. If x=x' are done, so suppose x\ne x'. Since f is injective, this means f(x) \ne f(x'). But now by the assumption that \leq_Y is a total order, we must have f(x) \leq_Y f(x') or f(x') \leq_Y f(x). Since the option f(x)=f(x') is not available, this means that f(x) <_Y f(x') or f(x') <_Y f(x). This completes the first direction.

Now suppose that f is not injective. Then there must be elements x,x' \in X such that x\ne x' and f(x)=f(x'). Our goal is to show that \leq_X is not a total order. We cannot have x \leq_X x' since this would mean f(x) <_Y f(x') (remember, f(x)=f(x')) or x=x', and we know both are false. We also cannot have x' \leq_X x for similar reasons.

Exercise 3.3.6

Exercise statement

Let f : X \to Y be a bijective function, and let f^{-1} : Y \to X be its inverse. Verify the cancellation laws f^{-1}(f(x)) = x for all x \in X and f(f^{-1}(y)) = y for all y \in Y. Conclude that f^{-1} is also invertible, and has f as its inverse (thus (f^{-1})^{-1} = f).

Hints

None.

How to think about the exercise

On one hand, this is a pretty obvious exercise and all of the results shown in it should make sense to you. On the other hand, I think it can be tricky to stick to just the definitions and what we know about bijective functions, and be vigilant so as to not smuggle in any extra “obvious” facts about bijective functions. But as long as you are extra careful, you should just be able to straightforwardly do the exercise.

Model solution

Let x \in X be arbitrary. Now f(x) \in Y. We can define the variable y := f(x). Since f is bijective, for this y \in Y, there is exactly one x' \in X such that f(x') = y, and we denote this x' by f^{-1}(y). But clearly this x' must be the same as our x. If that’s not clear to you, you can start with f(x') = y = f(x) and use the injectivity of f to conclude x'=x. So now just simply chasing down the equalities we know about, we see that x=x'=f^{-1}(y)=f^{-1}(f(x)) (in the last equality, we’re using the substitution axiom of equality). But that’s just what we wanted to show. So we’re done with the first claim. Note that we didn’t really need the variable y at all in this part, but I think using the extra variable makes things a bit clearer.

Now let y \in Y be arbitrary (this is a new variable we are declaring, and may not have anything to do with the y in the previous paragraph). Since f is bijective, for our given y there is exactly one x \in X such that f(x) = y, and we denote this x by f^{-1}(y). In other words, x = f^{-1}(y). Now what happens if we just apply f to both sides of that equation? We get f(x) = f(f^{-1}(y)). But we also know that f(x)=y, so we have y=f(x) = f(f^{-1}(y)), which is exactly what we wanted to show, so we’re done with the second cancellation law.

Now we have to show that f^{-1} is invertible. First we show that it is injective. Let y,y' \in Y, and suppose f^{-1}(y) = f^{-1}(y'). Using the second cancellation law, we apply f to both sides and get y=y'. So f^{-1} is injective as required. Now we show it’s also surjective. Let x \in X. We need to find some y \in Y such that f^{-1}(y)=x. Why don’t we just take f(x) \in Y as our y? (Since all we have to work with are x, f, and f^{-1}, we don’t really have much choice in constructing an element of Y.) Indeed, applying the first cancellation law, we get f^{-1}(f(x))=x, so f^{-1} is surjective.

Finally, we need to show that f^{-1} has f as its inverse. Since f^{-1} is bijective, for every x \in X there is exactly one y \in Y such that f^{-1}(y)=x. By the textbook, this value of y is denoted (f^{-1})^{-1}(x). If we can show that this (f^{-1})^{-1}(x) is the same as f(x), we will have shown that (f^{-1})^{-1}=f. To do this, we simply take the equation f^{-1}(y)=x and apply f to both sides, to get f(f^{-1}(y))=f(x). By the second cancellation law, this becomes y=f(x). But we already know that y = (f^{-1})^{-1}(x), so this means (f^{-1})^{-1}(x)=f(x), and now we are done.

Exercise 7.2.2

Exercise statement

Prove Proposition 7.2.5.

Proposition 7.2.5. Let \sum_{n=m}^\infty a_n be a formal series of real numbers. Then \sum_{n=m}^\infty a_n converges if and only if, for every real number \varepsilon > 0, there exists an integer N \geq m such that

\displaystyle \left|\sum_{n=p}^q a_n\right| \leq \varepsilon for all p,q \geq N.

Hints

  1. Use Proposition 6.1.12 and Theorem 6.4.18.
  2. If you want to do this exercise correctly, it is not as easy as just applying the given Proposition and/or Theorem!

How to think about the exercise

If you look up the hints, it is quite obvious how the exercise is done. I wanted to give a thought process of how to approach the problem if one randomly saw it (i.e. without accompanying hints), so in what follows I won’t be assuming that the hints have been given.

Let’s write out the two sides of the “if and only if” by making the quantifiers explicit:

For all \varepsilon > 0 there exists N \geq m such that for all p,q \geq N we have \left|\sum_{n=p}^q a_n\right| \leq \varepsilon           (1)

The claim that \sum_{n=m}^\infty a_n converges is:

For all \varepsilon > 0 there exists N \geq m such that for all k \geq N we have \left|\sum_{n=m}^k a_n - S\right| \leq \varepsilon           (2)

Above, I have used S to denote the sum. It’s actually equal to S=\sum_{n=m}^\infty a_n.

This shows something interesting. When formalizing what it meant for a series to converge, i.e., when writing out (2), we had to refer to what the sum converges to, i.e., to S. But in (1), we did not even have to refer to the final sum! In other words, (1) encodes the claim “this series converges” without talking about what the series converges to. This sounds awfully familiar … didn’t we have a way to say “this sequence converges” without referring to the limit of the sequence? That’s right, this is sounding exactly like what Cauchy sequences are. And in fact, if our pattern-matching is active, we might even notice that in (1) we had “for all p,q \geq N” which is reminiscent of the “for all j,k \geq N” that we used a lot when dealing with Cauchy sequences (the variable names don’t matter of course).

So let’s see … if S_N = \sum_{n=m}^N a_n is the Nth partial sum of the series, then \sum_{n=p}^q a_n is basically equal to S_q - S_p. (How did I see this? It was so quick it’s hard for me to say what my mind did here. I suppose it’s pattern-matching on things I’ve seen before, e.g. the integral rule \int_a^b + \int_b^c = \int_a^c, which can be written in subtracted form as \int_a^c - \int_a^b =\int_b^c.) So \left|\sum_{n=p}^q a_n\right| \leq \varepsilon is basically saying |S_q - S_p| \leq \varepsilon. So (1) is basically saying that (S_N)_{N=m}^\infty is Cauchy. And we have exactly the theorems to show that “is Cauchy” and “is convergent” are the same for sequences.

I said “basically” above because \sum_{n=p}^q a_n = S_q - S_p only holds when q \geq p. When q < p, the sum \sum_{n=p}^q a_n = 0 whereas S_q - S_p is not necessarily zero; instead it’s equal to -\sum_{n=q+1}^p a_n. And actually, I’ve made an off-by-one error: \sum_{n=p}^q a_n should actually be S_q - S_{p-1}. So this exercise is perhaps not as simple we thought! The first challenge was to figure out the “big idea” or strategy to use (i.e., recognize that Cauchyness is relevant), and the second challenge is now to make our idea actually work by getting all of the details right. Let’s write out all the variations of the claims we are working with, to get things straight:

(S_N)_{N=m}^\infty is Cauchy           (a)

For all \varepsilon > 0 there exists N \geq m such that for all p,q \geq N we have |S_q - S_p| \leq \varepsilon           (b)

For all \varepsilon > 0 there exists N \geq m such that for all p,q \geq N we have |S_q - S_{p-1}| \leq \varepsilon           (c)

For all \varepsilon > 0 there exists N \geq m such that for all p,q \geq N we have \left|\sum_{n=p}^q a_n\right| \leq \varepsilon           (d)

Our goal is to show that (a) and (d) are the same (why do we want to show both directions? It’s because the original Proposition is stated as an “if and only if”). That will allow us to use the fact that a sequence is Cauchy iff it is convergent.

It’s clear that (a) and (b) are equivalent, since (b) is just what it means for a sequence to be Cauchy. So we just need to show that (b) and (c) are equivalent and that (c) and (d) are equivalent.

(c) implies (b) since if |S_q - S_{p-1}| \leq \varepsilon holds for all p,q \geq N, then since p+1 \geq N as well, we have |S_q - S_{(p+1)-1}| \leq \varepsilon, i.e., |S_q - S_p| \leq \varepsilon.

(b) implies (c) since if we have some N such that |S_q - S_p| \leq \varepsilon for all p,q \geq N, then we can pick N+1 and see that if p,q \geq N+1 then p-1,q \geq N and so |S_q - S_{p-1}| \leq \varepsilon.

(The above two paragraphs are similar to Exercise 6.1.3 and Exercise 6.1.4. In all cases, what matters is what happens at the limits so shifting things around a bit won’t change whether a statement is true.)

So now we just have to show that (c) and (d) are equivalent. Being very mindful of off-by-one errors now, if q \geq p we have \sum_{n=p}^q a_n = S_q - S_{p-1}, and if q < p instead we have \sum_{n=p}^q a_n = 0 and

\begin{aligned}S_q - S_{p-1} &= (a_1 + \cdots + a_q) - (a_1 + \cdots + a_q + a_{q+1} + \cdots + a_p) \\ &= -(a_{q+1} + \cdots + a_p) \\ &= -\sum_{n=q+1}^p a_n\end{aligned}

So whatever p,q are, we have \left|\sum_{n=p}^q a_n\right|\leq |S_q - S_{p-1}|. This means (c) implies (d): if the bigger thing is less than \varepsilon, then so is the smaller thing. To show that (d) implies (c) as well, let \varepsilon > 0. Since (d) is true, we have some N; pick this N. Now let p,q \geq N. If q \geq p, we have \sum_{n=p}^q a_n = S_q - S_{p-1} so we have |S_q - S_{p-1}| = \left|\sum_{n=p}^q a_n\right| \leq \varepsilon. If q < p then from (d) we have \left|\sum_{n=q}^p a_n\right| \leq \varepsilon. But we know that S_q - S_{p-1} = -\sum_{n=q+1}^p a_n, so actually we want to show that \left|\sum_{n=q+1}^p a_n\right| \leq \varepsilon. But this is possible since q+1 \geq N. So in other words we have

\displaystyle |S_q - S_{p-1}| = \left|\sum_{n=q+1}^p a_n\right| \leq \varepsilon

That was a lot of fiddly details, but we’re finally done.

Model solution

Let S_N = \sum_{n=m}^N a_n be the Nth partial sum of the series \sum_{n=m}^\infty a_n. By definition of series convergence, \sum_{n=m}^\infty a_n converges iff the sequence (S_N)_{N=m}^\infty converges.

We will show that (S_N)_{N=m}^\infty is Cauchy if and only if, for every \varepsilon > 0 there exists an integer N \geq m such that for all p,q \geq N we have \left|\sum_{n=p}^q a_n \right| \leq \varepsilon. If we can do this, then Theorem 6.4.18 will guarantee the result that we are trying to show.

First suppose that (S_N)_{N=m}^\infty is Cauchy. Let \varepsilon > 0. Since (S_N)_{N=m}^\infty is Cauchy, this means that there exists an integer N \geq m such that for all p,q \geq N we have |S_q - S_p| \leq \varepsilon.  Consider the number N+1. If p,q \geq N+1, then q, p-1 \geq N. Now we have two cases. If q \geq p, then the sum \sum_{n=p}^q a_n is equal to S_q - S_{p-1} (Lemma 7.1.4(a)), so we have \left|\sum_{n=p}^q a_n\right| = |S_q - S_{p-1}| \leq \varepsilon by Cauchyness. If on the other hand q < p, then \left|\sum_{n=p}^q a_n\right| = 0 < \varepsilon (Definition 7.1.1). Let us summarize: we let \varepsilon > 0, then found a number N+1 \geq m such that if p,q \geq N+1 then \left|\sum_{n=p}^q a_n\right| \leq \varepsilon. So we’ve completed the first direction of the proof.

Now suppose that for every \varepsilon > 0 there exists an integer N \geq m such that for all p,q \geq N we have \left|\sum_{n=p}^q a_n \right| \leq \varepsilon. Let \varepsilon > 0. From our assumption, we are given some N \geq m, so pick this N. Now let p,q \geq N. We have \left|\sum_{n=p}^q a_n \right| \leq \varepsilon, and we want to show |S_q - S_p| \leq \varepsilon. We have three cases:

  • If q=p then |S_q - S_p| = 0 < \varepsilon.
  • Next, if q > p then q \geq p+1 and we have \sum_{n=p+1}^q a_n = S_q - S_p by Lemma 7.1.4(a). And since q,p+1 \geq N we can say that \left|\sum_{n=p+1}^q a_n \right| \leq \varepsilon. But this means |S_q - S_p| \leq \varepsilon as required.
  • Finally, if q < p then q+1 \leq p and we have S_q - S_p = -(S_p - S_q) = -\sum_{n=q+1}^p a_n. But since q+1, p \geq N we have \left|\sum_{n=q+1}^p a_n\right| \leq \varepsilon, so |S_q - S_p| = |S_p - S_q| = \left|\sum_{n=q+1}^p a_n\right| \leq \varepsilon as required.

In all three cases we have shown that |S_q - S_p| \leq \varepsilon so we are done.

Exercise 6.6.4

Exercise statement

Prove Proposition 6.6.5. (Note that one of the two implications has a very short proof.)

Proposition 6.6.5 (Subsequences related to limits). Let (a_n)_{n=0}^\infty be a sequence of real numbers, and let L be a real number. Then the following two statements are logically equivalent (each one implies the other):

(a) The sequence (a_n)_{n=0}^\infty converges to L.
(b) Every subsequence of (a_n)_{n=0}^\infty converges to L.

Hints

  1. For one of the directions, use Lemma 6.6.4.

How to think about the exercise

This is a pretty straightforward exercise. I could leave it at that, but I want to try something new. So here’s a kind of stream-of-consciousness account of how I approached the problem:

We want to show (a) and (b) are the same. So first suppose (a) is true. We want to show (b) is true. What does that mean? Let’s unwrap:

Let (a_n)_{n=0}^\infty be a sequence of real numbers, and suppose (a_n)_{n=0}^\infty converges to L \in \mathbf R. Now we want to show that every subsequence of (a_n)_{n=0}^\infty converges to L. It would be good to have an arbitrary subsequence and to give it a name. So let (b_n)_{n=0}^\infty be a subsequence of (a_n)_{n=0}^\infty. What does this mean? By definition, this means that there is a strictly increasing function f : \mathbf N \to \mathbf N such that b_n = a_{f(n)} for all n \in \mathbf N.

Now, we want to show that (b_n)_{n=0}^\infty converges to L. How do we do that? To show that any sequence converges at all, the usual thing to do is to let \varepsilon > 0 be arbitrary. Now we have to find some N such that n \geq N implies |b_n - L| \leq \varepsilon. How do we find such an N? The only possible way is to use some fact about (a_n)_{n=0}^\infty, because that’s the only sequence we know anything about. We know that (a_n)_{n=0}^\infty converges to L. That means that for any \varepsilon > 0 (including the one we defined earlier), there is N such that n \geq N implies |a_n - L| \leq \varepsilon. Okay, that is pretty close to what we want. Since we don’t know any other N to use, let’s just pick this N that we know exists, for now. We know that:

if n \geq N then |a_n - L|\leq \varepsilon. (*)

We want to show that:

if n \geq N then |b_n - L|\leq \varepsilon;

or, making the relation between (a_n)_{n=0}^\infty and (b_n)_{n=0}^\infty clear, we want to show that:

if n \geq N then |a_{f(n)} - L|\leq \varepsilon.

So in other words, assuming n \geq N, we have |a_n - L|\leq \varepsilon and we want |a_{f(n)} - L|\leq \varepsilon. Can we get what we want? We could if we knew that f(n) \geq N (because of (*)). In other words, we want to know if n \geq N implies f(n) \geq N. One way this could be true is if f(n) \geq n, so that f(n) \geq n \geq N. (Here I have “cheated” a bit. I have seen the pattern “f(n) \geq n” many times before, so it was immediately obvious to me that I should use it. I tried to give an account of how one might have reached the same conclusion without having known the pattern ahead of time, but I don’t know if I’ve succeeded.) This seems pretty plausibly true. The identity function seems like the slowest-growing strictly increasing function (it just counts up one by one), but other strictly increasing functions can be faster-growing (they can skip some numbers). It seems like an easy case of induction. It’s clear that f(0) \geq 0, and if f(n) \geq n then since f(n+1) > f(n), we have f(n+1) > n, so f(n+1) \geq n+1. So that completes the proof that (a) implies (b).

Now we do the other direction. Suppose every subsequence of (a_n)_{n=0}^\infty converges to L. We want to show that (a_n)_{n=0}^\infty converges to L. Since we know something about every subsequence, it seems like we ought to pick some clever particular subsequence to get what we want – oh, we can just pick (a_n)_{n=0}^\infty to be the subsequence!

Model solution

First we show that (a) implies (b). Suppose the sequence (a_n)_{n=0}^\infty converges to L. We want to show that every subsequence of (a_n)_{n=0}^\infty converges to L. So let (b_n)_{n=0}^\infty be a subsequence of (a_n)_{n=0}^\infty. This means there is some strictly increasing function f : \mathbf N \to \mathbf N such that b_n = a_{f(n)} for all n \in \mathbf N. We will show that (b_n)_{n=0}^\infty converges to L. Let \varepsilon > 0. Since (a_n)_{n=0}^\infty converges to L, there exists some N \geq 0 such that:

  • for all n\geq N we have |a_n - L| \leq \varepsilon. (*)

Pick this N, and let n \geq N. We want to show that |b_n - L| \leq \varepsilon. If we can show that f(n) \geq n for all n \in \mathbf N, then we would have f(n) \geq n \geq N, so by (*) we have |a_{f(n)} - L| \leq \varepsilon. By definition of (b_n)_{n=0}^\infty this means we have |b_n - L| \leq \varepsilon as required.

So it remains to show that f(n) \geq n for all n \in \mathbf N. We proceed by induction. The base case f(0) \geq 0 follows since the range of f is \mathbf N. Now suppose inductively that f(n) \geq n. Since f is strictly increasing, we have f(n+1) > f(n). Thus we have f(n+1) > n. By Proposition 2.2.12(e) we have f(n+1) \geq n+1 as required. This closes the induction.

Next we show that (b) implies (a). This follows immediately from Lemma 6.6.4: (a_n)_{n=0}^\infty is a subsequence of itself, so if every subsequence of (a_n)_{n=0}^\infty converges to L then in particular (a_n)_{n=0}^\infty must converge to L.

Exercise 6.6.1

Exercise statement

Let (a_n)_{n=0}^\infty, (b_n)_{n=0}^\infty, and (c_n)_{n=0}^\infty be sequences of real numbers. Then (a_n)_{n=0}^\infty is a subsequence of (a_n)_{n=0}^\infty. Furthermore, if (b_n)_{n=0}^\infty is a subsequence of (a_n)_{n=0}^\infty, and (c_n)_{n=0}^\infty is a subsequence of (b_n)_{n=0}^\infty, then (c_n)_{n=0}^\infty is a subsequence of (a_n)_{n=0}^\infty.

Hints

  1. You may need to use induction depending on how careful you want to be.

How to think about the exercise

This exercise is almost mechanically straightforward (i.e. requires just unwrapping the definitions and figuring out the right functions to use), but there is one complication, which is that Tao decided to use a “simpler” definition of a strictly increasing function on the natural numbers. Because of this definition, we have to do a little bit of extra work to show the transitivity result.

The induction is even more complicated by the fact that the statement we want to prove by induction, when phrased in a natural way, starts at 1 rather than 0, but the book does not prove that induction works with an arbitrary base case. I’ve split that part off into a lemma to make the proof here look conceptually cleaner.

Model solution

First we show that (a_n)_{n=0}^\infty is a subsequence of itself. The function f : \mathbf N \to \mathbf N defined by f(n) := n (i.e. the identity function on the set of natural numbers) is strictly increasing because f(n+1) = n+1 > n = f(n). We also have a_n = a_{f(n)} for each n \in \mathbf N. Thus (a_n)_{n=0}^\infty is a subsequence of (a_n)_{n=0}^\infty.

Now suppose that (b_n)_{n=0}^\infty is a subsequence of (a_n)_{n=0}^\infty, and that (c_n)_{n=0}^\infty is a subsequence of (b_n)_{n=0}^\infty. We want to show that (c_n)_{n=0}^\infty is a subsequence of (a_n)_{n=0}^\infty. Since (b_n)_{n=0}^\infty is a subsequence of (a_n)_{n=0}^\infty, there exists a function g : \mathbf N \to \mathbf N which is strictly increasing such that b_n = a_{g(n)} for each n \in \mathbf N. Similarly, since (c_n)_{n=0}^\infty is a subsequence of (b_n)_{n=0}^\infty, there exists a function h: \mathbf N \to \mathbf N which is strictly increasing such that c_n = b_{h(n)} for each n \in \mathbf N. By composing g and h, we can produce a function which selects elements out of (a_n)_{n=0}^\infty that make up the sequence (c_n)_{n=0}^\infty. Concretely, we have c_n = b_{h(n)} = a_{g(h(n))} = a_{(g\circ h)(n)}.

It remains to verify that g\circ h : \mathbf N \to \mathbf N is a strictly increasing function. We first show that for a strictly increasing function g : \mathbf N \to \mathbf N, if n > m then g(n) > g(m). (This is actually the same thing as Exercise 6.1.1 but I had forgotten about that exercise when I wrote this post. So you can read along below or skip the subproof and refer to Exercise 6.1.1 instead.)

Fix a natural number m, and consider the proposition P(d) defined by “g(m+d) > g(m)”. We will use induction to show that P(d) is true for all d \geq 1.

For the base case d=1, we must show that g(m+1) > g(m), but this is true since g is a strictly increasing function.

Now let d \geq 1 and suppose inductively that P(d) is true, i.e. g(m+d) > g(m). We want to show that P(d+1) is true, i.e. g(m+d+1) > g(m). By inductive hypothesis, we have g(m+d) > g(m). Also, because g is strictly increasing, we have g((m+d) + 1) > g(m+d). So by transitivity of inequality, we have g(m+d+1) > g(m) as required. This closes the induction.

Now to actually prove the result that if n > m then g(n) > g(m), we see that n-m \geq 1 so P(n-m) is true, i.e. g(n)= g(m+(n-m)) > g(m).

With the result above, it is easy to show that g\circ h is a strictly increasing function. Since h is a strictly increasing function, we have h(n+1) > h(n) by definition. Now using the result above, we have g(h(n+1)) > g(h(n)). But this is the same thing as (g\circ h)(n+1) > (g\circ h)(n).

Induction with arbitrary base case

In this post, I will state a version of mathematical induction that uses an arbitrary natural number base case. I will assume that the reader has worked through the book up to and including section 2.2; I will also be making use of of subtracting, which is only introduced in section 4.1 (but the use of subtraction is not fundamental, and the same result can be shown using Lemma 2.2.10 instead, and replacing all mentions of n-1 with “the number a such that a{{+}\!{+}} = n”).

I’m splitting this result off as a lemma because it’s very useful for Exercise 6.6.1 but is very much not the point of Exercise 6.6.1, so I thought it would be distracting to include as part of the solution for that exercise.

Exercise statement

Lemma. Let n_0 be a natural number, and let P(n) be a property pertaining to an arbitrary natural number n. Suppose that P(n_0) is true. Suppose also that for each n\geq n_0, if P(n) is true then P(n+1) is also true. Then P(n) is true for all n \geq n_0.

Model solution 1

In this solution, we make use of the strong principle of induction. The strong principle of induction is a stronger version of induction with arbitrary base case, since it already allows an arbitrary case (and is stronger because it allows one to use more assumptions when proving the inductive case). So the idea below is to specialize/weaken the strong principle of induction.

Let n_0 be a natural number, and let P(n) be a property pertaining to an arbitrary natural number n. Suppose that P(n_0) is true. Suppose also that for each n\geq n_0, if P(n) is true then P(n+1) is also true. We must show that P(n) is true for all n \geq n_0.

We use the strong principle of induction. We must show the following implication for each n \geq n_0: if P(n') is true for all natural numbers n_0 \leq n' < n, then P(n) is also true. So let n \geq n_0 be a natural number, and suppose P(n') is true for all natural numbers n_0 \leq n' < n. We have two cases:

  • If n=n_0, then there is no natural number n' such that n_0 \leq n' < n_0, so the hypothesis of the implication is vacuously true. That means we have to show that P(n_0) is true. But we are already given that P(n_0) is true.
  • If n > n_0, then we have n-1 \geq n_0 by Proposition 2.2.12(e) (why?). So we have n_0 \leq n-1 < n. Since we assumed that P(n') is true for all natural numbers n_0 \leq n' < n, this means that in particular P(n-1) is true. Since n-1 \geq n_0, we have that P((n-1)+1) is true, i.e. P(n) is true as required.

We showed that P(n) is true in both cases, so this closes the induction.

Model solution 2

Let n_0 be a natural number, and let P(n) be a property pertaining to an arbitrary natural number n. Suppose that P(n_0) is true. Suppose also that for each n\geq n_0, if P(n) is true then P(n+1) is also true. We must show that P(n) is true for all n \geq n_0.

Define the statement Q(m) to be P(n_0+m). We will show using (ordinary) induction that Q(m) is true for all natural numbers m.

For the base case, we must show that Q(0) is true. But this is just the statement P(n_0), which we assumed to be true.

Now suppose inductively that Q(m) is true. We must show that Q(m+1) is true. Since Q(m) is true, we know that P(n_0+m) is true. By assumption about the property P, we know that P(n_0+m+1) is true. But this means Q(m+1) is true. This closes the induction.

We have now shown that Q(m) is true for all natural numbers m. Unwrapping the definition of Q, this means that P(n_0+m) is true for all m. But our goal was to show that P(n) is true for all n \geq n_0. So let n \geq n_0. By Definition 2.2.11, this means that there is some m such that n = n_0 + m. But we know that for all m the statement P(n_0+m) is true. So this means that in particular for our m, we have P(n_0+m). But since n = n_0 + m this means that P(n) as desired.