Exercise 7.2.1

Exercise statement

Is the series \sum_{n=1}^\infty (-1)^n convergent or divergent? Justify your answer. Can you now resolve the difficulty in Example 1.2.2?

Hints

None.

How to think about the exercise

This is a straightforward exercise.

Model solution

The series \sum_{n=1}^\infty (-1)^n diverges by the zero test, Corollary 7.2.6. Indeed, the limit \lim_{n\to\infty} (-1)^n does not exist as the sequence ((-1)^n)_{n=1}^\infty = (-1,1,-1,1,-1,\ldots) oscillates between -1 and 1. To prove that \lim_{n\to\infty} (-1)^n does not exist, let \varepsilon := 1. Then no matter how big N \geq 1 is, we can pick j := N and k:=N+1. We have |(-1)^j - (-1)^k| = 2 > \varepsilon. Thus the sequence is not Cauchy, so by Theorem 6.4.18 it is not convergent. (Alternatively, we could have used Lemma 6.5.2.)

The reasoning in Example 1.2.2 is not valid because the series does not converge, so variable S is not a real number and thus cannot be manipulated in the usual way using the laws of algebra.

Exercise 9.3.4

Exercise statement

Propose a definition for limit superior \limsup_{x\to x_0; x\in E} f(x) and limit inferior \liminf_{x\to x_0; x\in E} f(x), and then propose an analogue of Proposition 9.3.9 for your definition. (For an additional challenge: prove that analogue.)

Hints

None.

How to think about the exercise

Let’s recall how the limit superior was defined for sequences of real numbers. Given a sequence (a_n)_{n=0}^\infty, we first defined the auxiliary sequence (a^+_N)_{N=0}^\infty by a^+_N := \sup(a_n)_{n=N}^\infty, then we took the infimum of the sequence (which is equivalent to taking the limit, since the sequence is monotonic so always converges): \limsup_{n\to\infty} a_n := \inf(a^+_N)_{N=0}^\infty = \lim_{N \to \infty} a^+_N.

So to define a limit superior, we need to take the supremum over something (and this something must be indexed somehow), then take the limit using the index. To help us see the similarities more with the real-valued functions case, let’s rewrite the limit superior for sequences as \limsup_{n\to\infty} a_n = \lim_{N \to \infty} \sup \{a_n : n \geq N\}.

So our goal now is to figure out the set we are taking the supremum over. Which set? This set must get smaller over time, so that the limit is defined (otherwise we wouldn’t have a guarantee that the sequence is monotonic). In other words, we want to “go further” in some relevant sense. In the case of sequences, we “go further” in the limit by taking larger values of n, but in the case of functions we “go further” by taking x values closer to x_0. So the trick is to take a set that shrinks around the point x_0. In particular, we can take \{f(x) : x \in E \text{ and } |x - x_0| < \delta\}. The smaller the value of \delta, the smaller the set. So we can define \limsup_{x\to x_0; x\in E} f(x) := \lim_{\delta \to 0} \sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\}. Here it is understood that when we take the limit as \delta \to 0 only positive values of \delta are considered (we could be clearer about this by writing something like \lim_{\delta \to 0; \delta > 0}). (TODO: Tao’s book doesn’t actually define divergent limits, so this part does fail in some edge cases, and so we either need to define such limits or use inf/sup. Thanks to crabman for the catch.)

(THIS IS WRONG. I WILL FIX LATER. Thanks to Karim Taha for the catch.) Now how do we state the analogue of Proposition 9.3.9? For part (a), instead of \lim_{x \to x_0; x\in E} f(x) = L we want to say \limsup_{x\to x_0; x\in E} f(x). For part (b), we can take a sequence (a_n)_{n=0}^\infty converging to x_0, but instead of saying \lim_{n\to \infty} f(a_n) = L, we can use the limit superior for sequences to say \limsup_{n\to \infty} f(a_n) = L.

If you want to think about this topic more, see the first edition of Pugh’s Real Mathematical Analysis, Chapter 3 Exercise 26, which makes the “go further” idea above more rigorous.

Model solution

We define the limit superior \limsup_{x\to x_0; x\in E} f(x) as \lim_{\delta \to 0; \delta > 0} \sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\}. Alternatively, using infimum instead of a limit we could define it as \inf\{\sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\} : \delta > 0\}.

(THIS IS WRONG. I WILL FIX LATER.) The analogue of Proposition 9.3.9 is as follows: Let X be a subset of \mathbf R, let f : X \to \mathbf R be a function, let E be a subset of X, let x_0 be an adherent point of E, and let L be a real number. Then the following two statements are logically equivalent:

(a) \limsup_{x\to x_0; x\in E} f(x)=L
(b) For every sequence (a_n)_{n=0}^\infty which consists entirely of elements of E, which converges to x_0, we have \limsup_{n\to\infty} f(a_n)=L.

The case for limit inferior is very similar: just interchange “superior” and “inferior” everywhere.

Exercise 8.3.3

Exercise statement

Recall from Exercise 3.6.7 that a set A is said to have lesser or equal cardinality than a set B iff there is an injective map f : A \to B from A to B. Using Exercise 8.3.2, show that if A,B are sets such that A has lesser or equal cardinality to B and B has lesser or equal cardinality to A, then A and B have equal cardinality. (This is known as the Schröder–Bernstein theorem, after Ernst Schröder (1841–1902) and Felix Bernstein (1878–1956).)

Hints

  1. Draw a picture.

How to think about the exercise

This is a quick corollary of Exercise 8.3.2. The only trick is to figure out what parts from Exercise 8.3.2 correspond to what parts of this current exercise. More precisely, we need to say what all of the objects named in Exercise 8.3.2 will be in terms of the objects introduced in Exercise 8.3.3.

Model solution 1

Let f : A \to B be an injection, and let g : B \to A be an injection. Let \tilde f : A \to f(A) be the map defined by \tilde f(x) := f(x) for all x \in A. Then \tilde f is a bijection since f is an injection. This means that \tilde f \circ g : B \to f(A) is an injection, since it is a composition of injections. Since f(A) \subseteq B, we can apply Exercise 8.3.2 (with f(A) in place of “A”, B in place of “B”, B in place of “C”, and \tilde f \circ g : B \to f(A) in place of “f : C \to A”) to produce a bijection h : f(A) \to B. Now consider h \circ \tilde f : A \to B. This is a composition of bijections so is itself a bijection.

Model solution 2

Thanks to Satira for this solution.

Let f : A \to B be an injection, and let g : B \to A be an injection. Since f and g are both injections, their composition f \circ g : B \to B is also an injection. This means that there is a bijection B \to f\circ g(B). By Exercise 8.3.2 (with f\circ g(B) in place of “A”, f(A) in place of “B”, B in place of “C”, and the bijection B \to f\circ g(B) in place of “f : C \to A”) there is a bijection f \circ g(B) \to f(A). Now we have a bijection f \circ g(B) \to f(A) and a bijection B \to f\circ g(B), so we can combine these to get a bijection f(A) \to B. But f provides a bijection A \to f(A) so we can combine these to obtain a bijection A \to B.

Here is a summary of all of the bijections, where each arrow represents a bijection: A \to f(A) \leftarrow f\circ g(B) \leftarrow B. Since the inverse of a bijection is also a bijection, we can traverse the arrows from A to B to obtain a bijection.

Exercise 8.3.2

Exercise statement

Let A,B,C be sets such that A \subseteq B \subseteq C, and suppose that there is an injection f : C \to A. Define the sets D_0, D_1, D_2, \ldots recursively by setting D_0 := B\setminus A, and then D_{n+1} := f(D_n) for all natural numbers n. Prove that the sets D_0, D_1, \ldots are all disjoint from each other (i.e., D_n \cap D_m = \emptyset whenever n \ne m). Also show that if g : A \to B is the function defined by setting g(x) := f^{-1}(x) when x \in \bigcup_{n=1}^\infty D_n, and g(x) := x when x \notin \bigcup_{n=1}^\infty D_n, then g does indeed map A to B and is a bijection between the two. In particular, A and B have the same cardinality.

Hints

  1. Draw a picture.
  2. You don’t lose much by taking C=B, and this simplifies the picture.
  3. Beware that Tao’s notation is non-standard here: typically in setting up the Schröder–Bernstein theorem we are given injections f : A \to B and g : B \to A (with no subset relation assumed between A and B) and are asked to define a bijection h : A \to B. In Tao’s book, the usual f is just the inclusion map \iota_{A\to B} : A\to B (see Exercise 3.3.8 for a definition), the usual g : B\to A is instead called f: C \to A, and the usual bijection h : A \to B is instead called g : A \to B. If you are referring to other expositions as you work on this exercise, you may want to temporarily convert Tao’s notation to match the notation in these other expositions.

How to think about the exercise

This is a pretty tough exercise, if you really want to understand what is going on instead of just bashing out a proof.

To simplify the exposition, let us assume that C = B. At the end of this section we will show how to adjust in the case where we only assume B \subseteq C, and in the model solution below we will revert to only assuming B \subseteq C.

To set the stage, let’s draw a picture of two sets A, B with A \subseteq B:

(Throughout this post, I will be copying from Richard Hammack’s exposition the idea of using suggestive shapes for the two sets as well as the idea generally of visually representing the sets by embedding them within each other.)

We define D_0 := B \setminus A:

We have an injection f : C \to A, i.e. f : B \to A since we assumed C=B. This means that we can put a copy f(B) \subseteq A of the set B inside A:

Notice two things: first the shape of f(B) is the same as the shape of B. This is because we assumed that f is injective, hence it doesn’t “deform” the shape by mapping multiple starting points onto the same point. Second, f(B) does not take up all of A, since we didn’t assume that f is a bijection.

Now we define D_1 := f(D_0). Since D_0 \subseteq B, we have f(D_0) \subseteq f(B). Thus we should see a copy of the green stuff within the copy of B that we produced:

We can keep going further with this same process. Our copy of B (namely f(B)) has a copy of A within it (namely f(A)) so using f we can place a copy of our copy of B (namely f(f(B))) inside our copy of A (namely f(A)):

By the injectivity of f, this “copy of a copy” of B again has a shape similar to that of the original B. The green stuff that is D_1 is embedded within f(B), so it also gets copied inside of f(f(B)). We can define D_2 := f(D_1) and see this as the third layer of green stuff in the picture:

Hopefully by this point you get the idea. As we make more and more copies of B within copies of A within copies of B, we accumulate more and more green stuff. The union of all of the green stuff is \bigcup_{n=0}^\infty D_n, and the white stuff we are left with is the complement, B \setminus \bigcup_{n=0}^\infty D_n.

Having all of these copies within copies is all nice and trippy, but how does this help us define a bijection g : A \to B? What do we even want, in terms of the picture we have drawn? We want to take each point in A and map it to some point in B in such a way that each point in B corresponds to exactly one point in A. The trick to doing this is the following: each point colored in white can stay put, and each point colored in green can “climb up” one level to hit a green point in an outer level:

In other words, the points in D_1 map to D_0 via f^{-1}, the points in D_2 map to D_1 via f^{-1}, and so on (that’s all the green stuff “climbing up”), while the points in the rest of A just stay where they are (that’s all the white stuff). If you look at the picture, I hope you can see that no matter which point in B you pick, there will be exactly one point in A that maps to it.

Now that we have the idea, how do we write down the formal definition of this bijection g : A \to B? The green stuff is \bigcup_{n=0}^\infty D_n, so you might be tempted to write the following:

\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=0}^\infty D_n \\ x & \text{otherwise}\end{cases}

But actually, what is D_0? We defined D_0 := B\setminus A, so the points in D_0 aren’t even in the domain of g! This means we don’t need to include D_0 in the union, so we can instead define:

\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=1}^\infty D_n \\ x & \text{otherwise}\end{cases}

with the union index starting at 1. In the end, starting the union at 0 or 1 does not change the behavior of g: we never encounter the case where x \in D_0 because A \cap D_0 is empty.

That concludes the intuition for the construction outlined in the exercise statement. Here are a few other points I want to make:

  • Why do we need to show that the sets D_0, D_1, D_2, \ldots are all disjoint from each other? It is not necessary to show this in order to prove the bijectivity of g. My understanding is that it is just an interesting additional fact one can prove about this setup, and also justifies drawing the separate green parts in a way that looks disjoint.
  • Why do we assume that f : C \to A is an injection rather than a bijection (as was originally the case in the 1st and 2nd editions of the book)? I haven’t thought about this in detail. I think it makes sense to use the weaker assumption of injectivity if bijectivity is not required, especially since the Schröder–Bernstein theorem only assumes injectivity.
  • What happens when we only assume B \subseteq C rather than C = B as above? The pictures above change slightly: now there is a bigger set C surrounding both B and A. Otherwise the pictures don’t change. In the proof there is slightly more work since we must verify that when we use f^{-1} in the definition of g, the output lands in B instead of C.
  • Since f is assumed to be an injection rather than a bijection, what does it even mean to write f^{-1}(x) for some x \in A? For an injection f : C \to A, the map \tilde f : C \to f(C) is a bijection (why?), so \tilde f^{-1} : f(C) \to C is a (bijective) function. So whenever x \in f(C) we can define f^{-1}(x) := \tilde f^{-1}(x). If x \notin f(C) we leave f^{-1}(x) undefined. This means that in the model solution below, when we define g, we must verify that if x \in \bigcup_{n=1}^\infty D_n then x \in f(C).
  • Why did Tao assume that A \subseteq B instead of allowing A, B to be unrelated sets (like in most expositions of the Schröder–Bernstein theorem)? Assuming A \subseteq B simplifies the drawing: if A,B are unrelated sets, one would need to draw two pictures instead of one (like on this page). By making A a subset of B, there is just one picture, and there is a simple rule for defining the bijection g: if you’re in a white area, stay put, and if you’re in a green area, “climb up” one level to take up a larger amount of space in a bijective way.

Finally, I want to consider a particularly simple case of the construction outlined in this exercise: what if the sets A,B,C are all finite? If there is an injection f:C\to A, that means \#(C) \leq \#(A) by Exercise 3.6.7. But since A \subseteq C, we also have \#(A) \leq \#(C) by Proposition 3.6.14(c). Thus \#(A) = \#(C). If A \ne C then we would have \#(A) < \#(C) by Proposition 3.6.14(c) again, a contradiction. Thus we must have A = C. Similarly we have \#(A) \leq \#(B) and \#(B) \leq \#(C) = \#(A) so \#(A) = \#(B). Since A \subseteq B, we can use Proposition 3.6.14(c) to conclude again that A=B. Thus in the finite case, all three sets must be the same. So we can try something like A=B=C=\{1,2,3\}. Then D_0 = B\setminus A = \emptyset, and D_1 = f(D_0) = f(\emptyset) = \emptyset, and so on, so that \bigcup_{n=1}^\infty D_n = \emptyset as well. This means that in the definition of g, we only ever encounter the case where g(x)=x. This does indeed give a bijection, since the identity map \iota_{A\to A} is a bijection.

Model solution

First we show that the sets D_0, D_1, \ldots are all disjoint from each other. We do this by showing the following statement: for every natural number n and every positive integer k, the sets D_n and D_{n+k} are disjoint. Once we have proved this, given two distinct natural numbers, we can let m be the larger one so that m > n. Then we can write m = n + k for some positive integer k and see that D_n and D_m = D_{n+k} are disjoint.

To prove the statement from above, let k be a positive integer. We will induct on n. For the base case, n=0, we must show that D_0 and D_k are disjoint. But D_0 = B \setminus A and D_k = f^k(D_0)\subseteq A so these two sets are disjoint: if x \in D_0 \cap D_k we would have x \notin A and x \in A, a contradiction. Now suppose inductively that D_n and D_{n+k} are disjoint. We need to show that D_{n+1} and D_{n+1+k} are disjoint. Suppose for sake of contradiction that there is some x \in D_{n+1} \cap D_{n+1+k}. Then since D_{n+1} = f(D_n) and D_{n+1+k} = D_{n+k+1} = f(D_{n+k}) there exist z \in D_n and z' \in D_{n+k} such that x = f(z) and x = f(z'). But f is injective so f(z) = x = f(z') implies z=z'. Since D_n and D_{n+k} are disjoint, this is impossible. This closes the induction.

Now define g : A \to B as in the exercise statement:

\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=1}^\infty D_n \\ x & \text{otherwise}\end{cases}

We first verify that g maps from A to B. The things to check are:

  • f^{-1}(x) is defined for each x \in \bigcup_{n=1}^\infty D_n: the value f^{-1}(x) is only defined when x \in f(C), so we must verify that if x \in \bigcup_{n=1}^\infty D_n then x \in f(C). So suppose x \in \bigcup_{n=1}^\infty D_n. Then x \in D_{n+1} for some natural number n, so we have x \in D_{n+1} = f(D_n) \subseteq f(C) as required.
  • If x \in \bigcup_{n=1}^\infty D_n then g(x) := f^{-1}(x) lands in B: We have x \in f(D_n) for some natural number n, so x = f(z) for some z \in D_n. Thus f^{-1}(x) = f^{-1}(f(z)) = z \in D_n. But D_0 = B\setminus A \subseteq B and D_m \subseteq A \subseteq B for each positive m, so in any case we have z \in B as required.
  • If x \notin \bigcup_{n=1}^\infty D_n then g(x) := x lands in B: In this case, we still have x \in A since x is in the domain of g, which means x \in B since A \subseteq B.

Thus g maps from A to B.

Now we show that g is injective. Let x, y \in A. We must show that g(x) = g(y) implies x=y. So suppose g(x) = g(y). We have three cases:

  • Both x and y are in \bigcup_{n=1}^\infty D_n: We have f^{-1}(x) = f^{-1}(y). Since f is injective, f^{-1} : f(C) \to C is a bijection. Thus whenever f^{-1} is defined, we can assume it acts bijectively. We thus have x=y.
  • Neither x nor y is in \bigcup_{n=1}^\infty D_n: By definition of g we have g(x)=x and g(y)=y so g(x)=g(y) means x=y.
  • Exactly one of the two elements is in \bigcup_{n=1}^\infty D_n: We can suppose x \in \bigcup_{n=1}^\infty D_n and y \notin \bigcup_{n=1}^\infty D_n (we can relabel x and y and repeat the same steps if it is the other way around). We have g(x) = f^{-1}(x) and g(y)=y so f^{-1}(x) = y. But x \in D_{n+1} for some natural number n, so x \in f^{n+1}(B\setminus A). Thus f^{-1}(x) \in f^n(B \setminus A). This means f^{-1}(x) is either in B\setminus A or is in \bigcup_{n=1}^\infty D_n. But y is in A (as it is in the domain of g) so it can’t be in B\setminus A, and y \notin \bigcup_{n=1}^\infty D_n by hypothesis of this case, so we have a contradiction. This means that this case actually isn’t possible so we don’t need to worry about it.

Now we show that g is surjective. To do this, let y \in B. We must find some x \in A such that g(x) = y. We have two cases, y \in \bigcup_{n=0}^\infty D_n or y \notin \bigcup_{n=0}^\infty D_n.

  • If y \in \bigcup_{n=0}^\infty D_n then y \in D_n for some natural number n so f(y) \in f(D_n), which means that f(y) \in \bigcup_{n=1}^\infty D_n. Thus if we let x := f(y) then we have g(x) = g(f(y)) = f^{-1}(f(y)) = y.
  • On the other hand, if y \notin \bigcup_{n=0}^\infty D_n then y is also not in the smaller set \bigcup_{n=1}^\infty D_n. Thus for x := y we have g(x) = g(y) = y.

In each case we have found some x \in A such that g(x) = y, so g is surjective.

Acknowledgments: Thanks to Satira for having extensive discussions with me about this exercise.

Exercise 6.4.1

Exercise statement

Prove Proposition 6.4.5.

Proposition 6.4.5 (Limits are limit points). Let (a_n)_{n=m}^\infty be a sequence which converges to a real number c. Then c is a limit point of (a_n)_{n=m}^\infty, and in fact it is the only limit point of (a_n)_{n=m}^\infty.

Hints

None.

How to think about the exercise

This is a straightforward exercise so I don’t have anything to say.

Model solution 1

Let (a_n)_{n=m}^\infty be a sequence which converges to a real number c. To show that c is a limit point of this sequence, let \varepsilon > 0 be a real number and let N' \geq m be an integer. Since the sequence converges to c, we know that there is some N \geq m such that |a_n - c| \leq \varepsilon for every n \geq N. We want to find some n \geq N' such that |a_n - c| \leq \varepsilon. We can take n := N+N'. Then we see that n \geq N', and since n \geq N we have |a_n-c|\leq \varepsilon as required.

Next we show that c is the only limit point of the sequence. Suppose for sake of contradiction that there there is another limit point c' \ne c. Put \varepsilon := |c-c'|/3. Since c\ne c', we know that \varepsilon is positive. Since the sequence converges to c, there is some N \geq m such that |a_n-c|\leq \varepsilon for all n\geq N. So fix this N. Since c' is a limit point, there exists an n \geq N such that |a_n - c'| \leq \varepsilon. Fix this n. Since n \geq N, we also know that |a_n-c|\leq \varepsilon. But now we have

|c-c'| = |c-a_n+a_n-c'| \leq |c-a_n| + |a_n-c'| \leq 2|c-c'|/3

which is a contradiction since |c-c'| > 0.

Model solution 2

Here’s an alternative way to prove the second part of this exercise.

We want to show that c is the only limit point of the sequence. Let c' be a limit point (not necessarily distinct from c), and let \varepsilon > 0 be arbitrary. Since the sequence converges to c, there is some N \geq m such that |a_n - c| \leq \varepsilon/2 for all n \geq N. Fix this N. Since N \geq m and c' is a limit point, there is some n \geq N such that |a_n - c'| \leq \varepsilon/2. Fix this n. Since n \geq N and the sequence converges to c, we have |a_n - c| \leq \varepsilon/2. Thus we have both |a_n-c'| \leq \varepsilon/2 and |a_n - c| \leq \varepsilon/2. But this means |c-c'| = |c-a_n+a_n-c'|\leq |c-a_n| + |a_n-c'| \leq \varepsilon. Since \varepsilon > 0 was arbitrary, we have just shown that |c-c'| \leq \varepsilon for every \varepsilon > 0. By Exercise 5.4.7 this means that c=c'.

Exercise 3.3.1

Exercise statement

Show that the definition of equality in Definition 3.3.7 is reflexive, symmetric, and transitive. Also verify the substitution property: if f, \tilde f: X \to Y and g, \tilde g: Y \to Z are functions such that f = \tilde f and g = \tilde g, then g \circ f = \tilde g \circ \tilde f.

Hints

None.

How to think about the exercise

This is a straightforward exercise. I do want to mention a couple of things though:

  • Definition 3.3.7 assumes that there is a notion of equality on elements of the range. If there was no such notion, then the statement f(x)=g(x) that appears in the definition wouldn’t make sense.
  • The exercise further assumes that the notion of equality on elements of the range obey the properties of an equivalence relation (reflexive, symmetric, transitive) as well as the substitution axiom; these four properties are called the axioms of equality in Appendix A.7. There is no way to complete this exercise without assuming these properties.

Model solution

To show the definition is reflexive, let f:X \to Y be a function. We must show f = f. Let x \in X. Then f(x) = f(x) by reflexivity of equality on elements of Y. Since x was arbitrary, this shows that f=f.

To show the definition is symmetric, let f : X \to Y and g : X \to Y be two functions with the same domain and range. Suppose f=g. This means that f(x)=g(x) for all x \in X. To show that g=f, we must show that g(x)=f(x) for all x \in X. So let x \in X. Then since f=g we have f(x)=g(x). By the symmetry of equality on the elements of Y, we have g(x)=f(x). Since x was arbitrary, this shows that g=f.

To show the definition is transitive, let f:X \to Y, g:X\to Y, and h:X\to Y be functions with the same domain and range. Suppose f=g and g=h. We must show that f=h. So let x \in X. Then we have f(x)=g(x) since f=g. We also have g(x)=h(x) since g=h. By transitivity of equality on elements of Y, we have f(x)=h(x). Since x was arbitrary, this shows that f=h.

Now let f, \tilde f: X \to Y and g, \tilde g: Y \to Z be functions such that f = \tilde f and g = \tilde g. We want to show g \circ f = \tilde g \circ \tilde f. Let x \in X. Then since f = \tilde f we have f(x) = \tilde f(x). By the substitution axiom of equality for elements of Y (see Appendix A.7), we have g(f(x)) = g(\tilde f(x)). Since g = \tilde g, we have g(\tilde f(x)) = \tilde g(\tilde f(x)). Now we have (g\circ f)(x) = g(f(x)) = g(\tilde f(x)) = \tilde g(\tilde f(x)) = (\tilde g \circ \tilde f)(x) using symmetry and transitivity of equality on elements of Z. Since x \in X was arbitrary, this shows that g \circ f = \tilde g \circ \tilde f as required.

Exercise 6.3.4

Exercise statement

Explain why Proposition 6.3.10 fails when x > 1. In fact, show that the sequence (x^n)_{n=1}^\infty diverges when x > 1. Compare this with the argument in Example 1.2.3; can you now explain the flaws in the reasoning in that example?

Hints

  1. Prove by contradiction and use the identity (1/x)^n x^n = 1 and the limit laws in Theorem 6.1.19.

How to think about the exercise

One thing that’s good to do when you’re trying to understand why a theorem fails in a particular case is to run through the proof of the theorem and locate the first step where the proof doesn’t make sense. This point comes early in the proof of Proposition 6.3.10 in the book. In particular, the sequence (x^n)_{n=1}^\infty is not decreasing when x > 1.

Model solution

Let x > 1. Suppose for sake of contradiction that \lim_{n\to\infty} x^n = L for some real number L, i.e. that the sequence (x^n)_{n=1}^\infty converges. Since x > 1, we have 0 < 1/x < 1. Thus by Proposition 6.3.10 we have \lim_{n\to\infty} (1/x)^n = 0. By the limit laws (Theorem 6.1.19(b)) we thus have \lim_{n\to\infty} ((1/x)^n x^n) = \left(\lim_{n\to\infty} (1/x)^n\right) \left(\lim_{n\to\infty} x^n\right) = 0L = 0. But (1/x)^n x^n = 1 by Proposition 4.3.10(a), which we can use thanks to Proposition 5.6.3. Thus we can compute \lim_{n\to\infty} ((1/x)^n x^n) = \lim_{n\to\infty} 1 = 1. This is a contradiction, and completes the proof.

Example 1.2.3 assumes that the limit \lim_{n\to\infty} x^n exists for all real x, which is false. If we restrict to 0 < x < 1 then indeed we have \lim_{n\to\infty} x^n = 0 by Proposition 6.3.10.

Exercise 7.3.2

Exercise statement

Prove Lemma 7.3.3.

Lemma 7.3.3 (Geometric series). Let x be a real number. If |x| \geq 1, then the series \sum_{n=0}^\infty x^n is divergent. If however |x| < 1, then the series is absolutely convergent and

\displaystyle \sum_{n=0}^\infty x^n = 1/(1-x).

Hints

  1. For the first part, use the zero test.
  2. For the second part, first use induction to establish the geometric series formula \sum_{n=0}^N x^n = (1-x^{N+1})/(1-x) and then apply Lemma 6.5.2.

How to think about the exercise

In the case where |x| < 1, Tao suggests to “use induction”. I think this is a pretty unhelpful hint. It nudges you toward a particular unenlightening kind of induction proof where you just blindly go through the motions – it is a totally valid proof, and will work without any trouble, but I just don’t like it. Here I want to walk through a better proof.

Let N be a natural number. The Nth partial sum of the series is

\displaystyle S_N := \sum_{n=0}^N x^n = 1 + x + x^2 + \cdots + x^{N-1} + x^N

The big insight is to notice that if you multiply this by x, you almost get the (N+1)th partial sum, S_{N+1}. In fact, we have

\displaystyle 1 + xS_N = 1 + x + x^2 + x^3 + \cdots + x^N + x^{N+1} = S_{N+1}

But for any series, the difference between the (N+1)th partial sum and the Nth partial sum is just the (N+1)th term. In other words, we have S_{N+1} - S_N = x^{N+1}. Substituting the identity S_{N+1} = 1 + xS_N from above, we have

x^{N+1} = S_{N+1} - S_N = (1 + xS_N) - S_N = 1 + (x-1)S_N

Rearranging, we obtain the desired formula

\displaystyle S_N = \frac{1 - x^{N+1}}{1-x}

Dividing by 1-x is valid since |x| < 1 hence x\ne 1.

The above wasn’t entirely rigorous because we used a bunch of “\cdots” to make it easier to see the pattern. But it is very simple to convert this into a rigorous proof, which we will do below in the model solution.

Model solution

First suppose |x| \geq 1. Consider the terms of this series 1, x, x^2, x^3, \ldots. By Lemma 6.5.2, this sequence either converges to 1 or else diverges. In either case, the sequence does not converge to zero, so by the zero test (Corollary 7.2.6) the series \sum_{n=0}^\infty x^n diverges.

Now suppose |x| < 1. We will show the geometric series formula \sum_{n=0}^N x^n = (1-x^{N+1})/(1-x). Write S_N for the Nth partial sum. Then we have

\displaystyle \begin{aligned}1 + xS_N &= 1 + x\sum_{n=0}^N x^n \\ &= 1 + \sum_{n=0}^N x^{n+1} \\ &= 1 + \sum_{n=1}^{N+1} x^n \\ &= \sum_{n=0}^{N+1} x^n \\ &= S_{N+1}\end{aligned}

We also have S_{N+1} - S_N = x^{N+1} for any series. Combining these two equalities and rearranging, we have S_N = (1-x^{N+1})/(1-x) as desired.

To show that the series \sum_{n=0}^\infty x^n is convergent, we directly compute the limit \lim_{N\to\infty} S_N. By Lemma 6.5.2 we have \lim_{N\to\infty} x^{N+1} = 0. Thus by the limit laws we have

\displaystyle \lim_{N\to\infty} S_N = \lim_{N\to\infty} \frac{1 - x^{N+1}}{1-x} = \frac{1 - \lim_{N\to\infty} x^{N+1}}{1-x} = \frac{1}{1-x}

as desired.

Finally we show that when |x| < 1 the series is absolutely convergent. We must show that the series \sum_{n=0}^\infty |x^n| converges. By Proposition 4.3.10(d) (which we can use thanks to Proposition 5.6.3), we have |x^n| = |x|^n. Thus it suffices to show that the series \sum_{n=0}^\infty |x|^n converges. But this is just a geometric series with the common ratio |x|, so the series converges (to \sum_{n=0}^\infty |x|^n = 1/(1-|x|)) by our proof above.

Exercise 7.2.4

Exercise statement

Prove Proposition 7.2.9.

Proposition 7.2.9 (Absolute convergence test). Let \sum_{n=m}^\infty a_n be a formal series of real numbers. If this series is absolutely convergent, then it is also conditionally convergent. Furthermore, in this case we have the triangle inequality

\displaystyle \left|\sum_{n=m}^\infty a_n\right| \leq \sum_{n=m}^\infty |a_n|

Hints

  1. Use Proposition 7.2.5.
  2. Use Proposition 7.1.4(e).

How to think about the exercise

In the proof below, we use Proposition 7.2.5 twice: once to go from the fact that \sum_{n=m}^\infty a_n absolutely converges to an inequality, and a second time to go from an inequality to the fact that \sum_{n=m}^\infty a_n conditionally converges. Make sure you introduce \varepsilon > 0 at the start of the proof, before you use Proposition 7.2.5 for the first time.

Model solution

Suppose \sum_{n=m}^\infty a_n is absolutely convergent. We want to show this series is conditionally convergent. To do so, we will use Proposition 7.2.5, which is also known as Cauchy’s convergence test. So let \varepsilon > 0 be a real number. Since \sum_{n=m}^\infty a_n is absolutely convergent, this means that \sum_{n=m}^\infty |a_n| is a convergent series. Thus there exists an integer N \geq m such that \left|\sum_{n=p}^q |a_n|\right| \leq \varepsilon for all p,q \geq N. The quantity \sum_{n=p}^q |a_n| is non-negative for all p,q (this is an easy induction), so in fact we have \left|\sum_{n=p}^q |a_n|\right| = \sum_{n=p}^q |a_n| which means that \sum_{n=p}^q |a_n| \leq \varepsilon for all p,q \geq N.

By the triangle inequality of Proposition 7.1.4(e), we have \left|\sum_{n=p}^q a_n\right| \leq \sum_{n=p}^q |a_n| for all p,q. Thus we have found an N\geq m such that for all p,q \geq N we have

\displaystyle \left|\sum_{n=p}^q a_n\right| \leq \sum_{n=p}^q |a_n| \leq \varepsilon

which proves convergence of \sum_{n=m}^\infty a_n.

It remains to show the triangle inequality, \left|\sum_{n=m}^\infty a_n\right| \leq \sum_{n=m}^\infty |a_n|. Let \alpha_N := \left|\sum_{n=m}^N a_n\right| be the absolute value of the Nth partial sum of the series \sum_{n=m}^\infty a_n, and let \beta_N := \sum_{n=m}^N |a_n| be the Nth partial sum of the series \sum_{n=m}^\infty |a_n|. By Proposition 7.1.4(e), we have \alpha_N \leq \beta_N for each N\geq m. Thus by the comparison principle, Lemma 6.4.13, we have the inequality \limsup_{N\to\infty} \alpha_N \leq \limsup_{N\to\infty} \beta_N.

We proved above that the series \sum_{n=m}^\infty a_n converges. Thus by the limit laws, Theorem 6.1.19 (c) and (g) (and the fact that |x| = \max(x, -x) for all real numbers x), we know that the sequence (\alpha_N)_{N=m}^\infty converges to \left|\sum_{n=m}^\infty a_n\right| (i.e. we know that the series converges, so this means the sequence of partial sums converges, which means by the limit laws that the sequence of absolute values of the partial sums also converges). Thus by Proposition 6.4.12(f) we have \limsup_{N\to\infty} \alpha_N = \lim_{N\to\infty} \alpha_N.

Similarly, by the hypothesis that \sum_{n=m}^\infty a_n is absolutely convergent, we know that the sequence (\beta_N)_{N=m}^\infty converges. Thus \limsup_{N\to\infty} \beta_N = \lim_{N\to\infty} \beta_N.

Combining these equalities with the inequality from above, we have

\displaystyle \left|\sum_{n=m}^\infty a_n\right| = \lim_{N\to\infty} \alpha_N \leq \lim_{N\to\infty} \beta_N = \sum_{n=m}^\infty |a_n|

as desired.

Exercise 3.2.1

Exercise statement

Show that the universal specification axiom, Axiom 3.8, if assumed to be true, would imply Axioms 3.2, 3.3, 3.4, 3.5, and 3.6. (If we assume that all natural numbers are objects, we also obtain Axiom 3.7.) Thus, this axiom, if permitted, would simplify the foundations of set theory tremendously (and can be viewed as one basis for an intuitive model of set theory known as “naive set theory”). Unfortunately, as we have seen, Axiom 3.8 is “too good to be true”!

Hints

None.

How to think about the exercise

This is a pretty straightforward exercise, so I don’t have anything to say.

Model solution

Axiom 3.2: Let P(x) be any statement pertaining to an object x that is always false, for instance the statement x \ne x or the statement 0=1. (You might not like using x\ne x because equality might not be defined for all objects, but typically in formal systems equality is defined for all objects and has the properties listed in Appendix A.7.) Then by Axiom 3.8 the set \{x : P(x)\text{ is true}\} exists. In other words, we have y \in \{x : P(x)\text{ is true}\} if and only if P(y) is true. Since P(y) is always false, this means that y \in \{x : P(x)\text{ is true}\} is also always false, which means the set does not contain any elements, i.e. for every object y we have y \notin \{x : P(x)\text{ is true}\}. This is exactly what Axiom 3.2 claims, so we are done.

Axiom 3.3: Let a be an object, and consider the statement P(x) defined as x=a. By Axiom 3.8, the set \{x : x=a\} exists. Thus the singleton set \{a\} exists. If a,b are objects, we can consider the statement Q(x) defined as “x=a or x=b”. Then by Axiom 3.8 the set \{x : x=a\text{ or }x=b\} exists, which is the pair set \{a,b\}. This proves Axiom 3.3.

Axiom 3.4: Let A,B be two sets. Consider the statement P(x) defined as “x\in A or x\in B”. Then by Axiom 3.8 the set \{x : P(x)\text{ is true}\} exists. Thus we have y \in \{x : P(x)\text{ is true}\} \iff (y \in A \text{ or }y \in B), which means \{x : P(x)\text{ is true}\} is the union of A and B.

Axiom 3.5: Let A be a set, and let P(x) be a statement pertaining to an object x \in A. Let Q(x) be the statement “x \in A and P(x) is true”. Then by Axiom 3.8 the set \{x : Q(x)\text{ is true}\} exists. Thus we have y \in \{x : Q(x)\text{ is true}\} \iff (y \in A \text{ and }P(y)\text{ is true}). This is what we mean by the set \{x \in A : P(x)\}, so this proves Axiom 3.5.

Axiom 3.6: Let A be a set, and let P(x,y) be a statement pertaining to x,y such that for each x \in A there is at most one y for which P(x,y) is true. Let Q(y) be the statement “P(x,y) is true for some x \in A”. Then by Axiom 3.8, the set \{y : Q(y) \text{ is true}\} exists. Thus we have z \in \{y : Q(y) \text{ is true}\} iff Q(z) is true iff P(x,z) is true for some x \in A. This is precisely what is meant by the set \{y : P(x,y) \text{ is true for some } x \in A\}, so this proves Axiom 3.6.

Axiom 3.7: Suppose all natural numbers are objects. Let P(x) be the statement “x is a natural number”. Then the set \{x : P(x)\text{ is true}\} exists by Axiom 3.8. This set is \mathbf N.

Create your website at WordPress.com
Get started