Exercise 3.1.11

Exercise statement

Show that the axiom of replacement implies the axiom of specification.

Hints

None.

How to think about the exercise

What does it mean for one axiom to imply another? In this case, both axioms are about constructing sets, so we just want to show that any set that we can construct using the axiom of specification can be constructed using the axiom of replacement instead. The axiom of specification allows us to construct sets of the form \{x \in A : P(x) \text{ is true}\}. So our goal is, given some set A and some property (predicate) P, to construct the set \{x \in A : P(x) \text{ is true}\} using the axiom of replacement.

To use the axiom of replacement, we must feed it a two-place property and a set. The set can just be A: we want to produce some subset of A, so it makes sense that we start with A and do some “replacement” on its elements somehow. So that leaves us to determine what the two-place property will be. Let’s call that property Q, since the variable P is already taken. Once we define what Q is, the axiom of replacement allows us to construct the set \{y : Q(x,y) \text{ is true for some }x \in A\}.

How do we decide what Q to use? Well, we want to keep only those elements x of A such that P(x) is true. So it makes sense to require P(x) as part of Q(x,y). And the elements of this new set should be the same elements from A, so we don’t want to apply any transformation to get the ys. So we can just let y = x. In other words, we can define Q(x,y) to be “y=x and P(x)” (it’s fine to say “y=x and P(y)” as well). Thus we have the set \{y : y=x\text{ and }P(x)\text{ for some }x \in A\}.

There is another way to look at this exercise, assuming you are comfortable with functions. The axiom of replacement basically says that if A is a set and f is an operation on elements of A, then \{f(x) : x \in A\} is a set. Here the operation f may return an undefined result (because for each x, the statement P(x,y) is true for at most one y rather than exactly one y). So to construct the set \{x \in A : P(x) \text{ is true}\}, we can define f(x) to be x if P(x) is true, and leave f(x) undefined if P(x) is false.

If you feel uncomfortable about f having undefined values, see Exercise 3.4.7, and also note that we can do everything with functions also: we split into cases depending on whether the resulting set will be empty. If A is empty or P(x) is false for all x\in A then the resulting set will be empty so we don’t need to do anything to construct it. Otherwise, since A is non-empty and P(x) is true for at least one x \in A, we can pick some element x_0 \in A for which P(x_0) is true and let this element be the default output of f for inputs we want to exclude. In other words, f : A \to A is the function defined by:

\displaystyle f(x) := \begin{cases}x & \text{if }P(x)\text{ is true} \\ x_0 & \text{otherwise}\end{cases}

But the discussion involving f is a bit informal, as we have not introduced functions at this point in the book. But nevertheless this way of looking at this problem provides good intuition.

Model solution

Let A be a set, and let P(x) be a statement pertaining to objects x \in A. To show that the axiom of replacement implies the axiom of specification, we will construct the set \{x \in A : P(x) \text{ is true}\} using just the axiom of replacement. Let Q(x,y) be the statement “y=x and P(x)”. To use the axiom of replacement, we must verify that for each x \in A, the statement Q(x,y) is true for at most one y. But for each x\in A, there is exactly one y such that y=x, so it follows that for the full statement of Q(x,y) there is at most one y satisfying the statement. The axiom of replacement thus allows us to construct the set \{y : y=x\text{ and }P(x)\text{ for some }x \in A\}. We claim that this is the same set as \{x \in A : P(x) \text{ is true}\}. We show the inclusion both ways:

  • Suppose z \in \{x \in A : P(x) \text{ is true}\}. Thus z \in A and P(z) is true. But this means that z=x and P(x) for some x \in A; specifically, there is z \in A such that z=z and P(z). Thus z \in \{y : y=x\text{ and }P(x)\text{ for some }x \in A\}.
  • Suppose z \in \{y : y=x\text{ and }P(x)\text{ for some }x \in A\}. This means that for some x \in A, we have z=x and P(x). Since z=x this means we have P(z) and z\in A as well. Thus z \in \{x \in A : P(x) \text{ is true}\}.

Since both directions of the inclusion hold, this means that the two sets are equal.

Exercise 8.3.5

Exercise statement

Show that no power set (i.e., a set of the form 2^X for some set X) can be countably infinite.

Hints

  1. You do not need the axiom of choice for this exercise (though of course if you have read Section 8.4 you may use it).

How to think about the exercise

I think the proof below is pretty straightforward, and I don’t have too much to say beyond that.

Model solution

We have three cases depending on the cardinality of X. If X is finite, then by Exercise 8.3.1 the power set 2^X has finite cardinality, so we are done. If X is countably infinite, then by Cantor’s theorem (Theorem 8.3.1) we see that 2^X cannot be countably infinite, so we are done. Thus it remains to show the result when X is uncountable, so let X be an uncountable set.

Let us introduce some notation. We write |A| \leq |B| to mean that there is an injection from A to B. The intuition for this notation comes from Exercise 8.3.3. The relation \leq has the transitivity property that if |A| \leq |B| and |B| \leq |C|, then |A| \leq |C|: if f : A \to B is an injection and g : B \to C is an injection, then g \circ f : A \to C is an injection by Exercise 3.3.2. (The relation \leq is anti-symmetric thanks to Exercise 8.3.3, and it is also reflexive because the identity map \mathrm{id} : A \to A is injective, so it is a reflexive, anti-symmetric, and transitive relation, i.e. a partial order. Thus we are justified in using the suggestive notation \leq.)

We have |X| \leq |2^X| because f : X \to 2^X defined by f(x) := \{x\} is an injection.

Now suppose for sake of contradiction that 2^X is countably infinite. Then there is a bijection between 2^X and \mathbf N, so in particular we have |2^X| \leq |\mathbf N|. We also showed above that |X| \leq |2^X|. So by transitivity of \leq, we have |X| \leq |\mathbf N|. Thus there is some injection g : X \to \mathbf N. Now consider the image g(X), which is a subset of \mathbf N. By Corollary 8.1.6, g(X) is at most countable. Define the function h : X \to g(X) by h(x) := g(x). This is a bijection: it is injective because g is injective, and it is surjective because g(X) = \{g(x) : x \in X\} is exactly the set of elements that g maps to. The bijection h shows that X has equal cardinality to g(X), which is at most countable. So X is at most countable, which contradicts the assumption that X is uncountable. This contradiction shows that 2^X cannot be countably infinite, which completes the proof.

Exercise 10.1.1

Exercise statement

Suppose that X is a subset of \mathbf R, x_0 is a limit point of X, and f : X \to \mathbf R is a function which is differentiable at x_0. Let Y \subseteq X be such that x_0 \in Y, and x_0 is also a limit point of Y. Prove that the restricted function f|_Y : Y \to \mathbf R is also differentiable at x_0, and has the same derivative as f at x_0. Explain why this does not contradict the discussion in Remark 10.1.2.

Hints

  1. Use Definition 9.3.6.

How to think about the exercise

This is a simple exercise and is a matter of putting together the right definitions.

Model solution

We want to show that

\displaystyle \lim_{y \to x_0; y \in Y\setminus\{x_0\}} \frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} = f'(x_0)

To do this, we will return to the definition of a limit, Definition 9.3.6. To show the limit exists, we must first verify that x_0 is an adherent point of Y\setminus\{x_0\}. But this is the case since x_0 is a limit point of Y. Now  let \varepsilon > 0. We want to find some \delta > 0 such that for all y \in Y\setminus\{x_0\}, if |y-x_0| < \delta then \left|\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon.

How can we find such a \delta? The only information we are given from which we could find a \delta is the fact that f is differentiable at x_0. Since f is differentiable at x_0, we know that for our \varepsilon > 0 in particular, there exists some \delta > 0 such that for all x \in X \setminus \{x_0\}, if |x-x_0| < \delta then \left|\frac{f(x) - f(x_0)}{x-x_0} - f'(x_0)\right| \leq \varepsilon.

So let’s use this \delta > 0. Now that we have a \delta, we must show that all y \in Y\setminus\{x_0\}, if |y-x_0| < \delta then \left|\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon. So let y \in Y\setminus\{x_0\}, and suppose |y-x_0| < \delta. We must show that \left|\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon.

Now Y \subseteq X, so this means Y \setminus \{x_0\} \subseteq X \setminus \{x_0\}. Thus we have y \in X\setminus \{x_0\}.

We know from the differentiability condition for f at x_0 that for all x \in X \setminus \{x_0\}, if |x-x_0| < \delta then \left|\frac{f(x) - f(x_0)}{x-x_0} - f'(x_0)\right| \leq \varepsilon. Since our y satisfies these conditions, we have \left|\frac{f(y) - f(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon.

To complete the proof, recall how f|_Y is defined. If y' \in Y, then f|_Y(y') := f(y'). Since y,x_0 \in Y, we have f|_Y(y) = f(y) and f|_Y(x_0) = f(x_0). Thus we have \left|\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} - f'(x_0)\right|=\left|\frac{f(y) - f(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon.

This does not contradict the discussion in Remark 10.1.2 because in the example there, 3 was not an adherent point of [1,2], so the limit was undefined. In contrast for this exercise we assumed that x_0 was an adherent point of Y \setminus \{x_0\} (i.e. a limit point of Y). And so if we used the example in Remark 10.1.2, the proof above would fail at the point where we tried to verify that x_0 is an adherent point of Y\setminus\{x_0\}.

The important point is that if x_0 were not a limit point of Y, then we could always pick \delta > 0 small enough so that the expression \frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} is undefined when y is further restricted to the set \{y \in Y \setminus \{x_0\} : |y - x_0| < \delta\}. This would mean that the limit is always defined no matter what L we use (since the implication in the definition of limit would be vacuous), so that in effect the derivative would equal any number possible. This would make the notation f|_Y'(x_0) ambiguous, and this concept is useless anyway, which is why we choose to not define the limit in this case.

Exercise 11.1.1

Exercise statement

Prove Lemma 11.1.4.

Lemma 11.1.4. Let X be a subset of the real line. Then the following two statements are logically equivalent:

(a) X is bounded and connected.
(b) X is a bounded interval.

Hints

  1. In order to show that (a) implies (b) in the case when X is non-empty, consider the supremum and infimum of X.

How to think about the exercise

This exercise is pretty straightforward, but one must be careful to stick to the definitions instead of using one’s intuitive notion of intervals; the whole point of the exercise is to make sure our formalization of ideas like “bounded” and “connected” and “interval” is correct. An example of what I mean: it seems like Tao defines “bounded interval” and then later on defines “bounded set”. Even though “bounded interval” contains the word “bounded”, the book doesn’t seem to show that bounded intervals are indeed bounded in the sense of bounded set. So part of the point of this exercise is to demonstrate this, and to see that nothing unexpected has happened.

Model solution

We first show that (b) implies (a). Suppose that X is a bounded interval (see Examples 9.1.3 for definition). Thus X is of the form (a,b), [a,b], (a,b], or [a,b) for real numbers a,b \in \mathbf R. We will just do one of the cases, namely the case X=[a,b), as the proof in the other cases is very similar. We first show that X is bounded. We claim that M := \max(|a|, |b|) + 1 is a bound (the +1 is a technicality, since Definition 9.1.22 requires M > 0 for some odd reason). Indeed, if x \in X then we have a \leq x < b by definition of the interval notation. Thus we have

-M < -\max(|a|, |b|) \leq -|a| \leq a \leq x < b \leq |b| \leq \max(|a|, |b|) < M

Thus we have x \in [-M,M]. Since x \in X was arbitrary, this shows that X \subseteq [-M, M] as required, so X is bounded.

Next we show that X is connected. Let x,y be elements of X such that x < y. We must show that [x,y] is a subset of X. Let z be a number such that x \leq z \leq y. We must show that z \in X. But since x \in X we have a \leq x, and since y \in X we have y < b. Thus by transitivity of inequalities we have a \leq x \leq z \leq y < b, so z \in [a, b) = X. This proves that X is connected.

Now we show that (a) implies (b). Suppose X is bounded and connected. We have two cases. Suppose first that X is empty. Then X is equal to an empty bounded interval such as (1,1).

Now suppose X is non-empty. Then define a := \inf(X) and b := \sup(X). Since X is a bounded and non-empty set, by Theorem 5.5.9 both a and b are real numbers. We have four cases depending on whether a \in X and b \in X. Since the proof is similar in each case, we will just show the case when a \in X and b \notin X. In this case, we will show that X equals the bounded interval [a, b). To do this, we will show that [a,b) \subseteq X and X \subseteq [a,b). We first show that [a,b) \subseteq X. Let x \in [a,b). Since x < b and b is the least upper bound for X, there exists some y \in X such that x < y (otherwise x would be a smaller upper bound for X). Now we have a,y \in X such that a \leq x < y; since X is connected, this means that x \in X as desired.

Now we show that X \subseteq [a, b). Let x \in X. Since a is the infimum of X and b is the supremum of X, we know that a \leq x \leq b. Thus to show that x \in [a,b) we just need to show that x \ne b. But this is easy, since we assumed that b \notin X.

Exercise 6.5.2

Exercise statement

Prove Lemma 6.5.2.

Lemma 6.5.2. Let x be a real number. Then the limit \lim_{n\to\infty} x^n exists and is equal to zero when |x| < 1, exists and is equal to 1 when x=1, and diverges when x=-1 or when |x| > 1.

Hints

  1. Use Proposition 6.3.10, Exercise 6.3.4, and the squeeze test.

How to think about the exercise

This is a straightforward exercise, but it requires some care in two respects:

  • It can be a little notationally confusing dealing with all of the absolute value signs so it is easy to think one has proved something when there is in fact a slight flaw in the proof.
  • If one wants to write up the proof in the shortest way possible, thinking of which cases to use can be tricky.

Model solution

First suppose 0 < |x| < 1. Since we have -|x|^n = -|x^n| \leq x^n \leq |x^n| = |x|^n for each n, we can use the squeeze test: by Proposition 6.3.10 we know that \lim_{n\to\infty} |x|^n = 0, so by the limit laws we have \lim_{n\to\infty} -|x|^n = 0 as well. This means that the sequence (x^n)_{n=1}^\infty must also converge to zero.

Next suppose x=1. Then we have x^n = 1^n = 1 for all n, so (x^n)_{n=1}^\infty is the constant sequence, which converges to 1. A similar proof can be given when x=0; in this case (x^n)_{n=1}^\infty is constantly zero, so converges to zero.

If x=-1, the limit \lim_{n\to\infty} (-1)^n does not exist as the sequence ((-1)^n)_{n=1}^\infty = (-1,1,-1,1,-1,\ldots) oscillates between -1 and 1. To prove that \lim_{n\to\infty} (-1)^n does not exist, let \varepsilon := 1. Then no matter how big N \geq 1 is, we can pick j := N and k:=N+1. We have |(-1)^j - (-1)^k| = 2 > \varepsilon. Thus the sequence is not Cauchy, so by Theorem 6.4.18 it is not convergent.

Finally suppose |x| > 1. Suppose for sake of contradiction that (x^n)_{n=1}^\infty converges to some limit L \in \mathbf R. Then by the limit laws, \lim_{n\to\infty} |x^n| = \lim_{n\to\infty} \max(x^n, -x^n) = \max(L,-L) = |L|. But |x^n| = |x|^n, so this means that (|x|^n)_{n=1}^\infty converges to |L|. This contradicts Exercise 6.3.4.

Exercise 7.2.1

Exercise statement

Is the series \sum_{n=1}^\infty (-1)^n convergent or divergent? Justify your answer. Can you now resolve the difficulty in Example 1.2.2?

Hints

None.

How to think about the exercise

This is a straightforward exercise.

Model solution 1

Let (a_n)_{n=1}^\infty be the sequence defined by a_n := 1 for all n \geq 1. This sequence is non-negative and decreasing, since a_n = 1 \geq 0 and a_n = 1 \geq 1 = a_{n+1} for every n \geq 1. We also have \lim_{n\to\infty} a_n = 1 \ne 0. Thus we can apply the alternating series test (Proposition 7.2.12) to conclude that the series \sum_{n=1}^\infty (-1)^n a_n = \sum_{n=1}^\infty (-1)^n diverges.

The reasoning in Example 1.2.2 is not valid because the series does not converge, so variable S is not a real number and thus cannot be manipulated in the usual way using the laws of algebra.

(Thanks to William for this solution.)

Model solution 2

The series \sum_{n=1}^\infty (-1)^n diverges by the zero test, Corollary 7.2.6. Indeed, the limit \lim_{n\to\infty} (-1)^n does not exist as the sequence ((-1)^n)_{n=1}^\infty = (-1,1,-1,1,-1,\ldots) oscillates between -1 and 1. To prove that \lim_{n\to\infty} (-1)^n does not exist, let \varepsilon := 1. Then no matter how big N \geq 1 is, we can pick j := N and k:=N+1. We have |(-1)^j - (-1)^k| = 2 > \varepsilon. Thus the sequence is not Cauchy, so by Theorem 6.4.18 it is not convergent. (Alternatively, we could have used Lemma 6.5.2.)

The reasoning in Example 1.2.2 is not valid because the series does not converge, so variable S is not a real number and thus cannot be manipulated in the usual way using the laws of algebra.

Exercise 9.3.4

Exercise statement

Propose a definition for limit superior \limsup_{x\to x_0; x\in E} f(x) and limit inferior \liminf_{x\to x_0; x\in E} f(x), and then propose an analogue of Proposition 9.3.9 for your definition. (For an additional challenge: prove that analogue.)

Hints

None.

How to think about the exercise

Let’s recall how the limit superior was defined for sequences of real numbers. Given a sequence (a_n)_{n=0}^\infty, we first defined the auxiliary sequence (a^+_N)_{N=0}^\infty by a^+_N := \sup(a_n)_{n=N}^\infty, then we took the infimum of the sequence (which is equivalent to taking the limit, since the sequence is monotonic so always converges): \limsup_{n\to\infty} a_n := \inf(a^+_N)_{N=0}^\infty = \lim_{N \to \infty} a^+_N.

So to define a limit superior, we need to take the supremum over something (and this something must be indexed somehow), then take the limit using the index. To help us see the similarities more with the real-valued functions case, let’s rewrite the limit superior for sequences as \limsup_{n\to\infty} a_n = \lim_{N \to \infty} \sup \{a_n : n \geq N\}.

So our goal now is to figure out the set we are taking the supremum over. Which set? This set must get smaller over time, so that the limit is defined (otherwise we wouldn’t have a guarantee that the sequence is monotonic). In other words, we want to “go further” in some relevant sense. In the case of sequences, we “go further” in the limit by taking larger values of n, but in the case of functions we “go further” by taking x values closer to x_0. So the trick is to take a set that shrinks around the point x_0. In particular, we can take \{f(x) : x \in E \text{ and } |x - x_0| < \delta\}. The smaller the value of \delta, the smaller the set. So we can define \limsup_{x\to x_0; x\in E} f(x) := \lim_{\delta \to 0} \sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\}. Here it is understood that when we take the limit as \delta \to 0 only positive values of \delta are considered (we could be clearer about this by writing something like \lim_{\delta \to 0; \delta > 0}). (TODO: Tao’s book doesn’t actually define divergent limits, so this part does fail in some edge cases, and so we either need to define such limits or use inf/sup. Thanks to crabman for the catch.)

(THIS IS WRONG. I WILL FIX LATER. Thanks to Karim Taha for the catch.) Now how do we state the analogue of Proposition 9.3.9? For part (a), instead of \lim_{x \to x_0; x\in E} f(x) = L we want to say \limsup_{x\to x_0; x\in E} f(x). For part (b), we can take a sequence (a_n)_{n=0}^\infty converging to x_0, but instead of saying \lim_{n\to \infty} f(a_n) = L, we can use the limit superior for sequences to say \limsup_{n\to \infty} f(a_n) = L.

If you want to think about this topic more, see the first edition of Pugh’s Real Mathematical Analysis, Chapter 3 Exercise 26, which makes the “go further” idea above more rigorous.

Model solution

We define the limit superior \limsup_{x\to x_0; x\in E} f(x) as \lim_{\delta \to 0; \delta > 0} \sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\}. Alternatively, using infimum instead of a limit we could define it as \inf\{\sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\} : \delta > 0\}.

(THIS IS WRONG. I WILL FIX LATER.) The analogue of Proposition 9.3.9 is as follows: Let X be a subset of \mathbf R, let f : X \to \mathbf R be a function, let E be a subset of X, let x_0 be an adherent point of E, and let L be a real number. Then the following two statements are logically equivalent:

(a) \limsup_{x\to x_0; x\in E} f(x)=L
(b) For every sequence (a_n)_{n=0}^\infty which consists entirely of elements of E, which converges to x_0, we have \limsup_{n\to\infty} f(a_n)=L.

The case for limit inferior is very similar: just interchange “superior” and “inferior” everywhere.

Exercise 8.3.3

Exercise statement

Recall from Exercise 3.6.7 that a set A is said to have lesser or equal cardinality than a set B iff there is an injective map f : A \to B from A to B. Using Exercise 8.3.2, show that if A,B are sets such that A has lesser or equal cardinality to B and B has lesser or equal cardinality to A, then A and B have equal cardinality. (This is known as the Schröder–Bernstein theorem, after Ernst Schröder (1841–1902) and Felix Bernstein (1878–1956).)

Hints

  1. Draw a picture.

How to think about the exercise

This is a quick corollary of Exercise 8.3.2. The only trick is to figure out what parts from Exercise 8.3.2 correspond to what parts of this current exercise. More precisely, we need to say what all of the objects named in Exercise 8.3.2 will be in terms of the objects introduced in Exercise 8.3.3.

Model solution 1

Let f : A \to B be an injection, and let g : B \to A be an injection. Let \tilde f : A \to f(A) be the map defined by \tilde f(x) := f(x) for all x \in A. Then \tilde f is a bijection since f is an injection. This means that \tilde f \circ g : B \to f(A) is an injection, since it is a composition of injections. Since f(A) \subseteq B, we can apply Exercise 8.3.2 (with f(A) in place of “A”, B in place of “B”, B in place of “C”, and \tilde f \circ g : B \to f(A) in place of “f : C \to A”) to produce a bijection h : f(A) \to B. Now consider h \circ \tilde f : A \to B. This is a composition of bijections so is itself a bijection.

Model solution 2

Thanks to Satira for this solution.

Let f : A \to B be an injection, and let g : B \to A be an injection. Since f and g are both injections, their composition f \circ g : B \to B is also an injection. This means that there is a bijection B \to f\circ g(B). By Exercise 8.3.2 (with f\circ g(B) in place of “A”, f(A) in place of “B”, B in place of “C”, and the bijection B \to f\circ g(B) in place of “f : C \to A”) there is a bijection f \circ g(B) \to f(A). Now we have a bijection f \circ g(B) \to f(A) and a bijection B \to f\circ g(B), so we can combine these to get a bijection f(A) \to B. But f provides a bijection A \to f(A) so we can combine these to obtain a bijection A \to B.

Here is a summary of all of the bijections, where each arrow represents a bijection: A \to f(A) \leftarrow f\circ g(B) \leftarrow B. Since the inverse of a bijection is also a bijection, we can traverse the arrows from A to B to obtain a bijection.

Exercise 8.3.2

Exercise statement

Let A,B,C be sets such that A \subseteq B \subseteq C, and suppose that there is an injection f : C \to A. Define the sets D_0, D_1, D_2, \ldots recursively by setting D_0 := B\setminus A, and then D_{n+1} := f(D_n) for all natural numbers n. Prove that the sets D_0, D_1, \ldots are all disjoint from each other (i.e., D_n \cap D_m = \emptyset whenever n \ne m). Also show that if g : A \to B is the function defined by setting g(x) := f^{-1}(x) when x \in \bigcup_{n=1}^\infty D_n, and g(x) := x when x \notin \bigcup_{n=1}^\infty D_n, then g does indeed map A to B and is a bijection between the two. In particular, A and B have the same cardinality.

Hints

  1. Draw a picture.
  2. You don’t lose much by taking C=B, and this simplifies the picture.
  3. Beware that Tao’s notation is non-standard here: typically in setting up the Schröder–Bernstein theorem we are given injections f : A \to B and g : B \to A (with no subset relation assumed between A and B) and are asked to define a bijection h : A \to B. In Tao’s book, the usual f is just the inclusion map \iota_{A\to B} : A\to B (see Exercise 3.3.8 for a definition), the usual g : B\to A is instead called f: C \to A, and the usual bijection h : A \to B is instead called g : A \to B. If you are referring to other expositions as you work on this exercise, you may want to temporarily convert Tao’s notation to match the notation in these other expositions.

How to think about the exercise

This is a pretty tough exercise, if you really want to understand what is going on instead of just bashing out a proof.

To simplify the exposition, let us assume that C = B. At the end of this section we will show how to adjust in the case where we only assume B \subseteq C, and in the model solution below we will revert to only assuming B \subseteq C.

To set the stage, let’s draw a picture of two sets A, B with A \subseteq B:

(Throughout this post, I will be copying from Richard Hammack’s exposition the idea of using suggestive shapes for the two sets as well as the idea generally of visually representing the sets by embedding them within each other.)

We define D_0 := B \setminus A:

We have an injection f : C \to A, i.e. f : B \to A since we assumed C=B. This means that we can put a copy f(B) \subseteq A of the set B inside A:

Notice two things: first the shape of f(B) is the same as the shape of B. This is because we assumed that f is injective, hence it doesn’t “deform” the shape by mapping multiple starting points onto the same point. Second, f(B) does not take up all of A, since we didn’t assume that f is a bijection.

Now we define D_1 := f(D_0). Since D_0 \subseteq B, we have f(D_0) \subseteq f(B). Thus we should see a copy of the green stuff within the copy of B that we produced:

We can keep going further with this same process. Our copy of B (namely f(B)) has a copy of A within it (namely f(A)) so using f we can place a copy of our copy of B (namely f(f(B))) inside our copy of A (namely f(A)):

By the injectivity of f, this “copy of a copy” of B again has a shape similar to that of the original B. The green stuff that is D_1 is embedded within f(B), so it also gets copied inside of f(f(B)). We can define D_2 := f(D_1) and see this as the third layer of green stuff in the picture:

Hopefully by this point you get the idea. As we make more and more copies of B within copies of A within copies of B, we accumulate more and more green stuff. The union of all of the green stuff is \bigcup_{n=0}^\infty D_n, and the white stuff we are left with is the complement, B \setminus \bigcup_{n=0}^\infty D_n.

Having all of these copies within copies is all nice and trippy, but how does this help us define a bijection g : A \to B? What do we even want, in terms of the picture we have drawn? We want to take each point in A and map it to some point in B in such a way that each point in B corresponds to exactly one point in A. The trick to doing this is the following: each point colored in white can stay put, and each point colored in green can “climb up” one level to hit a green point in an outer level:

In other words, the points in D_1 map to D_0 via f^{-1}, the points in D_2 map to D_1 via f^{-1}, and so on (that’s all the green stuff “climbing up”), while the points in the rest of A just stay where they are (that’s all the white stuff). If you look at the picture, I hope you can see that no matter which point in B you pick, there will be exactly one point in A that maps to it.

Now that we have the idea, how do we write down the formal definition of this bijection g : A \to B? The green stuff is \bigcup_{n=0}^\infty D_n, so you might be tempted to write the following:

\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=0}^\infty D_n \\ x & \text{otherwise}\end{cases}

But actually, what is D_0? We defined D_0 := B\setminus A, so the points in D_0 aren’t even in the domain of g! This means we don’t need to include D_0 in the union, so we can instead define:

\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=1}^\infty D_n \\ x & \text{otherwise}\end{cases}

with the union index starting at 1. In the end, starting the union at 0 or 1 does not change the behavior of g: we never encounter the case where x \in D_0 because A \cap D_0 is empty.

That concludes the intuition for the construction outlined in the exercise statement. Here are a few other points I want to make:

  • Why do we need to show that the sets D_0, D_1, D_2, \ldots are all disjoint from each other? It is not necessary to show this in order to prove the bijectivity of g. My understanding is that it is just an interesting additional fact one can prove about this setup, and also justifies drawing the separate green parts in a way that looks disjoint.
  • Why do we assume that f : C \to A is an injection rather than a bijection (as was originally the case in the 1st and 2nd editions of the book)? I haven’t thought about this in detail. I think it makes sense to use the weaker assumption of injectivity if bijectivity is not required, especially since the Schröder–Bernstein theorem only assumes injectivity.
  • What happens when we only assume B \subseteq C rather than C = B as above? The pictures above change slightly: now there is a bigger set C surrounding both B and A. Otherwise the pictures don’t change. In the proof there is slightly more work since we must verify that when we use f^{-1} in the definition of g, the output lands in B instead of C.
  • Since f is assumed to be an injection rather than a bijection, what does it even mean to write f^{-1}(x) for some x \in A? For an injection f : C \to A, the map \tilde f : C \to f(C) is a bijection (why?), so \tilde f^{-1} : f(C) \to C is a (bijective) function. So whenever x \in f(C) we can define f^{-1}(x) := \tilde f^{-1}(x). If x \notin f(C) we leave f^{-1}(x) undefined. This means that in the model solution below, when we define g, we must verify that if x \in \bigcup_{n=1}^\infty D_n then x \in f(C).
  • Why did Tao assume that A \subseteq B instead of allowing A, B to be unrelated sets (like in most expositions of the Schröder–Bernstein theorem)? Assuming A \subseteq B simplifies the drawing: if A,B are unrelated sets, one would need to draw two pictures instead of one (like on this page). By making A a subset of B, there is just one picture, and there is a simple rule for defining the bijection g: if you’re in a white area, stay put, and if you’re in a green area, “climb up” one level to take up a larger amount of space in a bijective way.

Finally, I want to consider a particularly simple case of the construction outlined in this exercise: what if the sets A,B,C are all finite? If there is an injection f:C\to A, that means \#(C) \leq \#(A) by Exercise 3.6.7. But since A \subseteq C, we also have \#(A) \leq \#(C) by Proposition 3.6.14(c). Thus \#(A) = \#(C). If A \ne C then we would have \#(A) < \#(C) by Proposition 3.6.14(c) again, a contradiction. Thus we must have A = C. Similarly we have \#(A) \leq \#(B) and \#(B) \leq \#(C) = \#(A) so \#(A) = \#(B). Since A \subseteq B, we can use Proposition 3.6.14(c) to conclude again that A=B. Thus in the finite case, all three sets must be the same. So we can try something like A=B=C=\{1,2,3\}. Then D_0 = B\setminus A = \emptyset, and D_1 = f(D_0) = f(\emptyset) = \emptyset, and so on, so that \bigcup_{n=1}^\infty D_n = \emptyset as well. This means that in the definition of g, we only ever encounter the case where g(x)=x. This does indeed give a bijection, since the identity map \iota_{A\to A} is a bijection.

Model solution

First we show that the sets D_0, D_1, \ldots are all disjoint from each other. We do this by showing the following statement: for every natural number n and every positive integer k, the sets D_n and D_{n+k} are disjoint. Once we have proved this, given two distinct natural numbers, we can let m be the larger one so that m > n. Then we can write m = n + k for some positive integer k and see that D_n and D_m = D_{n+k} are disjoint.

To prove the statement from above, let k be a positive integer. We will induct on n. For the base case, n=0, we must show that D_0 and D_k are disjoint. But D_0 = B \setminus A and D_k = f^k(D_0)\subseteq A so these two sets are disjoint: if x \in D_0 \cap D_k we would have x \notin A and x \in A, a contradiction. Now suppose inductively that D_n and D_{n+k} are disjoint. We need to show that D_{n+1} and D_{n+1+k} are disjoint. Suppose for sake of contradiction that there is some x \in D_{n+1} \cap D_{n+1+k}. Then since D_{n+1} = f(D_n) and D_{n+1+k} = D_{n+k+1} = f(D_{n+k}) there exist z \in D_n and z' \in D_{n+k} such that x = f(z) and x = f(z'). But f is injective so f(z) = x = f(z') implies z=z'. Since D_n and D_{n+k} are disjoint, this is impossible. This closes the induction.

Now define g : A \to B as in the exercise statement:

\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=1}^\infty D_n \\ x & \text{otherwise}\end{cases}

We first verify that g maps from A to B. The things to check are:

  • f^{-1}(x) is defined for each x \in \bigcup_{n=1}^\infty D_n: the value f^{-1}(x) is only defined when x \in f(C), so we must verify that if x \in \bigcup_{n=1}^\infty D_n then x \in f(C). So suppose x \in \bigcup_{n=1}^\infty D_n. Then x \in D_{n+1} for some natural number n, so we have x \in D_{n+1} = f(D_n) \subseteq f(C) as required.
  • If x \in \bigcup_{n=1}^\infty D_n then g(x) := f^{-1}(x) lands in B: We have x \in f(D_n) for some natural number n, so x = f(z) for some z \in D_n. Thus f^{-1}(x) = f^{-1}(f(z)) = z \in D_n. But D_0 = B\setminus A \subseteq B and D_m \subseteq A \subseteq B for each positive m, so in any case we have z \in B as required.
  • If x \notin \bigcup_{n=1}^\infty D_n then g(x) := x lands in B: In this case, we still have x \in A since x is in the domain of g, which means x \in B since A \subseteq B.

Thus g maps from A to B.

Now we show that g is injective. Let x, y \in A. We must show that g(x) = g(y) implies x=y. So suppose g(x) = g(y). We have three cases:

  • Both x and y are in \bigcup_{n=1}^\infty D_n: We have f^{-1}(x) = f^{-1}(y). Since f is injective, f^{-1} : f(C) \to C is a bijection. Thus whenever f^{-1} is defined, we can assume it acts bijectively. We thus have x=y.
  • Neither x nor y is in \bigcup_{n=1}^\infty D_n: By definition of g we have g(x)=x and g(y)=y so g(x)=g(y) means x=y.
  • Exactly one of the two elements is in \bigcup_{n=1}^\infty D_n: We can suppose x \in \bigcup_{n=1}^\infty D_n and y \notin \bigcup_{n=1}^\infty D_n (we can relabel x and y and repeat the same steps if it is the other way around). We have g(x) = f^{-1}(x) and g(y)=y so f^{-1}(x) = y. But x \in D_{n+1} for some natural number n, so x \in f^{n+1}(B\setminus A). Thus f^{-1}(x) \in f^n(B \setminus A). This means f^{-1}(x) is either in B\setminus A or is in \bigcup_{n=1}^\infty D_n. But y is in A (as it is in the domain of g) so it can’t be in B\setminus A, and y \notin \bigcup_{n=1}^\infty D_n by hypothesis of this case, so we have a contradiction. This means that this case actually isn’t possible so we don’t need to worry about it.

Now we show that g is surjective. To do this, let y \in B. We must find some x \in A such that g(x) = y. We have two cases, y \in \bigcup_{n=0}^\infty D_n or y \notin \bigcup_{n=0}^\infty D_n.

  • If y \in \bigcup_{n=0}^\infty D_n then y \in D_n for some natural number n so f(y) \in f(D_n), which means that f(y) \in \bigcup_{n=1}^\infty D_n. Thus if we let x := f(y) then we have g(x) = g(f(y)) = f^{-1}(f(y)) = y.
  • On the other hand, if y \notin \bigcup_{n=0}^\infty D_n then y is also not in the smaller sets D_0 and \bigcup_{n=1}^\infty D_n. From y\notin D_0 = B\setminus A we see that y \in A, so g(y) is defined. And from y\notin\bigcup_{n=1}^\infty D_n we see that g(y) = y. Thus for x := y we have g(x) = g(y) = y.

In each case we have found some x \in A such that g(x) = y, so g is surjective.

Acknowledgments: Thanks to Satira for having extensive discussions with me about this exercise.

Exercise 6.4.1

Exercise statement

Prove Proposition 6.4.5.

Proposition 6.4.5 (Limits are limit points). Let (a_n)_{n=m}^\infty be a sequence which converges to a real number c. Then c is a limit point of (a_n)_{n=m}^\infty, and in fact it is the only limit point of (a_n)_{n=m}^\infty.

Hints

None.

How to think about the exercise

This is a straightforward exercise so I don’t have anything to say.

Model solution 1

Let (a_n)_{n=m}^\infty be a sequence which converges to a real number c. To show that c is a limit point of this sequence, let \varepsilon > 0 be a real number and let N' \geq m be an integer. Since the sequence converges to c, we know that there is some N \geq m such that |a_n - c| \leq \varepsilon for every n \geq N. We want to find some n \geq N' such that |a_n - c| \leq \varepsilon. We can take n := N+N'. Then we see that n \geq N', and since n \geq N we have |a_n-c|\leq \varepsilon as required.

Next we show that c is the only limit point of the sequence. Suppose for sake of contradiction that there there is another limit point c' \ne c. Put \varepsilon := |c-c'|/3. Since c\ne c', we know that \varepsilon is positive. Since the sequence converges to c, there is some N \geq m such that |a_n-c|\leq \varepsilon for all n\geq N. So fix this N. Since c' is a limit point, there exists an n \geq N such that |a_n - c'| \leq \varepsilon. Fix this n. Since n \geq N, we also know that |a_n-c|\leq \varepsilon. But now we have

|c-c'| = |c-a_n+a_n-c'| \leq |c-a_n| + |a_n-c'| \leq 2|c-c'|/3

which is a contradiction since |c-c'| > 0.

Model solution 2

Here’s an alternative way to prove the second part of this exercise.

We want to show that c is the only limit point of the sequence. Let c' be a limit point (not necessarily distinct from c), and let \varepsilon > 0 be arbitrary. Since the sequence converges to c, there is some N \geq m such that |a_n - c| \leq \varepsilon/2 for all n \geq N. Fix this N. Since N \geq m and c' is a limit point, there is some n \geq N such that |a_n - c'| \leq \varepsilon/2. Fix this n. Since n \geq N and the sequence converges to c, we have |a_n - c| \leq \varepsilon/2. Thus we have both |a_n-c'| \leq \varepsilon/2 and |a_n - c| \leq \varepsilon/2. But this means |c-c'| = |c-a_n+a_n-c'|\leq |c-a_n| + |a_n-c'| \leq \varepsilon. Since \varepsilon > 0 was arbitrary, we have just shown that |c-c'| \leq \varepsilon for every \varepsilon > 0. By Exercise 5.4.7 this means that c=c'.

Create your website with WordPress.com
Get started