# Exercise 8.3.5

## Exercise statement

Show that no power set (i.e., a set of the form $2^X$ for some set $X$) can be countably infinite.

## Hints

1. You do not need the axiom of choice for this exercise (though of course if you have read Section 8.4 you may use it).

## How to think about the exercise

I think the proof below is pretty straightforward, and I don’t have too much to say beyond that.

## Model solution

We have three cases depending on the cardinality of $X$. If $X$ is finite, then by Exercise 8.3.1 the power set $2^X$ has finite cardinality, so we are done. If $X$ is countably infinite, then by Cantor’s theorem (Theorem 8.3.1) we see that $2^X$ cannot be countably infinite, so we are done. Thus it remains to show the result when $X$ is uncountable, so let $X$ be an uncountable set.

Let us introduce some notation. We write $|A| \leq |B|$ to mean that there is an injection from $A$ to $B$. The intuition for this notation comes from Exercise 8.3.3. The relation $\leq$ has the transitivity property that if $|A| \leq |B|$ and $|B| \leq |C|$, then $|A| \leq |C|$: if $f : A \to B$ is an injection and $g : B \to C$ is an injection, then $g \circ f : A \to C$ is an injection by Exercise 3.3.2. (The relation $\leq$ is anti-symmetric thanks to Exercise 8.3.3, and it is also reflexive because the identity map $\mathrm{id} : A \to A$ is injective, so it is a reflexive, anti-symmetric, and transitive relation, i.e. a partial order. Thus we are justified in using the suggestive notation $\leq$.)

We have $|X| \leq |2^X|$ because $f : X \to 2^X$ defined by $f(x) := \{x\}$ is an injection.

Now suppose for sake of contradiction that $2^X$ is countably infinite. Then there is a bijection between $2^X$ and $\mathbf N$, so in particular we have $|2^X| \leq |\mathbf N|$. We also showed above that $|X| \leq |2^X|$. So by transitivity of $\leq$, we have $|X| \leq |\mathbf N|$. Thus there is some injection $g : X \to \mathbf N$. Now consider the image $g(X)$, which is a subset of $\mathbf N$. By Corollary 8.1.6, $g(X)$ is at most countable. Define the function $h : X \to g(X)$ by $h(x) := g(x)$. This is a bijection: it is injective because $g$ is injective, and it is surjective because $g(X) = \{g(x) : x \in X\}$ is exactly the set of elements that $g$ maps to. The bijection $h$ shows that $X$ has equal cardinality to $g(X)$, which is at most countable. So $X$ is at most countable, which contradicts the assumption that $X$ is uncountable. This contradiction shows that $2^X$ cannot be countably infinite, which completes the proof.

# Exercise 10.1.1

## Exercise statement

Suppose that $X$ is a subset of $\mathbf R$, $x_0$ is a limit point of $X$, and $f : X \to \mathbf R$ is a function which is differentiable at $x_0$. Let $Y \subseteq X$ be such that $x_0 \in Y$, and $x_0$ is also a limit point of $Y$. Prove that the restricted function $f|_Y : Y \to \mathbf R$ is also differentiable at $x_0$, and has the same derivative as $f$ at $x_0$. Explain why this does not contradict the discussion in Remark 10.1.2.

## Hints

1. Use Definition 9.3.6.

## How to think about the exercise

This is a simple exercise and is a matter of putting together the right definitions.

## Model solution

We want to show that

$\displaystyle \lim_{y \to x_0; y \in Y\setminus\{x_0\}} \frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} = f'(x_0)$

To do this, we will return to the definition of a limit, Definition 9.3.6. To show the limit exists, we must first verify that $x_0$ is an adherent point of $Y\setminus\{x_0\}$. But this is the case since $x_0$ is a limit point of $Y$. Now  let $\varepsilon > 0$. We want to find some $\delta > 0$ such that for all $y \in Y\setminus\{x_0\}$, if $|y-x_0| < \delta$ then $\left|\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon$.

How can we find such a $\delta$? The only information we are given from which we could find a $\delta$ is the fact that $f$ is differentiable at $x_0$. Since $f$ is differentiable at $x_0$, we know that for our $\varepsilon > 0$ in particular, there exists some $\delta > 0$ such that for all $x \in X \setminus \{x_0\}$, if $|x-x_0| < \delta$ then $\left|\frac{f(x) - f(x_0)}{x-x_0} - f'(x_0)\right| \leq \varepsilon$.

So let’s use this $\delta > 0$. Now that we have a $\delta$, we must show that all $y \in Y\setminus\{x_0\}$, if $|y-x_0| < \delta$ then $\left|\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon$. So let $y \in Y\setminus\{x_0\}$, and suppose $|y-x_0| < \delta$. We must show that $\left|\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon$.

Now $Y \subseteq X$, so this means $Y \setminus \{x_0\} \subseteq X \setminus \{x_0\}$. Thus we have $y \in X\setminus \{x_0\}$.

We know from the differentiability condition for $f$ at $x_0$ that for all $x \in X \setminus \{x_0\}$, if $|x-x_0| < \delta$ then $\left|\frac{f(x) - f(x_0)}{x-x_0} - f'(x_0)\right| \leq \varepsilon$. Since our $y$ satisfies these conditions, we have $\left|\frac{f(y) - f(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon$.

To complete the proof, recall how $f|_Y$ is defined. If $y' \in Y$, then $f|_Y(y') := f(y')$. Since $y,x_0 \in Y$, we have $f|_Y(y) = f(y)$ and $f|_Y(x_0) = f(x_0)$. Thus we have $\left|\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0} - f'(x_0)\right|=\left|\frac{f(y) - f(x_0)}{y-x_0} - f'(x_0)\right| \leq \varepsilon$.

This does not contradict the discussion in Remark 10.1.2 because in the example there, $3$ was not an adherent point of $[1,2]$, so the limit was undefined. In contrast for this exercise we assumed that $x_0$ was an adherent point of $Y \setminus \{x_0\}$ (i.e. a limit point of $Y$). And so if we used the example in Remark 10.1.2, the proof above would fail at the point where we tried to verify that $x_0$ is an adherent point of $Y\setminus\{x_0\}$.

The important point is that if $x_0$ were not a limit point of $Y$, then we could always pick $\delta > 0$ small enough so that the expression $\frac{f|_Y(y) - f|_Y(x_0)}{y-x_0}$ is undefined when $y$ is further restricted to the set $\{y \in Y \setminus \{x_0\} : |y - x_0| < \delta\}$. This would mean that the limit is always defined no matter what $L$ we use (since the implication in the definition of limit would be vacuous), so that in effect the derivative would equal any number possible. This would make the notation $f|_Y'(x_0)$ ambiguous, and this concept is useless anyway, which is why we choose to not define the limit in this case.

# Exercise 11.1.1

## Exercise statement

Prove Lemma 11.1.4.

Lemma 11.1.4. Let $X$ be a subset of the real line. Then the following two statements are logically equivalent:

(a) $X$ is bounded and connected.
(b) $X$ is a bounded interval.

## Hints

1. In order to show that (a) implies (b) in the case when $X$ is non-empty, consider the supremum and infimum of $X$.

## How to think about the exercise

This exercise is pretty straightforward, but one must be careful to stick to the definitions instead of using one’s intuitive notion of intervals; the whole point of the exercise is to make sure our formalization of ideas like “bounded” and “connected” and “interval” is correct. An example of what I mean: it seems like Tao defines “bounded interval” and then later on defines “bounded set”. Even though “bounded interval” contains the word “bounded”, the book doesn’t seem to show that bounded intervals are indeed bounded in the sense of bounded set. So part of the point of this exercise is to demonstrate this, and to see that nothing unexpected has happened.

## Model solution

We first show that (b) implies (a). Suppose that $X$ is a bounded interval (see Examples 9.1.3 for definition). Thus $X$ is of the form $(a,b)$, $[a,b]$, $(a,b]$, or $[a,b)$ for real numbers $a,b \in \mathbf R$. We will just do one of the cases, namely the case $X=[a,b)$, as the proof in the other cases is very similar. We first show that $X$ is bounded. We claim that $M := \max(|a|, |b|) + 1$ is a bound (the $+1$ is a technicality, since Definition 9.1.22 requires $M > 0$ for some odd reason). Indeed, if $x \in X$ then we have $a \leq x < b$ by definition of the interval notation. Thus we have

$-M < -\max(|a|, |b|) \leq -|a| \leq a \leq x < b \leq |b| \leq \max(|a|, |b|) < M$

Thus we have $x \in [-M,M]$. Since $x \in X$ was arbitrary, this shows that $X \subseteq [-M, M]$ as required, so $X$ is bounded.

Next we show that $X$ is connected. Let $x,y$ be elements of $X$ such that $x < y$. We must show that $[x,y]$ is a subset of $X$. Let $z$ be a number such that $x \leq z \leq y$. We must show that $z \in X$. But since $x \in X$ we have $a \leq x$, and since $y \in X$ we have $y < b$. Thus by transitivity of inequalities we have $a \leq x \leq z \leq y < b$, so $z \in [a, b) = X$. This proves that $X$ is connected.

Now we show that (a) implies (b). Suppose $X$ is bounded and connected. We have two cases. Suppose first that $X$ is empty. Then $X$ is equal to an empty bounded interval such as $(1,1)$.

Now suppose $X$ is non-empty. Then define $a := \inf(X)$ and $b := \sup(X)$. Since $X$ is a bounded and non-empty set, by Theorem 5.5.9 both $a$ and $b$ are real numbers. We have four cases depending on whether $a \in X$ and $b \in X$. Since the proof is similar in each case, we will just show the case when $a \in X$ and $b \notin X$. In this case, we will show that $X$ equals the bounded interval $[a, b)$. To do this, we will show that $[a,b) \subseteq X$ and $X \subseteq [a,b)$. We first show that $[a,b) \subseteq X$. Let $x \in [a,b)$. Since $x < b$ and $b$ is the least upper bound for $X$, there exists some $y \in X$ such that $x < y$ (otherwise $x$ would be a smaller upper bound for $X$). Now we have $a,y \in X$ such that $a \leq x < y$; since $X$ is connected, this means that $x \in X$ as desired.

Now we show that $X \subseteq [a, b)$. Let $x \in X$. Since $a$ is the infimum of $X$ and $b$ is the supremum of $X$, we know that $a \leq x \leq b$. Thus to show that $x \in [a,b)$ we just need to show that $x \ne b$. But this is easy, since we assumed that $b \notin X$.

# Exercise 6.5.2

## Exercise statement

Prove Lemma 6.5.2.

Lemma 6.5.2. Let $x$ be a real number. Then the limit $\lim_{n\to\infty} x^n$ exists and is equal to zero when $|x| < 1$, exists and is equal to $1$ when $x=1$, and diverges when $x=-1$ or when $|x| > 1$.

## Hints

1. Use Proposition 6.3.10, Exercise 6.3.4, and the squeeze test.

## How to think about the exercise

This is a straightforward exercise, but it requires some care in two respects:

• It can be a little notationally confusing dealing with all of the absolute value signs so it is easy to think one has proved something when there is in fact a slight flaw in the proof.
• If one wants to write up the proof in the shortest way possible, thinking of which cases to use can be tricky.

## Model solution

First suppose $0 < |x| < 1$. Since we have $-|x|^n = -|x^n| \leq x^n \leq |x^n| = |x|^n$ for each $n$, we can use the squeeze test: by Proposition 6.3.10 we know that $\lim_{n\to\infty} |x|^n = 0$, so by the limit laws we have $\lim_{n\to\infty} -|x|^n = 0$ as well. This means that the sequence $(x^n)_{n=1}^\infty$ must also converge to zero.

Next suppose $x=1$. Then we have $x^n = 1^n = 1$ for all $n$, so $(x^n)_{n=1}^\infty$ is the constant sequence, which converges to $1$. A similar proof can be given when $x=0$; in this case $(x^n)_{n=1}^\infty$ is constantly zero, so converges to zero.

If $x=-1$, the limit $\lim_{n\to\infty} (-1)^n$ does not exist as the sequence $((-1)^n)_{n=1}^\infty = (-1,1,-1,1,-1,\ldots)$ oscillates between $-1$ and $1$. To prove that $\lim_{n\to\infty} (-1)^n$ does not exist, let $\varepsilon := 1$. Then no matter how big $N \geq 1$ is, we can pick $j := N$ and $k:=N+1$. We have $|(-1)^j - (-1)^k| = 2 > \varepsilon$. Thus the sequence is not Cauchy, so by Theorem 6.4.18 it is not convergent.

Finally suppose $|x| > 1$. Suppose for sake of contradiction that $(x^n)_{n=1}^\infty$ converges to some limit $L \in \mathbf R$. Then by the limit laws, $\lim_{n\to\infty} |x^n| = \lim_{n\to\infty} \max(x^n, -x^n) = \max(L,-L) = |L|$. But $|x^n| = |x|^n$, so this means that $(|x|^n)_{n=1}^\infty$ converges to $|L|$. This contradicts Exercise 6.3.4.

# Exercise 7.2.1

## Exercise statement

Is the series $\sum_{n=1}^\infty (-1)^n$ convergent or divergent? Justify your answer. Can you now resolve the difficulty in Example 1.2.2?

None.

## How to think about the exercise

This is a straightforward exercise.

## Model solution 1

Let $(a_n)_{n=1}^\infty$ be the sequence defined by $a_n := 1$ for all $n \geq 1$. This sequence is non-negative and decreasing, since $a_n = 1 \geq 0$ and $a_n = 1 \geq 1 = a_{n+1}$ for every $n \geq 1$. We also have $\lim_{n\to\infty} a_n = 1 \ne 0$. Thus we can apply the alternating series test (Proposition 7.2.12) to conclude that the series $\sum_{n=1}^\infty (-1)^n a_n = \sum_{n=1}^\infty (-1)^n$ diverges.

The reasoning in Example 1.2.2 is not valid because the series does not converge, so variable $S$ is not a real number and thus cannot be manipulated in the usual way using the laws of algebra.

(Thanks to William for this solution.)

## Model solution 2

The series $\sum_{n=1}^\infty (-1)^n$ diverges by the zero test, Corollary 7.2.6. Indeed, the limit $\lim_{n\to\infty} (-1)^n$ does not exist as the sequence $((-1)^n)_{n=1}^\infty = (-1,1,-1,1,-1,\ldots)$ oscillates between $-1$ and $1$. To prove that $\lim_{n\to\infty} (-1)^n$ does not exist, let $\varepsilon := 1$. Then no matter how big $N \geq 1$ is, we can pick $j := N$ and $k:=N+1$. We have $|(-1)^j - (-1)^k| = 2 > \varepsilon$. Thus the sequence is not Cauchy, so by Theorem 6.4.18 it is not convergent. (Alternatively, we could have used Lemma 6.5.2.)

The reasoning in Example 1.2.2 is not valid because the series does not converge, so variable $S$ is not a real number and thus cannot be manipulated in the usual way using the laws of algebra.

# Exercise 9.3.4

## Exercise statement

Propose a definition for limit superior $\limsup_{x\to x_0; x\in E} f(x)$ and limit inferior $\liminf_{x\to x_0; x\in E} f(x)$, and then propose an analogue of Proposition 9.3.9 for your definition. (For an additional challenge: prove that analogue.)

None.

## How to think about the exercise

Let’s recall how the limit superior was defined for sequences of real numbers. Given a sequence $(a_n)_{n=0}^\infty$, we first defined the auxiliary sequence $(a^+_N)_{N=0}^\infty$ by $a^+_N := \sup(a_n)_{n=N}^\infty$, then we took the infimum of the sequence (which is equivalent to taking the limit, since the sequence is monotonic so always converges): $\limsup_{n\to\infty} a_n := \inf(a^+_N)_{N=0}^\infty = \lim_{N \to \infty} a^+_N$.

So to define a limit superior, we need to take the supremum over something (and this something must be indexed somehow), then take the limit using the index. To help us see the similarities more with the real-valued functions case, let’s rewrite the limit superior for sequences as $\limsup_{n\to\infty} a_n = \lim_{N \to \infty} \sup \{a_n : n \geq N\}$.

So our goal now is to figure out the set we are taking the supremum over. Which set? This set must get smaller over time, so that the limit is defined (otherwise we wouldn’t have a guarantee that the sequence is monotonic). In other words, we want to “go further” in some relevant sense. In the case of sequences, we “go further” in the limit by taking larger values of $n$, but in the case of functions we “go further” by taking $x$ values closer to $x_0$. So the trick is to take a set that shrinks around the point $x_0$. In particular, we can take $\{f(x) : x \in E \text{ and } |x - x_0| < \delta\}$. The smaller the value of $\delta$, the smaller the set. So we can define $\limsup_{x\to x_0; x\in E} f(x) := \lim_{\delta \to 0} \sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\}$. Here it is understood that when we take the limit as $\delta \to 0$ only positive values of $\delta$ are considered (we could be clearer about this by writing something like $\lim_{\delta \to 0; \delta > 0}$). (TODO: Tao’s book doesn’t actually define divergent limits, so this part does fail in some edge cases, and so we either need to define such limits or use inf/sup. Thanks to crabman for the catch.)

(THIS IS WRONG. I WILL FIX LATER. Thanks to Karim Taha for the catch.) Now how do we state the analogue of Proposition 9.3.9? For part (a), instead of $\lim_{x \to x_0; x\in E} f(x) = L$ we want to say $\limsup_{x\to x_0; x\in E} f(x)$. For part (b), we can take a sequence $(a_n)_{n=0}^\infty$ converging to $x_0$, but instead of saying $\lim_{n\to \infty} f(a_n) = L$, we can use the limit superior for sequences to say $\limsup_{n\to \infty} f(a_n) = L$.

If you want to think about this topic more, see the first edition of Pugh’s Real Mathematical Analysis, Chapter 3 Exercise 26, which makes the “go further” idea above more rigorous.

## Model solution

We define the limit superior $\limsup_{x\to x_0; x\in E} f(x)$ as $\lim_{\delta \to 0; \delta > 0} \sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\}$. Alternatively, using infimum instead of a limit we could define it as $\inf\{\sup\{f(x) : x \in E \text{ and } |x - x_0| < \delta\} : \delta > 0\}$.

(THIS IS WRONG. I WILL FIX LATER.) The analogue of Proposition 9.3.9 is as follows: Let $X$ be a subset of $\mathbf R$, let $f : X \to \mathbf R$ be a function, let $E$ be a subset of $X$, let $x_0$ be an adherent point of $E$, and let $L$ be a real number. Then the following two statements are logically equivalent:

(a) $\limsup_{x\to x_0; x\in E} f(x)=L$
(b) For every sequence $(a_n)_{n=0}^\infty$ which consists entirely of elements of $E$, which converges to $x_0$, we have $\limsup_{n\to\infty} f(a_n)=L$.

The case for limit inferior is very similar: just interchange “superior” and “inferior” everywhere.

# Exercise 8.3.3

## Exercise statement

Recall from Exercise 3.6.7 that a set $A$ is said to have lesser or equal cardinality than a set $B$ iff there is an injective map $f : A \to B$ from $A$ to $B$. Using Exercise 8.3.2, show that if $A,B$ are sets such that $A$ has lesser or equal cardinality to $B$ and $B$ has lesser or equal cardinality to $A$, then $A$ and $B$ have equal cardinality. (This is known as the Schröder–Bernstein theorem, after Ernst Schröder (1841–1902) and Felix Bernstein (1878–1956).)

## Hints

1. Draw a picture.

## How to think about the exercise

This is a quick corollary of Exercise 8.3.2. The only trick is to figure out what parts from Exercise 8.3.2 correspond to what parts of this current exercise. More precisely, we need to say what all of the objects named in Exercise 8.3.2 will be in terms of the objects introduced in Exercise 8.3.3.

## Model solution 1

Let $f : A \to B$ be an injection, and let $g : B \to A$ be an injection. Let $\tilde f : A \to f(A)$ be the map defined by $\tilde f(x) := f(x)$ for all $x \in A$. Then $\tilde f$ is a bijection since $f$ is an injection. This means that $\tilde f \circ g : B \to f(A)$ is an injection, since it is a composition of injections. Since $f(A) \subseteq B$, we can apply Exercise 8.3.2 (with $f(A)$ in place of “$A$”, $B$ in place of “$B$”, $B$ in place of “$C$”, and $\tilde f \circ g : B \to f(A)$ in place of “$f : C \to A$”) to produce a bijection $h : f(A) \to B$. Now consider $h \circ \tilde f : A \to B$. This is a composition of bijections so is itself a bijection.

## Model solution 2

Thanks to Satira for this solution.

Let $f : A \to B$ be an injection, and let $g : B \to A$ be an injection. Since $f$ and $g$ are both injections, their composition $f \circ g : B \to B$ is also an injection. This means that there is a bijection $B \to f\circ g(B)$. By Exercise 8.3.2 (with $f\circ g(B)$ in place of “$A$”, $f(A)$ in place of “$B$”, $B$ in place of “$C$”, and the bijection $B \to f\circ g(B)$ in place of “$f : C \to A$”) there is a bijection $f \circ g(B) \to f(A)$. Now we have a bijection $f \circ g(B) \to f(A)$ and a bijection $B \to f\circ g(B)$, so we can combine these to get a bijection $f(A) \to B$. But $f$ provides a bijection $A \to f(A)$ so we can combine these to obtain a bijection $A \to B$.

Here is a summary of all of the bijections, where each arrow represents a bijection: $A \to f(A) \leftarrow f\circ g(B) \leftarrow B$. Since the inverse of a bijection is also a bijection, we can traverse the arrows from $A$ to $B$ to obtain a bijection.

# Exercise 8.3.2

## Exercise statement

Let $A,B,C$ be sets such that $A \subseteq B \subseteq C$, and suppose that there is an injection $f : C \to A$. Define the sets $D_0, D_1, D_2, \ldots$ recursively by setting $D_0 := B\setminus A$, and then $D_{n+1} := f(D_n)$ for all natural numbers $n$. Prove that the sets $D_0, D_1, \ldots$ are all disjoint from each other (i.e., $D_n \cap D_m = \emptyset$ whenever $n \ne m$). Also show that if $g : A \to B$ is the function defined by setting $g(x) := f^{-1}(x)$ when $x \in \bigcup_{n=1}^\infty D_n$, and $g(x) := x$ when $x \notin \bigcup_{n=1}^\infty D_n$, then $g$ does indeed map $A$ to $B$ and is a bijection between the two. In particular, $A$ and $B$ have the same cardinality.

## Hints

1. Draw a picture.
2. You don’t lose much by taking $C=B$, and this simplifies the picture.
3. Beware that Tao’s notation is non-standard here: typically in setting up the Schröder–Bernstein theorem we are given injections $f : A \to B$ and $g : B \to A$ (with no subset relation assumed between $A$ and $B$) and are asked to define a bijection $h : A \to B$. In Tao’s book, the usual $f$ is just the inclusion map $\iota_{A\to B} : A\to B$ (see Exercise 3.3.8 for a definition), the usual $g : B\to A$ is instead called $f: C \to A$, and the usual bijection $h : A \to B$ is instead called $g : A \to B$. If you are referring to other expositions as you work on this exercise, you may want to temporarily convert Tao’s notation to match the notation in these other expositions.

## How to think about the exercise

This is a pretty tough exercise, if you really want to understand what is going on instead of just bashing out a proof.

To simplify the exposition, let us assume that $C = B$. At the end of this section we will show how to adjust in the case where we only assume $B \subseteq C$, and in the model solution below we will revert to only assuming $B \subseteq C$.

To set the stage, let’s draw a picture of two sets $A, B$ with $A \subseteq B$:

(Throughout this post, I will be copying from Richard Hammack’s exposition the idea of using suggestive shapes for the two sets as well as the idea generally of visually representing the sets by embedding them within each other.)

We define $D_0 := B \setminus A$:

We have an injection $f : C \to A$, i.e. $f : B \to A$ since we assumed $C=B$. This means that we can put a copy $f(B) \subseteq A$ of the set $B$ inside $A$:

Notice two things: first the shape of $f(B)$ is the same as the shape of $B$. This is because we assumed that $f$ is injective, hence it doesn’t “deform” the shape by mapping multiple starting points onto the same point. Second, $f(B)$ does not take up all of $A$, since we didn’t assume that $f$ is a bijection.

Now we define $D_1 := f(D_0)$. Since $D_0 \subseteq B$, we have $f(D_0) \subseteq f(B)$. Thus we should see a copy of the green stuff within the copy of $B$ that we produced:

We can keep going further with this same process. Our copy of $B$ (namely $f(B)$) has a copy of $A$ within it (namely $f(A)$) so using $f$ we can place a copy of our copy of $B$ (namely $f(f(B))$) inside our copy of $A$ (namely $f(A)$):

By the injectivity of $f$, this “copy of a copy” of $B$ again has a shape similar to that of the original $B$. The green stuff that is $D_1$ is embedded within $f(B)$, so it also gets copied inside of $f(f(B))$. We can define $D_2 := f(D_1)$ and see this as the third layer of green stuff in the picture:

Hopefully by this point you get the idea. As we make more and more copies of $B$ within copies of $A$ within copies of $B$, we accumulate more and more green stuff. The union of all of the green stuff is $\bigcup_{n=0}^\infty D_n$, and the white stuff we are left with is the complement, $B \setminus \bigcup_{n=0}^\infty D_n$.

Having all of these copies within copies is all nice and trippy, but how does this help us define a bijection $g : A \to B$? What do we even want, in terms of the picture we have drawn? We want to take each point in $A$ and map it to some point in $B$ in such a way that each point in $B$ corresponds to exactly one point in $A$. The trick to doing this is the following: each point colored in white can stay put, and each point colored in green can “climb up” one level to hit a green point in an outer level:

In other words, the points in $D_1$ map to $D_0$ via $f^{-1}$, the points in $D_2$ map to $D_1$ via $f^{-1}$, and so on (that’s all the green stuff “climbing up”), while the points in the rest of $A$ just stay where they are (that’s all the white stuff). If you look at the picture, I hope you can see that no matter which point in $B$ you pick, there will be exactly one point in $A$ that maps to it.

Now that we have the idea, how do we write down the formal definition of this bijection $g : A \to B$? The green stuff is $\bigcup_{n=0}^\infty D_n$, so you might be tempted to write the following:

$\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=0}^\infty D_n \\ x & \text{otherwise}\end{cases}$

But actually, what is $D_0$? We defined $D_0 := B\setminus A$, so the points in $D_0$ aren’t even in the domain of $g$! This means we don’t need to include $D_0$ in the union, so we can instead define:

$\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=1}^\infty D_n \\ x & \text{otherwise}\end{cases}$

with the union index starting at $1$. In the end, starting the union at $0$ or $1$ does not change the behavior of $g$: we never encounter the case where $x \in D_0$ because $A \cap D_0$ is empty.

That concludes the intuition for the construction outlined in the exercise statement. Here are a few other points I want to make:

• Why do we need to show that the sets $D_0, D_1, D_2, \ldots$ are all disjoint from each other? It is not necessary to show this in order to prove the bijectivity of $g$. My understanding is that it is just an interesting additional fact one can prove about this setup, and also justifies drawing the separate green parts in a way that looks disjoint.
• Why do we assume that $f : C \to A$ is an injection rather than a bijection (as was originally the case in the 1st and 2nd editions of the book)? I haven’t thought about this in detail. I think it makes sense to use the weaker assumption of injectivity if bijectivity is not required, especially since the Schröder–Bernstein theorem only assumes injectivity.
• What happens when we only assume $B \subseteq C$ rather than $C = B$ as above? The pictures above change slightly: now there is a bigger set $C$ surrounding both $B$ and $A$. Otherwise the pictures don’t change. In the proof there is slightly more work since we must verify that when we use $f^{-1}$ in the definition of $g$, the output lands in $B$ instead of $C$.
• Since $f$ is assumed to be an injection rather than a bijection, what does it even mean to write $f^{-1}(x)$ for some $x \in A$? For an injection $f : C \to A$, the map $\tilde f : C \to f(C)$ is a bijection (why?), so $\tilde f^{-1} : f(C) \to C$ is a (bijective) function. So whenever $x \in f(C)$ we can define $f^{-1}(x) := \tilde f^{-1}(x)$. If $x \notin f(C)$ we leave $f^{-1}(x)$ undefined. This means that in the model solution below, when we define $g$, we must verify that if $x \in \bigcup_{n=1}^\infty D_n$ then $x \in f(C)$.
• Why did Tao assume that $A \subseteq B$ instead of allowing $A, B$ to be unrelated sets (like in most expositions of the Schröder–Bernstein theorem)? Assuming $A \subseteq B$ simplifies the drawing: if $A,B$ are unrelated sets, one would need to draw two pictures instead of one (like on this page). By making $A$ a subset of $B$, there is just one picture, and there is a simple rule for defining the bijection $g$: if you’re in a white area, stay put, and if you’re in a green area, “climb up” one level to take up a larger amount of space in a bijective way.

Finally, I want to consider a particularly simple case of the construction outlined in this exercise: what if the sets $A,B,C$ are all finite? If there is an injection $f:C\to A$, that means $\#(C) \leq \#(A)$ by Exercise 3.6.7. But since $A \subseteq C$, we also have $\#(A) \leq \#(C)$ by Proposition 3.6.14(c). Thus $\#(A) = \#(C)$. If $A \ne C$ then we would have $\#(A) < \#(C)$ by Proposition 3.6.14(c) again, a contradiction. Thus we must have $A = C$. Similarly we have $\#(A) \leq \#(B)$ and $\#(B) \leq \#(C) = \#(A)$ so $\#(A) = \#(B)$. Since $A \subseteq B$, we can use Proposition 3.6.14(c) to conclude again that $A=B$. Thus in the finite case, all three sets must be the same. So we can try something like $A=B=C=\{1,2,3\}$. Then $D_0 = B\setminus A = \emptyset$, and $D_1 = f(D_0) = f(\emptyset) = \emptyset$, and so on, so that $\bigcup_{n=1}^\infty D_n = \emptyset$ as well. This means that in the definition of $g$, we only ever encounter the case where $g(x)=x$. This does indeed give a bijection, since the identity map $\iota_{A\to A}$ is a bijection.

## Model solution

First we show that the sets $D_0, D_1, \ldots$ are all disjoint from each other. We do this by showing the following statement: for every natural number $n$ and every positive integer $k$, the sets $D_n$ and $D_{n+k}$ are disjoint. Once we have proved this, given two distinct natural numbers, we can let $m$ be the larger one so that $m > n$. Then we can write $m = n + k$ for some positive integer $k$ and see that $D_n$ and $D_m = D_{n+k}$ are disjoint.

To prove the statement from above, let $k$ be a positive integer. We will induct on $n$. For the base case, $n=0$, we must show that $D_0$ and $D_k$ are disjoint. But $D_0 = B \setminus A$ and $D_k = f^k(D_0)\subseteq A$ so these two sets are disjoint: if $x \in D_0 \cap D_k$ we would have $x \notin A$ and $x \in A$, a contradiction. Now suppose inductively that $D_n$ and $D_{n+k}$ are disjoint. We need to show that $D_{n+1}$ and $D_{n+1+k}$ are disjoint. Suppose for sake of contradiction that there is some $x \in D_{n+1} \cap D_{n+1+k}$. Then since $D_{n+1} = f(D_n)$ and $D_{n+1+k} = D_{n+k+1} = f(D_{n+k})$ there exist $z \in D_n$ and $z' \in D_{n+k}$ such that $x = f(z)$ and $x = f(z')$. But $f$ is injective so $f(z) = x = f(z')$ implies $z=z'$. Since $D_n$ and $D_{n+k}$ are disjoint, this is impossible. This closes the induction.

Now define $g : A \to B$ as in the exercise statement:

$\displaystyle g(x) := \begin{cases}f^{-1}(x) & \text{if } x \in \bigcup_{n=1}^\infty D_n \\ x & \text{otherwise}\end{cases}$

We first verify that $g$ maps from $A$ to $B$. The things to check are:

• $f^{-1}(x)$ is defined for each $x \in \bigcup_{n=1}^\infty D_n$: the value $f^{-1}(x)$ is only defined when $x \in f(C)$, so we must verify that if $x \in \bigcup_{n=1}^\infty D_n$ then $x \in f(C)$. So suppose $x \in \bigcup_{n=1}^\infty D_n$. Then $x \in D_{n+1}$ for some natural number $n$, so we have $x \in D_{n+1} = f(D_n) \subseteq f(C)$ as required.
• If $x \in \bigcup_{n=1}^\infty D_n$ then $g(x) := f^{-1}(x)$ lands in $B$: We have $x \in f(D_n)$ for some natural number $n$, so $x = f(z)$ for some $z \in D_n$. Thus $f^{-1}(x) = f^{-1}(f(z)) = z \in D_n$. But $D_0 = B\setminus A \subseteq B$ and $D_m \subseteq A \subseteq B$ for each positive $m$, so in any case we have $z \in B$ as required.
• If $x \notin \bigcup_{n=1}^\infty D_n$ then $g(x) := x$ lands in $B$: In this case, we still have $x \in A$ since $x$ is in the domain of $g$, which means $x \in B$ since $A \subseteq B$.

Thus $g$ maps from $A$ to $B$.

Now we show that $g$ is injective. Let $x, y \in A$. We must show that $g(x) = g(y)$ implies $x=y$. So suppose $g(x) = g(y)$. We have three cases:

• Both $x$ and $y$ are in $\bigcup_{n=1}^\infty D_n$: We have $f^{-1}(x) = f^{-1}(y)$. Since $f$ is injective, $f^{-1} : f(C) \to C$ is a bijection. Thus whenever $f^{-1}$ is defined, we can assume it acts bijectively. We thus have $x=y$.
• Neither $x$ nor $y$ is in $\bigcup_{n=1}^\infty D_n$: By definition of $g$ we have $g(x)=x$ and $g(y)=y$ so $g(x)=g(y)$ means $x=y$.
• Exactly one of the two elements is in $\bigcup_{n=1}^\infty D_n$: We can suppose $x \in \bigcup_{n=1}^\infty D_n$ and $y \notin \bigcup_{n=1}^\infty D_n$ (we can relabel $x$ and $y$ and repeat the same steps if it is the other way around). We have $g(x) = f^{-1}(x)$ and $g(y)=y$ so $f^{-1}(x) = y$. But $x \in D_{n+1}$ for some natural number $n$, so $x \in f^{n+1}(B\setminus A)$. Thus $f^{-1}(x) \in f^n(B \setminus A)$. This means $f^{-1}(x)$ is either in $B\setminus A$ or is in $\bigcup_{n=1}^\infty D_n$. But $y$ is in $A$ (as it is in the domain of $g$) so it can’t be in $B\setminus A$, and $y \notin \bigcup_{n=1}^\infty D_n$ by hypothesis of this case, so we have a contradiction. This means that this case actually isn’t possible so we don’t need to worry about it.

Now we show that $g$ is surjective. To do this, let $y \in B$. We must find some $x \in A$ such that $g(x) = y$. We have two cases, $y \in \bigcup_{n=0}^\infty D_n$ or $y \notin \bigcup_{n=0}^\infty D_n$.

• If $y \in \bigcup_{n=0}^\infty D_n$ then $y \in D_n$ for some natural number $n$ so $f(y) \in f(D_n)$, which means that $f(y) \in \bigcup_{n=1}^\infty D_n$. Thus if we let $x := f(y)$ then we have $g(x) = g(f(y)) = f^{-1}(f(y)) = y$.
• On the other hand, if $y \notin \bigcup_{n=0}^\infty D_n$ then $y$ is also not in the smaller sets $D_0$ and $\bigcup_{n=1}^\infty D_n$. From $y\notin D_0 = B\setminus A$ we see that $y \in A$, so $g(y)$ is defined. And from $y\notin\bigcup_{n=1}^\infty D_n$ we see that $g(y) = y$. Thus for $x := y$ we have $g(x) = g(y) = y$.

In each case we have found some $x \in A$ such that $g(x) = y$, so $g$ is surjective.

# Exercise 6.4.1

## Exercise statement

Prove Proposition 6.4.5.

Proposition 6.4.5 (Limits are limit points). Let $(a_n)_{n=m}^\infty$ be a sequence which converges to a real number $c$. Then $c$ is a limit point of $(a_n)_{n=m}^\infty$, and in fact it is the only limit point of $(a_n)_{n=m}^\infty$.

None.

## How to think about the exercise

This is a straightforward exercise so I don’t have anything to say.

## Model solution 1

Let $(a_n)_{n=m}^\infty$ be a sequence which converges to a real number $c$. To show that $c$ is a limit point of this sequence, let $\varepsilon > 0$ be a real number and let $N' \geq m$ be an integer. Since the sequence converges to $c$, we know that there is some $N \geq m$ such that $|a_n - c| \leq \varepsilon$ for every $n \geq N$. We want to find some $n \geq N'$ such that $|a_n - c| \leq \varepsilon$. We can take $n := N+N'$. Then we see that $n \geq N'$, and since $n \geq N$ we have $|a_n-c|\leq \varepsilon$ as required.

Next we show that $c$ is the only limit point of the sequence. Suppose for sake of contradiction that there there is another limit point $c' \ne c$. Put $\varepsilon := |c-c'|/3$. Since $c\ne c'$, we know that $\varepsilon$ is positive. Since the sequence converges to $c$, there is some $N \geq m$ such that $|a_n-c|\leq \varepsilon$ for all $n\geq N$. So fix this $N$. Since $c'$ is a limit point, there exists an $n \geq N$ such that $|a_n - c'| \leq \varepsilon$. Fix this $n$. Since $n \geq N$, we also know that $|a_n-c|\leq \varepsilon$. But now we have

$|c-c'| = |c-a_n+a_n-c'| \leq |c-a_n| + |a_n-c'| \leq 2|c-c'|/3$

which is a contradiction since $|c-c'| > 0$.

## Model solution 2

Here’s an alternative way to prove the second part of this exercise.

We want to show that $c$ is the only limit point of the sequence. Let $c'$ be a limit point (not necessarily distinct from $c$), and let $\varepsilon > 0$ be arbitrary. Since the sequence converges to $c$, there is some $N \geq m$ such that $|a_n - c| \leq \varepsilon/2$ for all $n \geq N$. Fix this $N$. Since $N \geq m$ and $c'$ is a limit point, there is some $n \geq N$ such that $|a_n - c'| \leq \varepsilon/2$. Fix this $n$. Since $n \geq N$ and the sequence converges to $c$, we have $|a_n - c| \leq \varepsilon/2$. Thus we have both $|a_n-c'| \leq \varepsilon/2$ and $|a_n - c| \leq \varepsilon/2$. But this means $|c-c'| = |c-a_n+a_n-c'|\leq |c-a_n| + |a_n-c'| \leq \varepsilon$. Since $\varepsilon > 0$ was arbitrary, we have just shown that $|c-c'| \leq \varepsilon$ for every $\varepsilon > 0$. By Exercise 5.4.7 this means that $c=c'$.

# Exercise 3.3.1

## Exercise statement

Show that the definition of equality in Definition 3.3.7 is reflexive, symmetric, and transitive. Also verify the substitution property: if $f, \tilde f: X \to Y$ and $g, \tilde g: Y \to Z$ are functions such that $f = \tilde f$ and $g = \tilde g$, then $g \circ f = \tilde g \circ \tilde f$.

None.

## How to think about the exercise

This is a straightforward exercise. I do want to mention a couple of things though:

• Definition 3.3.7 assumes that there is a notion of equality on elements of the range. If there was no such notion, then the statement $f(x)=g(x)$ that appears in the definition wouldn’t make sense.
• The exercise further assumes that the notion of equality on elements of the range obey the properties of an equivalence relation (reflexive, symmetric, transitive) as well as the substitution axiom; these four properties are called the axioms of equality in Appendix A.7. There is no way to complete this exercise without assuming these properties.

## Model solution

To show the definition is reflexive, let $f:X \to Y$ be a function. We must show $f = f$. Let $x \in X$. Then $f(x) = f(x)$ by reflexivity of equality on elements of $Y$. Since $x$ was arbitrary, this shows that $f=f$.

To show the definition is symmetric, let $f : X \to Y$ and $g : X \to Y$ be two functions with the same domain and range. Suppose $f=g$. This means that $f(x)=g(x)$ for all $x \in X$. To show that $g=f$, we must show that $g(x)=f(x)$ for all $x \in X$. So let $x \in X$. Then since $f=g$ we have $f(x)=g(x)$. By the symmetry of equality on the elements of $Y$, we have $g(x)=f(x)$. Since $x$ was arbitrary, this shows that $g=f$.

To show the definition is transitive, let $f:X \to Y$, $g:X\to Y$, and $h:X\to Y$ be functions with the same domain and range. Suppose $f=g$ and $g=h$. We must show that $f=h$. So let $x \in X$. Then we have $f(x)=g(x)$ since $f=g$. We also have $g(x)=h(x)$ since $g=h$. By transitivity of equality on elements of $Y$, we have $f(x)=h(x)$. Since $x$ was arbitrary, this shows that $f=h$.

Now let $f, \tilde f: X \to Y$ and $g, \tilde g: Y \to Z$ be functions such that $f = \tilde f$ and $g = \tilde g$. We want to show $g \circ f = \tilde g \circ \tilde f$. Let $x \in X$. Then since $f = \tilde f$ we have $f(x) = \tilde f(x)$. By the substitution axiom of equality for elements of $Y$ (see Appendix A.7), we have $g(f(x)) = g(\tilde f(x))$. Since $g = \tilde g$, we have $g(\tilde f(x)) = \tilde g(\tilde f(x))$. Now we have $(g\circ f)(x) = g(f(x)) = g(\tilde f(x)) = \tilde g(\tilde f(x)) = (\tilde g \circ \tilde f)(x)$ using symmetry and transitivity of equality on elements of $Z$. Since $x \in X$ was arbitrary, this shows that $g \circ f = \tilde g \circ \tilde f$ as required.