Matching Theory

October 1, 2014 by . 0 comments

Matching theory is an active field in mathematics, economics, and computer science. It ensured a Nobel memorial prize for Alvin E. Roth and Lloyd S. Shapley in 2012. The theory is applied in the real world to match students and colleges, doctors and hospitals, and to organize the allocation of donor organs. It all started with a beautiful paper by David Gale and Lloyd Shapley in 1962: College Admissions and the Stability of Marriage (read it!). In this post, we study the simplest version of what is known as the stable marriage problem and take a look at inherent conflicts. The exposition follows chapter 22 of the wonderful book Game Theory by Michael Maschler, Eilon Solan, and Shmuel Zamir.

What are stable matchings?

There is a set \( G \) of \( n \) girls and a set \( B \) of \( n \) boys. All girls and boys are heterosexual and desperate enough to prefer every member of the opposite sex to staying single. Each girl \( g \) has preferences over the boys, represented by a total order \( \succeq_g \) on \( B \). Similarly, each boy \( b \) has preferences over the girls, represented by a total order \(\succeq_b\) on \( G \).

We want to pair up the girls and the boys. Formally, a matching is simply a bijection \( M \) from \(G\) to \(B\) and if \( f(g)=b \), we say that \( g \) and \( b \) are matched (under \( M\)). Now, nobody can force the girls and boys to be together, so we have to look at matchings in which the boys and girls are relatively content with whom they get. To be precise, a matching is stable if we cannot find a girl and a boy who prefer each other under their preference ordering to whomever they are matched with.

Are there any?

Before we go into a deep study of stable matchings, we ought to make sure that there are stable matchings. We do so by giving an explicit algorithm for finding a stable matching, the boy courtship algorithm.

Theorem: There is at least one stable matching.

The algorithm starts with each boy going to the girl he ranks highest according to his preference ordering. Then each girl who is visited by some boy selects her favorite boy among them (for now) and rejects the others. In the next Pound each boy is going to the girl he ranks highest according to his preference ordering among the girls that have not rejected him yet. This is repeated until each girl is visited by exactly one boy, which provides us with a matching.

There are three properties of this algorithm we are going to use a lot:

(P1) If a boy visits a girl, he must have visited all girls who he prefers before and have been rejected by them.

(P2) If two boys visit the same girl at some point without being immediately rejected, the boy who visited her later must be preferred by her. For a boy will visit a girl until she rejects him (every preferred girl already has) and she will only reject him if someone better comes along.

(P3) If a girl is visited at some point, she will be visited ever after, for she only rejects a boy if someone better comes along and a boy only stops visiting a girl he has visited before if she rejects him.

We first note that the algorithm does terminate. By (P3), if every girl is visited at some time, all girls are visited eventually. Since there are as many girls as there are boys, this can only be if there is a matching. So it suffices to show that every girl is visited eventually. If some girl is not visited, some other girl must be visited by more than one boy and reject at least one boy. Since no boy visits a girl who has rejected him again, after a boy has been rejected by every girl who is visited by some boy, he has to visit a girl who hasn’t been visited before. Eventually, there can be no unvisited girl.

So the algorithm has produced a matching. We want to show it is stable. Suppose it is not then there are a boy, say Bob, and a girl, say Ann, who prefer each other to whomever they are matched with. Since Bob prefers Ann to his current partner, Bob must have visited Ann before and be rejected by her by (1). So Ann’s current partner must be preferred to Bob by (P2) in contradiction to Ann and Bob preferring each other to their current partners.

Who profits?

Now the boy start to complain that the whole system is unfair. They have to do all the proposing and get their ego crushed by rejection, while the girls can just choose and keep the boys coming to be judged. We can formulate the interests of boys by a partial order \( \succeq_B \) on the set of all matchings. If \( M \) and \( L \) are matchings, we have \( M\succeq_B L\) if every boy considers the girl he is matched with under \( M \) to be as least as good as the girl he is matched with under \( L \). Call \( M_B \) be the matching we get from the boy courtship algorithm. It turns out the boys are not that bad off after all:

Theorem: For every stable matching \( M \), we have \( M_B\succeq_B M \).

By (P1), it suffices to show that if a girl rejects a boy under the boy courtship algorithm at some point, there is no stable matching in which he is matched with that girl. We prove this by induction on the steps of the boy courtship algorithm.

If a girl, Ann, rejects a boy, Bob, at the first stage of the algorithm, she does so because there is a boy, Carl, she prefers for whom she is the top choice. So in any matching in which she is matched with Bob, she and Carl would prefer each other to whomever she is matched with. So there can be no stable match in which Ann and Bob are matched.

Now suppose that no boy could be with a girl that has rejected him in the first \( k \) stages of the algorithm and assume that Ann rejects Bob in stage \( k+1 \). So she must choose a guy, Ken, who is visiting her in the same round over Bob. Now by assumption, Ken could be with no girl that has rejected him before in any stable matching and by (P1), these are exactly the girls Ken prefers to Ann. Now suppose there is a stable matching in which Ann and Bob are matched. By assumption Ken is matched to someone worse than Ann. So Ken and Ann would prefer each other to their current partners. This contradiction shows that Ann and Bob cannot be matched under a stable matching.

Do the girls like it?

We can also define a partial order for the girls. For two matchings \( M \) and \( L \), we have \( M\succeq_G L \) if every girl considers the boy she is matched with under \( M \) at least as good as the boy she is matched with under \( L \). By a symmetry argument, we can exchange the roles of the boys and the girls and get a girl courtship algorithm that gives rise to a stable matching \( M_G \). Again by symmetry, we have \( M_G\succeq_G M \) for every stable matching \( M \). So how do \( \succeq_B \) and \( \succeq_G \) relate to each other?

Theorem: For any two stable matchings \( M_1 \) and \( M_2 \), one has \( M_1\succeq_B M_2 \) if and only if \( M_2\succeq_G M_1 \).

Without loss of generality, it suffices to show that \( M_1\succeq_B M_2 \) implies \( M_2\succeq_G M_1 \). Let Ann be a girl that is matched with different boys under \( M_1 \) and \( M_2 \). Say, she is matched with Bob under \( M_1 \) and with Ken under \( M_2 \). We want to show that she prefers Ken to Bob. Since \( M_1\succeq_B M_2 \), Bob prefers Ann to the girl he is matched with under \( M_2 \), Claire. If Ann would prefer Bob to Ken, then Ann and Bob would be better of with each other than with their partners under \( M_2 \). But \( M_2 \) is a stable matching, so Ann does not prefer Bob to Ken. So Ann prefers Ken to Bob.

Playing with Partitions: Euler’s Pentagonal Theorem

August 25, 2014 by . 7 comments


Today we will play a small game which is really really simple. We will add up numbers to make numbers. And to keep the matters simple we will only deal with counting numbers \(1, 2, 3, \ldots\) As an example \(2 + 3 = 5\), but to make the game challenging we would reverse the problem. Let’s ask how we can split \(5\) into sum of other numbers. For completeness we take \(5\) also as one of the ways to express \(5\) as sum of numbers. Clearly we have the following other ways $$ 5 = 4 + 1 = 3 + 1 + 1 = 3 + 2 = 2 + 2 + 1 = 2 + 1 + 1 + 1 = 1 + 1 + 1 + 1 + 1$$ so that there are \(7\) different ways of adding numbers to make \(5\). We don’t take into account the order of summands. Also one number can be repeated if needed.

Partitions of a Number

It turns out that a whole generation of distinguished mathematicians (Euler, Jacobi, Ramanujan, Hardy, Rademacher etc) were also interested in playing the above game. And needless to say, they made the whole thing very systematic by adding some definitions (its their silly habit so to speak). Following their footprints we say that a tuple \((n_{1}, n_{2}, \ldots, n_{k})\) of positive integers is a partition of a positive integer \(n\) if $$n_{1} \geq n_{2} \geq \cdots \geq n_{k}\text{ and }n_{1} + n_{2} + \cdots + n_{k} = n$$ Thus \((3, 2), (3, 1, 1)\) etc are partitions of \(5\). If \((n_{1}, n_{2}, \ldots, n_{k})\) is a partition of \(n\) then we say that each of the \(n_{i}\) is a part of this partition, \(k\) is the number of parts, \(n_{1}\) is the greatest part and \(n_{k}\) the least part of this partition.

The mathematicians were really not so interested in finding individual partitions of a number \(n\), but were rather interested in finding out the total number of partitions of \(n\). The example above deals with \(n = 5\) and clearly there are \(7\) partitions of \(5\). You should convince yourself by taking a slightly larger number, say \(n = 8\) or \(n = 10\), that finding the number of partitions of any given number \(n\) is not that easy. Especially writing out each partition of \(n\) and then counting them all is very difficult. You may always feel that probably you have missed one of the partitions. Mathematicians however found out a smart way to count partitions. We explore this technique further.

Counting via Method of Coefficients

To count partitions of \(n\) we need to count tuples \((n_{1}, n_{2}, \ldots, n_{k})\) such that \(n_{1} + n_{2} + \cdots + n_{k} = n\). Let us first try to solve a simpler problem. Suppose we need to find tuples \((x, y)\) such that \(x + y = 10\). Also let’s assume that \(x, y\) may be zero or positive integers. Clearly we have only \(11\) options \((0, 10), (1, 9), \ldots, (10, 0)\). Here we are paying attention to order also.

Let’s try to count tuples \((x, y, z)\) satisfying \(x + y + z = 10\). That does not seem so easy like the previous problem. But let’s work methodically. Let \(x = 0\) so that \(y + z = 10\) and we have \(11\) tuples \((y, z)\) for this problem. If \(x = 1\) then \(y + z = 9\) and we have \(10\) tuples \((y, z)\) satisfying \(y + z = 9\). Similarly we try for \(x = 2, 3, \ldots, 10\) and then the total count of all the tuples \((x, y, z)\) is $$11 + 10 + 9 + \cdots + 1 = \frac{11\cdot 12}{2} = 66$$ Following this argument we can see that the number of non-negative integral solutions to \(x + y + z = n\) is \((n + 1)(n + 2)/2\). Similarly this argument can be extended to show that the number of solutions to $$x_{1} + x_{2} + \cdots + x_{m} = n$$ is \(\displaystyle \binom{n + m – 1}{m – 1}\).

There is however another simple way to get the same result. Consider the expression $$(1 + x + x^{2} + \cdots + x^{i} + \cdots)(1 + x + x^{2} + \cdots + x^{j} + \cdots) = \frac{1}{1 – x}\cdot\frac{1}{1 – x}$$ If we multiply the two series on left side we get expressions of type \(x^{i + j}\) and there will be as many terms like \(x^{n}\) as there are number of solutions to \(i + j = n\). It follows (also note the right side) that the number of non-negative integer solutions to \(i + j = n\) is equal to the coefficient of \(x^{n}\) in \((1 – x)^{-2}\). Similarly the number of solutions to \(x_{1} + x_{2} + \cdots + x_{m} = n\) is the coefficient of \(x^{n}\) in \((1 – x)^{-m}\).

As a variation if we want solutions \((i, j)\) with \(i \geq 1, j \geq 2\) and \(i + j = n\) then we need to start the series corresponding to \(i\) with \(x\) and that for \(j\) with \(x^{2}\) so that the number of solutions is equal to the coefficient of \(x^{n}\) in $$(x + x^{2} + \cdots)(x^{2} + x^{3} + \cdots) = x^{3}(1 – x)^{-2}$$ Let’s now try out the equation \(i + 2j = n\). In this case the series should look like $$(1 + x + x^{2} + \cdots + x^{i} + \cdots)(1 + x^{2} + x^{4} + \cdots + x^{2j} + \cdots) = \frac{1}{(1 – x)(1 – x^{2})}$$ Similarly we can handle equations like \(i + 2j + 3k + l = n\).

Now we get back to counting partitions of \(n\). Clearly the least part in a partition can be \(1\) and the largest part can be \(n\). Let the integer \(i\) be a part in a partition of \(n\) and let it be repeated \(k_{i}\) times in that partition so that $$1\cdot k_{1} + 2 \cdot k_{2} + \cdots + n\cdot k_{n} = n$$ Now each of the variables \(k_{1}, k_{2}, \ldots, k_{n}\) is a non-negative integer. Thus total number of partitions of \(n\) is equal to the number of solutions of the above equation. However there is a catch here as the number of variables \(k_{1}, k_{2}, \ldots\) is itself dependent on \(n\). It turns out that the corresponding product is of the form $$(1 + x + \cdots + x^{1\cdot k_{1}} + \cdots)(1 + x^{2} + \cdots + x^{2\cdot k_{2}} + \cdots)\cdots(1 + x^{i} + \cdots + x^{i \cdot k_{i}} + \cdots)\cdots$$ and this evaluates to $$\frac{1}{(1 – x)(1 – x^{2})(1 – x^{3})\cdots}$$ It follows that if \(p(n)\) is the number of partitions of \(n\) and \(p(0) = 1\) then $$\sum_{n = 0}^{\infty}p(n)x^{n} = \frac{1}{(1 – x)(1 – x^{2})(1 – x^{3})\cdots}$$ In case of simple equations like \(i + 2j = n\) the products used to get simplified in the form of a finite number of terms of type \(1/(1 – x^{i})\) and using binomial theorem and multiplication of series we could get a finite formula for the number of solutions to \(i + 2j = n\) in terms of \(n\). But for the expression $$\frac{1}{(1 – x)(1 – x^{2})(1 – x^{3})\cdots}$$ we don’t have a general formula for the \(n^{\text{th}}\) term (because of the infinite number of factors involved). Are we doomed then?

Euler’s Pentagonal Theorem

Luckily the answer is No and we have some saving grace as we do have a formula for the \(n^{\text{th}}\) term of $$(1 – x)(1 – x^{2})(1 – x^{3})\cdots$$ Euler found this formula by multiplying the product by hand to get terms upto \(x^{52}\) and guessed a pattern for the coefficients in the resulting series. However it took him some years to give a theoretical justification for the formula and thereby provide a sound proof. We have the formula $$(1 – x)(1 – x^{2})(1 – x^{3})\cdots = \sum_{j = -\infty}^{\infty}(-1)^{j}x^{(3j^{2} + j)/2}$$ so that we have to form terms for all integral values of \(j\) (negative and zero also included) and for each \(j\) we have a term \(x^{(3j^{2} + j)/2}\) and its coefficient is \((-1)^{j}\). Thus there is a term like \(x^{k}\) in the above series expansion if and only if \(k\) is of the form \((3j^{2} + j)/2\) with \(j\) being some integer. Such numbers \(k\) are called pentagonal numbers and for this very reason the Euler’s formula above is called the Pentagonal Theorem.

Putting \(j = 0, -1, 1, -2, 2, -3, 3\) we get the first few terms of the series as $$1 – x – x^{2} + x^{5} + x^{7} – x^{12} – x^{15} + \cdots$$ As we have mentioned earlier it took some years for Euler to prove his formula and one might expect the proof to be extremely difficult. It turns out that the proof is not difficult but definitely very non-obvious and requires good amount of heavy mathematical machinery. Near the end of nineteenth century an American mathematician F. Franklin found a marvelous proof which involved no machinery at all, but rather arguments of a very different nature (termed “combinatorial” nowadays). This is what we discuss next.

Combinatorial Proof of Euler’s Pentagonal Theorem

Rather than focusing on \((1 – x)(1 – x^{2})\cdots\) let us first focus on $$(1 + x)(1 + x^{2})(1 + x^{3})\cdots$$ which might be simpler because of two reasons:

  • it involves only \(+\) signs
  • it is already of the form \((1 + x^{i} + \cdots)(1 + x^{j} + \cdots)\cdots\)

This second part really helps us a lot and tells us that the coefficient of \(x^{n}\) in the new product can be interpreted as the number of solutions to \(k_{1} + k_{2} + \cdots + k_{j} = n\) where all the \(k\)’s are different (apart from \(x^{0} = 1\) none of the factors has same power of \(x\)). This means that the coefficient of \(x^{n}\) in $$(1 + x)(1 + x^{2})(1 + x^{3})\cdots$$ gives us the number of partitions of \(n\) with distinct parts.

Thus for \(n = 5\) we have three such partitions \((5), (4, 1), (3, 2)\). Let \(q(n)\) be the coefficient of \(x^{n}\) in \((1 + x)(1 + x^{2})\cdots\) then \(q(5) = 3\). Also we split \(q(n)\) into multiple terms based on number of parts so that \(q_{k}(n)\) denotes the number of partitions of \(n\) into \(k\) distinct parts and we have $$q(n) = q_{1}(n) + q_{2}(n) + \cdots = \sum_{k \geq 1}q_{k}(n)$$ We can now attack our original problem of finding coefficient \(r(n)\) of \(x^{n}\) in the product $$(1 – x)(1 – x^{2})(1 – x^{3})\cdots$$ Clearly when we multiply the individual factors and consolidate the terms \(x^{n}\) we get terms corresponding to each partition of \(n\) with distinct parts, but not each such term will be added (like in case of the product \((1 + x)(1 + x^{2})\cdots\)), rather each such term will have a \(+\) or a \(-\) sign depending on the number of parts in the partition (if number of parts is even then it will be \(+\) otherwise a \(-\)). Thus to get \(x^{5}\) we have three terms namely \(-x^{5}, x^{5}, x^{5}\) corresponding to the partitions \((5), (4, 1), (3, 2)\) respectively. So the net result is the term \(x^{5}\).

It follows that $$r(n) = -q_{1}(n) + q_{2}(n) – q_{3}(n) + \cdots = \sum_{k \geq 1}(-1)^{k}q_{k}(n)$$ where \(r(n)\) is the coefficient of \(x^{n}\) in $$(1 – x)(1 – x^{2})(1 – x^{3})\cdots$$ Thus to find the coefficient \(r(n)\) we need to count partitions of \(n\) with distinct parts and try to see “how many (say \(A\)) of them have even number of parts” and “how many (say \(B\)) have odd number of parts”. Then \(r(n) = A – B\). Franklin’s great (and at the same time simple) idea was to analyze the partitions of \(n\) with distinct parts and show that most of the time there would be a pairing between partitions with even number of parts and those with odd number of parts so that mostly \(A = B\) and hence the coefficient \(r(n) = 0\).

Only in the exceptional cases there will be a few partitions (of “even” or “odd” type) which will be left out without any pairing with opposite type. Franklin found out exactly which partitions could be paired up and which ones could not. He found that depending on value of \(n\) either there will be no exceptional partitions of this kind or there will be at most only one such partition. We present Franklin’s beautiful idea next.

Franklin’s Pairing of Partitions

To proceed further we fix a pictorial notation for a given partition. If \((n_{1}, n_{2}, \ldots, n_{k})\) is a partition then it is represented as a left-aligned two dimensional array of dots such such that number of rows in the array is equal to the number of parts in the partition (\(k\)) and the \(i^{\text{th}}\) row contains \(n_{i}\) dots. Thus the partition \((5, 4, 2, 2)\) is represented as partitionSuch a notation is very helpful in analyzing the nature of partitions and is the key to most of the combinatorial proofs in the theory of partitions.

Franklin was analyzing partitions with distinct parts via their graphical representations. He found that under certain conditions a partition with odd number of parts could be transformed into a partition with even number of terms and vice versa. His idea was to start with the first row of dots and continue till we get consecutively decreasing series of dots in each successive row. Thus for \((9, 8, 7, 5, 4)\) we can start with first row of \(9\) dots and then continue till third row of \(7\) dots. We have \(3\) such rows from top where the parts (or dots) decrease one by one. If the number of such rows is less than the last part (which is the case here as last part is \(4\)) then we can remove the last dots from each of these rows and put it after the last row to make another partition (which also has distinct parts and total number of dots remains same) which has an extra row. This changes the number of rows by one and thereby its parity. The procedure can be reversed if the number of parts is at least one more than the last part. We show this pairing below for the partitions \((9, 8, 7, 5, 4)\) and \((8, 7, 6, 5, 4, 3)\). franklin_proofMore formally let \((n_{1}, n_{2}, \ldots, n_{k})\) be a partition of \(n\) with distinct parts and let \(j\) be the smallest integer for which \(n_{j + 1} < n_{j} – 1\). If \(j < n_{k}\) then the last dots of first \(j\) rows can be removed and put as \((k + 1)^{\text{th}}\) row to make a partition with \((k + 1)\) distinct parts. If \(j \geq n_{k}\) then the \(k^{\text{th}}\) row can be removed and its dots can be placed one by one at the end of each row starting from first row.

This rearrangement will fail if \(j = k = n_{k}\) or \(j = k = n_{k} – 1\) and will work in all the other cases. In the first case we have $$n = j + (j + 1) + \cdots + (2j – 1) = \frac{3j^{2} – j}{2}$$ and in the second case $$n = (j + 1) + (j + 2) + \cdots + 2j = \frac{3j^{2} + j}{2}$$ In these two cases we are left with an unpaired partition with \(j\) parts.

In now follows that the sum \(r(n) = \sum_{k \geq 1}(-1)^{k}q_{k}(n)\) is \(0\) if \(n\) is not of the form \(n = (3j^{2} \pm j)/2\) and is \((-1)^{j}\) otherwise. We thus have $$(1 – x)(1 – x^{2})(1 – x^{3})\cdots = 1 + \sum_{j = 1}^{\infty}(-1)^{j}x^{(3j^{2} + j)/2} + \sum_{j = 1}^{\infty}(-1)^{j}x^{(3j^{2} – j)/2}$$ which is also written as $$(1 – x)(1 – x^{2})(1 – x^{3})\cdots = \sum_{j = -\infty}^{\infty}(-1)^{j}x^{(3j^{2} + j)/2}$$ which is the Euler’s Pentagonal Theorem.

Counting Partitions via Euler’s Pentagonal Theorem

We now have the formula $$\sum_{n = 0}^{\infty}p(n)x^{n} = \frac{1}{(1 – x)(1 – x^{2})\cdots} = \dfrac{1}{{\displaystyle \sum_{n = -\infty}^{\infty}(-1)^{n}x^{(3n^{2} + n)/2}}}$$ or $$\left(1 + \sum_{n = 1}^{\infty}p(n)x^{n}\right)\left(1 + \sum_{n = 1}^{\infty}(-1)^{n}x^{(3n^{2} + n)/2} + \sum_{n = 1}^{\infty}(-1)^{n}x^{(3n^{2} – n)/2}\right) = 1$$ Equating coefficients of \(x^{n}\) on both sides gives us a recursive relation to calculate \(p(n)\) namely $$p(n) = p(n – 1) + p(n – 2) – p(n – 5) – p(n – 7) + p(n – 12) + p(n – 15) – \cdots$$ where the pattern \(1, 2, 5, 7, 12, 15,\ldots\) is the sequence of pentagonal numbers of the form \((3j^{2} \pm j)/2\). We start with \(p(1) = 1\) and the convention that \(p(0) = 1, p(n) = 0\) if \(n < 0\). We have the next few calculations $$\begin{aligned}p(2) &= p(1) + p(0) = 2\\ p(3) &= p(2) + p(1) = 3\\ p(4) &= p(3) + p(2) = 5\\ p(5) &= p(4) + p(2) – p(0) = 7\\ p(6) &= p(5) + p(4) – p(1) = 11\\ p(7) &= p(6) + p(5) – p(2) – p(0) = 15\\ p(8) &= p(7) + p(6) – p(3) – p(1) = 22\\ p(9) &= p(8) + p(7) – p(4) – p(2) = 30\\ p(10) &= p(9) + p(8) – p(5) – p(3) = 42\end{aligned}$$ Indian mathematician S. Ramanujan was not so happy calculating partitions in this manner (because it involved keeping a table of all previously calculated values) and he gave an almost exact formula for \(p(n)\) in terms of \(n\) which was further refined by Hans Rademacher into an exact formula. It is of interest to note that before the Ramanujan-Rademacher formula, Euler’s pentagonal theorem was the only way out to calculate partitions of a number. The approach by Ramanujan is considerably more difficult to understand but is a gem of mathematics and we will have occasion to discuss it later on this blog.

Binary quadratic forms over the rational integers and class numbers of quadratic fields.

August 23, 2014 by . 2 comments

I wrote an article with Irving Kaplansky on indefinite binary quadratic forms, integral coefficients. At the time, I believe I used high-precision continued fractions or similar. It took me years to realize that the right way to solve Pell’s equation, or find out the “minimum” of an indefinite form (and other small primitively represented values), or the period of its continued fraction, was the method of “reduced” forms in cycles/chains, due to Lagrange, Legendre, Gauss. It is also the cheapest way to find the class number and group multiplication for ideals in real quadratic fields, this probably due to Dirichlet. For imaginary quadratic fields, we have easier “reduced” positive forms.

A binary quadratic form, with integer coefficients, is some $$ f(x,y) = A x^2 + B xy + C y^2. $$ The discriminant is $$ \Delta = B^2 – 4 A C. $$ We will abbreviate this by $$ \langle A,B,C \rangle. $$ It is primitive if \({\gcd(A,B,C)=1. }\) Standard fact, hard to discover but easy to check: $$ (A x^2 + B x y + C D y^2 ) (C z^2 + B z w + A D w^2 ) = A C X^2 + B X Y + D Y^2,$$ where \({ X = x z – D yw, \; Y = A xw + C yz + B yw. }\) This gives us Dirichlet’s definition of “composition” of quadratic forms of the same discriminant, $$ \langle A,B,CD \rangle \circ \langle C,B,AD \rangle = \langle AC,B,D \rangle. $$ In particular, if this \({D=1,}\) the result represents \({1}\) and is (\({SL_2 \mathbb Z}\)) equivalent to the “principal” form for this discriminant. Oh, duplication or squaring in the group; if \({\gcd(A,B)=1,}\) $$ \langle A,B,AD \rangle^2 = \langle A^2,B,D \rangle. $$ This comes up with positive forms: \({ \langle A,B,C \rangle \circ \langle A,-B,C \rangle = \langle 1,B,AC \rangle }\) is principal, the group identity. Probably should display some \({SL_2 \mathbb Z}\) equivalence rules, these are how we calculate when things are not quite right for Dirichlet’s rule: $$ \langle A,B,C \rangle \cong \langle C,-B,A \rangle, $$ $$ \langle A,B,C \rangle \cong \langle A, B + 2 A, A + B +C \rangle, $$ $$ \langle A,B,C \rangle \cong \langle A, B – 2 A, A – B +C \rangle. $$

Imaginary first. Suppose we want to know about \({\mathbb Q(\sqrt {-47}).}\) Reduced positive forms \({ \langle A,B,C \rangle }\) obey \({|B| \leq A \leq C}\) and \({B \neq -A,}\) plus whenever \({A = C,}\) we have \({B \geq 0.}\) Our group of binary forms is

$$ \begin{array}{rrrrr} \langle & 1, & 1, & 12 & \rangle \\ \langle & 2, & -1, & 6 & \rangle \\ \langle & 2, & 1, & 6 & \rangle \\ \langle & 3, & -1, & 4 & \rangle \\ \langle & 3, & 1, & 4 & \rangle \end{array} $$

This is an abelian group in any case, so it is cyclic of order 5. These are also the five elements in the ring of integers of \({\mathbb Q(\sqrt {-47}).}\) Here is the mapping from forms to ideals: given \({ \langle A,B,C \rangle, }\) drop the letter \({C.}\) That’s it. $$ \langle A,B,C \rangle \mapsto \left[ A, \frac{B + \sqrt \Delta}{2} \right]. $$ Oh, why is this an ideal, rather than just some \({\mathbb Z}\)-lattice? Because, given \({\alpha,\beta}\) rational integers, $$ \left[ \alpha, \frac{\beta + \sqrt \Delta}{2} \right] $$ is an ideal if and only if $$ 4 \alpha | ( \Delta – \beta^2 ). $$

Group: we already see how to do $$ \langle 2,1,6 \rangle^2 \cong \langle 4,1,3 \rangle \cong \langle 3,-1,4 \rangle; $$ $$ \langle 2,1,6 \rangle \circ \langle 3,-1,4 \rangle \cong \langle 2,5,9 \rangle \circ \langle 3,5,6 \rangle \cong \langle 6,5,3 \rangle \cong \langle 3,-5,6 \rangle \cong \langle 3,1,4 \rangle; $$ $$ \langle 2,1,6 \rangle \circ \langle 3,1,4 \rangle \cong \langle 6,1,2 \rangle \cong \langle 2,-1,6 \rangle. $$ $$ \langle 2,1,6 \rangle \circ \langle 2,-1,6 \rangle \cong \langle 1,1,12 \rangle $$ in any case.

Oh, not at all by the way, if we want \({\mathbb Q(\sqrt {K})}\) for squarefree \({K \equiv 2,3 \pmod 4,}\) we actually use quadratic forms of discriminant \({\Delta = 4K.}\) Not my fault. Indeed, below, in the Pell solution for \({61,}\) the discriminant is actually \({244,}\) there is a single cycle of discriminant \({61}\) which looks a bit different, mainly due to the odd numbers in the middle coefficient.

There are a few extra tricks for indefinite forms. The right way to calculate things is to use “reduced” forms. The definition of \({ \langle A,B,C \rangle }\) being reduced is $$ 0 < B < \sqrt \Delta, \; \; \mbox{and} \; \; \sqrt \Delta -B < 2 |A| < \sqrt \Delta + B, $$ this being equivalent to $$ 0 < B < \sqrt \Delta, \; \; \mbox{and} \; \; \sqrt \Delta -B < 2 |C| < \sqrt \Delta + B. $$

If you actually calculate enough Conway topographs, you realize eventually that

THEOREM \({ \langle A,B,C \rangle }\) is reduced if and only if $$ AC < 0 \; \; \mbox{and} \; \; B > |A+C| $$ I have never seen this in print, it should be attributed to Conway or to Jagy or to Marty Weissman of UC Santa Cruz and Singapore; Conway knows much more than he writes down. With the required relation $$ B^2 – 4 AC = \Delta $$ and evident inequalities, this gives a quick way to find all reduced forms.

Now, different reduced forms may be equivalent, which sounds bad, but the cycle method quickly decides these relationships. No decimal accuracy or “cycle detection” is ever required. A single calculation is done, \({\left\lfloor \sqrt \Delta \right\rfloor }\) and remembered forever. I actually wrote Newton’s square root method with integer arithmetic. Everything else, forever, is integer arithmetic. Given a form \({ \langle A,B,C \rangle }\) we want to find its “right neighbor.” First, we find a nonzero integer I like to call \({\delta,}\) with $$ \delta C > 0 \; \; \mbox{and} \; \; |\delta| = \; \left\lfloor \; \frac{B + \left\lfloor \sqrt \Delta \right\rfloor }{2|C|} \; \right\rfloor . $$ The absolute values of the \({\delta}\)’s are the partial quotients in the continued fraction of, well, something. Note that \({B}\) and \({\left\lfloor \sqrt \Delta \right\rfloor }\) are positive.

Alright, given a \({\delta,}\) the form and its right neighbor are $$ \langle A,B,C \rangle \rightarrow \langle C,-B + 2 C \delta, A – B \delta +C \delta^2 \rangle. $$ For those who know what a Gram matrix is, call it \({G,}\) the Gram matrix of the right neighbor is \({P^T G P,}\) where $$ P \; = \; \left( \begin{array}{rr} 0 & -1 \\ 1 & \delta \end{array} \right). $$

Keeping a cumulative product of these \({P}\) matrices is how we solve Pell, quickly and without tears.

$$ \begin{array}{rrrrrrrr} 0. & \langle & 1, & 14, & -12 & \rangle & \delta= & -1 \\ 1. & \langle & -12, & 10, & 3 & \rangle & \delta= & 4 \\ 2. & \langle & 3, & 14, & -4 & \rangle & \delta= & -3 \\ 3. & \langle & -4, & 10, & 9 & \rangle & \delta= & 1 \\ 4. & \langle & 9, & 8, & -5 & \rangle & \delta= & -2 \\ 5. & \langle & -5, & 12, & 5 & \rangle & \delta= & 2 \\ 6. & \langle & 5, & 8, & -9 & \rangle & \delta= & -1 \\ 7. & \langle & -9, & 10, & 4 & \rangle & \delta= & 3 \\ 8. & \langle & 4, & 14, & -3 & \rangle & \delta= & -4 \\ 9. & \langle & -3, & 10, & 12 & \rangle & \delta= & 1 \\ 10. & \langle & 12, & 14, & -1 & \rangle & \delta= & -14 \\ 11. & \langle & -1, & 14, & 12 & \rangle & \delta= & 1 \\ 12. & \langle & 12, & 10, & -3 & \rangle & \delta= & -4 \\ 13. & \langle & -3, & 14, & 4 & \rangle & \delta= & 3 \\ 14. & \langle & 4, & 10, & -9 & \rangle & \delta= & -1 \\ 15. & \langle & -9, & 8, & -5 & \rangle & \delta= & 2 \\ 16. & \langle & 5, & 12, & -5 & \rangle & \delta= & -2 \\ 17. & \langle & -5, & 8, & 9 & \rangle & \delta= & 1 \\ 18. & \langle & 9, & 10, & -4 & \rangle & \delta= & -3 \\ 19. & \langle & -4, & 14, & 3 & \rangle & \delta= & 4 \\ 20. & \langle & 3, & 10, & -12 & \rangle & \delta= & -1 \\ 21. & \langle & -12, & 14, & 1 & \rangle & \delta= & 14 \\ 22. & \langle & 1, & 14, & -12 & \rangle & & \end{array} $$

Automorph, written on right of Gram matrix: $$ \left( \begin{array}{rr} 183241189 & 2713847760 \\ 226153980 & 3349396909 \end{array} \right). $$

Pell automorph $$ \left( \begin{array}{rr} 1766319049 & 13795392780 \\ 226153980 & 1766319049 \end{array} \right). $$

Pell unit $$1766319049^2 – 61 \cdot 226153980^2 = 1$$

Pell NEGATIVE $$29718^2 – 61 \cdot 3805^2 = -1$$

4 PRIMITIVE $$1523^2 – 61 \cdot 195^2 = 4$$

-4 PRIMITIVE $$39^2 – 61 \cdot 5^2 = -4 $$

Here we are using $$ \langle 1,0,-61 \rangle \cong \langle 1,14,-12 \rangle. $$ It begins with \({\langle 1,14,-12 \rangle}\) and does not end until it gets \({\langle 1,14,-12 \rangle}\) again, this being guaranteed. No floating point reals anywhere, no decimal accuracy, no doubts. Think Gwen Stefani. Note that “Automorph” is a traditional word for the matrix \({P}\) when \({P^T G P = G.}\)

Ideals; with positive prime discriminant \({p \equiv 1 \pmod 4,}\) the principal form always represents both \({1}\) and \({-1,}\) and, just as with positive forms, the mapping to ideals is a bijection a group isomorphism. I did some examples with class number above one; note that the first computer they gave me was called phoebus after Phoebus Apollo, this was intended to be phoebusjunior but the spelling went sideways. Also, I ran out of steam putting angle brackets \({\langle \rangle}\) around the coefficient triples for binary forms, I hope you can adjust:

jagy@phobeusjunior: ./indefCycle_All_Reduced 229
1. 1 15 -1 cycle length 2
2. 3 13 -5 cycle length 6
3. 5 13 -3 cycle length 6
form class number is 3
jagy@phobeusjunior: ./indefCycle_All_Reduced 257
1. 1 15 -8 cycle length 6
2. 2 15 -4 cycle length 6
3. 4 15 -2 cycle length 6
form class number is 3
jagy@phobeusjunior: ./indefCycle_All_Reduced 401
1. 1 19 -10 cycle length 6
2. 2 19 -5 cycle length 6
3. 5 19 -2 cycle length 6
4. 4 17 -7 cycle length 10
5. 7 17 -4 cycle length 10
form class number is 5
Now, if the discriminant is divisible by any prime \({q \equiv 3 \pmod 4,}\) or we just have bad luck with the prime factors \({p \equiv 1 \pmod 4,}\) then the principal form does not represent \({-1}\) and the mapping from forms to ideals becomes two to one. I really thought there was a big mystery, but no. Here is an example \({\Delta = 5 \cdot 13 \cdot 17 \cdot 41 = 45305.}\) There are 316 reduced forms collected into 16 cycles:
jagy@phobeusjunior: ./indefCycle_All_Reduced 45305
45305 factored 5 * 13 * 17 * 41
1. 1 211 -196 cycle length 12
2. -1 211 196 cycle length 12
3. 2 211 -98 cycle length 16
4. -2 211 98 cycle length 16
5. 7 211 -28 cycle length 18
6. -7 211 28 cycle length 18
7. 14 211 -14 cycle length 20
8. -14 211 14 cycle length 20
9. 5 205 -164 cycle length 20
10. -5 205 164 cycle length 20
11. 10 205 -82 cycle length 24
12. -10 205 82 cycle length 24
13. 13 195 -140 cycle length 22
14. -13 195 140 cycle length 22
15. 26 195 -70 cycle length 26
16. -26 195 70 cycle length 26
form class number is 16
My software tends to report, for each cycle, a form with particularly large \({B.}\) Now, as you can see, \({\langle 1,211,-196 \rangle}\) and \({\langle -1,211,196 \rangle}\) are distinct \({SL_2 \mathbb Z}\) classes, distinct cycles.

When this happens, in slang I made up \({1 \neq -1,}\) we can prove that $$ \langle A,B,-C \rangle $$ and $$ \langle -A,B,C \rangle $$ are ALWAYS DISTINCT \({SL_2 \mathbb Z}\) classes. Because, you see,$$ \langle -1,B,AC \rangle \circ \langle A,B,-C \rangle = \langle -A,B,C \rangle. $$ So, the mapping to ideals says IDENTIFY \({ \langle A,B,-C \rangle }\) and \({ \langle -A,B,C \rangle ,}\) send them to the same ideal. Why not? We already have an equality of ideals in $$ \left[ A, \frac{B + \sqrt \Delta}{2} \right] = \left[ -A, \frac{B + \sqrt \Delta}{2} \right]. $$ So, we just take the form from each pair with positive \({A,}\) which cuts the form class number by half, down from 16 to 8, and the ideals are $$ \left[1 , \frac{211 + \sqrt{45305}}{2} \right], $$ $$ \left[ 2, \frac{211 + \sqrt{45305}}{2} \right], $$ $$ \left[ 7, \frac{211 + \sqrt{45305}}{2} \right], $$ $$ \left[ 14, \frac{211 + \sqrt{45305}}{2} \right], $$ $$ \left[5 , \frac{205 + \sqrt{45305}}{2} \right], $$ $$ \left[10 , \frac{205 + \sqrt{45305}}{2} \right], $$ $$ \left[13 , \frac{195 + \sqrt{45305}}{2} \right], $$ $$ \left[26 , \frac{195 + \sqrt{45305}}{2} \right]. $$

One other valuable idea that I can impart here is that of genus. If you check, with either the forms or the ideals, you will find that each one squares to the identity. We say that each genus has only one class. It follows automatically the=at the ideal class group is not cyclic, it is \({(\mathbb Z/ 2 \mathbb Z)^3. }\) I worked a fair amount to get the group table, then I thought, why not just use the Legendre symbols? And that works perfectly.

$$ \begin{array}{rrrrr} & 5 & 13 & 17 & 41 \\ 1 & + & + & + & + \\ 2 & – & – & + & + \\ 7 & – & – & – & – \\ 14 & + & + & – & – \\ 5 \rightarrow 46 & + & – & – & + \\ 10 \rightarrow 23 & – & + & – & + \\ 13 \rightarrow 38 & – & + & + & – \\ 26 \rightarrow 19 & + & – & + & – \end{array} $$ As you can see, if the first coefficient has a common divisor with the discriminant, we switch to some other number represented by the quadratic form. This table, because there is just one class per genus, immediately tells the multiplication. Just multiply the Legendre symbols for each factor \({5,13,17,41.}\)

The Cohen-Lenstra heuristics say that the vast majority of class groups are cyclic. I did an example where that works, despite having more than one genus. Here, most forms do not square to the identity.

jagy@phobeusjunior: ./indefCycle_All_Reduced 1345
1345 factored 5 * 269
1. 1 35 -30 cycle length 6
2. -1 35 30 cycle length 6
3. 2 35 -15 cycle length 8
4. -2 35 15 cycle length 8
5. 3 35 -10 cycle length 8
6. -3 35 10 cycle length 8
7. 4 33 -16 cycle length 10
8. -4 33 16 cycle length 10
9. 8 33 -8 cycle length 10
10. -8 33 8 cycle length 10
11. 16 33 -4 cycle length 10
12. -16 33 4 cycle length 10
form class number is 12
jagy@phobeusjunior:~/old drive/home/jagy/Cplusplus$
Now, as forms, $$ \langle 8,33,-8 \rangle^2 = \langle 64, 33,-1 \rangle $$ is minus the identity. However, as ideals this is identified with the principal class. I worked it all out, the multiplication table is

$$\begin{array}{rrrrrrr} \bullet & 1 & 4 & 16 & 8 & 2 & 3 \\ 1 & 1 & 4 & 16 & 8 & 2 & 3 \\ 4 & 4 & 16 & 1 & 2 & 3 & 8 \\ 16 & 16 & 1 & 4 & 3 & 8 & 2 \\ 8 & 8 & 2 & 3 & 1 & 4 & 16 \\ 2 & 2 & 3 & 8 & 4 & 16 & 1 \\ 3 & 3 & 8 & 2 & 16 & 1 & 4 \end{array}$$ The two genera should be evident; \({1,4,16}\) are quadratic residues \({\pmod 5}\) and \({\pmod {269}.}\) Then \({8,2,3}\) are nonresidues. The group is cyclic, either \({2}\) or \({3}\) is a generator.

To give one straightforward example,(non-reduced) \({ \langle 1, 0,-3 \rangle }\) and \({ \langle -1, 0,3 \rangle }\) are sent to the same ideal. In this case, because the discriminant is divisible by at least one prime \({q \equiv 3 \pmod 4,}\) the two forms are in different genera and it is easy to say which (positive) primes are represented. \({x^2 – 3 y^2}\) represents all primes \({p \equiv 1 \pmod {12},}\) while \({3 x^2 – y^2}\) represents \({2,3}\) and all primes \({p \equiv 11 \pmod {12}.}\)

Finally, something quite tricky. The ideal viewpoint identifies the quadratic forms \({ \langle 1, 13,-13 \rangle }\) and \({ \langle -1, 13,13 \rangle }\) of discriminant \({221 = 13 \cdot 17.}\) However, the traditional concern is to find out the (positive) primes integrally represented by a form. This then moves to what is called class field theory; the primes represented by \({ \langle 1, 13,-13 \rangle }\) are those (answer by Noam Elkies) for which $$ x^8 + 34 x^6 + 83 x^4 + 34 x^2 + 1 \pmod p $$ factors into all linear factors. Put another way, \({ \langle 1, 13,-13 \rangle }\) represents \({17}\) and all primes \({(p|13) = (p|17) = 1}\) such that $$ x^4 + x^3 + x^2 + 2 x + 4 \pmod p $$ factors into all linear factors. Go Figure. In comparison, the primes represented by \({ \langle 5, 11,-5 \rangle }\) are simply all those that are quadratic nonresidues \({\pmod {13}}\) and \({\pmod {17},}\) becuase the other class in its (form) genus is \({ \langle -5, 11,5 \rangle }\) which represents exactly the same numbers. Oh, and this is a Markov form: the nonzero number represented of smallest absolute value is \({5,}\) which is called the “minimum” and might be written \({m.}\) Then, \({\Delta / m^2 = 221 / 25 =8.84 < 9,}\) which means this must be a Markov form, parametrised by the Markov Numbers, which include \({5.}\) There is considerable rigidity when \({9m^2 > \Delta,}\) i.e. extremely large minimum.

Two points determine a line, three a quadratic — what has that got to do with CDs?

July 24, 2014 by . 5 comments

In this post I describe how simple facts about polynomials are applied in correcting errors, for example scratches on compact disks.  The same technique is used in many other places, e.g. in the 2-dimensional QuickResponse  bar codes.


The two facts from algebra that we need are:

Theorem 1. A polynomial of degree \(n\) has at most \(n\) zeros.

Theorem 2. If \((x_1,y_1),(x_2,y_2),\ldots,(x_n,y_n)\) are \(n\) points on the \(xy\)-plane such that \(x_i\neq x_j\) whenever \(i\neq j\), then there is a unique polynomial \(f(x)\) of degree \(<n\) such that \(f(x_i)=y_i\) for all \(i\).

So, if \(n=2\), we want the polynomial \(f(x)\) to be linear. In that case, the graph \(y=f(x)\) will be the line passing through the points \((x_1,y_1)\) and \((x_2,y_2)\). Similarly, when \(n=3\) we want the polynomial to be (at most) quadratic, and we want its graph to pass through the given three points. Finding the coefficients of such a quadratic is not too arduous an exercise in linear systems of equations. For general \(n\) there is a known formula for the polynomial \(f(x)\) called Lagrange’s interpolation polynomial . The uniqueness of such a polynomial follows from Theorem 1. If \(f_1(x)\) and \(f_2(x)\) were two different polynomials of degree \(<n\) passing through all these \(n\) points, then their difference \(f_1(x)-f_2(x)\) is also of degree \(<n\) and vanishes at all the points \(x_i, i=1,2,\ldots,n\), which is impossible by Theorem 1.

Extending a message using a polynomial

The applications I discuss are about communication. We have two parties, a transmitter and a receiver. The transmitter wants to send a message to the receiver. We assume that they have in advance agreed upon a method of coding the messages to sequences of numbers \(y_1,y_2,\ldots,y_k\) for some natural number \(k\). The simplest way of communicating would be for the transmitter to simply write this list of numbers to a channel that the receiver can later read. The channel could be something like a note that you pass to a classmate or it could be compact disk, where the transmitter just writes the numbers. It could be something fancier like a radio frequency band, or an optical fiber, but we ignore the physical nature of the channel here.

If everything goes well, the receiver can read the message correctly. But, they may misread one or more of the numbers. Or, one of the numbers may get smeared and become illegible while the note is exchanging hands. What to do? The basic idea is to add some redundancy to the message. We seek to extend the message from \(k\) numbers to \(n\) numbers, \(n>k\), in such a way that the receiver can recover from a few such mishaps. A useful way of doing this is to:

  • Agree in advance on a sequence of \(n\) values for \(x\)-coordinates \(x_1,\ldots,x_n\).
  • View the message \(y_1,y_2,\ldots,y_k\) as a sequence of \(k\) points $$(x_1,y_1), (x_2,y_2),\ldots, (x_k,y_k).$$
  • Find the unique polynomial \(f(x)\) of degree \(<k\) passing through all these \(k\) points.
  • Extend the message to \(y_1,y_2,\ldots,y_n\) in such a way that \(y_i=f(x_i)\) for all \(i=1,2,\ldots,n\). Note that the actual ‘payload’ message consists of the \(k\) first numbers, the remaining \(r=n-k\) are the redundant numbers.

Toy Example #1 – recovery from an illegible number

In the Figures below I always use \(x_1=1,x_2=2,\ldots,x_n=n\) for simplicity. Let us consider as an example the case \(k=2, n=3\), so our polynomials will always have degree \(\le1\), and their graphs are thus lines. Let us further select the message to be \((y_1,y_2)=(3,4)\). In this case the polynomial \(f(x)=x+2\) as \(f(1)=y_1\) and \(f(2)=y_2\). As \(f(3)=5\) the extended message is then \((y_1,y_2,y_3)=(3,4,5)\). In the figure below the payload part of the message is formed by the \(y\)-coordinates of the black dots and the \(y\)-coordinates of the red dot(s) are there for redundancy. I also include the entire graph of \(f(x)\) to make what follows clearer.


How does this minimum amount of redundancy help? Consider the scenario when the second numbers was, indeed, smeared and the receiver could only read the sequence as \((3,?,5)\). Now, our scheme allows the receiver to deduce that the missing number must be equal to \(4\). This is because they know that the points \((1,3), (2,?)\) and \((3,5)\) must all be on the same line.  With the knowledge that two points determine a line, the value of \(y_2=4\). What the receiver sees is shown in the Figure below.


Observe that it is immaterial here which of the three numbers was garbled. The receiver knows two points on the line, and can thus recover the polynomial \(f(x)\) and hence also the message.

Toy example #2 – spotting an error, and correcting it

However, adding a single redundant number does not protect us from other kinds of errors. Suppose that one of the numbers is misread, and the receiver reads \((y_1,y_2,y_3)=(3,2,5)\) instead of the correct sequence above. Now they cannot cope. The receiver can notice that the three points are not collinear, so they know that something went wrong. But, they have no way of knowing which of the three numbers is incorrect. Any pair of points gives a line all right, but now the received version of the message matches equally well with three distinct lines — all passing through two out of three points. The picture is as follows.


How can we cope with such errors? The solution is to add more redundancy. Let’s try \(k=2, n=4, r=4-2=2\) with two redundant numbers. Let’s again use the message \((y_1,y_2)=(3,4)\). This time we add a fourth point \((x_4,y_4)=(4,f(4))=(4,6)\), so the extended message (payload + redundancy) is now \((3,4,5,6)\). The picture is as follows.


If, again, one of the numbers is read incorrectly, then this time we can tell which one is incorrect, because the other three points are collinear.

Theorem 3. Given four points \(P_1,P_2,P_3,P_4\) on the plane there can be at most a single line containing at least three of those points.

Proof. Assume contrariwise that there are two lines, \(L_1\) and \(L_2\), both passing through at least three of the points. Because at most one of the four points is not on \(L_1\), and similarly at most one of the points is not on \(L_2\), we see that the lines \(L_1\) and \(L_2\) share at least \(4-1-1=2\) of the points. But there is a unique line through those two points, so \(L_1=L_2\). Q.E.D.

If in our toy example the receiver again makes a mistake with the second number and reads \((y_1,y_2,y_3,y_4)=(3,2,5,6)\), it will be easy for them to tell from the following picture, which point is the odd one out. Even without the plot of \(y=f(x)\) that I added to make things easier for you.


Coping with many problems in a single message

Ok, let us generalize. Assume that the payload consists of \(k\) numbers, and that we tag \(r=n-k\) redundant numbers to it so the entire transmitted list has \(n\) numbers. What kind of problems does this scheme protect us against? Above we encountered two kinds of problems. An erasure happens when the receiver cannot make out the written number. Observe that the receiver still knows the \(x\)-coordinate of that point (those were agreed in advance) — only the \(y\)-coordinate is missing. Another type of problem was an error, an incorrectly read number. From the toy examples we already got the feeling that, of these two, an error is more difficult to cope with. We needed two redundant numbers to correct a single error, but a single redundant number was enough to cope with an erasure.

Assume that we want to cater for the possibility of \(t\) errors and \(e\) erasures in a single transmission. How many redundant numbers do we need to cope with that? Altogether, the message gives us \(n=k+r\) points. The erasures mean that we only have \(n-e\) points remaining, and must manage with them. The goal is that these should somehow determine a unique polynomial of degree \(<k\). The solution is provided by the following generalization of Theorem 3.

Theorem 4. Given \(k+2t\) points with distinct \(x\)-coordinates on the plane there can be at most a single polynomial of degree \(<k\) whose graph passes through all but at most \(t\) of the points.

Proof. I am feeling a bit naughty and leaving this as an exercise. Use Theorem 2 in the same way that I used the fact “two points determine a unique line” in the proof of Theorem 3.

Corollary. When we append \(r\) redundant numbers to a payload of \(k\) according to this scheme we can cope with a combination of \(e\) erasures and \(t\) errors, if $$r\ge 2t+e.$$

So we see that adding redundancy makes the scheme more robust against the prescribed types of problems created by the channel. Of course, the added redundancy comes with a cost. A redundant number consumes the same amount of channel resources as a payload number (space on a CD or that piece of paper). There the user of this kind of a system needs to consider the probabilities of both errors and erasures, and then settle on a certain quality of service such that the inequality of Corollary is satisfied with a high enough probability.

Snakes in the paradise – need more algebra

There is one serious concern that I need to mention and address. Above I have loosely used the concept of a number without specifying what I meant by that. The figures undoubtedly left you with the impression that the numbers are usual real numbers. Unfortunately that won’t do. The basic reason is that to write down a real number with full precision takes infinitely many bits. But the messages on e.g. CDs consist of bytes. Programmers are used to viewing the value of a byte as an integer \(b\) in the range \(0\le b\le 255\). Even if we restrict the payload numbers to this range we cannot guarantee that the redundant numbers to be tagged would also oblige. Consider the following example of \(k=7, r=4\) and the payload \((y_1,\ldots,y_7)=(3, 7, 0, 1, 9, 6, 8)\). Here’s what the Lagrange interpolation polynomial looks like with these inputs.


Where are the red dots indicating the redundant numbers? Well, you can guess from the plot that the polynomial \(f(x)\) (it is of degree six) will increase very rapidly after the last payload. Indeed, this sextic polynomial gives the values \(f(8)=164, f(9)=903, f(10)=3129\) and \(f(11)=8464\) as the redundant numbers. So we face a serious overflow problem. Here’s a rescaled version of the plot.


Underflow is also possible as the redundant numbers may very well be negative. One might also worry that even though the payload consists of integers the redundant numbers might not. After all, Lagrange’s interpolation formula introduces divisions. Actually, as in the above overflow example, this never happens. But this is an artefact of the way we set up things. I leave this as an exercise for the interested reader.

Exercise. Let \(f(x)\in\mathbb{Q}[x]\) be the Lagrange interpolation polynomial (of degree \(<k\)) passing through the points \((1,y_1), (2,y_2), \ldots, (k,y_k)\), where all the numbers \(y_i, i=1,2,\ldots,k\) are integers. Show that \(f(n)\) is then an integer for all \(n\in\mathbb{Z}\).

But, we absolutely don’t want to use more than a single byte to present the redundant numbers. Or, more precisely, we insist that a redundant number should use the same amount of memory as a number in the payload part (a byte, a word,…). People familiar with elementary number theory may already feel the urge to reinterpret byte as a residue class modulo \(256\), i.e. as an element of the residue class ring \(R=\mathbb{Z}_{256}\). Unfortunately the algebra breaks down if we do that. To wit, already Theorem 1 fails dramatically. As an example, consider the linear polynomial \(f(x)=16x\). Even though it is linear, it has no less than sixteen distinct zeros in the ring \(R\), namely the residue classes of the form \(\overline{16m}\) for \(m=0,1,\ldots,15\). When you learn a bit more about the structures in abstract algebra you realize that the problem is that the ring \(R\) is not a field. But there is a field with 256 elements that fits the bill exactly! Its construction requires familiarity with polynomial rings and their quotient rings, and I don’t want to reproduce scores of pages from a first textbook on abstract algebra. To a reader who knows what is going on I will tell that

$$\mathbb{F}_{256}\cong\mathbb{F}_2[x]/\langle x^8+x^4+x^3+x^2+1\rangle$$

is a popular way of describing this field. A consequence of this is that the total length (payload + redundancy) of the messages in bytes cannot exceed \(256\). This is because at that point we run out of \(x\)-coordinates!

Links to more information

Coding theory is a topic straddling mathematics and communication engineering. It deals (among other things) in the design of schemes, like the one outlined here, aiming at coping with errors produced into messages by the channel. Two American coding theorists, Irving S. Reed and Gustave Solomon were the first to figure out this way of using the algebra of finite fields. One of their inventions was to choose the sequence of \(x\)-coordinates in a clever way that makes the calculation of the redundant numbers very easy (they can skip the time consuming use of Lagrange’s interpolation formula entirely). The resulting codes are called Reed–Solomon codes (or just RS-codes) in their honor. The Wikipedia-article on RS-codes  describes the algebra very differently, but in the end it is equivalent. That description reflects the more efficient calculation of the redundant part. I chose to present the codes in this way because it makes it easier to understand why these codes can cope with the prescribed types of errors.

Practical issues

I will close my contribution with a few remarks about certain practical aspects. The receiver faces the task of figuring out the correct polynomial. In the toy example of four points we could just take any pair of points (there are only six pairs), and check whether the line determined by them passes through a third point. But if we use a RS-code capable of correcting five errors with parameters \(k=240, r=10\), then the receiver following an analogous approach would need to generate \(\binom{250}{10}=219005316087032475\) Lagrange interpolation polynomials of degree \(239\), and check if one gives a good enough match. Not efficient at all! Thankfully, much more efficient algorithms have been developed by several other coding theorists.

The other remark I want to make has to do with compact disks. There, Reed-Solomon codes are used in the following way that produces good protection against typical scratching of the disk surface. The data on a CD is split into blocks like the one shown below.


Here each dot represents a single byte (actually, there is another error correcting code involved in reading a single byte from the disk that one is working at the level of individual bits and does not concern us here). The rows and columns in such a matrix are all encoded using an appropriate RS-code. The black dots are payload bytes. The encoding is done as follows. First, we add to each column four redundant bytes (green). Then we also add four redundant bytes to all the rows — in the payload part (red) as well as in the redundant part (blue). The bytes of a single row are close to each other on the CD, so the CD-player can read them quickly. It attempts to correct eventual errors. Usually it succeeds, but if that row is in a scratchy area, it fails, and then marks the entire row as unreliable data (i.e. an erasure). But different rows are more spaced out on the CD surface, so there is a high hope that only few rows from a given a matrix are lost in this fashion. If at most four of the rows are marked as erasures the column by column RS-code can recover the lost data. For the CD-player to be able to retrieve such matrices from the disk fast the entire matrix is on the same track (at the same distance from the disk center). Thus, a circular scratch may wipe out enough of the matrix, and the scheme will fail.

A happy thought: algebraic tools triumph in solving problems arising in communication.

Zabreiko’s lemma and four fundamental theorems of functional analysis

June 25, 2014 by . 0 comments

There are no doubts that open mapping theorem, closed graph theorem, bounded inverse theorem, uniform boundedness principle are the fundamental theorems of functional analysis. All of them are similar in the sense that they explicitly or implicitly use the Baire category theorem, but in the details they are different. However, all these theorems can be formulated as continuity of certain seminorms defined on a normed spaces.

The proof of the lemma is very similar to the usual proof by Banach-Schauder of the open mapping theorem and is actually stronger than the usual consequences of the Baire category theorem in basic functional analysis. The lemma can be used to easily prove the aforementioned theorems.

I would like to thank the user t.b. for drawing our attention to this lemma and adaptation of the original proof from the general case of linear metric spaces, which he gave as an answer to the question Direct Approach to the Closed Graph Theorem on the main site.

Lemma. (Zabreiko, [1], [2]) Let \(X\) be a Banach space and let \(p: X \to [0,\infty)\) be a seminorm. If for all absolutely convergent series \(\sum_{n=1}^\infty x_n\) in \(X\) we have \[ p\left(\sum_{n=1}^\infty x_n\right) \leq \sum_{n=1}^\infty p(x_n) \in [0,\infty] \] then \(p\) is continuous. That is to say, there exists a constant \(C \geq 0\) such that \(p(x) \leq C\lvert x\rvert _{X}\) for all \(x \in X\).

Proof. Let \(A_n = p^{-1}([0,n])\) and \(F_n = \overline{A_n}\). Note that \(A_n\) and \(F_n\) are symmetric and convex because \(p\) is a seminorm. We have \(X = \bigcup_{n=1}^\infty F_n\) and Baire’s theorem implies that there is \(N\) such that the interior of \(F_N\) is nonempty.

Therefore, there are \(x_0 \in X\) and \(R \gt 0\) such that \(B_R(x_0) \subset F_N\). By symmetry of \(F_N\) we have \(B_{R}(-x_0) = -B_{R}(x_0) \subset F_n\), too. If \(\lvert x\rvert \lt R\) then \(x+x_0 \in B_{R}(x_0)\) and \(x-x_0 \in B_{R}(-x_0)\), so \(x \pm x_0 \in F_{N}\). By convexity of \(F_N\) it follows that \[ x = \frac{1}{2}(x-x_0) + \frac{1}{2}(x+x_0) \in F_N, \] so \(B_R(0) \subset F_N\). But, in fact, we can prove much more: \[ B_{R}(0) \subset A_N \] Suppose \(\lvert x\rvert \lt R\) and choose \(r\) such that \(\lvert x\rvert \lt r \lt R\). Fix \(0 \lt q \lt 1-\frac{r}{R}\), so \(\frac{1}{1-q} \frac{r}{R} \lt 1\). Then \(y = \frac{R}{r}x \in B_{R}(0) \subset F_N = \overline{A_N}\), so there is \(y_{0} \in A_N\) such that \(\lvert y-y_0\rvert \lt qR\), so \(q^{-1}(y-y_0) \in B_R\). Now choose \(y_1 \in A_N\) with \(\lvert q^{-1}(y-y_0) – y_1\rvert \lt q R\), so \(\lvert (y-y_0 – qy_1)\rvert \lt q^2 R\). By induction we obtain a sequence \( (y_k)\subset A_N\) such that \[ \left\lvert y - \sum_{k=0}^n q^k y_k\right\rvert \lt q^n R \quad \text{for all }n \geq 0, \] hence \(y = \sum_{k=0}^\infty q^k y_k\). Observe that by construction \(|y_k| \leq R + qR\) for all \(k\), so the series \(\sum_{k=0}^\infty q^k y_k\) is absolutely convergent. But then the countable subadditivity hypothesis in \(p\) implies that \[ p(y) = p\left(\sum_{k=0}^\infty q^k y_k\right) \leq \sum_{k=0}^\infty q^k p(y_k) \leq \frac{1}{1-q} N \] and thus \(p(x) \leq \frac{r}{R} \frac{1}{1-q} N \lt N\) which means \(x \in A_N\), as we wanted.

Finally, for any \(x \neq 0\), we have with \(\lambda = \frac{R}{\lvert x\rvert (1+\varepsilon)}\) that \(\lambda x \in B_{R}(0) \subset A_N\), so \(p(\lambda x) \leq N\) and thus \(p(x) \leq \frac{N(1+\varepsilon)}{R} \lvert x\rvert \), as desired. \(\blacksquare\)

Now the proof of the fundamental theorems becomes an easy exercise.

  1. The open mapping theorem. Hint: set \(p(y) = \inf_{x\in T^{-1}(y)}\lvert x\rvert \).
  2. The bounded inverse theorem. Hint: set \(p(x) = \lvert T^{-1}(x)\rvert \).
  3. The uniform boundedness principle. Hint: set \(p(x) = \sup_{i\in I}\lvert T_i (x)\rvert \).
  4. The closed graph theorem. Hint: set \(p(x) = \lvert T(x)\rvert \).

Detailed solutions can be found in theorems \(1.6.5\), \(1.6.6\), \(1.6.9\) and \(1.6.11\) in An Introduction to Banach Space Theory Graduate Texts in Mathematics. Robert E. Megginson.

[1] П. П. Забрейко, Об одной теореме для полуаддитивных функционалов, Функциональный анализ и его приложения, 3:1 (1969), 86–88

[2] P. P. Zabreiko, A theorem for semiadditive functionals, Functional analysis and its applications 3 (1), 1969, 70-72)

When do n and 2n have the same decimal digits?

June 11, 2014 by . 3 comments

A recent question on math.stackexchange asks for the smallest positive number \( A \) for which the number \( 2A \) has the same decimal digits in some other order.

Math geeks may immediately realize that \( 142857 \) has this property, because it is the first 6 digits of the decimal expansion of \( \frac 17 \), and the cyclic behavior of the decimal expansion of \( \frac n7 \) is well-known. But is this the minimal solution? It is not. Brute-force enumeration of the solutions quickly reveals that there are 12 solutions of 6 digits each, all permutations of \( 142857 \), and that larger solutions, such as \( 1025874 \) and \( 1257489 \), seem to follow a similar pattern. What is happening here?

Stuck in Dallas-Fort Worth airport last month, I did some work on the problem, and although I wasn’t able to solve it completely, I made significant progress. I found a method that allows one to hand-calculate that there is no solution with fewer than six digits, and to enumerate all the solutions with 6 digits, including the minimal one. I found an explanation for the surprising behavior that solutions tend to be permutations of one another. The short form of the explanation is that there are fairly strict conditions on which sets of digits can appear in a solution of the problem. But once the set of digits is chosen, the conditions on that order of the digits in the solution are fairly lax.

So one typically sees, not only in base 10 but in other bases, that the solutions to this problem fall into a few classes that are all permutations of one another; this is exactly what happens in base 10 where all the 6-digit solutions are permutations of \( 124578 \). As the number of digits is allowed to increase, the strict first set of conditions relaxes a little, and other digit groups can (and do) appear as solutions.


The property of interest, \( P_R(A) \), is that the numbers \( A \) and \( B=2A \) have exactly the same base-\( R \) digits. We would like to find numbers \( A \) having property \( P_R \) for various \( R \), and we are most interested in \( R=10 \). Suppose \( A \) is an \( n \)-digit numeral having property \( P_R \); let the (base-\( R \)) digits of \( A \) be \( a_{n-1}\ldots a_1a_0 \) and similarly the digits of \( B = 2A \) are \( b_{n-1}\ldots b_1b_0 \). The reader is encouraged to keep in mind the simple example of \( R=8, n=4, A=\mathtt{1042}, B=\mathtt{2104} \) which we will bring up from time to time.

Since the digits of \( B \) and \( A \) are the same, in a different order, we may say that \( b_i = a_{P(i)} \) for some permutation \( P \). In general \( P \) might have more than one cycle, but we will suppose that \( P \) is a single cycle. All the following discussion of \( P \) will apply to the individual cycles of \( P \) in the case that \( P \) is a product of two or more cycles. For our example of \( A=\mathtt{1042}, B=\mathtt{2104} \), we have \( P = (0\,1\,2\,3) \) in cycle notation. We won’t need to worry about the details of \( P \), except to note that \( i, P(i), P(P(i)), \ldots, P^{n-1}(i) \) completely exhaust the indices \( 0. \ldots n-1 \), and that \( P^n(i) = i \) because \( P \) is an \( n \)-cycle.

Conditions on the set of digits in a solution

For each \( i \) we have $$ a_{P(i)} = b_{i} \equiv 2a_{i} + c_i\pmod R $$ where the ‘carry bit’ \( c_i \) is either 0 or 1 and depends on whether there was a carry when doubling \( a_{i-1} \). (When \( i=0 \) we are in the rightmost position and there is never a carry, so \( c_0= 0 \).) We can then write:

$$ \begin{align} a_{P(P(i))} &= 2a_{P(i)} + c_{P(i)} \\ &= 2(2a_{i} + c_i) + c_{P(i)} &&= 4a_i + 2c_i + c_{P(i)}\\ a_{P(P(P(i)))} &= 2(4a_i + 2c_i + c_{P(P(i)})) + c_{P(i)} &&= 8a_i + 4c_i + 2c_{P(i)} + c_{P(P(i))}\\ &&&\vdots\\ a_{P^n(i)} &&&= 2^na_i + v \end{align} $$

all equations taken \( \bmod R \). But since \( P \) is an \( n \)-cycle, \( P^n(i) = i \), so we have $$ a_i \equiv 2^na_i + v\pmod R $$ or equivalently $$ \big(2^n-1\big)a_i + v \equiv 0\pmod R\tag{\( \star \)} $$ where \( v\in\{0,\ldots 2^n-1\} \) depends only on the values of the carry bits \( c_i \)—the \( c_i \) are precisely the binary digits of \( v \).

Specifying a particular value of \( a_0 \) and \( v \) that satisfy this equation completely determines all the \( a_i \). For example, \( a_0 = 2, v = \color{darkblue}{0010}_2 = 2 \) is a solution when \( R=8, n=4 \) because \( \bigl(2^4-1\bigr)\cdot2 + 2\equiv 0\pmod 8 \), and this solution allows us to compute

$$ \def\db#1{\color{darkblue}{#1}}\begin{align} a_0&&&=2\\ a_{P(0)} &= 2a_0 &+ \db0 &= 4\\ a_{P^2(0)} &= 2a_{P(0)} &+ \db0 &= 0 \\ a_{P^3(0)} &= 2a_{P^2(0)} &+ \db1 &= 1\\ \hline a_{P^4(0)} &= 2a_{P^3(0)} &+ \db0 &= 2\\ \end{align} $$

where the carry bits \( c_i = \langle 0,0,1,0\rangle \) are visible in the third column, and all the sums are taken \( \pmod 8 \). Note that \( a_{P^n(0)} = a_0 \) as promised. This derivation of the entire set of \( a_i \) from a single one plus a choice of \( v \) is crucial, so let’s see one more example. Let’s consider \( R=10, n=3 \). Then we want to choose \( a_0 \) and \( v \) so that \( \left(2^3-1\right)a_0 + v \equiv 0\pmod{10} \) where \( v\in\{0\ldots 7\} \). One possible solution is \( a_0 = 5, v=\color{darklue}{101}_2 = 5 \). Then we can derive the other \( a_i \) as follows:

$$ \begin{align} a_0&&&=5\\ a_{P(0)} &= 2a_0 &+ \db1 &= 1\\ a_{P^2(0)} &= 2a_{P(0)} &+ \db0 &= 2 \\\hline a_{P^3(0)} &= 2a_{P^2(0)} &+ \db1 &= 5\\ \end{align} $$

And again we have \( a_{P^n(0)}= a_0 \) as required.

Since the bits of \( v \) are used cyclically, not every pair of \( \langle a_0, v\rangle \) will yield a different solution. Rotating the bits of \( v \) and pairing them with different choices of \( a_0 \) will yield the same cycle of digits starting from a different place. In the first example above, we had \( a_0 = 2, v = 0010_2 = 2 \). If we were to take \( a_0 = 4, v = 0100_2 = 4 \) (which also solves \( (\star) \)) we would get the same cycle of values of the \( a_i \) but starting from \( 4 \) instead of from \( 2 \), and similarly if we take \( a_0=0, v = 1000_2 = 8 \) or \( a_0 = 1, v = 0001_2 \). So we can narrow down the solution set of \( (\star) \) by considering only the so-called bracelets of \( v \) rather than all \( 2^n \) possible values. Two values of \( v \) are considered equivalent as bracelets if one is a rotation of the other. When a set of \( v \)-values are equivalent as bracelets, we need only consider one of them; the others will give the same cyclic sequence of digits, but starting in a different place. For \( n=4 \), for example, the bracelets are \( 0000, 0001, 0011, 0101, 0111, \) and \( 1111 \); the sequences \( 0110, 1100, \) and \( 1001 \) being equivalent to \( 0011 \), and so on.


Let us take \( R=9, n=3 \), so we want to find 3-digit numerals with property \( P_9 \). According to \( (\star) \) we need \( 7a_i + v \equiv 0\pmod{9} \) where \( v\in\{0\ldots 7\} \). There are 9 possible values for \( a_i \); for each one there is at most one possible value of \( v \) that makes the sum zero:

$$ \begin{array}{rrr} a_i & 7a_i & v \\ \hline 0 & 0 & 0 \\ 1 & 7 & 2 \\ 2 & 14 & 4 \\ 3 & 21 & 6 \\ 4 & 28 & \\ 5 & 35 & 1 \\ 6 & 42 & 3 \\ 7 & 49 & 5 \\ 8 & 56 & 7 \\ \end{array} $$

(For \( a_i=4 \) there is no solution.) We may disregard the non-bracelet values of \( v \), as these will give us solutions that are the same as those given by bracelet values of \( v \). The bracelets are:

$$ \begin{array}{rl} 000 & 0 \\ 001 & 1 \\ 011 & 3 \\ 111 & 7 \end{array} $$

so we may disregard the solutions exacpt when \( v=0,1,3,7 \). Calculating the digit sequences from these four values of \( v \) and the corresponding \( a_i \) we find:

$$ \begin{array}{ccl} a_0 & v & \text{digits} \\ \hline 0 & 0 & 000 \\ 5 & 1 & 512 \\ 6 & 3 & 637 \\ 8 & 7 & 888 \ \end{array} $$

(In the second line, for example, we have \( v=1 = 001_2 \), so \( 1 = 2\cdot 5 + 0; 2 = 1\cdot 2 + 0; \) and \( 5 = 2\cdot 2 + 1 \).)

Any number \( A \) of three digits, for which \( 2A \) contains exactly the same three digits, in base 9, must therefore consist of exactly the digits \( 000, 125, 367, \) or \( 888 \).

A warning

All the foregoing assumes that the permutation \( P \) is a single cycle. In general, it may not be. Suppose we did an analysis like that above for \( R=10, n=5 \) and found that there was no possible digit set, other than the trivial set 00000, that satisfied the governing equation \( (2^5-1)a_0 + v\equiv 0\pmod{10} \). This would not completely rule out a base-10 solution with 5 digits, because the analysis only rules out a cyclic set of digits. There could still be a solution where \( P \) was a product of a \( 2 \) and a \( 3 \)-cycle, or a product of still smaller cycles.

Something like this occurs, for example, in the \( n=4, R=8 \) case. Solving the governing equation \( (2^5-1)a_0 + v \equiv 0\pmod 8 \) yields only four possible digit cycles, namely \( \{0,1,2,4\}, \{1,3,6,4\}, \{2,5,2,5\} \), and \( \{3,7,6,5\} \). But there are several additional solutions: \( 2500_8\cdot 2 = 5200_8, 2750_8\cdot 2 = 5720_8, \) and \( 2775_8\cdot 2 = 5772_8 \). These correspond to permutations \( P \) with more than one cycle. In the case of \( 5720_8 \), for example, \( P \) exchanges the \( 5 \) and the \( 2 \), and leaves the \( 0 \) and the \( 7 \) fixed.

For this reason we cannot rule out the possibility of an \( n \)-digit solution without first considering all smaller \( n \).

The Large Equals Odd rule

When \( R \) is even there is a simple condition we can use to rule out certain sets of digits from being single-cycle solutions. Recall that \( A=a_{n-1}\ldots a_0 \) and \( B=b_{n-1}\ldots b_0 \). Let us agree that a digit \( d \) is large if \( d\ge \frac R2 \) and small otherwise. That is, \( d \) is large if, upon doubling, it causes a carry into the next column to the left.

Since \( b_i =(2a_i + c_i)\bmod R \), where the \( c_i \) are carry bits, we see that, except for \( b_0 \), the digit \( b_i \) is odd precisely when there is a carry from the next column to the right, which occurs precisely when \( a_{i-1} \) is large. Thus the number of odd digits among \( b_1,\ldots b_{n-1} \) is equal to the number of large digits among \( a_1,\ldots a_{n-2} \). This leaves the digits \( b_0 \) and \( a_{n-1} \) uncounted. But \( b_0 \) is never odd, since there is never a carry in the rightmost position, and \( a_{n-1} \) is always small (since otherwise \( B \) would have \( n+1 \) digits, which is not allowed). So the number of large digits in \( A \) is exactly equal to the number of odd digits in \( B \). And since \( A \) and \( B \) have exactly the same digits, the number of large digits in \( A \) is equal to the number of odd digits in \( A \). Observe that this is the case for our running example \( 1042_8 \): there is one odd digit and one large digit (the 4).

When \( R \) is odd the analogous condition is somewhat more complicated, but since the main case of interest is \( R=10 \), we have the useful rule that:

For \( R \) even, the number of odd digits in any solution \( A \) is equal to the number of large digits in \( A \).

Conditions on the order of digits in a solution

We have determined, using the above method, that the digits \( \{5,1,2\} \) might form a base-9 numeral with property \( P_9 \). Now we would like to arrange them into a base-9 numeral that actually does have that property. Again let us write \( A = a_2a_1a_0 \) and \( B=b_2b_1b_0 \), with \( B=2A \). Note that if \( a_i = 1 \), then \( b_i = 3 \) (if there was a carry from the next column to the right) or \( 2 \) (if there was no carry), but since \( b_i=3 \) is impossible, we must have \( a_i = 2 \) and therefore \( a_{i-1} \) must be small, since there is no carry into position \( i \). But since \( a_{i-1} \) is also one of \( \{5,1,2\} \), and it cannot also be \( 1 \), it must be \( 2 \). This shows that the \( 1 \), unless it appears in the rightmost position, must be to the left of the \( 2 \); it cannot be to the left of the \( 5 \). Similarly, if \( a_i = 2 \) then \( b_i = 5 \), because \( 4 \) is impossible, so the \( 2 \) must be to the left of a large digit, which must be the \( 5 \). Similar reasoning produces no constraint on the position of the \( 5 \); it could be to the left of a small digit (in which case it doubles to \( 1 \)) or a large digit (in which case it doubles to \( 2 \)). We can summarize these findings as follows:

$$ \begin{array}{cl} \text{digit} & \text{to the left of} \\ \hline 1 & 1, 2, \text{end} \\ 2 & 5 \\ 5 & 1,2,5,\text{end} \end{array} $$

Here “end” means that the indicated digit could be the rightmost.

Furthermore, the left digit of \( A \) must be small (or else there would be a carry in the leftmost place and \( 2A \) would have 4 digits instead of 3) so it must be either \( 1 \) or \( 2 \). It is not hard to see from this table that the digits must be in the order \( 125 \) or \( 251 \), and indeed, both of those numbers have the required property: \( 125_9\cdot 2 = 251_9 \), and \( 251_9\cdot 2 = 512_9 \).

This was a simple example, but in more complicated cases it is helpful to draw the order constraints as a graph. Suppose we draw a graph with one vertex for each digit, and one additional vertex to represent the end of the numeral. The graph has an edge from vertex \( v \) to \( v’ \) whenever \( v \) can appear to the left of \( v’ \). Then the graph drawn for the table above looks like this:

Graph for 125 base 9

A 3-digit numeral with property \( P_9 \) corresponds to a path in this graph that starts at one of the nonzero small digits (marked in blue), ends at the red node marked ‘end’, and visits each node exactly once. Such a path is called hamiltonian. Obviously, self-loops never occur in a hamiltonian path, so we will omit them from future diagrams.

Now we will consider the digit set \( 637 \), again base 9. An analysis similar to the foregoing allows us to construct the following graph:

Graph for 367 base 9

Here it is immediately clear that the only hamiltonian path is \( 3-7-6-\text{end} \), and indeed, \( 376_9\cdot 2 = 763_9 \).

In general there might be multiple instances of a digit, and so multiple nodes labeled with that digit. Analysis of the \( 0,0,0 \) case produces a graph with no legal start nodes and so no solutions, unless leading zeroes are allowed, in which case \( 000 \) is a perfectly valid solution. Analysis of the \( 8,8,8 \) case produces a graph with no path to the end node and so no solutions. These two trivial patterns appear for all \( R \) and all \( n \), and we will ignore them from now on.

Returning to our ongoing example, \( 1042 \) in base 8, we see that \( 1 \) and \( 2 \) must double to \( 2 \) and \( 4 \), so must be to the left of small digits, but \( 4 \) and \( 0 \) can double to either \( 0 \) or \( 1 \) and so could be to the left of anything. Here the constraints are so lax that the graph doesn’t help us narrow them down much:

Graph for 1024 base 8

Observing that the only arrow into the 4 is from 0, so that the 4 must follow the 0, and that the entire number must begin with 1 or 2, we can enumerate the solutions:

If leading zeroes are allowed we have also:
All of these are solutions in base 8.

The case of \( R=10 \)

Now we turn to our main problem, solutions in base 10.

To find all the solutions of length 6 requires an enumeration of smaller solutions, which, if they existed, might be concatenated into a solution of length 6. This is because our analysis of the digit sets that can appear in a solution assumes that the digits are permuted cyclically; that is, the permutations \( P \) that we considered had only one cycle each. If we perform the analy

There are no smaller solutions, but to prove that the length 6 solutions are minimal, we must analyze the cases for smaller \( n \) and rule them out. We now produce a complete analysis of the base 10 case with \( R=10 \) and \( n\le 6 \). For \( n=1 \) there is only the trivial solution of \( 0 \), which we disregard. (The question asked for a positive number anyway.)

\( n=2 \)

For \( n=2 \), we want to find solutions of \( 3a_i + v \equiv 0\pmod{10} \) where \( v \) is a two-bit bracelet number, one of \( 00_2, 01_2, \) or \( 11_2 \). Tabulating the values of \( a_i \) and \( v\in\{0,1,3\} \) that solve this equation we get:

$$ \begin{array}{ccc} v& a_i \\ \hline 0 & 0 \\ 1& 3 \\ 3& 9 \\ \end{array} $$

We can disregard the \( v=0 \) and \( v=3 \) solutions because the former yields the trivial solution \( 00 \) and the latter yields the nonsolution \( 99 \). So the only possibility we need to investigate further is \( a_i = 3, v = 1 \), which corresponds to the digit sequence \( 36 \): Doubling \( 3 \) gives us \( 6 \) and doubling \( 6 \), plus a carry, gives us \( 3 \) again.

But when we tabulate of which digits must be left of which, we see that there is no solution with just \( 3 \) and \( 6 \), because the graph we get, once self-loops are eliminated, looks like this:

graph for 36 base 10

which obviously has no hamiltonian path. Thus there is no solution for \( R=10, n=2 \).

\( n=3 \)

For \( n=3 \) we need to solve the equation \( 7a_i + v \equiv 0\pmod{10} \) where \( v \) is a bracelet number in \( \{0,\ldots 7\} \), specifically one of \( 0,1,3, \) or \( 7 \). Since \( 7 \) and \( 10 \) are relatively prime, for each \( v \) there is a single \( a_i \) that solves the equation. Tabulating the possible values of \( a_i \) as before, and this time omitting rows with no solution, we have:

$$ \begin{array}{rrl} v & a_i & \text{digits}\\ \hline 0& 0 & 000\\ 1& 7 & 748 \\ 3& 1 & 125\\ 7&9 & 999\\ \end{array} $$

The digit sequences \( 0,0,0 \) and \( 9,9,9 \) yield trivial solutions or nonsolutions as usual, and we will omit them in the future. The other two lines suggest the digit sets \( 1,2,5 \) and \( 4,7,8 \), both of which fails the “odd equals large” rule.

This analysis rules out the possibility of a digit set with \( a_0 \to a_1 \to a_2 \to a_1 \), but it does not completely rule out a 3-digit solution, since one could be obtained by concatenating a one-digit and a two-digit solution, or three one-digit solutions. However, we know by now that no one- or two-digit solutions exist. Therefore there are no 3-digit solutions in base 10.

\( n=4 \)

For \( n=4 \) the governing equation is \( 15a_i + v \equiv 0\pmod{10} \) where \( v \) is a 4-bit bracelet number, one of \( \{0,1,3,5,7,15\} \). This is a little more complicated because \( \gcd(15,10)\ne 1 \). Tabulating the possible digit sets, we get:

$$ \begin{array}{crrl} a_i & 15a_i& v & \text{digits}\\ \hline 0 & 0 & 0 & 0000\\ 1 & 5 & 5 & 1250\\ 1 & 5 & 15 & 1375\\ 2 & 0 & 0 & 2486\\ 3 & 5 & 5 & 3749\\ 3 & 5 & 15 & 3751\\ 4 & 0 & 0 & 4862\\ 5 & 5 & 5 & 5012\\ 5 & 5 & 5 & 5137\\ 6 & 0 & 0 & 6248\\ 7 & 5 & 5 & 7493\\ 7 & 5 & 5 & 7513\\ 8 & 0 & 0 & 8624 \\ 9 & 5 & 5 & 9874\\ 9 & 5 & 15 & 9999 \\ \end{array} $$

where the second column has been reduced mod \( 10 \). Note that even restricting \( v \) to bracelet numbers the table still contains duplicate digit sequences; the 15 entries on the right contain only the six basic sequences \( 0000, 0125, 1375, 2486, 3749, 4987 \), and \( 9999 \). Of these, only \( 0000, 9999, \) and \( 3749 \) obey the odd equals large criterion, and we will disregard \( 0000 \) and \( 9999 \) as usual, leaving only \( 3749 \). We construct the corresponding graph for this digit set as follows: \( 3 \) must double to \( 7 \), not \( 6 \), so must be left of a large number \( 7 \) or \( 9 \). Similarly \( 4 \) must be left of \( 7 \) or \( 9 \). \( 9 \) must also double to \( 9 \), so must be left of \( 7 \). Finally, \( 7 \) must double to \( 4 \), so must be left of \( 3,4 \) or the end of the numeral. The corresponding graph is:

graph for 3749 base 10

which evidently has no hamiltonian path: whichever of 3 or 4 we start at, we cannot visit the other without passing through 7, and then we cannot reach the end node without passing through 7 a second time. So there is no solution with \( R=10 \) and \( n=4 \).

\( n=5 \)

We leave this case as an exercise. There are 8 solutions to the governing equation, all of which are ruled out by the odd equals large rule.

\( n=6 \)

For \( n=6 \) the possible solutions are given by the governing equation \( 63a_i + v \equiv 0\pmod{10} \) where \( v \) is a 6-bit bracelet number, one of \( \{0,1,3,5,7,9,11,13,15,21,23,27,31,63\} \). Tabulating the possible digit sets, we get:

$$ \begin{array}{crrl} v & a_i & \text{digits}\\ \hline 0 & 0 & 000000\\ 1 & 3 & 362486 \\ 3 & 9 & 986249 \\ 5 & 5 & 500012 \\ 7 & 1 & 124875 \\ 9 & 7 & 748748 \\ 11 & 3 & 362501 \\ 13 & 9 & 986374 \\ 15 & 5 & 500137 \\ 21 & 3 & 363636 \\ 23 & 9 & 989899 \\ 27 & 1 & 125125 \\ 31 & 3 & 363751 \\ 63 & 9 & 999999 \\ \end{array} $$

After ignoring \( 000000 \) and \( 999999 \) as usual, the large equals odd rule allows us to ignore all the other sequences except \( 124875 \) and \( 363636 \). The latter fails for the same reason that \( 36 \) did when \( n=2 \). But \( 142857 \) , the lone survivor, gives us a complicated derived graph containing many hamiltonian paths, every one of which is a solution to the problem:

graph for 124578 base 10

It is not hard to pick out from this graph the minimal solution \( 125874 \), for which \( 125874\cdot 2 = 251748 \), and also our old friend \( 142857 \) for which \( 142857\cdot 2 = 285714 \).

We see here the reason why all the small numbers with property \( P_{10} \) contain the digits \( 124578 \). The constraints on which digits can appear in a solution are quite strict, and rule out all other sequences of six digits and all shorter sequences. But once a set of digits passes these stringent conditions, the constraints on it are much looser, because \( B \) is only required to have the digits of \( A \) in some order, and there are many possible orders, many of which will satisfy the rather loose conditions involving the distribution of the carry bits. This graph is typical: it has a set of small nodes and a set of large nodes, and each node is connected to either all the small nodes or all the large nodes, so that the graph has many edges, and, as in this case, a largish clique of small nodes and a largish clique of large nodes, and as a result many hamiltonian paths.


This analysis is tedious but is simple enough to perform by hand in under an hour. As \( n \) increases further, enumerating the solutions of the governing equation becomes very time-consuming. I wrote a simple computer program to perform the analysis for given \( R \) and \( n \), and to emit the possible digit sets that satisfied the large equals odd criterion. I had wondered if every base-10 solution contained equal numbers of the digits \( 1,2,4,8,5, \) and \( 7 \). This is the case for \( n=7 \) (where the only admissible digit set is \( \{1,2,4,5,7,8\}\cup\{9\} \)), for \( n=8 \) (where the only admissible sets are \( \{1,2,4,5,7,8\}\cup \{3,6\} \) and \( \{1,2,4,5,7,8\}\cup\{9,9\} \)), and for \( n=9 \) (where the only admissible sets are \( \{1,2,4,5,7,8\}\cup\{3,6,9\} \) and \( \{1,2,4,5,7,8\}\cup\{9,9,9\} \)). But when we reach \( n=10 \) the increasing number of bracelets has loosened up the requirements a little and there are 5 admissible digit sets. I picked two of the promising-seeming ones and quickly found by hand the solutions \( 4225561128 \) and \( 1577438874 \), both of which wreck any theory that the digits \( 1,2,4,5,8,7 \) must all appear the same number of times.


Thanks to Karl Kronenfeld for corrections to this article, any many helpful suggestions about what to omit.

Green’s Theorem and Area of Polygons

June 4, 2014 by . 10 comments

A common method used to find the area of a polygon is to break the polygon into smaller shapes of known area. For example, one can separate the polygon below into two triangles and a rectangle:

Figure 1

By breaking this composite shape into smaller ones, the area is at hand: $$\begin{align}A_1 &= bh = 5\cdot 2 = 10 \\ A_2 = A_3 &= \frac{bh}{2} = \frac{2\cdot 1}{2} = 1 \\ A_{total} &= A_1+A_2+A_3 = 12\end{align}$$

Unfortunately, this approach can be difficult for a person to use when they cannot physically (or mentally) see the polygon, such as when a polygon is given as a list of many vertices.



Happily, there is a formula for the area of any simple polygon that only requires knowledge of the coordinates of each vertex. It is as follows: $$A = \sum_{k=0}^{n} \frac{(x_{k+1} + x_k)(y_{k+1}-y_{k})}{2} \tag{1}$$ (Where \({n}\) is the number of vertices, \({(x_k, y_k)}\) is the \({k}\)-th point when labelled in a counter-clockwise manner, and \({(x_{n+1}, y_{n+1}) = (x_0, y_0)}\); that is, the starting vertex is found both at the start and end of the list of vertices.)

It should be noted that the formula is not “symmetric” with respect to the signs of the \({x}\) and \({y}\) coordinates. This can be explained by considering the “negative areas” incurred when adding the signed areas of the triangles with vertices \({(0,0)-(x_k, y_k)-(x_{k+1}, y_{k+1})}\).

In the next sections, I derive this formula using Green’s Theorem, show an example of its use, and provide some applications.



Green’s Theorem states that, for a “well-behaved” curve \({C}\) forming the boundary of a region \({D}\):

\(\displaystyle \oint_C P(x, y)\;\mathrm dx + Q(x, y)\;\mathrm dy = \iint_D \frac{\partial Q}{\partial x} – \frac{\partial P}{\partial y}\;\mathrm dA \ \ \ \ \ (2)\)

(In this context, “well behaved” means, among other things, that \({C}\) is piecewise smooth. For a full listing of the requirements for \({C}\) and \({D}\), please see here.)

Recalling that the area of \({D}\) is equal to \({\iint_D \;\mathrm dA}\), we can use Green’s Theorem to calculate area if we choose \({P}\) and \({Q}\) such that \({\frac{\partial Q}{\partial x} – \frac{\partial P}{\partial y}=1}\). Clearly, choosing \({P(x, y) = 0}\) and \({Q(x, y) = x}\) satisfies this requirement.

Thus, letting \({A}\) be the area of the region \({D}\):

\(\displaystyle A = \oint_C x\;\mathrm dy \ \ \ \ \ (3)\)

Now, consider the polygon below, bordered by the piecewise-smooth curve \({C = C_0\cup C_1\cup\cdots\cup C_{n-1}\cup C_n}\), where \({C_k}\) starts at the point \({(x_k, y_k)}\) and ends at the “next” point along the polygon’s edge when proceeding counter-clockwise. (N.B. If we proceed in a clockwise manner, we get the negative of the area of the polygon.)

Figure 2

Note: the dashed lines indicate that there could be any number of points between \({(x_2, y_2)}\) and \({(x_{n-1}, y_{n-1})}\); this is a complete polygon.

Because line integrals over piecewise-smooth curves are additive over length, we have that:

\(\displaystyle A = \oint_C x\;\mathrm dy = \int_{C_0} x\;\mathrm dy + \cdots + \int_{C_n} x\;\mathrm dy \ \ \ \ \ (4)\)

To compute the \({k}\)-th line line integral above, parametrize the segment from \({(x_k, y_k)}\) to \({(x_{k+1}, y_{k+1})}\):

\(\displaystyle C_k:\; \vec{r}=\left((x_{k+1} – x_k)t + x_k,\; (y_{k+1} – y_k)t + y_k\right),\quad 0\le t\le 1 \ \ \ \ \ (5)\)

Substituting this parametrization into the integral, we find: $$\begin{align}\int_{C_k} x\;\mathrm dy &= \int_0^1 \left((x_{k+1} – x_k)t + x_k\right)\left(y_{k+1} – y_k\right)\;\mathrm dt \\ &= \frac{(x_{k+1} + x_k)(y_{k+1}-y_{k})}{2}\end{align}$$

Summing all of the \(C_k\)’s, we then find the total area: $$A = \sum_{k=0}^{n} \frac{(x_{k+1} + x_k)(y_{k+1}-y_{k})}{2} \tag{6}$$


To demonstrate use of this formula, let us apply this to the shape at the beginning of the document. A copy of the image is found below, but this one is marked with the coordinates of the vertices.

FIgure 3

Starting with the coordinate \({(0, 0)}\) and proceeding counter-clockwise (again, if we go in the opposite direction, we get the negative of the correct area), we apply our formula:

$$\begin{align} A &= \sum_{k=0}^{n} \frac{(x_{k+1} + x_k)(y_{k+1}-y_{k})}{2} \\ &= \frac{(1.5 + 0)(0 – 0)}{2} + \frac{(2.5 + 1.5)(-1 – 0)}{2} + \frac{(3.5 + 2.5)(0 – (-1))}{2} + \frac{(5+3.5)(0-0)}{2} \\ &\quad + \frac{(5+5)(2-0)}{2} + \frac{(3.5+5)(2-2)}{2} + \frac{(2.5+3.5)(3-2)}{2}\\ &\quad +\frac{(1.5+2.5)(2-3)}{2} + \frac{(0+1.5)(2-2)}{2} + \frac{(0+0)(0-2)}{2}\\ &= 0 + (-2) + 3 + 0 + 10 + 0 + 3 + (-2) + 0 + 0\\ &= 12 \end{align}$$

We arrive at the same answer we had above, which is good. (Obviously!)

Possible Applications

Although this formula is an interesting application of Green’s Theorem in its own right, it is important to consider why it is useful. As can be seen above, this approach involves a lot of tedious arithmetic. Thus, its main benefit arises when applied in a computer program, when the tedium is no longer an issue.

From an engineering perspective, this formula could be used to determine the area (and, using similarly-derived formulas for the moments of a polygon about the \({x}\) and \({y}\) axes, the centroid) of component in a CAD program. Many computer programs estimate all shapes as polygons or connected line segments (rather than exact curves/circles), which means this formula can be directly applied to the existing representation in software.

In a contrived context, this formula can also be useful in collegiate programming competitions. (This was why I first took note of this formula.) Some problems in the “computational geometry” category may involve computing the area of specific polygons, such as the convex hull of a set of points. Knowing the derivation can make it easier to recall during a competition event.

As a final example, one can use this formula to approximate the area of a plot of land. Over a small region, surface area on a sphere can be approximated as that on a rectangle. Thus, an irregularly-shaped plot of land can be modelled as an polygon. Once coordinates are established for each vertex, the formula derived above can be applied to find its area.

Welcome to the Math.SE Blog!

June 3, 2014 by . 1 comments

Welcome to the Math StackExchange community blog!

Four years ago today, Dan Dumitru proposed the creation of Math.StackExchange. The site has evolved into a community with hundreds of thousands of questions and answers, and over 30 thousand questions and answers appearing each month.

Now, we have a blog.

This blog provides a way of going beyond the Q&A format to allow exposition and discussion. There might be posts about mathematics, or Math.SE, or a book review, or whatever seems appropriate. Anyone can contribute to the blog. This blog is written and edited by community members, and we are actively soliticing both one-time and regular contributors! So if you have an idea of something you’d like to hear about or read about, or if you would like to contribute, check out the blog chat room and the Blog FAQ thread on meta.

I’m looking forward to see what we make here.


Filed under all levels, expository