Andrew Tweddle's Blog: December 2015

Thursday, December 03, 2015

A combinatorial solution to the sum of squares formula

Overview

In my previous two blog posts, I showed that the number of squares of various sizes in an nxn grid is $\sum_{{i}={1}}^{n}i^2 = \frac{n(n + 1)(2n + 1)}{6}$. In this post I'm going to use combinatorial Mathematics to derive a different closed form solution.

All posts in this series

A quick review of combinations

In combinatorics, the combination function $\binom{n}{k}$ is used to counts the number of different ways that k items can be chosen from a set of n unique items, with the order of the k items being irrelevant. $\binom{n}{k}$ is usually read as "n combination k" or "n choose k". It is also known as the binomial coefficient. You can read more about combinations on Wikipedia.

You also need to know that: $$ \begin{align*} n! &= 1 . 2 . 3 ... n & \text{n! is the product of the first n natural numbers} \\ \binom{n}{k} &= \frac{n!}{(n-k)! k!} \\ &= \frac{n (n - 1) ... (n - k + 1)}{1 . 2 . 3 ... k} \end{align*} $$

The approach

We are going to use combinatorial Mathematics to count the number of possible squares in the grid. To do this we will need to define a coordinate system for the grid and a convenient data representation for a valid square in that grid. By a data representation, I mean a set of numeric values that can uniquely identify a particular square in the grid.

We will then use combinations to count all possible ways of generating valid data representations. But we will need to take into account that combinations generate unique numbers. That will require us to consider separate cases based on the number of identical variables in our representation.

A useful data representation

A recap of the problem

The problem is to count the total number of squares of any size in a grid. So if you think of a Chessboard, how many 1x1, 2x2, ..., 8x8 squares are there?

Let's consider whether there is a useful notation for describing any valid square in the grid.

Defining a coordinate system

First let's define a coordinate system for the grid.

We'll say that (x, y) is the coordinate of a space on the checkerboard in column x and row y. And we'll choose (1,1) to be the coordinate of the top-left corner. That way columns go from left to right and rows go from top to bottom - the same direction as reading pages in a book (for most cultures).

[This is just one possible convention... I could have started at the bottom left instead, like a chart, or made the corner square (0, 0) instead of (1,1).]

Comparing different representations for valid squares in the grid

We could use the co-ordinates $(x_1, y_1)$ and $(x_2, y_2)$ of two opposite corners of a square block. But that representation encodes any rectangle in the grid, not just the squares. We'd prefer to choose a data representation which enforces constraints naturally.

So what if we used $(x_1, y_1, s)$ as our data representation, where $(x_1, y_1)$ is the top left corner of the square and $s$ is the size of the sides of the square? This is better, because we have 3 variables representing a valid square instead of 4, and we have expressed the constraint that a square's sides have the same size.

However this representation is still a little cumbersome. To express the constraint that the square fits within the grid, we need to ensure that the bottom right corner is somewhere inside the grid. That requires calculating the x and y coordinates of the bottom-right corner. Our data representation would be more convenient if we could avoid all unnecessary calculations.

So let's encode the bottom-right corner into the data representation instead. This will give us our desired data representation...

The chosen data representation

We will represent a square in the grid by the triple $(x, y, s)$ where (x, y) is the bottom-right corner of the square and s is the size.

Our data representation (x, y, s) is summarized in the following diagram:

With this data representation, our constraints on a valid square are: $$ \begin{align*} 1 \leq x \leq n & \tag{1} \\ 1 \leq y \leq n & \tag{2} \\ 1 \leq s \leq n & \tag{3} \\ s \leq x & \tag{4} \\ s \leq y & \tag{5} \\ \end{align*} $$

There is some redundancy in constraint 3, since the other 4 constraints ensure that $s \leq n$. I've chosen to do it this way so that constraints 1, 2 and 3 all have the same form. They will be satisfied naturally when we use the combination function. Only constraints 4 and 5 will need to be considered explicitly.

Generating all possible data representations

Choosing values of x, y and s from the set {1, 2, ... n}

We can address constraints 1, 2 and 3 by choosing our variables x, y and s from the set {1, 2, 3, ..., n}. That allows us to use the combination function.

Unfortunately there's a catch! Combinations count unique numbers, and some of the variables x, y and s could be the same. So we'll need to consider separate cases depending on how many of the variables have unique values.

Let's start by defining variables for each case. Let $c_i$ be the number of ways of generating representations $(x, y, s)$ when $i$ unique numbers are taken from the set {1, 2, 3, ..., n}. The total number of squares is then $c_1 + c_2 + c_3$.

Counting $c_3$: the number of squares when x, y and s are all different

There are $\binom{n}{3}$ ways of choosing three unique numbers from the set {1, 2, ..., n}. Of these, the smallest must always be assigned to $s$ so that constraints 4 and 5 are addressed. However the largest could be assigned to x or y. This leads to two sub-cases: $$ \begin{align*} \text{ i) } & s < x < y \\ \text{ii) } & s < y < x \end{align*} $$

So we have two different representations generated from each set of three distinct numbers. Thus: $$ c_3 = 2 \binom{n}{3} $$

Counting $c_2$: the number of squares when only two of x, y and s are different

There are $\binom{n}{2}$ ways of choosing two unique numbers from the set {1, 2, ..., n}. But there are three different ways that these two numbers can be assigned to the values (x, y, s): $$ \begin{align*} \text{ i) } & s < x = y \\ \text{ ii) } & s = x < y \\ \text{ iii) } & s = y < x \\ \end{align*} $$

Thus: $$ c_2 = 3 \binom{n}{2} $$

Counting $c_1$: the number of squares when x, y and s are all the same

There are $\binom{n}{1} = n$ ways of choosing a single number from the set {1, 2, ..., n}.

Each such number will define a single solution with: $$ \begin{align*} s = x = y \\ \end{align*} $$ Clearly all the constraints will be satisfied. Thus: $$ c_1 = n $$

Putting it all together

So the total number of squares of various sizes in an n x n grid is: $$ c_1 + c_2 + c_3 = n + 3 \binom{n}{2} + 2 \binom{n}{3} $$

Checking the answer

In the second post of the series we showed that the number of squares is $ \frac{n(n + 1)(2n + 1)}{6} $. Let's check our answer by seeing if the two formulas are equivalent.

$$ \begin{align*} n + 3 \binom{n}{2} + 2 \binom{n}{3} &= n + 3 \frac{n (n - 1)}{2} + 2 \frac{ n(n - 1)(n - 2)}{6} \\ &= \frac{n}{6} [ 6 + 9 (n - 1) + 2(n - 1)(n - 2) ] & \text{ Extract } \frac{n}{6} \text { from all terms } \\ &= \frac{n}{6} [ 6 + 9n - 9 + 2n^2 - 6n + 4 ] \\ &= \frac{n}{6} [ 2n^2 + 3n + 1 ] \\ &= \frac{n}{6} [ (n + 1)(2n + 1) ] \\ &= \frac{n(n + 1)(2n + 1)}{6} \\ \end{align*} $$ So our calculations seem correct, since this is the formula we generated in the previous blog post.

The value of a good data representation

Rob Pike's 5th rule of programming supposedly states that:

Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

This echoes Fred Brooks' statement in "The Mythical Man-Month" that: "Representation is the essence of programming".

This blog post is about Mathematics, not programming. But I'd argue that the elegance of this solution also comes from choosing the right data representation.

Conclusion

This is the third post in the series. The first three posts have shown that the number of squares in an nxn grid is: $$ \begin{align*} & \sum_{{i}={1}}^{n}i^2 & & \text{ from blog post 1} \\ = & \frac{n(n + 1)(2n + 1)}{6} & & \text{ from blog post 2} \\ = & n + 3 \binom{n}{2} + 2 \binom{n}{3} & & \text{ from blog post 3} \\ \end{align*} $$

Next time

This post had far less algebra than the previous blog post. But there was still some algebraic manipulation required to arrive at the formula $\frac{n(n + 1)(2n + 1)}{6}$. It's certainly not obvious why the formula has that specific form. My aim in the next two blog posts will be to address this. I want to arrive at that formula via a much more direct route.

Wednesday, December 02, 2015

Guessing a closed form solution for the sum of squares

Overview

In a previous blog post, I showed that the number of squares of various sizes in an nxn grid is $\sum_{{i}={1}}^{n}i^2 $. As n gets large, it's going to take a long time to sum up all those squares. So a closed form solution would be much better.

In this blog post I'm going to demonstrate a very mechanical way of finding that formula. It doesn't require any flashes of genius, and it generalizes to many similar problems. If you were the manager of a team of Mathematicians selling commercial Mathematics projects, this is probably how you'd want them to solve the problem. It's boring, predictable, general and doesn't require (nor provide) any special insight. And you can even solve the algebra programmatically, which will reduce the risk of a calculation error. Kaching!

What? You don't work for a company that sells Mathematics for money? How surprising. Well, stay tuned to this blog series. Because in future blog posts I'm going to present a variety of more inspired solutions. These will give much better insight into why the formula comes out the way it does. But first let's derive the formula...

All posts in this series

The mechanical approach

Step 1: guess the pattern of the formula

The degree of a polynomial is the highest power that the variable is raised to. If the summation is over a polynomial of degree $d$, then make an educated "guess" that the closed form expression is a polynomial of degree $d+1$. This seems plausible, because it holds for the following well known formulae: $$ \begin{align*} \sum_{{i} = {1}}^{n}1 &= n \\ \sum_{{i} = {1}}^{n}i &= \frac{n(n+1)}{2} &= \frac{1}{2} n^2 + \frac{1}{2} n \\ \end{align*} $$

Firstly, let $f(n) = \sum_{{i}={1}}^{n}i^2 $ be the actual sum of squares function.

Note that $f(0) = 0$ since there are no terms to add when $n = 0$ (there are zero squares in a zero by zero grid).

Now let $g(n)$ be our guess for the closed form equivalent of function f: $$ \begin{align*} g(n) &= \sum_{{j}=0}^{d+1}a_j.n^j \\ &= a_{0} + a_{1} n + a_{2} n^{2} + a_{3} n^{3} & \text{since d = 2} & \tag{1} \end{align*} $$

Next we need to calculate values for the $a_j$ values. We will start with $a_0$.

Step 2: Ensure that $g(0) = f(0) = 0$

$ g(0) = a_3 . 0^3 + a_2 . 0^2 + a_1 . 0 + a_0 = a_0 $

Hence set: $a_0 = 0 \tag{2}$

Step 3: Choose a method for determining the other coefficients

At this point it's tempting to set $g(1) = f(1)$, $g(2) = f(2)$ and $g(3) = f(3)$ to calculate the coefficients $a_j$. This is quick and simple and it will generate the correct values of the $a_j$ if $f(n)$ is indeed a polynomial.

But it won't provide any justification that $f(n)$ really is a polynomial of degree $d + 1$ i.e. that $f(n) = g(n)$ for all non-negative integers $n$. To be more rigorous, we are going to calculate the coefficients a different way...

Let $h(i)$ be the expression in the summation i.e. $h(i) = i^2 \tag{3} $

[Note that you can generalize this method using other polynomial functions for $h$.]

We are going to choose values for the $a_j$ coefficients so that: $$ \begin{align*} g(i) - g(i-1) &= h(i) & \forall i \in \mathbb{N} & \tag{4} \\ \end{align*} $$

Providing we can find such values, this will enable us to prove that $$ \begin{align*} g(n) &= f(n) & \forall n \in \mathbb{N} \end{align*} $$ This is because:

$$ \begin{align*} f(n) &= \sum_{{i}={1}}^{n}h(i) \\ &= \sum_{{i}={1}}^{n}[g(i) - g(i-1)] \\ &= [g(n) - g(n-1)] + [g(n-1) - g(n-2)] + ... + [g(2) - g(1)] + [g(1) - g(0)] \\ &= g(n) - g(0) & \text{because all other terms cancel out} \\ &= g(n) & \text{since } g(0) = 0 \text{ by equation 2} \end{align*} $$

There's some tedious and error-prone algebraic manipulation involved in calculating $g(i) - g(i-1)$. Instead of doing it by hand, let's use symbolic mathematics software to do it for us...

Step 4: Determine the other coefficients programmatically

First let's make some technology choices:

Use the Python programming language, since it is popular in the Scientific community.
I already have Python 2.7.10 installed via the PythonXY scientific Python distribution for Windows.
Use the SymPy Python library as our symbolic Mathematics package.
Use the iPython notebook as our REPL, as it allows easy sharing of the code.
Store the notebook in GitHub, which has built-in support for hosting and displaying iPython notebooks.
Extract the code into a separate Python script, and copy this script into a GitHub gist for easy sharing in the blog.

The first step was to develop the SymPy code in an iPython web notebook. You can download or view the notebook on GitHub.

If you download it, you can then open it from the command line by navigating to a suitable folder and running

ipython notebook --ip=localhost

This will open a local web page which you can use to navigate to the notebook and edit it:

After calculating the coefficients and factorizing the polynomial, the notebook spits out the following formula: $$ \frac{n(n + 1)(2n + 1)}{6} $$

Step 5 (optional): Generalize the code to work for other polynomial summations

After getting the code working using the iPython Notebook, I ported it to a Python script. I then modified the script so that, instead of only supporting the case $h(i) = i^2$, it would accept any polynomial expression in the variable $i$ as a command line parameter.

To demonstrate that this method works more generally, I've included a short Powershell snippet to calculate the closed form of the sums of k^th powers for k from 0 to 10:

Running this Powershell snippet, produces the following output:

I'm fairly happy with how the Python script turned out in the end. But I found it quite frustrating getting to that point. While SymPy is reasonably well documented at the level of modules and functions, it's harder to work out what the allowed values are for many of the function parameters and what they mean. Although the code is working, there are still quite a few hacks and TODO's left in the code.

As well as using the official SymPy documentation, I also found this SciPy 2011 Tutorial quite useful.

It would have been quicker to do the algebra by hand, although admittedly not for the more general case. Arguably that's also due to my lack of regular practice with SymPy, iPython and Python.

Conclusion

In this blog post I've used SymPy to derive the following result: $$ \sum_{{i}={1}}^{n}i^2 = \frac{n(n + 1)(2n + 1)}{6} $$

In my next few blog posts on this problem, I'm going to demonstrate other derivations of the closed form solution. In particular, I'm looking for derivations which provide greater insight into the problem.