Andrew Tweddle's Blog

An animated visual explanation of the sum of squares formula

2016-09-02T17:41:00.003-04:00

Overview

This is the fourth in a series of posts about a recreational maths problem - counting the number of squares (of all sizes) on a chessboard. The first three posts have shown that the number of squares of various sizes in an nxn grid is $\sum_{{i}={1}}^{n}i^2 = \frac{n(n + 1)(2n + 1)}{6}$. In this post we will solve the problem in a different way that will explain the structure of that formula.

All posts in this series

A recap

In the first post of the series I showed that the number of squares in an n by n grid is $\sum_{{i}={1}}^{n}i^2$.

In the second post I used algebra to show that this summation equals $\frac{n(n + 1)(2n + 1)}{6}$.

But can we find a more direct link between this formula and the number of squares in an n x n grid? Preferably one that explains each part of the formula.

In this post I'm going to use an animated graphical demo to show you how to get to the formula directly. This will help us to build an intuition about the problem. In later blog posts we'll use that intuition to derive a very elegant solution.

tl;dr

If you don't feel like reading the full explanation, you can jump straight to the animations:

Skip straight to the first demo!
Skip to the second demo

The solution approach

The formula can be rewritten as follows: $$ \begin{align*} f(n) & = \frac{n(n + 1)(2n + 1)}{6} \\ & = \frac{n(n + 1)}{2} \frac{2n + 1}{3} \\ & = \binom{n + 1}{2} . \frac{1}{3} . (2n + 1)\\ \end{align*} $$

I'll explain the three parts of this formula separately:

There are $\binom{n + 1}{2}$ ways of selecting a vertical "slice" or subset of an n by n grid.
The $\frac{1}{3}$ is because we're going to do this in three different ways.
And we'll do this in a clever way, so that the number of squares that exactly fit the width of the 3 slices will always add up to $2n + 1$

Below is an animated demonstration for $n = 8$. You can see the 3 grids and an initial set of vertical slices of each grid.

When you click the start button, the demonstration will run through all the possible ways of generating vertical slices of the 3 grids. A pink square will slide down each vertical slice. The top left corner of the square is highlighted in a darker colour. As the pink square slides down the yellow slice, the dark pink corner will leave behind a "residue" of golden squares. This will make it easier to count the total number of squares that fit the width of the 3 grids.

Below the grids is a table which will be used to keep track of the total count. As the animation is running, watch how the rightmost column of the table always adds up to 17 (i.e. $2n+1$).

An interactive demonstration

Speed: Slow Medium Fast Very fast Align window to here Next demo

Grid A

0	1	2	3	4	5	6	7	8

Grid B

0	1	2	3	4	5	6	7	8

Grid C

0	1	2	3	4	5	6	7	8

Combinations (5 at a time)			Pink squares per slice			Total squares
#	i	j	A	B	C	A + B + C





Total (all combinations):
$\binom{n + 1}{2} = \frac{n(n+1)}{2} = \frac{8 (8 + 1)}{2} = 36$			$f(n) + f(n) + f(n) = 3 f(n)$			$ = \binom{n + 1}{2} (2n + 1)$

Note: Once the table has been fully populated, you will be able to:

use the slider to scroll through the rows
select an individual row by clicking on it

Explanation

I'll explain the solution in detail further down. But the essence of it is to:

Choose a pair of numbers from the set {0, 1, ..., n}.
Find three different ways of defining a vertical "slice" of the grid using that pair of numbers.
Count the number of squares that fit the width $w$ of each slice exactly i.e. count how many $w$ x $w$ squares fit into the slice.
The number of squares in each of the slices depends on the width $w$ of that slice, which depends on the specific pair of numbers chosen.
But add the number of $w$ x $w$ squares in all three slices, and the answer is always $2n+1$, regardless of the numbers chosen.

I've created an interactive demonstration below which allows you to choose a pair of numbers and see the slices generated. Play around with it. You may be able to understand the solution without needing the explanation that follows.

Choose a pair of distinct numbers between 0 and n

First we choose any two numbers from the set $\{0, 1, ..., n\}$.

Let $i$ be the smaller and $j$ the larger of the two, so that $0 \le i \lt j \le n$.

A data representation

For each of the 3 grids we will use $i$ and $j$ to generate a slice of the grid. We will label the left and right edges of each grid $p$ and $q$.

Another demonstration

The previous demo showed all possible pink squares that fit into each vertical slice. This demo shows the possible top left hand corners of the squares in a golden colour. The number of golden squares is the total number of squares that fit into the 3 slices for a particular combination of $(i, j)$.


Choose:	$i$		$j$		Align window to here	Previous demo

Grid A

0	1	2	3	4	5	6	7	8

A	Formula	Value
$p$	$i$
$q$	$j$
$w$	$j - i$
$s$	$n - j + i + 1$

Grid B

0	1	2	3	4	5	6	7	8

B	Formula	Value
$p$	$n - j$
$q$	$n - j + i + 1$
$w$	$i + 1$
$s$	$n - i$

Grid C

0	1	2	3	4	5	6	7	8

C	Formula	Value
$p$	$j - i - 1$
$q$	$n - i$
$w$	$n - j + 1$
$s$	$j$

Squares in all three slices

Grid	Size of squares ($w$ x $w$)	Formula for $s$	Value (# of squares)
A		$n - j + i + 1$
B		$n - i$
C		$j$
	Total:	$2n + 1$	17

A few notes

I suggest taking a few moments to satisfy yourself that $p$ and $q$ are always valid for the three grids i.e. that $ 0 \le p \lt q \le n $. This can be deduced from equation (1) i.e. $0 \le i \lt j \le n$.

To be completely rigorous, we should also verify that $p$ and $q$ can map to any pair of integers from the set {0, 1, ..., n}, otherwise there could be squares which aren't being counted. I'm not going to go to those lengths, because the purpose of this blog post is to give you an intuitive understanding of why the sum of squares formula has the form it does. [Hint: Show that the mappings from ($i$, $j$ ) -> ( $p$, $q$ ) are bijections (one-to-one).]

Also take a look at the table of calculations. Notice how the $i$ and $j$ variables cancel out when you add the three formulae. This causes the total number of squares in the 3 slices to always be the same, no matter which pair of $i$ and $j$ values we choose.

A detailed explanation

Recall the formula we derived previously: $$ \begin{align*} \frac{n(n + 1)(2n + 1)}{6} & = \frac{n(n + 1)}{2} \frac{2n + 1}{3} \\ & = \binom{n + 1}{2} \frac{1}{3} (2n + 1)\\ \end{align*} $$

This gives a few clues for solving the problem. I'm going to break the formula into separate parts. Later I'll show you how to solve each part. Here's a brief overview of how each part of the formula will be used:

#	Expression	Notes
1.	$ \binom{n + 1}{2} $	Select two numbers from the set {0, 1, ..., n}. Treat these as the left and right edge of a vertical slice of the grid. This combination gives all possible ways of slicing the grid.
2.	$ \frac{1}{3} $	Find two other ways of using the same two numbers to define vertical slices of the grid. This gives three different ways of partitioning the grid into slices. Add up all the squares that fit exactly in the 3 slices defined by the pair of numbers. But that means that every square in the final total is going to be counted 3 times. So we need to divide the total by 3.
3.	$ 2n + 1 $	Count the number of squares which fit into each slice exactly (i.e. with the same width as the slice). Each pair of numbers gives 3 different vertical slices. Each slice will accommodate a different number of squares. But together the 3 slices are always going to have exactly $2n + 1$ squares.

Conclusion

This post gives us an intuitive idea of why the formula for the sum of squares looks like $\frac{n(n + 1)(2n + 1)}{6}$.

But can we do better than this?

Where do the three mappings from $(i, j)$ to $(p, q)$ come from? Must we rely on trial and error to determine them. Or is there a more satisfying way of deriving them?

And where does the $2n + 1$ term come from? We used algebraic cancellation to determine it. But can we find a more intuitive explanation for why it must have that specific form?

In the next few blog posts, I am going to answer these questions. In the process we're going to derive a very elegant solution which addresses each of these questions.

A combinatorial solution to the sum of squares formula

2015-12-03T21:07:00.001-05:00

Overview

In my previous two blog posts, I showed that the number of squares of various sizes in an nxn grid is $\sum_{{i}={1}}^{n}i^2 = \frac{n(n + 1)(2n + 1)}{6}$. In this post I'm going to use combinatorial Mathematics to derive a different closed form solution.

All posts in this series

A quick review of combinations

In combinatorics, the combination function $\binom{n}{k}$ is used to counts the number of different ways that k items can be chosen from a set of n unique items, with the order of the k items being irrelevant. $\binom{n}{k}$ is usually read as "n combination k" or "n choose k". It is also known as the binomial coefficient. You can read more about combinations on Wikipedia.

You also need to know that: $$ \begin{align*} n! &= 1 . 2 . 3 ... n & \text{n! is the product of the first n natural numbers} \\ \binom{n}{k} &= \frac{n!}{(n-k)! k!} \\ &= \frac{n (n - 1) ... (n - k + 1)}{1 . 2 . 3 ... k} \end{align*} $$

The approach

We are going to use combinatorial Mathematics to count the number of possible squares in the grid. To do this we will need to define a coordinate system for the grid and a convenient data representation for a valid square in that grid. By a data representation, I mean a set of numeric values that can uniquely identify a particular square in the grid.

We will then use combinations to count all possible ways of generating valid data representations. But we will need to take into account that combinations generate unique numbers. That will require us to consider separate cases based on the number of identical variables in our representation.

A useful data representation

A recap of the problem

The problem is to count the total number of squares of any size in a grid. So if you think of a Chessboard, how many 1x1, 2x2, ..., 8x8 squares are there?

Let's consider whether there is a useful notation for describing any valid square in the grid.

Defining a coordinate system

First let's define a coordinate system for the grid.

We'll say that (x, y) is the coordinate of a space on the checkerboard in column x and row y. And we'll choose (1,1) to be the coordinate of the top-left corner. That way columns go from left to right and rows go from top to bottom - the same direction as reading pages in a book (for most cultures).

[This is just one possible convention... I could have started at the bottom left instead, like a chart, or made the corner square (0, 0) instead of (1,1).]

Comparing different representations for valid squares in the grid

We could use the co-ordinates $(x_1, y_1)$ and $(x_2, y_2)$ of two opposite corners of a square block. But that representation encodes any rectangle in the grid, not just the squares. We'd prefer to choose a data representation which enforces constraints naturally.

So what if we used $(x_1, y_1, s)$ as our data representation, where $(x_1, y_1)$ is the top left corner of the square and $s$ is the size of the sides of the square? This is better, because we have 3 variables representing a valid square instead of 4, and we have expressed the constraint that a square's sides have the same size.

However this representation is still a little cumbersome. To express the constraint that the square fits within the grid, we need to ensure that the bottom right corner is somewhere inside the grid. That requires calculating the x and y coordinates of the bottom-right corner. Our data representation would be more convenient if we could avoid all unnecessary calculations.

So let's encode the bottom-right corner into the data representation instead. This will give us our desired data representation...

The chosen data representation

We will represent a square in the grid by the triple $(x, y, s)$ where (x, y) is the bottom-right corner of the square and s is the size.

Our data representation (x, y, s) is summarized in the following diagram:

With this data representation, our constraints on a valid square are: $$ \begin{align*} 1 \leq x \leq n & \tag{1} \\ 1 \leq y \leq n & \tag{2} \\ 1 \leq s \leq n & \tag{3} \\ s \leq x & \tag{4} \\ s \leq y & \tag{5} \\ \end{align*} $$

There is some redundancy in constraint 3, since the other 4 constraints ensure that $s \leq n$. I've chosen to do it this way so that constraints 1, 2 and 3 all have the same form. They will be satisfied naturally when we use the combination function. Only constraints 4 and 5 will need to be considered explicitly.

Generating all possible data representations

Choosing values of x, y and s from the set {1, 2, ... n}

We can address constraints 1, 2 and 3 by choosing our variables x, y and s from the set {1, 2, 3, ..., n}. That allows us to use the combination function.

Unfortunately there's a catch! Combinations count unique numbers, and some of the variables x, y and s could be the same. So we'll need to consider separate cases depending on how many of the variables have unique values.

Let's start by defining variables for each case. Let $c_i$ be the number of ways of generating representations $(x, y, s)$ when $i$ unique numbers are taken from the set {1, 2, 3, ..., n}. The total number of squares is then $c_1 + c_2 + c_3$.

Counting $c_3$: the number of squares when x, y and s are all different

There are $\binom{n}{3}$ ways of choosing three unique numbers from the set {1, 2, ..., n}. Of these, the smallest must always be assigned to $s$ so that constraints 4 and 5 are addressed. However the largest could be assigned to x or y. This leads to two sub-cases: $$ \begin{align*} \text{ i) } & s < x < y \\ \text{ii) } & s < y < x \end{align*} $$

So we have two different representations generated from each set of three distinct numbers. Thus: $$ c_3 = 2 \binom{n}{3} $$

Counting $c_2$: the number of squares when only two of x, y and s are different

There are $\binom{n}{2}$ ways of choosing two unique numbers from the set {1, 2, ..., n}. But there are three different ways that these two numbers can be assigned to the values (x, y, s): $$ \begin{align*} \text{ i) } & s < x = y \\ \text{ ii) } & s = x < y \\ \text{ iii) } & s = y < x \\ \end{align*} $$

Thus: $$ c_2 = 3 \binom{n}{2} $$

Counting $c_1$: the number of squares when x, y and s are all the same

There are $\binom{n}{1} = n$ ways of choosing a single number from the set {1, 2, ..., n}.

Each such number will define a single solution with: $$ \begin{align*} s = x = y \\ \end{align*} $$ Clearly all the constraints will be satisfied. Thus: $$ c_1 = n $$

Putting it all together

So the total number of squares of various sizes in an n x n grid is: $$ c_1 + c_2 + c_3 = n + 3 \binom{n}{2} + 2 \binom{n}{3} $$

Checking the answer

In the second post of the series we showed that the number of squares is $ \frac{n(n + 1)(2n + 1)}{6} $. Let's check our answer by seeing if the two formulas are equivalent.

$$ \begin{align*} n + 3 \binom{n}{2} + 2 \binom{n}{3} &= n + 3 \frac{n (n - 1)}{2} + 2 \frac{ n(n - 1)(n - 2)}{6} \\ &= \frac{n}{6} [ 6 + 9 (n - 1) + 2(n - 1)(n - 2) ] & \text{ Extract } \frac{n}{6} \text { from all terms } \\ &= \frac{n}{6} [ 6 + 9n - 9 + 2n^2 - 6n + 4 ] \\ &= \frac{n}{6} [ 2n^2 + 3n + 1 ] \\ &= \frac{n}{6} [ (n + 1)(2n + 1) ] \\ &= \frac{n(n + 1)(2n + 1)}{6} \\ \end{align*} $$ So our calculations seem correct, since this is the formula we generated in the previous blog post.

The value of a good data representation

Rob Pike's 5th rule of programming supposedly states that:

Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

This echoes Fred Brooks' statement in "The Mythical Man-Month" that: "Representation is the essence of programming".

This blog post is about Mathematics, not programming. But I'd argue that the elegance of this solution also comes from choosing the right data representation.

Conclusion

This is the third post in the series. The first three posts have shown that the number of squares in an nxn grid is: $$ \begin{align*} & \sum_{{i}={1}}^{n}i^2 & & \text{ from blog post 1} \\ = & \frac{n(n + 1)(2n + 1)}{6} & & \text{ from blog post 2} \\ = & n + 3 \binom{n}{2} + 2 \binom{n}{3} & & \text{ from blog post 3} \\ \end{align*} $$

Next time

This post had far less algebra than the previous blog post. But there was still some algebraic manipulation required to arrive at the formula $\frac{n(n + 1)(2n + 1)}{6}$. It's certainly not obvious why the formula has that specific form. My aim in the next two blog posts will be to address this. I want to arrive at that formula via a much more direct route.

Guessing a closed form solution for the sum of squares

2015-12-02T23:03:00.000-05:00

Overview

In a previous blog post, I showed that the number of squares of various sizes in an nxn grid is $\sum_{{i}={1}}^{n}i^2 $. As n gets large, it's going to take a long time to sum up all those squares. So a closed form solution would be much better.

In this blog post I'm going to demonstrate a very mechanical way of finding that formula. It doesn't require any flashes of genius, and it generalizes to many similar problems. If you were the manager of a team of Mathematicians selling commercial Mathematics projects, this is probably how you'd want them to solve the problem. It's boring, predictable, general and doesn't require (nor provide) any special insight. And you can even solve the algebra programmatically, which will reduce the risk of a calculation error. Kaching!

What? You don't work for a company that sells Mathematics for money? How surprising. Well, stay tuned to this blog series. Because in future blog posts I'm going to present a variety of more inspired solutions. These will give much better insight into why the formula comes out the way it does. But first let's derive the formula...

All posts in this series

The mechanical approach

Step 1: guess the pattern of the formula

The degree of a polynomial is the highest power that the variable is raised to. If the summation is over a polynomial of degree $d$, then make an educated "guess" that the closed form expression is a polynomial of degree $d+1$. This seems plausible, because it holds for the following well known formulae: $$ \begin{align*} \sum_{{i} = {1}}^{n}1 &= n \\ \sum_{{i} = {1}}^{n}i &= \frac{n(n+1)}{2} &= \frac{1}{2} n^2 + \frac{1}{2} n \\ \end{align*} $$

Firstly, let $f(n) = \sum_{{i}={1}}^{n}i^2 $ be the actual sum of squares function.

Note that $f(0) = 0$ since there are no terms to add when $n = 0$ (there are zero squares in a zero by zero grid).

Now let $g(n)$ be our guess for the closed form equivalent of function f: $$ \begin{align*} g(n) &= \sum_{{j}=0}^{d+1}a_j.n^j \\ &= a_{0} + a_{1} n + a_{2} n^{2} + a_{3} n^{3} & \text{since d = 2} & \tag{1} \end{align*} $$

Next we need to calculate values for the $a_j$ values. We will start with $a_0$.

Step 2: Ensure that $g(0) = f(0) = 0$

$ g(0) = a_3 . 0^3 + a_2 . 0^2 + a_1 . 0 + a_0 = a_0 $

Hence set: $a_0 = 0 \tag{2}$

Step 3: Choose a method for determining the other coefficients

At this point it's tempting to set $g(1) = f(1)$, $g(2) = f(2)$ and $g(3) = f(3)$ to calculate the coefficients $a_j$. This is quick and simple and it will generate the correct values of the $a_j$ if $f(n)$ is indeed a polynomial.

But it won't provide any justification that $f(n)$ really is a polynomial of degree $d + 1$ i.e. that $f(n) = g(n)$ for all non-negative integers $n$. To be more rigorous, we are going to calculate the coefficients a different way...

Let $h(i)$ be the expression in the summation i.e. $h(i) = i^2 \tag{3} $

[Note that you can generalize this method using other polynomial functions for $h$.]

We are going to choose values for the $a_j$ coefficients so that: $$ \begin{align*} g(i) - g(i-1) &= h(i) & \forall i \in \mathbb{N} & \tag{4} \\ \end{align*} $$

Providing we can find such values, this will enable us to prove that $$ \begin{align*} g(n) &= f(n) & \forall n \in \mathbb{N} \end{align*} $$ This is because:

$$ \begin{align*} f(n) &= \sum_{{i}={1}}^{n}h(i) \\ &= \sum_{{i}={1}}^{n}[g(i) - g(i-1)] \\ &= [g(n) - g(n-1)] + [g(n-1) - g(n-2)] + ... + [g(2) - g(1)] + [g(1) - g(0)] \\ &= g(n) - g(0) & \text{because all other terms cancel out} \\ &= g(n) & \text{since } g(0) = 0 \text{ by equation 2} \end{align*} $$

There's some tedious and error-prone algebraic manipulation involved in calculating $g(i) - g(i-1)$. Instead of doing it by hand, let's use symbolic mathematics software to do it for us...

Step 4: Determine the other coefficients programmatically

First let's make some technology choices:

Use the Python programming language, since it is popular in the Scientific community.
I already have Python 2.7.10 installed via the PythonXY scientific Python distribution for Windows.
Use the SymPy Python library as our symbolic Mathematics package.
Use the iPython notebook as our REPL, as it allows easy sharing of the code.
Store the notebook in GitHub, which has built-in support for hosting and displaying iPython notebooks.
Extract the code into a separate Python script, and copy this script into a GitHub gist for easy sharing in the blog.

The first step was to develop the SymPy code in an iPython web notebook. You can download or view the notebook on GitHub.

If you download it, you can then open it from the command line by navigating to a suitable folder and running

ipython notebook --ip=localhost

This will open a local web page which you can use to navigate to the notebook and edit it:

After calculating the coefficients and factorizing the polynomial, the notebook spits out the following formula: $$ \frac{n(n + 1)(2n + 1)}{6} $$

Step 5 (optional): Generalize the code to work for other polynomial summations

After getting the code working using the iPython Notebook, I ported it to a Python script. I then modified the script so that, instead of only supporting the case $h(i) = i^2$, it would accept any polynomial expression in the variable $i$ as a command line parameter.

To demonstrate that this method works more generally, I've included a short Powershell snippet to calculate the closed form of the sums of k^th powers for k from 0 to 10:

Running this Powershell snippet, produces the following output:

I'm fairly happy with how the Python script turned out in the end. But I found it quite frustrating getting to that point. While SymPy is reasonably well documented at the level of modules and functions, it's harder to work out what the allowed values are for many of the function parameters and what they mean. Although the code is working, there are still quite a few hacks and TODO's left in the code.

As well as using the official SymPy documentation, I also found this SciPy 2011 Tutorial quite useful.

It would have been quicker to do the algebra by hand, although admittedly not for the more general case. Arguably that's also due to my lack of regular practice with SymPy, iPython and Python.

Conclusion

In this blog post I've used SymPy to derive the following result: $$ \sum_{{i}={1}}^{n}i^2 = \frac{n(n + 1)(2n + 1)}{6} $$

In my next few blog posts on this problem, I'm going to demonstrate other derivations of the closed form solution. In particular, I'm looking for derivations which provide greater insight into the problem.

Counting squares in a grid using the sum of squares method

2015-06-10T14:20:00.001-04:00

Introduction

A while back a colleague asked some of us how we would go about solving a particular Mathematical problem. The problem was to count the total number of squares in a Chess board. This included all squares of size 1x1, 2x2, up to 8x8.

I found a number of different ways of solving the problem which I'm going to present in the next few blog posts.

As has been the theme with my previous blog posts on Mathematical puzzles, my goal is to progress from solutions which are algebraic to solutions which provide insight.

This blog post is about the first way of solving the problem, which I've dubbed the sum of squares method.

Note that I'll be solving the problem for any n x n grid (or lattice), not just for the 8 x 8 chessboard.

All posts in this series

Step 1: How many k x k squares fit into an n x n grid?

Supposing we have an n x n grid. In how many different ways can we fit a square of size k x k into the grid (and of course its edges must align with edges inside or along the edge of the grid)?

Consider the k x k pink square shown in the diagram below...

I've shown the top-left cell of the pink square in a darker pink. We can't move the dark pink square any further right or down. But we could move it up or to the left. As we move the square around, the dark pink cell can end up anywhere within the yellow area.

So the number of possible positions for the pink k x k square is precisely the number of positions for the dark pink cell, which is the size of the yellow area.

Now the yellow area has sides of length: $n - (k - 1) = n - k + 1$. So there are $(n - k + 1)^2$ possible k x k squares that can fit into the n x n grid.

Step 2: Add up the number of squares for each value of k

The smallest value of k is 1 and there will be $n^2$ one-by-one squares fitting in the grid. The largest value of k is n and there will be only one n x n square.

If we add up the number of k x k squares for each value of k we get a formula of:

$$ \sum_{{k}={1}}^{n}(n - k + 1)^2 $$
That's quite a neat formula. But we can make it even neater by adding a new variable $i = n - k + 1$ and summing over all possible values of $i$ instead. The following table will make this clearer:

$k$	$i = n - k + 1$	Number of squares of size k x k $ = (n - k + 1)^2 = i^2$
$1$	$n$	$n^2$
$2$	$n - 1$	$(n-1)^2$
...	...	...
$n - 1$	$2$	$2^2$
$n$	$1$	$1^2$

Now instead of summing over the values of k (from top to bottom), we can instead sum over the values of i (i.e. starting from the bottom row to the top row). And our formula becomes: $$ \sum_{{i}={1}}^{n}i^2 $$

Step 3: Derive a closed form solution

The formula above is very succinct and beautiful. But it has a drawback... as $n$ gets large, it's going to a take a long time to do all those summations. To address this, what we really want to find is a closed form solution without any summation symbols.

In future posts I'm going to show you a few different ways of getting a closed formula.

Next steps...

In the next blog post in this series, I'm going to derive the closed form solution through algebraic manipulation of the summation formula.

In later posts I will derive the number of squares in the grid in other ways. These will lead directly to the closed form solution. I'm hoping they will also give more insight into why the formula comes out the way it does.

Andrew Loves Math(s)

2014-03-02T12:44:00.001-05:00

Introduction

The other day I was discussing Mathematical puzzles with my colleagues and one of them mentioned the "Johnny Hates Math" problem. The problem statement can be found on the SPOJ web site.

Essentially the problem is to insert plus signs at appropriate places in a string of digits such that the resulting sum equals a target value. The plus signs will divide the digits into a set of numbers. None of these numbers should be more than 5 digits long and none should start with a zero, except possibly the number zero itself.

The instructions on the SPOJ web site are to print out the solution with the least number of plus signs. But I think it's more interesting to know the total number of solutions. So I'm just going to provide code to solve that problem, not satisfy the SPOJ specification.

I'm going to solve the problem in a number of different ways and using different programming language features (mainly using Scala). The purpose is to get a feel for which solution methods produce the most concise code and which produce the fastest code. It will also be an opportunity to experiment with Scala and develop more of a feel for the language.

A PowerShell solution

My first attempt was in PowerShell and took 46 minutes to code. A lot of that time was spent fixing PowerShell syntax errors.

Interestingly, I automatically turned to recursion to solve the problem. I guess that's the effect of experimenting with functional programming in the last little while!

My approach was to:

Branch on the number of digits in the next number (line 6 in the code snippet below)
Prune each branch based on having enough digits left (line 8), not starting with a zero (line 11), not exceeding the target value (line 16) and not running out of digits before reaching the target value (line 25)
Recursively call the function with the remaining digits and remaining target value (line 28)
End the recursion when there are no digits remaining and the target value has been reached exactly (line 21)

A quick comment on line 28: @(...) wraps its contents in an array unless the contents are already an array. This solves a problem in PowerShell where, if a function or script emits multiple values, these values are automatically wrapped in an array - but if there is only a single value, then it is returned as is.

This inconsistency can lead to some very tricky bugs. So it's important to coerce the return value to an array.

My laptop has an i5-2430M CPU. The PowerShell script solved the problem in under 5 seconds for the problem of inserting plus signs into the left hand side of: 15442147612367219875=472

The break statements provided a very convenient way of excluding infeasible branches. This didn't feel as clean in my first Scala attempt, since Scala treats if statements as expressions (similar to the ?: ternary conditional operator in languages like C# and Java).

A few Scala solutions

A Scala base class for the solver algorithm

I refactored the algorithm-independent features into a base class. The code for this is shown below:

I made the somewhat strange decision of defining the abstract solve method to take the same parameters as the class constructor. This was purely to give the derived classes the option of recursively calling the solve method. But it's not a great API design choice. It feels like it is trying to be both a static method (although Scala doesn't have such a concept as it would violate pure OO) and an abstract method.

I'm uncomfortable with this API, which usually means something's wrong. But I'm going to leave it for now, because my focus is on solving the problem, not designing an API.

In line 19, the companion object uses the "SolverUsingFlatMap" derived class as its default solver. I'll provide that sub-class next...

Scala solution 1: SolverUsingFlatMap

The code for my first Scala solution is shown below:

This solver works in a very similar fashion to the PowerShell algorithm. However, instead of returning arrays of strings, it represents a single solution as a List[Int] (i.e. a Scala linked list of integers). A list of solutions is then represented as a List[List[Int]].

I returned an empty list of solutions to indicate that a branch was infeasible. But the break statements in PowerShell felt a bit more intuitive to me.

Scala solution 2: SolverUsingFor

I wanted to get away from returning empty lists as a way of indicating a failed branch. My first thought was to use an Option[List[List[Int]]] type to represent a pruned branch using an option of None. But this felt like overkill.

Instead my second attempt used a for statement to get away from the need to return empty lists. This code turned out to be very concise and pleasing:

Scala solution 3: SolverUsingFoldLeft

I then started wondering whether it was possible to use a foldLeft operation to calculate a solution.

There were a few conceptual issues with making this work. Firstly, a foldLeft is taking a fixed list of items and an initial accumulator, and updating the accumulator for each successive item. In this case, each item was an individual digit.

Secondly, all the previous solutions branched in up to 5 directions at each step. To get branching to work with foldLeft, I needed to store all the branches in the accumulator. To do this, I created a PartialSolution class to keep track of the numbers generated so far, as well as the current number which was being built up. At each step, I would then split each PartialSolution into at most 2 new PartialSolutions, by either terminating the current number or adding another digit to it.

The code is shown below:

This code is considerably more verbose. I also didn't expect it to perform very well. Firstly, it is effectively doing a breadth-first search. Secondly I would expect it to be more memory-intensive and memory access is often much slower than computation. This is partially mitigated by storing the numbers in reverse order (so that large parts of the previous level's data structures are reused, since prepending a number to a list does not create a new list). Also, the foldLeft is tail recursive.

I also expected the foldLeft solution to degrade very quickly for large target totals, as the number of partial solutions is potentially doubling in size at every step. So with a large target value, the pruning effect of comparing the running total to the target is lost.

It turns out that this solver performed a lot better than I expected. And although it did degrade more with larger totals, the effect was not as pronounced as I had been expecting.

Scala worksheet to compare performance of the solvers

I used a Scala worksheet to test the code and compare performance of the various algorithms. The worksheet code and results are shown below:

[UPDATE: This way of measuring performance is not ideal, as explained in this blog post by Aleksey Shipilёv. Rather use a micro-benchmarking framework, such as JMH (possibly with sbt-jmh) or ScalaMeter.]

I tested the 3 solvers twice - first using a low target value, and then again using a high target value (to get an idea of the scalability of the algorithms).

There are some interesting things to note here.

Firstly, the flatMap method was fastest in all cases.

Despite being so short, the method using for comprehensions was much slower. In fact, for the smaller target value, I was quite surprised to see that the breadth-first foldLeft solver was sometimes faster!

With a much larger value, the foldLeft solver showed its poor scalability. However it wasn't that much worse than the method using the for comprehension.

All the Scala solvers were much faster than the PowerShell algorithm. This isn't too surprising, since PowerShell is an interpreted scripting language.

Impressions of Scala

Last year I completed the two Scala courses given on Coursera. This was my first foray into Scala since completing those courses. So what are my impressions of the language?

Programming in the small

Firstly, I've really enjoyed solving this problem with Scala. As with F#, I love being able to have concise, readable code - not unlike a dynamically typed language - and yet still gain the benefits of type safety and good performance.

I can see how functional programming in general provides a great solution for "programming in the small".

Scripting

Scripting is another aspect of programming in the small. So at some point I'd like to experiment with using the Scala REPL for writing scripts.

There is definitely promise in this area. I've had a lot of experience using PowerShell for scripting at work and at home. But I've had colleagues demonstrate that they can generally match the features of PowerShell using the F# REPL. It's obvious that functional programming languages can mount a credible challenge in this space - at least from a language perspective. The major advantage PowerShell continues to have in a work setting is its ubiquity on Windows servers.

I also recently finished working on a "record linkage" challenge in my spare time using Python and Pandas for the data munging. If and when time permits, I would like to re-develop the project in Scala to see which code base is easier to work with.

Programming in the large

Something I really like about Scala is that it is clearly designed for "programming in the large" as well. It's not just about functional programming. The baby has not been thrown out with the bathwater: object orientation is treated as an equal citizen in the language.

Actually I'd argue that Scala's true potential in the enterprise has very little to do with being a functional programming language. There are many Scala features which allow very concise, readable Object-Oriented code to be written. This succinctness is part of the style of functional programming. However the overtly functional features in Scala are often too easily abused and could be a distraction from the very real value that Scala offers.

I've seen a number of recent indicators that Scala may be ready to break into the enterprise development space:

The January 2014 issue of the Thoughtworks technology radar places "Scala - The good parts" in the adopt category at the centre of the radar.
Rod Johnson is the creator of the Spring framework. In his keynote address at ScalaDays 2013, he stated his opinion that by 2018 Scala would be the leading "new" language and that it would find its niche as the leading enterprise language.
Vaughn Vernon, author of "Implementing Domain-Driven Design", has recently been blogging about using Scala and Akka for Domain-Driven Design. Tomorrow I will be attending a 3 day workshop given by Vaughn. So I'm looking forward to hearing his opinions on Scala.
The frequently down (and out?) HammerPrinciple.com web site provides survey-based comparisons of various programming languages. Scala ranked first (above C# and Java) for "this language is best for very large projects", second for "I would use this language for writing server programs" and third for "this language is good for distributed computing".

On the other hand...

Scala has momentum. But that doesn't mean it will achieve breakthrough. Numerous great languages have fallen by the wayside, and there's no guarantee that Scala won't join their ranks. It's a big leap from a language being ready for the enterprise, to the enterprise being ready for a language!

On the negative side, Scala has a lot of very advanced language features which can easily trip up newcomers. This complexity is a substantial barrier to enterprise adoption.

This explains why Thoughtworks qualifies its recommendation of Scala as being just "the good parts".

Rod Johnson also addresses this issue in the ScalaDays keynote mentioned earlier, emphasizing the need to favour readability over poetry.

Martin Odersky, the designer of the Scala language, is well aware of the challenge. In 2011 he provided a list of language features with recommendations for which features were suitable for different levels of application developers and which for library designers.

Library design

This raises another area where Scala appears to be very elegant... library design.

The hammerprinciple.com site has Scala at number 3 for "I rarely have difficulty abstracting patterns I find in my code" and "this language is expressive". It comes in at number 2 for "I find code written in this language is very elegant". These suggest that Scala could be a great language for designing libraries for Scala and the JVM.

Of course you have to take this with a pinch of salt, since early adopters are likely to bias the HammerPrinciple survey outcomes. However there seems to be enough promise here to justify further experimentation.

On that note, there were some things in my JHM code that I really didn't like. I want to refactor the solve() method. And even though it's only in a worksheet, I'd also like to get rid of the implicit parameter to the time() function. It's one of those features which can easily make a code base more opaque. And even though the way I've used it is fairly transparent, just the act of using it legitimizes its use by less experienced programmers (who either haven't developed the intuition to know when not to use the feature, or don't yet have the ability to hold their peers to account by being able to explain the reasoning behind that intuition).

In a later post I'd like to refactor the code for reusability, and in so doing understand more of Scala's library design features.

Conclusion

This has been an interesting algorithmic challenge and I have really enjoyed solving it with Scala.

I've seen enough promise with Scala to justify using it as my preferred hobby language (along with Python for experimenting with Data Analytics and data munging).

It's been a lot of fun programming with Scala. And that's pretty important, because there's no guarantee that it will ever become a marketable skill.

RIP Amber

2014-02-27T16:43:00.001-05:00

Today we took our beautiful bull mastiff to be put down. Heart-breaking, but necessary... she was over 10 years old and had been diagnosed with kidney failure. She was skipping meals and her weight had dropped to just 34 kilograms (from 51 kg a few years before).

She had a wonderful sniff around the vet's waiting room, and it was lovely to see some of her old joy and bounciness back. But it also brought back memories of her younger self. It made it all the sadder to say good-bye and stroke her head as she fell asleep for the last time...

RIP, Amber!

Birthday Board Games Bash

2014-02-23T17:54:00.001-05:00

Thank you!

Firstly, a big thank you to my wonderful wife for arranging a board games party on Saturday for my birthday.

Around 40 friends attended, with the numbers being fairly evenly split between adults and children. We started at 10:00 am, and things went very smoothly, despite a power outage to replace some stolen electrical cable in the area.

There were 3 separate groups of friends present. I was a little apprehensive as it's always difficult to mix different social groups together. But it went really well.

Ricochet Robots

We started out playing Ricochet Robots as it can handle an arbitrary number of people and it's easy to join or leave at any time. It's a very clever competitive puzzle and it can be a real brain-burner. It's not everyone's cup of tea, but I love it!

There are online versions at http://www.ricochetrobot.com and http://www.ricochetrobots.com.

The Resistance: Avalon

Two of my friends had brought along copies of Avalon (the Resistance) so we then split into two groups - one for novices and the other for the experts. I played in the expert group, but felt a bit out of my depth at first.

There are a group of us who play games once a week after work. The rest of the guys have been playing Avalon a lot over the past few months and even have their own terminology now. But over that time I was temporarily seconded to another project to produce an architecture document for a client. So I have a lot of catching up to do.

A present!

Although everyone had been instructed not to bring presents, my colleagues clubbed together to buy me a present anyway. They had colluded with my wife, who had sneakily found out which games were on my most desired list. So I received a copy of Libertalia.

This game has been a big hit in our weekly after-work games sessions. It's a lot of fun to play. So I was delighted!

Ultimate Werewolf

After one of the Avalon owners left the party, we ended up with around 15 people still wanting to play. Although the Resistance improves on Werewolf in just about every way, it only caters for up to 10 players. Werewolf can handle much more than that.

So we switched over to a few games of Ultimate Werewolf instead. Great fun as always!

The birthday buffoon!

As numbers dwindled, we switched back to Avalon again. We ended with an absolutely excellent game, where the minions of Mordred won all 3 quests and also knew who Merlin was.

I ended up being a perfect patsy for two of the minions of Mordred, who inveigled their way into the trusted group. It was so impressive to see how cleverly they pulled the wool over our eyes, that I felt honoured to have been part of the game - even if I was one of the ones who had been so comprehensively fooled by them.

Usually our monthly board games events last until late into a Saturday night, but this time everyone was gone before supper.

And thanks again!

All in all, it was a really fun way to spend a birthday. Thanks to everyone who attended, and made this an ideal birthday party for me!

Researching the Josephus problem

2014-02-08T10:05:00.000-05:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

This is the ninth post in a series about an interesting Mathematical problem known as the Josephus problem.

Imagine a number of people sitting in a circle. Every second person is asked to leave the circle. This continues until only one person is left. If you label the people 1 to n, what is the label of this last person?

In the first 6 posts in the series:

I derived a formula for the label of the last person left,
I provided various proofs of the formula, and
I wrote code in the F# programming language to calculate the answer in various ways.

In the seventh and eighth posts I created an efficient algorithm for calculating the answer when the number of people skipped is not two but some arbitrary interval k. But I wasn't able to find a closed formula. And I felt I had gone as far as I could on my own. It was time to see how other people had solved the problem.

This post is about what I discovered.

The problem has a name

The first thing I discovered is that the problem has a name. It is known as the Josephus problem and it is widely used in Mathematical and Programming challenges. So I will be updating the previous posts to reflect the accepted name.

You can read more about the history of the Josephus problem on Wikipedia or Wolfram Mathworld.

In search of an elegant solution

My own attempt

When there are n people in the circle labelled 1 to n, the formula for the last person left in the circle is: $$ f(n) = 2n + 1 - 2^{{\lfloor log_2 n \rfloor} + 1} \tag{1} $$
In the third post in the series I used algebra to prove this result. But although algebra is a very powerful tool for proving the formula, it doesn't give any insight into why the formula is true.

One of the challenges I set myself was to provide that insight. The closest I came was in the sixth post in the series, where I used a calculation graph to show the pattern of values for $f(n)$.

Chamberlain's solution is much more elegant...

Chamberlain's solution

Chamberlain's Solution can be found on page 403 of Problem Solving and Recreational Mathematics 2012 by Paul Yiu of Florida Atlantic University.

His solution is based on a few key insights. I've paraphrased these as follows:

When the number of people in the circle is a power of two, the last person left is the first person skipped. This is because:
1. All the even numbered people are skipped each time around the circle
2. Exactly half the people are eliminated, so the remaining size of the circle is again a power of two
3. Person one will still be the first person skipped on the next traversal of the circle
4. So the process repeats itself with the circle halving each time, until person one is the only person left

Suppose the size of the circle is $2^m + r$ (with $0 \le r < 2^m$). Carry out the first $r$ removals. Then:
1. The first $r$ people removed are persons $2, 4, 6, ..., 2r$
2. The next person to be skipped will be person $2r + 1$
3. This leaves a new, smaller circle with $2^m$ people
4. So we can apply the rule from point 1 above (for a circle of size $2^m$)
5. So person $2r + 1$ will be the last person left in the circle

Making it rigorous

There are a few points above which may not be obvious. Let's try to make them more rigorous:

In point 2.a., the assumption is made that the labels of the first $r$ people removed would not "clock over" (i.e. one revolution of the circle hasn't been completed). Is this assumption reasonable?

$$ \begin{align*} \text {By definition: } 0 & \le r < 2^m \tag{2} \\ \text {and: } n & = 2^m + r \tag{3} \\ \\ \therefore 2r & < 2^m + r &\text{ by adding r to both sides of the inequality on the right of (2)}\\ \text{i.e. } 2r & < n \tag{4} \\ \\ \text{Also: } 2r + 1 & \le n \tag{5} \end{align*} $$
Equation 4 shows that this is a reasonable assumption.

In point 2.b. we also assumed that the next person to be skipped will be person $2r + 1$. Equation 5 proves that this label doesn't "clock over" either.

So this proves that $f(2^m + r) = 2r + 1 \text{ where } 0 \le r < 2^m$.

And this can be re-expressed in terms of logarithm and floor functions by noting that:
$$ \begin{align*} 0 & \le r < 2^m & \text{ from equation 2} \\ \therefore 2^m & \le 2^m + r < 2^m + 2^m \\ \therefore 2^m & \le n < 2^{m+1} & \text{ from equation 3} \\ \therefore m & \le \log_2{n} < m+1 & \text{ since logarithms are monotonic (order-preserving) } \\ \therefore m & = {\lfloor log_2 n \rfloor} \tag{6} \\ \\ \text{So:} \\ f(n) & = 2r + 1 \\ & = 2( n - 2^m ) + 1 & \text{ from equation 3}\\ & = 2n + 1 - 2^{m+1} \\ & = 2n + 1 - 2^{{\lfloor log_2 n \rfloor} + 1} & \text{ from equation 6} \end{align*} $$
And this is the formula given in equation 1 earlier.

A beautiful visualization of Chamberlain's solution

You can find a beautiful visual explanation of this solution on the Exploring Binary web site.

Another very succinct solution

I was also very impressed by this solution. But although it was very succinct, I didn't feel that it provided the same "aha" moment of Chamberlain's solution.

Generalizing the Josephus problem to arbitrary intervals

I found two algorithms for generalizing the Josephus problem to arbitrary intervals.

Jakobczyk's algorithm

One of these was in a paper entitled On the Generalized Josephus Problem by F Jakobczyk. Although interesting in its own right, I'm rather going to concentrate on the other algorithm I found, as it was far more elegant...

The Ahrens-Schubert algorithm

I found the first page of an article by I. M. Davids published by the Applied Probability Trust. This page gives a very elegant algorithm for the Josephus problem generalized to arbitrary intervals. But since only the first page (with the abstract) is provided, I don't have a proof for the algorithm.

The Ahrens-Schubert solution uses a number series known as an Ahrens array. These are also referred to in the paper as "rounded up" arrays. They are similar to geometric sequences, except that the answer is rounded up after every step. So each term is the previous term multiplied by a factor $f$ and then rounded up to the nearest integer.

Suppose $a_0$ is the initial integer term. Then: $$ \begin{align*} a_1 & = \lceil{a_0 . f}\rceil & \\ a_2 & = \lceil{a_1 . f}\rceil & = \lceil{\lceil{a_0 . f} \rceil . f} \rceil \\ & ... \\ a_m & = \lceil{a_{m-1} . f}\rceil &= \overbrace{\lceil{ \text{ ... } \lceil{\lceil{a_0 . f} \rceil . f} \rceil} \text{ ... } .f \rceil}^\text{m times} \\ \end{align*} $$

The Ahrens-Schubert solution works by finding the largest number (say $a_{m}$) in this series which is less than $kn+1$, where $k$ is the interval to skip and $n$ is the number of people in the circle. The answer is then $ kn + 1 - a_m$.

But what are the values for $a_0$ and $f$ in the rounded up series?

Well, one neat thing about the Ahrens-Schubert solution, is that by choosing different values for $a_0$ it can determine not just the last person left in the circle, but any arbitrary $e^\text{th}$ person removed from the circle. Here are the parameters: $$ \begin{align*} f & = \frac{k}{k-1} \\ a_0 & = k(n - e) + 1 \\ \therefore a_0 & = 1 & \text{when determining the last person left in the circle (set } e = n \text{)}\\ \end{align*} $$

The Ahrens-Schubert solution for $k = 2$

Something else I like about the Ahrens-Schubert solution is how cleanly the closed formula for $k = 2$ emerges from the algorithm. When $k = 2$: $$ \begin{align*} f & = \frac{2}{2 - 1} = 2 \\ a_0 & = k(n - e) + 1 = 2(n-e) + 1 \\ \text{ or: } a_0 & = 1 & \text{for the last person left in the circle}\\ \therefore a_m & = \overbrace{\lceil{ \text{ ... } \lceil{\lceil{a_0 . f} \rceil . f} \rceil} \text{ ... } .f \rceil}^\text{m times} \\ & = \overbrace{\lceil{ \text{ ... } \lceil{\lceil{1 . 2} \rceil . 2} \rceil} \text{ ... } .2 \rceil}^\text{m times} \\ & = 2^m &\text{since the ceiling falls away as all terms are integers} \end{align*} $$ So the answer is $2n + 1 - 2^m$ where $2^m$ is the largest value of $a_i = 2^i$ less than $2n + 1$. And this reduces to equation 1!

My implementation of the Ahrens-Schubert algorithm

Below is my implementation of the algorithm in F#:

let rec getLargestTermInRoundedUpGeometricSeriesBelow 
        (startValue:int) (factor:double) boundary =
    if startValue >= boundary then
        raise (
            System.ArgumentException(
                "The starting element in the series is not below the bound!"))
    else
        let nextValue = int (System.Math.Ceiling( (double startValue) * factor ))
        if nextValue >= boundary then
            startValue
        else
            getLargestTermInRoundedUpGeometricSeriesBelow nextValue factor boundary

let calcEliminationNumberInCircleWithIntervalByRoundedUpSeries 
        eliminationIndex interval sizeOfCircle =
    let dInterval = double interval
    let factor = dInterval / (dInterval - 1.0)
    let bound = interval * sizeOfCircle + 1
    let initialValue = interval * (sizeOfCircle - eliminationIndex) + 1
    bound - getLargestTermInRoundedUpGeometricSeriesBelow initialValue factor bound

let calcLastLeftInCircleWithIntervalByRoundedUpSeries interval sizeOfCircle =
    calcEliminationNumberInCircleWithIntervalByRoundedUpSeries sizeOfCircle interval sizeOfCircle

Performance of the Ahrens-Schubert algorithm

Below are timings for the algorithm:

These are the same inputs as I used in my own algorithm from post 8 of the series, shown below:

Comparing the timings, you can see that the Ahrens-Schubert algorithm is noticeably faster. It took 0.9 seconds to my algorithm's 1.2 seconds when k = 2, and 2.4 seconds versus 2.6 seconds when k = 100.

Other references

Code to solve the Josephus problem

Implementations in a variety of languages can be found on the Rosetta Code web site.

Conclusion

In this series of articles, I have tackled and successfully solved the Josephus problem. This includes finding an efficient algorithm when the problem is generalized so that an arbitrary number of people is skipped.

I have discovered that this is a well-known problem with an extensive history and a number of explanations and algorithms, including two that were extremely elegant. This makes it an ideal problem to tackle yourself, as there are extensive online resources to compare your approach against.

Using F# to implement various algorithms has been particularly rewarding. This has piqued my interested in functional programming. As a result, I have started dabbling in Haskell. I also recently completed Martin Odersky's Functional Programming Principles in Scala course on Coursera and the follow-up course on the Principles of Reactive Programming. These are thoroughly enjoyable courses, which I recommend highly.

But I'm glad to be finally wrapping up this series on the Josephus problem. It's definitely time to move onto something new!

The 2013 Entelect AI Challenge Play-offs

2013-09-22T15:13:00.000-04:00

Introduction

Last Saturday evening (the 14th of September), I attended the play-offs of the 2013 Entelect Artificial Intelligence Programming Challenge.

Last year the theme was based on the Tron light-cycles game, but on a sphere instead of a grid. This year the theme was inspired by the 1980's tank warfare game, Battle City.

Here's a whirlwind summary of the game rules...

There are 8 fixed boards ranging in size from about 61x61 to 81x81. Each player has 2 tanks. A play wins by either shooting or driving over the enemy base. Bullets move at twice the speed of tanks, and tanks can shoot each other and shoot through walls. Tanks occupy a 5x5 area. Bases and bullets occupy a single cell. When a wall is shot, the two walls on either side are also destroyed (thus allowing a tank to shoot a path through the walls). If tanks try to move into a wall or another tank, the tank will turn but not move. A tank can only have one bullet in play at a time. So the tank is effectively disarmed until its bullet hits a wall, another tank, a base or the edge of the board. Both players move simultaneously and there are 3 seconds between turns. The players communicate with the game engine via SOAP web service calls.

Last year there were over 100 competitors, but this year the challenge was definitely a few notches up in complexity. In the end there were only 22 contestants. So I was expecting the play-offs to be much quieter than last year. But fortunately it wasn't like that at all.

As with last year's competition the camaraderie was superb. There was a lot of lively chatter. And it was great to put faces to some of the people I had been interacting with on the unofficial google groups forum for the competition.

My path to the play-offs

My first bot

Although I worked hard on my bot, I had also been working crazy hours at work and I was still quite tired. But I was given a couple of days off work to compensate for the long hours I had been working. So I don't think the hours at work affected my productivity too much (I ended up writing over 15 000 lines of code in the 7 weeks of the competition). But I think I made more careless mistakes than I usually would.

Between my own bugs and quite a few bugs and race conditions in the official test harness, I lost most of the final week of the competition to bug fixing. I found myself with a number of really nice algorithms, but with no bot to make use of them!

The result is that I very nearly didn't submit a bot at all. My initial entry used a very basic heuristic of having one tank take the shortest path to attack the enemy base and the other tank attack the closest enemy tank. It had very basic bullet avoidance code which I added at the death (so to speak), and which hadn't been adequately tested.

At that stage I had given up on doing well. My main reason for entering was to attend the play-offs, meet up with some of the other contestants and enjoy the superb spirit of the event. I had very low expectations for my bot and I named it "Vrot Bot".

Vrot Bot

"Vrot" is the Afrikaans word for rotten. Like many Afrikaans words it has become a standard part of English slang in South Africa.

It is pronounced like the English word "fraught", but with a short vowel sound and a rolled "r". But if you're an English-speaking South African, you will mis-pronounce it slightly so that "vrot" rhymes with "bot". So it will sound like "frot bot".

The entry date gets extended

Due to the small number of competitors and some technical issues with the test harness, the competition ended up being extended twice.

This gave me an extra week to put a much more sophisticated bot together. However it was only in the last 24 hours of the competition that my new bot ended up beating the original vrot bot convincingly. And a lot of the new code had not gone through nearly enough testing, so it was bound to be buggy.

On that basis I decided to keep the name Vrot Bot.

Testing, testing, testing...

Last year I came fourth in the competition. My secret sauce was building a WPF user interface that allowed me to play against my bot, save and load games, rewind games to arbitrary points in their history, visualize a variety of parameters at each cell of the board and drill down into the search tree of moves (I used the NegaMax algorithm to choose moves). I ended up with a very robust, well-tested bot.

Although I had planned to do something similar this year, the competition had a much shorter duration than last year and I never quite got there. At one point I had a choice between writing my own test harness and using the official test harness provided by the organisers.

I chose to use the official test harness, because I was concerned about the risk of integrating a C# WCF web service client with the SOAP web service exposed by the Java test harness. So I wanted to test that integration as thoroughly as possible, and my own test harness wouldn't have provided that capability.

I ended up using a variety of simpler methods for testing my bot's logic. However this made the test cycle much slower, and I paid the price in having a much buggier bot. There were many parts of my code which were very poorly tested because I simply didn't have a quick and easy way of forcing the game into the scenario I wanted to test.

If there was only one thing I could change about my approach to this year's competition, it would be to write my own test harness (as well as using the official test harness for testing integrations). That would have given me the ability to load a previously played game at any point in its history, instead of having to wait minutes for the harness to reach the turn where I knew the bug to be.

Performance tuning...

Last year my bot's main weakness was its slow performance. The first and third-placed players both used C++ for their bots, so having good performance gave a distinct competitive advantage. So this year I put a lot more effort into tweaking the performance of my bot.

There's a very true saying that premature optimization is the root of all evil. I was well aware of this. However I also felt that to get really good performance the data structures would need to be designed with performance in mind. And that's something that's not easy to change later if you get it wrong.

With hindsight I shouldn't have spent as much time worrying about performance. The search space for this year's challenge was massive. That meant that a brute force approach was far less valuable than last year. My time would have been better spent improving my testing capability.

The play-offs

The 22 contestants were grouped into 4 round robin pools with 5 to 6 players per pool.

Many of the matches were decided on silly mistakes, so I think many of the other contestants had also been struggling with the same issues: the shorter competition timeframe, a much more complex problem space and difficulties with adequately testing their bots.

My bot performed much better than I was expecting. I topped my pool quite easily, winning 4 of my 5 pool matches.

Strangely enough, my bot seemed to be a lot more decisive than I was expecting. But I didn't think too much of it at the time.

My one loss was on a smaller, fairly open board. Bullet-dodging was something I had tested quite thoroughly. Yet my tanks ran straight into the enemy tanks' bullets instead of dodging them. That was also quite surprising.

My bot's algorithm is based on running through a number of scenarios and, for each applicable scenario, calculating a value for each of the 6 possible tank actions for each tank. At the time I suspected that a number of scenarios had combined to create a cumulative value for moving forward that was greater than the value of either dodging or shooting approaching bullets.

It was only a few days later that I discovered the real reason for the lack of bullet-dodging behaviour... but I'll get to that later.

The elimination round

I was expecting the top 2 players in each pool to go through to the finals at the rAge computer gaming expo on 5 October. However there was a twist in proceedings...

The players were seeded based on how many games they had won in the pool stage. The top 16 seeds went into an elimination round. The elimination round worked as follows: seed 1 played seed 16, seed 2 played seed 15, and so on. The winners of those 8 matches would go through to the finals at rAge.

Although I was one of the top 5 seeds, I had a sinking feeling at this point as a single loss could knock one out the tournament. Imagine my feeling when I saw the board for my knock-out game... It was the same board as my one loss in the pool rounds! And sadly the result was the same... my tanks raced towards the enemy tanks, ate a bullet each and I was knocked out the competition!

I was disappointed and a little irritated that everything had hinged on a single game. At least last year the format was a double elimination, so a single bad game couldn't knock a good bot out of the competition.

On the other hand, my expectations had been low to begin with. So the end result wasn't different from what I had been expecting. And I had the added satisfaction of knowing that my bot had generally been much stronger than expected. So all I could do was be philosophical about the loss...

The debacle of the official test harness

Bots that froze

Something surprising and quite sad happened to some contestants. On start-up their bots froze and did nothing for the rest of the game.

Last year's winner, Jaco Cronje, had this problem on a number of boards. But he won easily when his bot played. Fortunately he won enough games to be one of the lower seeds of the top 16, and during the knockout round he got the board that his bot did not freeze on. He won that game easily and made it through to the finals.

Another competitor, Bernhard Häussermann, had a similar issue which prevented his tank from moving in any of his games. This is very sad when you consider how much effort people put into their entries for the competition.

But was it really his own error that led to the frozen tanks?

The mysterious case of the frozen tanks

In the week after the play-offs Jaco and Bernhard went on a mission to identify the reason for their tanks freezing.

During the final weekend before submission, Jaco had set his bot running through the night to check for weird issues. It had run fine. The competition organisers had said they would use the same test harness to run the competition as they had provided for the players to test against. So how was it possible for Jaco to get a different result during the play-offs?

The details of their digging are on this page of the google group for the competition. Here's the short version...

A number of players had noticed that the id's of the 4 tanks were always 0, 1, 2 and 3. One player would get tanks 0 and 3. The other player would get tank id's 1 and 2. Since tank movement was resolved in the order of tank id's, this ordering made it a little fairer to choose who got precedence when two tanks tried to move into the same space on the same turn.

But in the play-offs the test harness started assigning very large tank id's. This caused out of range exceptions for some of the bots when they tried to save the tank information into a much smaller array.

Perhaps the test harness was modified to run multiple games without requiring a restart, instead of the correct approach of building a "tournament runner application" which would re-start the unmodified test harness between games.

So why didn't my tanks dodge bullets?

On Wednesday I had a hunch about why my bullets didn't dodge bullets. And why they seemed to behave differently to what I'd been expecting.

So on Wednesday evening, after getting home from a Data Science meetup group, I put my hunch to the test.

I simulated a tank id outside the range of 0 to 3. As expected, this caused my bot to also throw an out-of-range exception in the part of my code which configures tank movement sequences.

The reason my bot didn't freeze is that I had put in a fail-safe mechanism to catch any errors made by my bot.

If an error occurred, I would see if my bot had made a move yet. If it hadn't, I would run a very simple algorithm so that it would at least make a move rather than just freezing. My closest tank would attack the enemy base, and my other tank would attack the closest enemy tank. There was no bullet dodging logic, as I didn't want to risk running any extra code that could have been the cause of the original error.

So this explains why my bot had failed to dodge bullets. And also why it had acted so single-mindedly.

I had worked so hard on my scenario-driven bot in the final (extended) week of the competition. And it probably didn't even get used due to this bug. How sad. Since I still won my pool with this very basic bot, imagine what I could have done with my much more sophisticated bot!

[Bernhard was in the same pool as me. So without the tank ID bug I might not have won the pool. But at least my best bot would have been on display.]

To be or not to be bitter

So at this point I have two choices...

Jaco and Bernhard have decompiled the jar file for the official test harness, and there should be no way that the test harness can assign a tank id out of the range 0 to 3. So the organizers must have changed the test harness after the final submission date, in which case it nullifies its purpose as a test harness. I could choose to be bitter about this.

My second choice is to suck it up and learn from the mistakes that I made. And I can only do that if I don't take the easy way out by blaming someone else for what went wrong (tempting though that is).

I used a variety of defensive coding practices, including my fail-safe algorithm. But I failed to code defensively against the possibility of the tank id's being outside the range of 0 to 3. I have a vague memory of feeling very uncomfortable at making this assumption. But I made the poor judgement call of ignoring my intuition and going with a cheap solution rather than the right solution (or even just a cheap solution with more defensive coding). That mistake is my mistake, and it's something I can learn from.

Keeping the goal in mind

The big picture with entering competitions like this, is that you are not doing it for the prize money or the fame (though those are useful motivators to help give you a focus for your efforts). Doing so would be a form of moonlighting and is arguably unethical.

To some extent you are doing it for the fun and the camaraderie - the opportunity to interact with smart developers who enjoy challenging themselves with very tricky problems. But I get similar enjoyment from playing German-style board games with my colleagues and friends - and with far less investment of time.

However the most important reason is for the learning experience. You are exposing yourself to a challenging situation which will stretch you to your limits, and in which you are bound to make some mistakes (as well as learn new coding techniques). The mistakes you make and the lessons you learn are part of your growth as a programmer.

With this perspective I would rather learn from my mistakes than look for someone to blame.

The alternative is to become bitter at the injustice of the situation. But life is full of injustice, and in many cases (such as this one) it is not caused by malevolence, but simply by human fallibility. As software developers we know all too well how seemingly innocuous changes can have unexpectedly large side-effects.

I'm not advocating that we should lower the standards we set for ourselves and others. But I think we should balance that against having the maturity to forgive ourselves and others when we make unfortunate mistakes despite our best intentions.

The most valuable lesson I learnt

The most valuable lesson learnt was not about coding more defensively. Or about making testability a primary consideration. Or to avoid premature optimization. Or even to listen to my gut feeling when I'm feeling uncomfortable about code that I'm writing (valuable though that is).

Those are all things I already knew. As the saying goes, experience is that thing that allows you to recognise a mistake when you make it again!

The most valuable lesson was a new insight: beware of the second system effect!

Last year I placed fourth in the challenge with a fairly standard NegaMax search tree approach. I was conservative in my approach, did the basics well, and got a great result! But I had some innovative ideas which I never got as far as implementing...

This year I wanted to do better than last year. And I wanted to push the envelope more. And as a result I fell foul of the second system effect, and did worse than last year.

And this is not the only recent instance of this happening...

In 2011 I did really well in the UDT Super 15 Rugby Fantasy League with a simple statistical model (built in Excel) and a linear programming model for selecting my Fantasy League team each round. And I placed in the 99.4th percentile (183rd out of around 30 000 entrants)!

In 2012 I wanted to do even better, so I used the R programming language to build a much more complex statistical model. I put a lot more effort into it. Yet I only placed in the 90th percentile. The second systems effect again!

In both cases I learnt far more from my second system than from the more conservative (but more successful) first system. So it's not all bad, since learning and self-development is the primary goal. But in future I'd like to find a better balance between getting a good result and "expressing my creativity".

Wrap-up

As one of the eight finalists in the 2012 competition, I ended up being interviewed after the play-offs last year. That interview can be found on youtube.

Although I didn't make it to the final eight this year, I still found myself being roped into an interview. I'll update this page with the link once that interview has been uploaded to youtube.

For anyone who's interested, I've also uploaded the source code for my bot entry on GitHub.

There's more detail on the approach that I and the other contestants took at this page on the google groups forum.

What's next?

At some point I'd like to post a blog entry on my shortest path algorithm, as I came up with a neat trick to side-step the various complexities of doing a Dijkstra algorithm with binary heaps, pairing heaps, r-divisions on planar graphs, and so forth.

But first I'd like to wrap up my series of posts on an interesting problem in recreational Mathematics known as the Josephus problem.

The past two months...

2013-09-13T18:18:00.000-04:00

A break from blogging

Over the past 2 months I've taken a break from blogging.

During that time I've spent a week in the Kruger park, worked crazy hours on a software architecture investigation for a client, and worked equally crazy hours in my spare time on the 2013 Entelect Artificial Intelligence competition.

This blog post is a recap of those 3 activities.

The Kruger Park

I took my family camping at Punda Maria campsite in the Northern section of the Kruger Park from 14 to 19 July.

Back in 2008 we had one or our best Kruger trips ever in that part of the park, seeing no less than 4 leopard, including an excellent sighting where the leopard was metres from our car. So we had extolled the virtues of Northern Kruger to our friends John and Sarah, who were accompanying us with their 3 children.

A leopard sighting from our trip to Northern Kruger in October 2008

This time was very quiet, however, as the floods earlier in the year had allowed the animals to disperse much further than they normally would in the dry month of July. We narrowly missed seeing a leopard near the campsite and we only heard the lions at night.

In my opinion, Northern Kruger is the most beautiful part of the Kruger park, particularly the area near the Pafuri picnic site and along the Nyala drive. So although we were disappointed at not seeing any of the big cats, we still enjoyed our visit immensely.

Northern Kruger has a reputation for being a birders' paradise, and we certainly saw our fair share of beautiful birds. I even managed to capture a Lilac-Breasted Roller in flight, something I had tried many times before without success.

A lilac-breasted roller in flight

A korhaan sighting on the way back from Pafuri

A hornbill that scavenges left-overs at the Babalala picnic area

An architectural investigation

On returning from Kruger, I jumped straight into a 4 week architectural investigation for a client. The client was concerned about their dependence on a 3rd party integration hub whose host was experiencing financial uncertainty. My role was to assist the client in understanding their IT ecosystem, assess their and their business partners' dependencies on the 3rd party vendor, and provide a roadmap (with estimated costs) for mitigating the risk.

This is very different from the work that I do on a day to day basis. Most of the time I am one of the software designers/architects on a team of 25 to 30 developers, testers, business analysts, architects and project managers. I spend my day doing UML diagrams for new features, assisting developers when requested, estimating task sizes, performing code reviews, troubleshooting performance issues, and so forth. In other words, I am very much a cog in a bigger machine.

With the architectural investigation I was on my own. I had complete autonomy to carry out the investigation as I saw fit. I love doing investigative work. I love work that has strategic impact. I love feeling my way towards a solution. So I revelled in this independence.

Although exciting, it was very intense work too. On Friday 16 August I gave a presentation of my findings to the technical stakeholders. I worked right through the night on both the Tuesday and Thursday night beforehand, clocking over 25 hours of work on Tuesday and Wednesday, and over 27 hours on Thursday and Friday.

I've done all-nighters before in my career. But never two in one week. It's definitely not something I would recommend...

I was very happy with the outcome of the work though, and the presentation seemed to go very well, despite getting off to a slow start due to my lack of sleep and the adrenalin of putting the finishing touches to the presentation only minutes before the meeting started!

The final presentation to the executive committee took place the following Tuesday. The previous presentation had included a lot more technical detail, and the focus had been on presenting a wide variety of options. The final presentation was much more condensed, and focused on 2 primary recommendations. The presentation went extremely well, and the feedback was very positive - both from the customer's CFO and from a number of my superiors at Dariel.

A few days later I was asked whether I would like to do similar investigations in future. I replied that this was similar to asking someone who has just completed the Comrades Ultra-marathon whether they would like to do it again next year!

The 2013 Entelect AI Challenge

Last year I participated in the inaugural Entelect R 100,000 Artificial Intelligence Challenge. I was delighted to place 4th out of 101 contestants.

I was really hoping to better that this year. However the 2013 challenge took place over the same period as the trip to Kruger and the architectural investigation. Additionally, last year's contestants had from mid-July until 24 September to submit their entries. This year the closing date was 2nd September. So there was simply less time to recover from the long hours on the architectural investigation.

This year's competition was to program two tanks to take on two other tanks on one of 8 boards, up to 81x81 squares in size, in a simultaneous movement tank battle based loosely on the 1980's arcade game Battle City, and using SOAP web services to communicate to the server. This was a massive jump in complexity from last year's competition, which was a Tron-like turn-based game with one unit per player, played on a 30x30 sphere with communication via reading and writing a text file.

If I was a wiser man, I would have thrown in the towel before even starting. Fortunately I'm not!

I started my career as an Operations Research consultant at the CSIR and I still retain a strong passion for decision optimization and decision automation. My outlet for that passion is entering competitions like the Entelect AI Challenge.

It's also a great way to keep my coding skills current, as my role as a software designer means that I don't get to write as much production code as I would like.

So, against my better judgement, and despite all obstacles, I have participated in the Entelect programming challenge again this year! It's been very tiring and the stress levels have been immense. I have barely lurched over the finish line, assisted in no small measure by the final entry date being extended by a week from 2nd September to Monday 9th September.

But at least I have managed to complete an entry. And one that I'm reasonably proud of, even though I don't think it will be good enough to get me to the final 8 like last year.

The camaraderie last year was amazing, and the way the contestants have assisted and encouraged each other in this year's competition has been no less amazing. You would never guess we were competing with each other for a prize of R 100,000 (roughly $10,000). That's a lot of money and the prize for second place is tiny by comparison. But you'd never guess it by the way contestants have generously shared advice with each other, and encouraged each other to keep going when it gets tough. And believe me, this has been a very tough challenge this year (as shown by the far smaller number of people who got as far as submitting an entry).

The play-offs for the competition take place tomorrow evening at the Fire and Ice Protea Hotel in Melrose Arch, Johannesburg. I will be there to share in that spirit of camaraderie again.

Although my expectations for my bot are low, my hopes are high. I'm hoping that my bot gives it horns...

... and that some remaining bug doesn't sabotage my efforts and leave me meekly hiding in the shadows!

What comes next?

I'm still busy with a couple of blog postings for the problem of calculating the last person left in a circle if every second person is asked to leave. I hope to post those shortly and close off the series.

I worked out a few very nice algorithms for the Battle City AI challenge. So I'd like to write a series of articles on those as well. I'd also like to analyse some of the mistakes I made in the competition, and what I would do differently if I could start over.

And once my bot is knocked out the competition I'm planning on posting the code for my entry to GitHub.

I suspect this will be of interest to the other contestants in the competition (we have already been doing post-mortems of our strategies on the unofficial google groups forum for the competition). But it may also have a few nice ideas that will be useful to people competing in other AI competitions in future.

So please keep a lookout for those!

More (an efficient circle elimination algorithm for arbitrary intervals)

2013-06-09T16:14:00.001-04:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

In my previous post I generated a recursive equation for calculating the last person left in a circle if every $k^{\text{th}}$ person is removed from the circle.

I also wrote an F# program which highlighted a weakness of this algorithm. For circles with more than 60 000 people, the F# algorithm would experience a stack overflow.

In this post I am going to develop a more efficient algorithm to address this flaw.

Warning: Lots of algebra ahead!

The initial formatting of the Mathematical equations can be slow. If so, please be patient.

The recursive equation for arbitrary intervals

To recap, here is the main equation from the previous blog post: $$ f_k(n) = [f_k(n-1) + k - 1 ] \bmod n + 1 \tag{4} $$
[This is very much a continuation of the previous post, so I have gone with the same equation numbering].

Arranging the function outputs in rows

Recall the calculation graph from post 6 in the series:

Notice how $f(n)$ "clocks over" for the first value in each row.

Let's arrange the values of $f_k(n)$ into rows in a similar way. To do this I am going to consider the values of $n$ at which $f_k(n)$ clocks over. Every time this happens, a new row will be started.

Let $r$ be the index of a row in the graph.
Let $n_r$ be the first value of $n$ in row $r$.

Now consider the previous value of $n$: $n_r - 1$. Since the modulo operator clocks over for $n_r$, we can take the argument to this operator in equation 4, and we must have:
$$ \begin{align*} f_k(n_r - 1) + k - 1 &\ge n_r \\ \therefore f_k(n_r - 1) &\ge n_r - (k - 1)\\ \end{align*} $$
However, we also know - by the definition of $f_k(n)$ - that:
$$ f_k(n_r - 1) <= n_r - 1 $$
So it must be that:
$$ \begin{align*} f_k(n_r - 1) &= n_r - j & \text{for some } j \in \mathbb{N} \text{ such that } 1 \le j \le k-1 \tag{5} \end{align*} $$
From our definition of $n_r$ we know that $f_k(n)$ doesn't clock over for any $n \in \left\{n_{r-1}+1, n_{r-1} + 2, ..., n_r - 1\right\}$.

And because it doesn't clock over, we can remove the mod operator in equation 4, which simplifies to: $$ \begin{align*} f_k(n) &= [f_k(n-1) + k - 1 ] \bmod n + 1 & \text{from equation 4}\\ &= [f_k(n-1) + k - 1 ] + 1 \\ &= f_k(n-1) + k & \forall n \in \left\{n_{r-1}+1, n_{r-1} + 2, ..., n_r - 1\right\} \\ \\ \therefore f_k(n_{r-1} + i) &= f_k(n_{r-1}) + ik &\forall i \in \left\{0, 1, \ldots, n_r - n_{r-1}-1 \right\} \tag{6} \\ \\ \text{In particular:} \\ f_k(n_r - 1) &= f_k(n_{r-1}) + k.(n_r - n_{r-1} - 1) \tag{7} \\ \end{align*} $$
Equations 5 and 7 give us two different formulae for $f_k(n_r-1)$, so we can combine them: $$ \begin{align*} n_r - j &= f_k(n_{r-1}) + k.(n_r - n_{r-1} - 1)\\ \Leftrightarrow (k-1).n_r &= k.n_{r-1} + k - j - f_k(n_{r-1}) \tag{8} \end{align*} $$
First we are going to use equation 8 to determine the value of j.

We can re-arrange equation 8 as follows: $$ \begin{align*} j - 1 & = k.n_{r - 1} + k - 1 - f_k(n_{r-1}) - (k-1).n_r \\ & = [(k-1)+1].n_{r - 1} + (k-1) - f_k(n_{r-1}) - (k-1).n_r \\ & = n_{r-1} - f_k(n_{r-1}) + (k-1).[n_{r - 1} + 1 - n_r] \tag{9} \end{align*} $$
Why $j-1$ on the left side of the equation?

Recall from equation 5 that $1 \leq j \leq k-1 $. So $0 \leq j - 1 \leq k-2 < k-1$.

And hence $j - 1 = (j - 1) \bmod (k - 1)$. This allows us to simplify the right hand side of equation 9 by taking the "$\bmod (k-1)$" of both sides.

So: $$ \begin{align*} j - 1 & = [j - 1] \bmod (k - 1) \\ & = [n_{r-1} - f_k(n_{r-1}) + (k-1).[n_{r - 1} + 1 - n_r]] \bmod (k - 1) & \text{from equation 9} \\ & = [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) & \text{since multiples of }k - 1\text{ can be removed} \\ \\ \Rightarrow j & = [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) + 1 \tag{10} \end{align*} $$

Now that we have a formula for j, we can determine both $n_r$ (using equation 8) and $f_k(n_r)$ (using equations 4 and 5).

Let's determine $n_r$ first. Substituting equation 10 into equation 8 gives:

$$ \begin{align*} (k-1).n_r &= k.n_{r-1} + k - j - f_k(n_{r-1}) &\text{from equation 8} \\ &= k.n_{r-1} + k - ([n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) + 1) - f_k(n_{r-1}) &\text{from equation 10} \\ &= [(k-1) + 1].n_{r-1} + (k-1) - [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) - f_k(n_{r-1}) \\ &= (k-1).n_{r-1} + n_{r-1} + (k-1) - [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) - f_k(n_{r-1}) \\ &= (k-1)[n_{r-1} + 1] + n_{r-1} - f_k(n_{r-1}) - [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) \\ \Rightarrow n_r &= n_{r-1} + 1 + \frac{n_{r-1} - f_k(n_{r-1}) - [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) }{k-1} \\ \end{align*} $$ This rather complicated quotient can be simplified quite significantly by observing that: $$ \frac{m - m \bmod d}{d} = \lfloor \frac{m}{d} \rfloor $$
This follows by expressing $m$ as: $$ \begin{align*} m &= qd+r & \text{where }0 \leq r < d &\text{i.e. }r = m \bmod d \\ \Rightarrow q & = \frac{m - r}{d} \\ & = \frac{m - m \bmod d}{d} \\ \text{But: } q = \lfloor \frac{m}{d} \rfloor &\text{and the result follows} \end{align*} $$

So substituting $n_{r-1} - f_k(n_{r-1})$ for $m$ and $k-1$ for $d$ we obtain the following simplification:

$$ n_r = n_{r-1} + 1 + \lfloor \frac{n_{r-1} - f_k(n_{r-1}) }{ k - 1 } \rfloor \tag{11} $$

Now let's determine the value of $f_k(n_r)$:

$$ \begin{align*} f_k(n_r) &= [f_k(n_r - 1) + k - 1 ] \bmod n_r + 1 & \text{from equation 4} \\ &= [(n_r - j) + k - 1 ] \bmod n_r + 1 & \text{from equation 5} \\ &= [k - 1 - j] \bmod n_r + 1 \\ &= [k - 1 - ([n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) + 1)] \bmod n_r + 1 & \text{from equation 10} \\ &= [k - 2 - [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1)] \bmod n_r + 1 \end{align*} $$
This is already sufficient. However we can simplify things further. When $n_r \ge k-1$, the $ \text{mod }n_r $ can be dropped. This gives the following set of formulae:

$$ \begin{align*} f_k(n_{1}) & = f(1) = 1 \tag{12} \\ f_k(n_r) & = k - 1 - [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) & \text{ where } n_r \geq k - 1 \tag{13} \\ f_k(n_r) & = [k - 2 - [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1)] \bmod n_r + 1 & \forall n_r \text{ where } r > 1 \tag{14} \\ \end{align*} $$

So equations 11 through to 14 provide formulae for:

$n_r$ in terms of $n_{r-1}$ and $f_k(n_{r-1})$
$f_k(n_r)$ in terms of $n_r$, $n_{r-1}$ and $f_k(n_{r-1})$
$n_1$ and $f_k(n_1)$

With these pieces in place we are ready to create a more efficient algorithm for calculating $f_k(n)$ for arbitrary n.

The row-based algorithm

Given a value $n$ for which we wish to calculate $f_k(n)$, we can't work backwards down to $f_k(1)$ as we have done in the past. The problem is that we don't know which row $r$ it falls within, or what $n_r$ and $f(n_r)$ are.

Instead we are going to have to work our way forwards to find the row that $n$ belongs in. We will start at $n_{1} = 1, f_k(1) = 1$. From this we will derive $n_{2}$ and $f(n_{2})$. We will continue generating this sequence of values for $n_r$ and $n_{r+1}$ until we find a value of $r$ for which $n_r \le n < n_{r+1}$. When we do, we can use equation 6 to determine $f_k(n)$: $$ f_k(n) = f_k(n_r) + k.(n-n_r) \tag{15} $$
While working our way through the rows we will keep track of the values for the next row: $n_{r+1}$ and $f_k(n_{r+1})$. This will allow us to easily calculate $n_{r+2}$ and $f_k(n_{r + 2})$.

Below is an F# function to calculate these values for the next row:

let getNextRowStartSizeAndLabel thisRowStartSize thisRowStartLabel interval =
    let floor 
        = int( System.Math.Floor( double(thisRowStartSize - thisRowStartLabel) 
               / double(interval - 1) ) 
             )
    let nextRowStartSize = thisRowStartSize + 1 + floor
    let nextRowStartLabel = 
        if nextRowStartSize < interval - 1 then
            ( interval - 2 
              - (thisRowStartSize - thisRowStartLabel) % (interval - 1) 
            ) % nextRowStartSize + 1
        else
            interval - 1 - (thisRowStartSize - thisRowStartLabel) % (interval - 1)
    (nextRowStartSize, nextRowStartLabel)

The variables map to the formula as follows:

thisRowStartSize $\mapsto n_r$
thisRowStartLabel $\mapsto f_k(n_r)$
interval $\mapsto k$
nextRowStartSize $\mapsto n_{r+1}$
nextRowStartLabel $\mapsto f_k(n_{r+1})$

The following F# functions use the function above to calculate $f_k(n)$:

let rec getLabelOfLastLeftInCircleWithIntervalByRow 
        thisRowStartSize thisRowStartLabel 
        nextRowStartSize nextRowStartLabel 
        interval sizeOfCircle =
    if (sizeOfCircle >= thisRowStartSize) && (sizeOfCircle < nextRowStartSize) then
        thisRowStartLabel + interval * (sizeOfCircle - thisRowStartSize)
    else
        let (newNextRowStartSize, newNextRowStartLabel) 
            = getNextRowStartSizeAndLabel nextRowStartSize nextRowStartLabel interval
        getLabelOfLastLeftInCircleWithIntervalByRow 
            nextRowStartSize nextRowStartLabel 
            newNextRowStartSize newNextRowStartLabel 
            interval sizeOfCircle

let calcLastLeftInCircleWithIntervalByRow interval sizeOfCircle =
    if sizeOfCircle < 1 then
        raise (System.ArgumentException("The size of the circle must be positive!"))
    if interval < 2 then
        raise (System.ArgumentException("The interval must be 2 or more!"))
    let (nextRowStartSize, nextRowStartLabel) = getNextRowStartSizeAndLabel 1 1 interval
    getLabelOfLastLeftInCircleWithIntervalByRow 
        1 1 nextRowStartSize nextRowStartLabel interval sizeOfCircle

The efficiency of the row-based algorithm

Look back at equation 11:
$$ n_r = n_{r-1} + 1 + \lfloor \frac{n_{r-1} - f_k(n_{r-1})}{k-1} \rfloor $$

Consider what happens as $n_{r-1}$ starts becoming large compared to $k$. $f_k(n_{r-1}) \le k$ (since a "clock-over" occurs at $n_{r-1}$), so this term will become very small. So as $n_{r-1}$ becomes large:
$$ n_r \approx (1 + \frac{1}{k-1}).n_{r-1} $$

So the $\left\{n_r\right\}$ series is approximately a geometric progression. Since its values are going to grow exponentially, the algorithm will rapidly find the row containing n, even for very large values of n. The size of the stack is going to scale as O(log n), so we are far less likely to reach a value of n which will cause stack overflows.

If we run the latest F# algorithm it is indeed blindingly fast:

It was able to handle some very large 32 bit numbers almost instantaneously.

I would expect it to fail when the next higher $n_r$ value is larger than the maximum 32 bit integer. It might be possible to work around this by inserting a "try catch" block to detect and recover from this situation, since we can still calculate the answer using the previous value of $n_r$.

I thought that another failure condition could be with very large values of $k$, since the exponential growth rate would be very small when $k$ is large. So I tested the algorithm with $k$ and $n$ both equal to a billion (one thousand million). The calculation took slightly under 32 seconds on my i5 laptop.

Displaying the rows in the row-based algorithm

Ideally we would like a closed form solution for arbitrary k. I tried looking for patterns in the rows for various values of k, but I couldn't see a consistent pattern.

For anyone who would like to try, or just if you're curious to see the row structure in action, here is an F# script to allow you to see all $(n, f_k(n))$ pairs arranged in rows. It depends on the function getNextRowStartSizeAndLabel defined earlier, so make sure that function has been defined in your F# interactive session:

let rec showLabelsOfLastLeftInCircleWithIntervalByRow 
        showFullRow thisRowStartSize thisRowStartLabel 
        nextRowStartSize nextRowStartLabel interval 
        sizeOfCircle maxSizeOfCircle =
    if (sizeOfCircle <= maxSizeOfCircle) then
        let result = thisRowStartLabel + interval * (sizeOfCircle - thisRowStartSize)
        if sizeOfCircle = thisRowStartSize then
            System.Console.WriteLine()
            System.Console.Write( "f({0})={1}; ", sizeOfCircle, result)
        elif showFullRow then
            System.Console.Write( "f({0})={1}; ", sizeOfCircle, result)
        
        if sizeOfCircle < nextRowStartSize - 1 then
            showLabelsOfLastLeftInCircleWithIntervalByRow 
                showFullRow thisRowStartSize thisRowStartLabel 
                nextRowStartSize nextRowStartLabel
                interval (sizeOfCircle+1) maxSizeOfCircle 
        else
            let (newNextRowStartSize, newNextRowStartLabel) =
                getNextRowStartSizeAndLabel nextRowStartSize nextRowStartLabel interval
            showLabelsOfLastLeftInCircleWithIntervalByRow 
                showFullRow nextRowStartSize nextRowStartLabel 
                newNextRowStartSize newNextRowStartLabel 
                interval (sizeOfCircle+1) maxSizeOfCircle

let showLastLeftInCircleWithIntervalByRow showFullRow interval maxSizeOfCircle =
    if maxSizeOfCircle < 1 then
        raise 
            ( System.ArgumentException(
                "The maximum size of the circle must be 1 or more!")
            )
    if interval < 2 then
        raise 
            ( System.ArgumentException(
                "The interval must be 2 or more!")
            )
    let (nextRowStartSize, nextRowStartLabel) = 
        getNextRowStartSizeAndLabel 1 1 interval
    showLabelsOfLastLeftInCircleWithIntervalByRow 
        showFullRow 1 1 
        nextRowStartSize nextRowStartLabel interval 
        1 maxSizeOfCircle
    System.Console.WriteLine()

When the showFullRow parameter to the function is false, then only the first pair in each row will be shown. I find this is a lot more useful than seeing the intermediate values as well, which soon leads to each console line overflowing.

Validating the various algorithms against one other

The following F# script can be used to check that the 3 algorithms produce the same answers:

type circleCalculationWithInterval = {
    SizeOfCircle: int;
    Interval: int;
    LastLeftInCircleByBruteForce: int;
    LastLeftInCircleByRecursiveFormula: int;
    LastLeftInCircleByRow: int
}

let getCircleCalculationWithInterval includeBruteForce interval sizeOfCircle =
    // Brute force is slow to calculate. Provide the option to omit it...
    let lastLeftInCircleByBruteForce = 
        match includeBruteForce with 
        | true -> calcLastLeftInCircleWithIntervalByBruteForce interval sizeOfCircle
        | false -> 0
    let lastLeftInCircleByRecursiveFormula = 
        calcLastLeftInCircleWithIntervalByRecursiveFormula interval sizeOfCircle
    let lastLeftInCircleByRow = 
        calcLastLeftInCircleWithIntervalByRow interval sizeOfCircle
    let circleCalc = {
        SizeOfCircle = sizeOfCircle;
        Interval = interval;
        LastLeftInCircleByBruteForce = lastLeftInCircleByBruteForce; 
        LastLeftInCircleByRecursiveFormula = lastLeftInCircleByRecursiveFormula;
        LastLeftInCircleByRow = lastLeftInCircleByRow
    }
    circleCalc

You can then check the results using snippets similar to the following:

// With brute force:
[1..1000] |> List.map (getCircleCalculationWithInterval true 2) |> List.filter (
    fun cc -> cc.LastLeftInCircleByRow <> cc.LastLeftInCircleByBruteForce
           || cc.LastLeftInCircleByRow <> cc.LastLeftInCircleByRecursiveFormula
    );;

[1..100] |> List.map (getCircleCalculationWithInterval true 100) |> List.filter (
    fun cc -> cc.LastLeftInCircleByRow <> cc.LastLeftInCircleByBruteForce
           || cc.LastLeftInCircleByRow <> cc.LastLeftInCircleByRecursiveFormula
    );;

// Without brute force:
[1..10000] |> List.map (getCircleCalculationWithInterval false 2) |> List.filter (
    fun cc -> cc.LastLeftInCircleByRow <> cc.LastLeftInCircleByRecursiveFormula
    );;

[1..10000] |> List.map (getCircleCalculationWithInterval false 100) |> List.filter (
    fun cc -> cc.LastLeftInCircleByRow <> cc.LastLeftInCircleByRecursiveFormula
    );;

A new proof for the formula when removing every second person

We can use the general formulae to provide a new derivation of the closed form solution when every second person is removed from the circle.

The proof ends up being remarkably simple, because $k - 1 = 1$. Terms such as $(...) \bmod (k-1)$ simply fall away, since any integer mod 1 is zero. Also some terms involve a division by $k-1$, and these also simplify very nicely.

But these simplifications only work for $k = 2$. This reduces my confidence of finding a closed form solution for the general case.

Anyway, here's the proof:
$$ \begin{align*} f_2(n_r) & = k - 1 - [n_{r-1} - f_k(n_{r-1})] \bmod (k - 1) & \text{ from equation 13, since } n_r \geq k - 1 = 1 \\ & = 2 - 1 - [n_{r-1} - f_2(n_{r-1})] \bmod 1 & \text{(but any number mod 1 is 0)} \\ & = 1 & \text{ for }n_r > 1 \\ \\ \text{but: } f_2(n_{1}) & = f(1) = 1 & \text{from equation 12} \\ \\ \therefore f_2(n_r) & = 1 & \forall n_r \tag{16} \\ \\ n_r & = n_{r-1} + 1 + \lfloor \frac{n_{r-1} - f_k(n_{r-1}) }{ k - 1 } \rfloor & \text{ from equation 11} \\ & = n_{r-1} + 1 + \lfloor n_{r-1} - f_k(n_{r-1}) \rfloor & \\ & = n_{r-1} + 1 + n_{r-1} - f_k(n_{r-1}) & \text{(since floor has an integer argument)} \\ & = n_{r-1} + 1 + n_{r-1} - 1 & \text{since }f_k(n_{r-1}) = 1 \text{ by equation 16 } \\ & = 2.n_{r-1} & \text{a geometric progression} \tag{17} \\ \\ \therefore n_r & = 2^{r-1} & \text{by equations 16 and 17} \tag{18} \\ \\ f_2(n) & = f_2(n_r) + 2.(n-n_r) & \text{from equation 15, where: } n_r \le n < n_{r+1} \\ & = 1 + 2(n - 2^{r-1}) & \text{ where: } 2^{r-1} \le n \le 2^r \text{(by equation 18)}\\ & = 1 + 2n - 2.2^{r-1} & \text{where: } 2^{r-1} \le 2^{log_2 n} < 2^r \\ & = 1 + 2n - 2.2^{\lfloor log_2 n \rfloor} & \text{since: } 2^{r-1} = 2^{\lfloor log_2 n \rfloor} < 2^{r}\\ & = 2n + 1 + 2^{{\lfloor log_2 n \rfloor} + 1} \tag{19} \\ \end{align*} $$
And this is the same equation derived in blog post three in the series, albeit through a completely different method.

Conclusion

In this blog post and the previous post I derived formulae which are applicable to intervals other than 2.

I was able to use these formulae to find a new proof of the closed form solution for the original interval of 2 (i.e. removing every second person from the circle).

I wasn't able to derive a closed form solution for arbitrary intervals. However I was able to devise an efficient algorithm for calculating the answer.

Next time

I feel that I have progressed as far as I can with the problem on my own. I don't know whether a closed form solution exists. But even if it does, I doubt I'm going to find it.

So now it's time to search the internet to see if this is a problem which other people have solved. In my next blog post, I hope to report back on what I found.

I'm also hoping to share some of the scripts and tools I used to generate the artefacts in this series of blog posts.

But most of all, I'm looking forward to wrapping up this series, so that I can move on to other topics that capture my interest!

Seven potato (generalizing to other intervals)

2013-06-06T16:01:00.001-04:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

In the previous posts in the series I derived Mathematical formulae for calculating the last person left in a circle if every second person is asked to leave (continuing until only one person remains).

But what if we want to remove every third person from the list, or every eighth person (as in the "One Potato, Two potato" children's game)?

In this post and the next I will attempt to generalize the results to arbitrary intervals.

The challenge

Let $f_k(n)$ be the label of the last person left in a circle of $n$ people when every $k^{\text{th}}$ person is asked to leave.

Clearly $f_k(1) = 1$.

When the interval is two, the basic equations allowed us to express $f_{2}(2n+b)$ in terms of $f_{2}(n)$. My gut feel is that this same method won't be useful for an interval of $k$, because it will only allow us to express $f_k(kn+d)$ in terms of $f_k((k-1)n)$, not in terms of $f_k(n)$.

Instead I'm going to find a way to express $f_k(n)$ in terms of $f_k(n-1)$.

Interestingly, the inspiration for doing this comes from the generalization of the F# brute force calculation method presented in the fifth blog post in the series. So not only have the functional algorithms proved useful for checking the Mathematics. They have also led to new Mathematical insights!

Brute force computation

So to recap, let's write an F# script which generalizes the brute force calculation method to arbitrary intervals:

let rec getLastLabelInCircleWithIntervalByBruteForce numberToSkip interval circle =
    match (numberToSkip, circle) with
    | (_, current :: []) -> current
    | (0, current :: restOfCircle)
        -> getLastLabelInCircleWithIntervalByBruteForce
            (interval-1) interval restOfCircle
    | (negativeToSkip, _) when negativeToSkip < 0
        -> raise (System.ArgumentException("The number to skip can't be negative!"))
    | (positiveToSkip, current :: restOfCircle)
        -> getLastLabelInCircleWithIntervalByBruteForce
             (positiveToSkip-1) interval (restOfCircle @ [current])
    | (_,_) -> raise (System.ArgumentException("The circle mustn't be empty!"))

let rec calcLastLeftInCircleWithIntervalByBruteForce interval sizeOfCircle = 
    getLastLabelInCircleWithIntervalByBruteForce
        (interval-1) interval [1 .. sizeOfCircle]

The method recursively skips a person and moves them to the end of the list until it reaches the next person to remove from the circle. It removes that person and starts over with the next person to skip being at the front of the list.

It runs in a similar time to the previous brute force algorithm.

A recursive formula

Let's consider what happens when we have $n$ people in a circle, and we remove the next person. Let $m$ be the label of the next person removed from the circle. As with the F# algorithm, after removing person $m$, we will shift the entire circle around so that person $m+1$ is the first person in the list (or person 1 if $m = n$).

So after removing person $m$ and shifting everyone around, we are left with $n-1$ people in the circle: $$ \begin{array}{| l | c | c | c | c | c | c | c |} \hline \text{Index: }& 1 & 2 & \ldots & n-m & n-m+1 & \ldots & n-1\\ \text{Label: }& m+1 & m+2 & \ldots & n & 1 & \ldots & m-1\\ \hline \end{array} $$
We will be using the "mod" (modulo) operator to handle the clocking over from position n to position 1. To do this we need to modify the labels to be zero-based:

$$ \begin{array}{| l | c | c | c | c | c | c | c |} \hline \text{Index: }& 1 & 2 & \ldots & n-m & n-m+1 & \ldots & n-1\\ \text{Label - 1: } & m & m+1 & \ldots & n-1 & 0 & \ldots & m-2\\ \hline \end{array} $$
Note that $n-1 = (n-1) \bmod n$ and $0 = n \bmod n$. This suggests a way of expressing all the labels in a uniform manner:

$$ \begin{array}{| l | c | c | c | c | c | c | c |} \hline \text{Index: } & 1 & 2 & \ldots & n-m & n-m+1 & \ldots &n-1\\ \text{Label - 1: } & m\bmod n & (m+1)\bmod n & \ldots & (n-1)\bmod n & n\bmod n & \ldots & (n+m-2)\bmod n\\ \hline \end{array} $$
Now is a good time to make the labels one-based again. I'll also rearrange the terms slightly to make the mapping from indexes to labels more obvious:

$$ \begin{array}{| l | c | c | c | c |} \hline \text{Index: }& 1 & 2 & \ldots & n-1\\ \text{Label:} & [1+(m-1)]\bmod n + 1 & [2+(m-1)]\bmod n + 1 & \ldots & [(n-1) + (m-1)]\bmod n + 1\\ \hline \end{array} $$
This gives us a way of mapping from indexes to labels. The $i^{\text{th}}$ label in the list is $[i+(m-1)] \bmod n + 1$.

There are now $n-1$ people in the circle. So the index of the last person left will be $f_k(n-1)$. Hence the label of the last person left will be $[f_k(n-1)+m-1]\bmod n + 1$. Hence:

$$ f_k(n) = [f_k(n-1)+m-1]\bmod n + 1 \tag{1} $$
$m$ is the label of the first person removed from the circle of n people. How do we determine $m$?

If $k \le n$ then $m = k$. But what if $k > n$? In this case the circle will be circumnavigated one or more times before the $k^{\text{th}}$ person is eliminated.

We can use the modulo operator to determine the value of m, remembering that we must first zero-base the label, then apply the modulo operator, then add back 1 to the label at the end. So:

$$ \begin{align*} m = & (k-1) \bmod n + 1 & \text{for }k > n \\ m = & k & \text{for } k \le n \end{align*} $$

But these two equations can be collapsed into one, because when $k \le n$ then $k-1 < n$, so $k = (k-1) \bmod n + 1$. So this gives: $$ m = (k-1) \bmod n + 1 \tag{2} $$
Substituting into equation 1, we get the following:

$$ \begin{align*} f_k(n) &= [f_k(n-1) + ((k - 1) \bmod n + 1) - 1] \bmod n + 1 \\ &= [f_k(n-1) + (k - 1) \bmod n ] \bmod n + 1 \tag{3} \\ \text{So: }\\ f_k(n) &= [f_k(n-1) + k - 1 ] \bmod n + 1 \tag{4} \end{align*} $$
As promised, we have a recursive formula for $f_k(n)$ in terms of $f_k(n-1)$.

F# code for the recursive formula

The following F# function implements the recursive formula:

let rec calcLastLeftInCircleWithIntervalByRecursiveFormula interval sizeOfCircle =
    match sizeOfCircle with
        | 1 -> 1
        | n when n > 1 
            -> ( calcLastLeftInCircleWithIntervalByRecursiveFormula
                    interval (sizeOfCircle-1)
                 + interval - 1
               ) % sizeOfCircle
               + 1
        | _ -> raise (System.ArgumentException("The size of the circle must be positive!"))

This function runs much faster than the brute force approach. However it has a significant flaw. On my machine, a stack overflow occurs at some value of $n$ between 60 000 and 70 000. This is not really surprising, since the size of the stack is O(n) and the recursive call is not tail recursive.

By contrast, the recursive formula from the fifth post in the series was O(log n) in the stack size, since it expressed $f_{2}(2n+b)$ in terms of $f_{2}(n)$.

Next time

My next blog post will be a continuation of this topic.

I will be deriving a more efficient algorithm which has O(log n) recursive calls rather than O(n). This will not only solve the stack overflow problem. It will also make the algorithm incredibly fast.

Six potato (building a calculation graph)

2013-05-11T02:35:00.000-04:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

In previous posts in the series I derived Mathematical formulae (and provided F# scripts) for calculating the last person left in a circle if every second person is asked to leave (continuing until only one person remains).

In the third post in the series I developed an algebraic formula. However algebra often fails to provide insight into the solution. In this post I'd like to provide that insight by taking a very different approach to the problem.

Finding a pattern

In the second post in the series I derived the basic equations needed to solve the problem:

$$ \begin{align*} f(1) &= 1 \tag{1} \\ f(2n) &= 2.f(n) - 1 \tag{2} \\ f(2n+1) &= 2.f(n) + 1 \tag{3} \end{align*} $$

Notice how equations 2 and 3 both reference $f(n)$. So knowing $f(n)$ allows you to generate two other numbers: $f(2n)$ and $f(2n+1)$.

Let's generate a graph showing how the values are determined:

There seems to be a pattern here. The output in the first row is 1. The outputs in the second row are 1, 3. The outputs in the 3rd row are 1, 3, 5, 7. It gets a bit harder to see, but if we expand another level down we get 1, 3, 5, 7, 9, 11, 13, 15:

So the inputs in row r go from $2^{r-1}$ to $2^{r}-1$ in increments of 1.

And the outputs in row r seem to go from 1 to $2^{r} - 1$ in increments of 2.

Insight

A bit further on I will prove this pattern using the principle of induction. But first let's look for the insight into why this pattern is so.

There are two things to show:

The output of the first number in each row must always be 1
Consecutive outputs in a row always differ by 2

Why is the output of the first number in a row always 1? In other words why does $f(2^{m}) = 1$?

If the size of the circle is a power of 2, then exactly half the people will be removed during one trip around the circle. And the last person removed will be the person just before person 1. So person 1 is not going to be eliminated in the next round of eliminations either.

And since half of a power of 2 is also a power of 2, this pattern is going to repeat itself. So person 1 is never going to be eliminated. So the answer has to be 1.

Why does each pair of successive nodes in a row always differ by 2?

Subtract equation 2 from equation 3 in the basic equations. The $f(n)$ terms cancel out, showing that: $$ f(2n+1) - f(2n) = [2.f(n) + 1] - [2.f(n) - 1] = 2 $$

But we also need to consider the other way of pairing successive terms (i.e. an odd followed by an even argument).

In other words, why is $f(2n+2) - f(2n+1) = 2$ for all n, except when $2n+2$ is a power of 2?

We will prove this by induction. Suppose $f(n+1) - f(n) = 2$ in the previous row. Then:

$$ \begin{align*} f(2n+2) &= f(2(n+1)) \\ &= 2.f(n+1) - 1 & \text{by (2)} \\ \\ \text{So: }f(2n+2) - f(2n + 1) & = [2.f(n+1) - 1] - [2.f(n) + 1] & \text{by (3)} \\ &= 2.[f(n+1) - f(n)] - 2 \\ &= 2.[2] - 2 \\ &= 2 \end{align*} $$
Look at the second row in the graph: $f(3) - f(2) = 2$. Starting at this row, the property of successive outputs differing by 2 is going to ripple down from row to row in the graph.

Now let's prove this formally. We'll do it by induction again, but this time we'll use the row number in the graph as the induction step (or more correctly the row number minus 1).

Proof by induction

The inductive step

Suppose that for some integer k, the numbers from $2^k$ to $2^{k+1} - 1$ satisfy:

$f(2^{k} + i) = 2i + 1 \tag{4}$.
Then we wish to show that this holds true for k+1 as well. In other words, the numbers from $2^{k+1}$ to $2^{k+2} - 1$ should satisfy $f(2^{k+1} + j) = 2j + 1$.

To prove this we consider the two cases of odd and even values of j separately:

Proof for even values:

Let $j = 2i$ for any $i \in \left\{ 0, 1, ..., 2^{k}-1 \right\}$.

First note that j will have values in the following range:

$ j \in \left\{0, 2, ..., 2^{k+1} - 2 \right\} \tag{5}$

Then:

$$ \begin{align*} f(2^{k+1} + j) &= f(2.2^{k} + 2i) \\ &= f(2[2^{k} + i]) \\ &= 2.f(2^{k} + i) - 1 & \text{by (2)} \\ &= 2.( 2i + 1 ) - 1 & \text{by (4)} \\ &= 2.( j + 1 ) - 1 \\ &= 2j + 1 \end{align*} $$ Derivation for odd values:

Let $j = 2i + 1$ for any $i \in \left\{ 0, 1, ..., 2^{k}-1 \right\}$.

Note that j will have values in the following range:

$ j \in \left\{1, 3, ..., 2^{k+1} - 1 \right\} \tag{6}$

Then:

$$ \begin{align*} f(2^{k+1} + j) &= f(2.2^{k} + 2i + 1) \\ &= f(2[2^{k} + i] + 1) \\ &= 2.f(2^{k} + i) + 1 & \text{by (3)} \\ &= 2.( 2i + 1 ) + 1 & \text{by (4)} \\ &= 2.( j ) + 1 \\ &= 2j + 1 \end{align*} $$

So combining these two cases, we see that $f(2^{k+1} + j) = 2j + 1$ for all $j \in \left\{ 0, 1, ..., 2^{k+1}-1 \right\}$, from (5) and (6).

The base case

When k = 0:

$f(2^{0}) = f(1) = 1 = 2 . 0 + 1$

So when k = 0 and $i \in \left\{0, ..., 2^{k} - 1\right\} = \left\{0\right\}$, $f(2^{k} + i) = 2i + 1$

Conclusion

So this is our proof by induction of the following:

For all non-negative integers $k$ and $i$, with $i \in \left\{ 0, 1, ..., 2^{k}-1 \right\}$:

$f(2^{k} + i) = 2i + 1$

But for $n = 2^{k} + i$ this gives: $$ \begin{align*} f(n) &= 2i + 1 \\ &= 2.[i + 2^{k} - 2^{k}] + 1 \\ &= 2.[2^{k} + i] + 1 - 2^{k+1} \\ &= 2n + 1 - 2^{k+1} \\ &= 2n + 1 - 2^{{\lfloor log_2 n \rfloor} + 1} \end{align*} $$ And that is the same as the equation derived in the third post of the series.

Next time

In the next blog post in the series I will try to generalize the result to intervals other than 2.

Five potato (verifying the formulae with F#)

2013-03-24T17:34:00.000-04:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

In the first four posts in the series I derived Mathematical formulae for calculating the last person left in a circle if every second person is asked to leave (continuing until only one person remains).

In this post I am going to write F# scripts to check these formulae.

Why F#?

Functional programming languages often express Mathematical concepts very elegantly
I felt like learning a functional language (though I had originally planned to learn Haskell)
I opened the F# interactive console on a whim and the solutions were flowing before I had time to change my mind!

I'm very happy with how natural it felt to use F#. And it does build on my knowledge of .Net. So I don't regret my somewhat accidental choice of F# instead of Haskell.

Brute force computation

The following F# functions use a brute force approach to calculate the number of the last person left in the circle:

let rec getLastLabelInCircleByBruteForce circle =
    match circle with
    | onlyOneLeft :: [] -> onlyOneLeft
    | nextToSkip :: nextOut :: restOfCircle 
        -> getLastLabelInCircleByBruteForce (restOfCircle @ [nextToSkip])
    | _ -> raise (System.ArgumentException("The circle mustn't be empty!"))

let calcLastLeftInCircleByBruteForce sizeOfCircle = 
    getLastLabelInCircleByBruteForce [1..sizeOfCircle]

Here's a short explanation of the F# syntax...

"let rec <functionName> <parameter> =" defines a recursive function taking a single parameter.
F# is strongly typed. It uses Hindley-Milner type inference to deduce parameter types.
"match circle with" does pattern matching on circle.
Each pipe ("|") is the start of a pattern. Whatever comes after "->" is the result if the pattern is the first to match.

You will also need to know a few things about lists in F#. I recommend reading this wikibooks article or this article by Microsoft's Chris Smith.

Note the following...

F# lists are linked lists, not the .Net List class.
"head::tail" is a list comprising an element called head pointing to a sub-list called tail.
"[]" is an empty list.
So "onlyOneLeft :: []" matches a list with a single item (the answer).
"nextToSkip :: nextOut :: restOfCircle" matches the first two items in the list. restOfCircle is the remaining people in the circle (which could be the empty list).
"_" is a placeholder which will match anything.
"list1 @ list2" concatenates two lists.
"[1..N]" creates a list with the numbers 1, 2, 3, ... N.

So the algorithm repeatedly moves the first person to the end of the list and skips the second person. It does this recursively until a single person is left. If you like, we are rotating the circle towards us, instead of moving around the circle (which would involve unnecessary complexity, such as keeping track of our current position and clocking over correctly when we reached the end of the list).

Let's run the fsi application and calculate the results for circles with 10 and 10,000 people (with timing switched on):

This appears to be quite slow. It took around 0.9 seconds to run the calculation for a circle with 10,000 people.

This is partly because "(restOfCircle @ [nextToSkip])" is an O(n) operation, where n is the size of restOfCircle.

Recursive computation

let rec calcLastLeftInCircleByHalving sizeOfCircle =
    let (div, rem) = (sizeOfCircle / 2, sizeOfCircle % 2)
    match (div,rem) with
    | (0,1) -> 1
    | (n,0) -> 2 * (calcLastLeftInCircleByHalving n) - 1
    | (n,1) -> 2 * (calcLastLeftInCircleByHalving n) + 1
    | (_,_) -> 0  // to suppress compiler warnings

Notice how closely the first 3 patterns correspond to the 3 formulae.

The code also runs much faster than the brute force approach. So much faster that it calculated all the numbers up to a million in 0.629 seconds!

Computation via the algebraic formula

In my third post of the series I derived an algebraic formula for the last person left in the circle: $$ f(n) = 2n + 1 - 2^{{\lfloor log_2 n \rfloor} + 1} \tag{9} $$ In F# this becomes:

let calcLastLeftInCircleByFormula n =
    let floorOfLog = System.Math.Floor(System.Math.Log( double n, 2.0) )
    2 * n + 1 - int ( 2.0 ** (floorOfLog + 1.0))

Note that System.Math is the static class in the .Net framework for Mathematical calculations. Its Log() method has an overload which takes the base of the logarithm as its second parameter.

Running this for all the numbers up to a million takes 0.543 seconds - slightly quicker than the previous method...

Two things make me uncomfortable with this approach:

It feels inefficient to be doing floating point calculations to solve an integer problem.
I'm concerned that rounding errors could cause a calculation error

The rounding error concern is easily addressed. Simply add 0.5 to the value inside the int(...) conversion. But perhaps we should be avoiding floating point calculations altogether...

Computation via binary arithmetic

In the fourth post in the series I presented a very elegant rule for calculating the answer using binary representations.

This method can be re-stated (for computational convenience) as follows:

take the binary representation of the number
drop the leading digit
shift left
add 1

Let's convert that to F#:

First we need to find the leading digit, so we can drop it. This is equivalent to finding the highest power of 2 that is less than (or equal to) the number. In other words it's the term $2^{\lfloor log_2 n \rfloor}$ in the algebraic formula.

This is the same as finding the left-most bit in the binary representation of the number and setting all remaining bits to zero.

We can do this easily by repeatedly shifting the number to the right (the final bit drops out) and simultaneously shifting the number 1 to the left each time (this multiplies by 2, giving the next higher power of 2). Repeat until the original number is 1, and the other number will be the power of 2 we are looking for.

In F# the ">>>" operator does a bitwise right shift, and "<<<" does a left shift. So the following pair of F# functions calculate $2^{\lfloor log_2 n \rfloor}$:

let rec calcFirstBinaryDigit powerOf2 shiftedNumber =
    match shiftedNumber with
    | 0 -> raise (System.ArgumentException("No power of 2 is less than zero"))
    | 1 -> powerOf2
    | _ -> calcFirstBinaryDigit (powerOf2 <<< 1) (shiftedNumber >>> 1)

let calcHighestPowerOfTwoNotGreaterThan = calcFirstBinaryDigit 1

With this building block in place, we can easily express the binary calculation method as follows:

let calcLastLeftInCircleByBinaryFormula n =
    ((n - calcHighestPowerOfTwoNotGreaterThan n) <<< 1) + 1

The binary formula took 0.465 seconds, which is marginally faster than the floating point formula:

Comparing the results of the various methods

Now let's return to the original purpose of writing these various F# scripts, which was to check the Mathematics.

First I'm going to create a data structure to hold all the answers from the various methods for a single size of circle...

type circleCalculation = { 
    SizeOfCircle: int;
    LastLeftInCircleByBruteForce: int;
    LastLeftInCircleByHalving: int;
    LastLeftInCircleByFormula: int;
    LastLeftInCircleByBinaryFormula: int
}

Then I'm going to create a function to perform the calculation of a single record. I want this function to have a parameter allowing me to switch off the brute force method if I want to, since I know it is much slower than the other methods:

let getCircleCalculation includeBruteForce sizeOfCircle =
    // Brute force is so slow to calculate. Provide the option to omit it...
    let lastLeftInCircleByBruteForce = 
        match includeBruteForce with 
        | true -> calcLastLeftInCircleByBruteForce sizeOfCircle
        | false -> 0
    let lastLeftInCircleByHalving = calcLastLeftInCircleByHalving sizeOfCircle
    let lastLeftInCircleByFormula = calcLastLeftInCircleByFormula sizeOfCircle
    let lastLeftInCircleByBinaryFormula = calcLastLeftInCircleByBinaryFormula sizeOfCircle
    let circleCalc = {
        SizeOfCircle = sizeOfCircle;
        LastLeftInCircleByBruteForce = lastLeftInCircleByBruteForce; 
        LastLeftInCircleByHalving = lastLeftInCircleByHalving;
        LastLeftInCircleByFormula = lastLeftInCircleByFormula;
        LastLeftInCircleByBinaryFormula = lastLeftInCircleByBinaryFormula
    }
    circleCalc

Something that really amazed me is that I didn't need to specify the circleCalculation type in the function. F# is able to infer the data type from the usage pattern. Very neat!

With this in place, I can use the following script to check that the methods all match up:

// With brute force:
[1..1000] |> List.map (getCircleCalculation true) |> List.filter (
    fun cc -> cc.LastLeftInCircleByBinaryFormula <> cc.LastLeftInCircleByBruteForce
           || cc.LastLeftInCircleByBinaryFormula <> cc.LastLeftInCircleByHalving
           || cc.LastLeftInCircleByBinaryFormula <> cc.LastLeftInCircleByFormula
    )

// Without brute force:
[1..1000000] |> List.map (getCircleCalculation false) |> List.filter (
    fun cc -> cc.LastLeftInCircleByBinaryFormula <> cc.LastLeftInCircleByHalving
           || cc.LastLeftInCircleByBinaryFormula <> cc.LastLeftInCircleByFormula
    )

Running this shows that all the methods produce the same result. Mission accomplished...

Next time

In the next blog post in the series, I will show a different way of solving the problem. Instead of using the basic equations to recursively calculate a large number from the formula for smaller numbers, I'm going to work in the opposite direction.

What will emerge from this is a pattern of answers for successive values of the circle size. You might like to have a look at some of the earlier screen captures to see if you can work out what the pattern is.

Four (a binary calculation method)

2013-03-24T15:59:00.001-04:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

In my first three posts in the series I derived Mathematical formulae for calculating the last person left in a circle if every second person is asked to leave (continuing until only one person remains).

In this post I am going to show a really neat way of calculating the answer. It is based on the binary number representation of the number of people in the circle.

If you are unfamiliar with the binary number system, then I suggest looking for a basic tutorial on binary numbers first.

The binary calculation rule

Take the binary representation of the number of people in the circle. Move the left-most (non-zero) bit to the end. Convert back to decimal and you have the number of the last person left.

Some examples

$$ f(\underbrace{10}_{\text{decimal}}) = f(\underbrace{1010}_{\text{binary}}) = \underbrace{0101}_{\text{binary}} = \underbrace{5}_{\text{decimal}} $$ $$ f(\underbrace{13}_{\text{decimal}}) = f(\underbrace{1101}_{\text{binary}}) = \underbrace{1011}_{\text{binary}} = \underbrace{11}_{\text{decimal}} $$

Derivation

In my third post of the series I derived an algebraic formula for the last person left in the circle: $$ f(n) = 2n + 1 - 2^{{\lfloor log_2 n \rfloor} + 1} \tag{9} $$ I'd like to write that formula slightly differently: $$ f(n) = 2(n - 2^{\lfloor log_2 n \rfloor} ) + 1 \tag{10} $$

$2^{\lfloor log_2 n \rfloor}$ is just the highest power of 2 that is less than (or equal to) the number. Now each position in a binary representation represents a power of 2. So this represents the left-most bit in the binary representation of n (and all other positions zero).

Then $n - 2^{\lfloor log_2 n \rfloor}$ is just the original number n with its first (non-zero) binary digit dropped (i.e. replaced with a zero). Let's call this number m.

In binary, you can multiply a number by 2 by shifting all its bits one place to the left. So 2m+1 is just a "left shift" of m and set the rightmost bit to 1.

So we can express the binary method as:

take the binary representation of the number
drop the leading digit
append a 1 (i.e. shift left and add 1)
convert back to decimal

But since we are dropping the leading 1, and appending a trailing 1, you could see this as rotating the first bit around to the end.

Next time

In the next post in the series, I will be writing code to calculate the answer using each of the methods presented in the first four posts.

Three potato (an algebraic solution)

2013-01-20T08:04:00.000-05:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

In my first post of the series I described a situation where a number of people are sitting in a circle. Every second person is asked to leave. This continues until only one person remains. If the people are labelled 1 to N, what is the formula for that final person?

In the second post in the series I derived the basic equations needed to solve the problem:

$$ \begin{align*} f(1) &= 1 \tag{1} \\ f(2n) &= 2.f(n) - 1 \tag{2} \\ f(2n+1) &= 2.f(n) + 1 \tag{3} \end{align*} $$
In this post I will use these equations to derive the formula for the label of the last person left.

In future posts I will show a more intuitive way of solving the problem (but still starting from these 3 equations).

Derivation

Equations 2 and 3 allow us to calculate the formula for a number in terms of the formula for a smaller number. Starting from any number we can keep on repeating this process until we get the formula in terms of f(1).

For example, suppose there are 10 people in the circle. Then: $$ \begin{align*} f(10) &= 2.f(5) - 1 & \text{by equation (2)} \\ &= 2.(2.f(2) + 1) - 1 & \text{by equation (3)} \\ &= 2.(2.(2.f(1) - 1) + 1) - 1 & \text{by equation (2)} \\ &= 2.(2.(2.(1) - 1) + 1) - 1 & \text{by equation (1)} \\ &= 2.(2.(1) + 1) - 1 \\ &= 2.(3) - 1 \\ &= 5 \\ \end{align*} $$

We would like to apply this same process to an arbitrary number n. Hopefully a pattern will emerge that will allow us to generate a formula in terms of n.

One potential problem is that we do different things (add or subtract one), depending on whether the number at that step is odd or even. We can avoid this problem if we have a formula which doesn't depend on whether the argument is odd or even.

Fortunately it's quite easy to do. Just note that:

$$ 2b-1 = \left\{ \begin{array}{l l} -1 & \text{if $b = 0$}\\ +1 & \text{if $b = 1$} \end{array} \right. $$
So we can collapse equations 2 and 3 into this equation: $$ \begin{align*} f(2n + b) = 2.f(n) + 2b - 1 & \text{ where b $\in \{0,1\}$}\tag{4} \\ \end{align*} $$

Equations 2 and 3 depend on whether the number we are solving for is odd or even. And the number we are solving for halves after each step. My first instinct on seeing this pattern was to express n as a binary number:

Suppose that: $$ 2^{k} \leq n < 2^{k+1} $$ Then we can express n as: $$ \begin{align*} n = \sum_{{i}={0}}^{k}{a_{i}.2^i} & \text{ where $a_{i} \in \{0,1\}$}\tag{5} \end{align*} $$ Note that the leading binary digit must be 1: $$ a_{k} = 1 \tag{6} $$

We're also going to need a well-known equation for binary sums: $$ \sum_{{i}={0}}^{k-1}2^{i} = 2^{k} - 1 \tag{7} $$ This is really just saying that if all the binary digits in a number are 1, then adding 1 will cause all columns to clock over to give the next higher power of 2. This is similar to how addition works in the decimal system. For example: $$ 10^{3} - 1 = 1000 - 1 = 999 = \sum_{{i}={0}}^{2}{9.10^{i}} = \sum_{{i}={0}}^{2}{(10-1).10^{i}} $$

Let's carry out the steps to derive the formula for f(n): $$ \begin{align*} f(n) &= f( \sum_{{i}={0}}^{k}{a_{i}.2^i} ) & \text{by equation (5)} \\ &= f( 2 \sum_{{i}={1}}^{k}{a_{i}.2^{i-1}} + a_{0} ) \\ &= 2.f( \sum_{{i}={1}}^{k}{a_{i}.2^{i-1}} ) + 2 a_{0} - 1 & \text{by equation (4)} \\ &\text{Let's rewrite this formula as follows:} \\ f(n) &= 2.f( \sum_{{i}={1}}^{k}{a_{i}.2^{i-1}} ) + \sum_{{i}={0}}^{0}[(2 a_{i}-1).2^{i}] & \text{after step 1} \\ &= 2.f( 2 \sum_{{i}={2}}^{k}{a_{i}.2^{i-2}} + a_{1} ) + \sum_{{i}={0}}^{0}[(2 a_{i}-1).2^{i}] \\ &= 2.[2.f( \sum_{{i}={2}}^{k}{a_{i}.2^{i-2}}) + 2 a_{1} - 1] + \sum_{{i}={0}}^{0}[(2 a_{i}-1).2^{i}] & \text{by equation (4)} \\ &= 2^{2}.f( \sum_{{i}={2}}^{k}{a_{i}.2^{i-2}}) + 2^{1}[2 a_{1} - 1] + \sum_{{i}={0}}^{0}[(2 a_{i}-1).2^{i}] \\ &= 2^{2}.f( \sum_{{i}={2}}^{k}{a_{i}.2^{i-2}}) + \sum_{{i}={0}}^{1}[(2 a_{i}-1).2^{i}] & \text{after step 2} \\ &= \text{ ...} \\ &= 2^{j}.f( \sum_{{i}={j}}^{k}{a_{i}.2^{i-j}}) + \sum_{{i}={0}}^{j-1}[(2 a_{i}-1).2^{i}] & \text{after some step j, $1 \leq j \leq k $} \\ &= \text{ ...} \\ &= 2^{k}.f( \sum_{{i}={k}}^{k}{a_{i}.2^{i-k}}) + \sum_{{i}={0}}^{k-1}[(2 a_{i}-1).2^{i}] & \text{after step k} \\ \end{align*} $$
So what I've done is to break the formula on the right into two terms, each with a summation. Then I've progressively eaten away at the left summation while building up the right.

Now we just need to expand and simplify: $$ \begin{align*} f(n) &= 2^{k}.f( \sum_{{i}={k}}^{k}{a_{i}.2^{i-k}}) + \sum_{{i}={0}}^{k-1}[(2 a_{i}-1).2^{i}] \\ &= 2^{k}.f(a_{k}) + 2 [\sum_{{i}={0}}^{k-1}a_{i}.2^{i}] - \sum_{{i}={0}}^{k-1}2^{i} \\ &= 2^{k}.f(a_{k}) + 2 [\sum_{{i}={0}}^{k-1}a_{i}.2^{i} + a_{k}.2^{k} - a_{k}.2^{k} ] - (2^{k} - 1) & \text{ by equation (7)} \\ &= 2^{k}.f(1) + 2 [\sum_{{i}={0}}^{k}a_{i}.2^{i} - 1.2^{k} ] - 2^{k} + 1 & \text{ since $a_{k} = 1$ by equation (6)} \\ &= 2^{k}.1 + 2 [\sum_{{i}={0}}^{k}a_{i}.2^{i} - 2^{k} ] - 2^{k} + 1 & \text{ by equation (1) } \\ &= 2^{k} + 2 [n - 2^{k} ] - 2^{k} + 1 & \text{ by equation (5) } \\ &= 2^{k} + 2n - 2^{k+1} - 2^{k} + 1 \\ &= 2n+1 - 2^{k+1} & \tag{8} \\ \end{align*} $$
Since we can easily calculate k from n, that is our solution.

The calculation method

Let's summarize how we solve this for a particular value of n:

Double n and add 1
Find the largest power of 2 that is less than or equal to n, and double it
Subtract the second number from the first
That is your answer!

Let's check it using our previous example of n = 10: $$ 2^{3} \leq 10 < 2^{4} $$ Hence k = 3.

So: $$ \begin{align*} f(10) &= 2 \times 10 + 1 - 2^{3+1} \\ &= 20 + 1 - 2^{4} \\ &= 21 - 16 \\ &= 5 \\ \end{align*} $$ Which is the answer we got previously!

The algebraic formula

If we want to formalize things we can also express the formula in terms of n only. Simply note that: $$ k = \lfloor log_2 n \rfloor $$ So that: $$ f(n) = 2n + 1 - 2^{{\lfloor log_2 n \rfloor} + 1} \tag{9} $$

Conclusion

So now we have a formula. But it took a lot of algebra. And we still don't have an intuitive feel for why the formula comes out the way it does.

In my next few blog posts I'm going to present a different way of looking at the problem. Instead of going straight for brute force calculation, we are going to apply a bit more insight first.

Two potato (the basic equations)

2012-12-16T15:37:00.001-05:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

In my previous blog entry I described a problem from the 2010 South African Mathematics Olympiad. In the next few posts I'm going to show you a few ways of solving the problem. But in this blog entry, I'm just going to set up the equations which the various solutions will all make use of.

Let's start with a recap of the problem...

Imagine n people sitting in a circle. Label them 1 to n in a clockwise direction. Imagine walking around the circle tapping every second person on the shoulder (person 2, 4, 6 and so on). People leave the circle when they are tapped. This continues until only one person is left. What is the label of that person?

Derivation

Firstly let's define f(n) to be the label of the last person remaining when there are n people in the circle.

Clearly f(1) = 1.

What we'd like to do is find expressions for f(n) in terms of f(m) where m is smaller than n.

Every second person is being asked to leave with roughly half as many people remaining after each walk around the circle. This suggests that we should consider the cases of n being odd and even separately.

So let's see what happens when there are an even number of people around the circle, say 2n people.

In our first trip around the circle, the n even numbered people will leave the circle. This will leave n odd-numbered people.

Let's re-label these odd people as 1, 2, ..., n. Then the (new) label of the last person remaining would equal f(n). That comes straight from the definition of f(n).

But we are interested in the original label of this last person. That is not f(n) but 2.f(n) - 1. This is because 2k - 1 is the kth odd number.

Hence f(2n) = 2 f(n) - 1.

Next we will find a similar formula for circles with an odd number of people.

We already have a formula for n = 1. All other positive odd numbers can be expressed as 2n+1 for some positive integer n.

Initially the situation looks like this for 2n+1 people:

We would like to express f(2n+1) in terms of f(n). That means still having n people in the circle after the first round of removals.

To do this we need to remove n+1 people. The first n people removed are all even numbered as before. But now the (n+1)th person removed is person number 1. This leaves the following arrangement:

Note that the kth person in the new circle of people has a label of 2k+1.

So we can re-label the remaining people from 1 to n as before, and f(n) is the new label of the last person remaining if we continue the elimination process. But in the original labelling scheme, this becomes 2.f(n) + 1.

So we have shown that f(2n+1) = 2.f(n) + 1.

The basic equations

If we put all of this together, we get the following set of equations (where n is a positive integer):

$$ \begin{align*} f(1) &= 1 \tag{1} \\ f(2n) &= 2.f(n) - 1 \tag{2} \\ f(2n+1) &= 2.f(n) + 1 \tag{3} \end{align*} $$

Next time

In my next few blog posts I will use these equations to provide a formula for f(n).

One potato (the problem statement)

2012-12-02T09:17:00.000-05:00

All posts in this series:

One Potato: A description of the problem
Two potato: The basic equations
Three potato: An algebraic formula for the last person left
Four: A neat calculation method using binary numbers
Five potato: F# functions to check the various formulae
Six potato: Using a calculation graph to see the pattern
Seven potato: Generalizing to other intervals
More: An efficient algorithm for arbitrary intervals
Online research: In which I discover that it is called the Josephus problem

Introduction

Some months back I discovered the web site of the SA Math Foundation including past papers for the South African Mathematics Olympiad. Back in high school I used to love solving Mathematics problems. So I downloaded a bunch of question papers and starting solving some of them.

This series of blog postings describes some interesting solutions I found to one of the problems.

[Edit: I've since discovered that it is known as the Josephus problem]

The problem statement

One of the first papers I glanced at was the 2010 senior paper, second round.

The first question described a situation in which there are 10 people in a circle, and every 2nd person is asked to leave the circle. This continues until only one person remains. If we number the people 1 to 10, and eliminate person 2 then 4 and so on, what is the number of the last person left?

So it works much like the children's game "One potato, two potato", except that every 2nd person is eliminated, not every 8th person.

The plan

Later I was thinking back to the problem and made a fortuitous mistake. I accidentally thought that the problem was phrased as 2010 people, not 10 people (Math Olympiad problems often use the current year in the problem statement).

So I ended up solving the problem generally, not just for the number 10 (which can easily be solved by brute force).

In the next few blog postings I intend to show you a few solutions to the problem. As I did with the Pyramid of Hexagons problem, I want to provide an algebraic proof as well as intuitive insight into why the proof is true.

So my plan is as follows:

First I will generate the basic equations which all the solutions will use
Thereafter I will provide my first algebraic solution
Then I will try to give some insight into an interesting pattern behind the formula Hint: There is a link to the binary number system.
Then I will give a simpler proof inspired by this pattern.
And finally I will provide the code I wrote to generate the illustrations.

Disclaimer

While the visualization should be very helpful, it's not going to be as striking as my Pyramid of Hexagons visualization.

So I'm hoping that a reader of this blog will be able to find an even more elegant way of explaining the formula.

A Powershell script to sample CPU utilizations by process

2012-12-02T08:09:00.002-05:00

We recently had an issue at work where CPU utilizations on our app servers started "flat-lining" at 100%. This starved our WCF service hosts of CPU cycles, leading to complaints of slow performance.

The performance decline was dramatic and it didn't occur straight after a release. This suggested that something had changed elsewhere in the organisation.

The approach on our project has been to regularly release new modules side-by-side with the legacy system. Integration messages keep the data in the two systems synchronized. This significantly minimizes risk compared to a "big bang" approach.

But some users continue using the old system surreptitiously. Printing is a case in point. Print driver issues had been a frequent source of problems in the past. So some users had gone back to using the old system for printing.

Unbeknownst to us, the legacy team had removed the ability to print documents through the legacy system. So the printing load had suddenly increased by 50%.

This led to other problems in the system. For example, our performance logging queue started building up on one of the app servers. Messages are inserted in parallel, but the logging service is throttled to only process one message at a time. So when the logging service is starved of CPU cycles, it starts falling behind. This leads to a constant drain on CPU, making further "flat-lining" more likely.

We had been aware for some time that printing could cause performance problems. So fortunately I already had a design in place for moving most of the printing load onto the print servers, where CPU was under-utilized. But it would take a week or two to develop and test the code.

As an interim solution we commissioned a few more app servers.

To check whether printing was the sole source of problems, I took some random snapshots of resource monitor graphs on one of the application servers. At times printing was clearly causing the CPU to flatline at 100%. But at other times the app servers flatlined when no printing was taking place. So I suspected that printing was not the only cause of high CPU usage. Instead of app servers being able to recover from short-lived "distress states", the extra printing load had caused performance to reach a runaway situation.

To quantify this, I wrote a Powershell script to use WMI to sample CPU utilization for each process on each app server. I wrote the results to a csv file, opened it in Excel, and generated a pivot table off the data.

This highlighted some other issues that we hadn't been aware of before. For example, there was an SNMP monitoring process, SysEdge, which was consuming 25% CPU on 3 of the app servers for long periods each morning. I alerted the infrastructure team and they were able to fix this issue so that SysEdge was consuming under 1% CPU on all app servers.

You can download the script from this location... Get-CPUUtilizationByServerAndProcess.ps1:

param (
    [string[]] $serverNames = $( throw 'The list of server names was not provided' ),
    [int] $coresPerServer = $( throw 'The number of cores per server was not provided' ),
    [int] $repetitions = 1,
    [int] $intervalInSeconds = 60
)

1..$repetitions | foreach-object {
    $repetition = $_
    [DateTime] $now = [DateTime]::Now
    Write-Host "Getting CPU utilizations at $now (repetition $repetition)" -foregroundColor Green

    foreach ($server in $serverNames)
    {
        Write-Host "    Getting processes on $server" -foregroundColor Cyan
        
        $prc = gwmi Win32_PerfFormattedData_PerfProc_Process -computerName $server  # To get around bug where it is zero the first time called
        $prc = gwmi Win32_PerfFormattedData_PerfProc_Process -computerName $server | ? { $_.Name -ne '_Total' -and $_.Name -ne 'Idle' }
        $recordedAt = [DateTime]::Now
        
        $summary = $prc | select-object IDProcess,Name,PercentProcessorTime,WorkingSet,@{n='PercentTime';e={$_.PercentProcessorTime/$coresPerServer}}
        
        foreach ($processSummary in $summary)
        {
            $processName = $processSummary.Name
            $processName = ($processName.Split('#'))[0]
            $percentTime = $processSummary.PercentTime
            $workingSet = $processSummary.WorkingSet
            
            $record = new-object -typeName 'PSObject' -property @{
                Server = $server
                Process = $processName
                Repetition = $repetition
                AvgCPU = [Math]::Round( $percentTime, 2 )
                WorkingSet = $workingSet
                RecordedAt = $recordedAt
                Year = $recordedAt.Year
                Month = $recordedAt.Month
                Day = $recordedAt.Day
                DayOfWeek = $recordedAt.DayOfWeek
                Hour = $recordedAt.Hour
                Minute = $recordedAt.Minute
            }
            $record
        }
    }
    
    $timeTillNextRepetition = $now.AddSeconds($intervalInSeconds) - [DateTime]::Now
    $secondsToWait = $timeTillNextRepetition.TotalSeconds
    if ($secondsToWait -gt 0)
    {
        Start-Sleep -Seconds $secondsToWait
    }
}

A few notes on the Powershell script...

You will notice that I make the WMI call twice. This is because the first call for a particular app server will return all zeroes. I suspect the first call for each app server could be moved outside the loop (e.g. have a loop zero whose data is discarded). But it was good enough for my purposes.

I also kept the following piece of code to illustrate a point...

1..$repetitions | foreach-object {
    $repetition = $_
    ...
}

I used to write code like this a lot, assigning the $_ iteration variable to another variable. This was for 2 reasons...

Firstly, for debugging purposes. I would assign the variable a value to test. Then copy the remainder of the inner script block into the Powershell REPL.

And secondly, so that I could nest loops but still access the outer iteration variable.

However there is an easier way to get the same effect...

foreach ($repetion in 1..$repetitions) {
    ...
}

We have since gone live with the new printing solution. And so far things are looking very promising. CPU load has only decreased slightly, but its "spikiness" seems to have improved significantly. Average execution times of service calls appear to have decreased by up to 18% at the busiest time of day. And it certainly looks like the print processes are terminating faster than before, presumably because print spooling is now taking place locally.

This is also the first time that we have moved an asynchronous process to a different server, thus improving the performance of the synchronous processes which affect user perception of speed. I can see a lot of opportunity for doing more of this in future.

My toolkit for troubleshooting performance issues

2012-12-02T03:34:00.001-05:00

For the past couple of years I've been involved in a long-running project to replace many of the software applications for a large hospital network. For much of that time I've been the team lead for the production support team (although more recently my role has changed to being the team's designer, using Enterprise Architect to create UML diagrams for various enhancements to deployed modules).

Over that time I've spent a lot of time troubleshooting and fixing performance issues. This has ranged from identifying and fixing problems caused by database blocking, hunting down memory leaks with windbg and sosex and investigating the causes of high CPU utilization on app servers.

My approach invariably starts with finding a way to gather the right data to allow me to pinpoint where the problem lies.

Even in a high pressure severity 1 outage, this is still my most common starting point. The data allows me to cut out a whole lot of areas of the system for further investigation. With just a few areas to look at in detail, I can then delegate in-depth troubleshooting across team members.

In a crisis, the last thing you must do is panic and take a shotgun approach to looking for problems. First make sure you understand the problem. Data is a very objective way of doing this. But you can't do that unless you have data readily at hand and a toolkit of scripts and habits for rapidly gathering and analyzing that data.

The general purpose tools that I keep on using are Powershell and Excel pivot tables.

These are complemented by specific sources of data for each layer of the application, such as:

Various SQL DM views and query plans for understanding SQL performance problems
Our performance logs (saved to the database) for analyzing WCF service call durations
WMI for app server and print server performance
Memory dumps for memory leak issues at the client

In addition to this, there are some other tools that are very useful for getting a gut feel for a problem, such as SCOM (Systems Center Operations Manager) and Windows Resource Monitor.

SCOM graphs are great for getting a high level view of what's happening across a range of servers.

Resource Monitor is very useful for seeing what's happening on your servers at a point in time. The Process Explorer utility from the sysinternals suite is more powerful (we have it installed on all our app servers). But I've found that Resource Monitor is usually good enough for my purposes.

Interestingly enough, I haven't yet needed to use a code profiler, though this is probably the tool most developers think of first when you mention performance troubleshooting.

Even when the problem is with the code, there are usually other ways of pinpointing which code is at fault.

For example, one of the developers wrote code to hook into the NHibernate layer and write a debug message whenever a possible N+1 selects problem was detected. So those kind of problems can be picked up by the "new development" teams before they get deployed into the production environment.

We also have a Castle Dynamix Proxy interceptor which hooks into the web service proxy and logs performance data for every WCF service call made.

We have a monthly release cycle, and after each release we run a query against the performance logs to look for service calls which are performing significantly worse than a month earlier (this is the only reasonable baseline, since service call volumes differ significantly by day of the week and time of the month). So this also helps us to find poorly performing code without the need for a profiler.

Over the coming months I would like to share more details with you, including some of the Powershell and SQL scripts I've written to gather and analyze the data.

UPDATE:

A post about a Powershell script to sample CPU utilization by process

hexnet.org and the pyramid of hexagons

2012-11-25T02:12:00.001-05:00

The hexnet.org web site recently found my blog posting on why a "pyramid" of hexagons is cubic and linked to it on Google+. Here is a link to their explanation of the same fact.

Blender model for the pyramid of hexagons

2012-11-17T10:49:00.001-05:00

In my previous post I provided a visual explanation of why a pyramid of hexagons is cubic. I created the visuals for that post using Blender 2.64 and Python. For those interested in playing with the Blender model, I have uploaded it (with its embedded Python script) to this Google docs drive.

Bear in mind that this is my first foray into both Blender and Python. So it's unlikely to be "idiomatic" Python or an example of Blender best practices!

To reduce size, the model only contains various cameras and lights. These are contained in layers 0, 8, 9, 10, 18 and 19 as can be seen at the bottom of the 3D view:

To create the blocks and hexagons you need to run the embedded script. First change from the "Default" layout to "Scripting" layout. Ensure the "Text" script is selected. And click on the "Run script" button.

Then change back to the default layout. You will see that layers 1 to 4 and 11 to 14 now show dots to indicate that they contain objects:

Layers 1 to 4 (the numbering starts at zero) in the top left block contain the various shells making up the cube. Layers 11 to 14 in the bottom left block contain the various layers making up the pyramid.

Since these two structures overlap, you shouldn't select both at the same time.

However when you want to start again, you should select just these 8 layers, hit A to toggle selection of all objects, and DELETE to delete the objects. Make sure you haven't selected any of the layers with cameras and lights, otherwise you will accidentally delete these too.

Use F12 to render the image as seen by the selected camera. But first you need to make sure that you have selected the layer that the camera and light are in, otherwise you will just see a silhouette.

Also make sure that the correct camera is selected for the scene...

The Python script allows you to generate an entire cube or Pyramid of a particular size using the following lines of code at the end of the script:

createCube(4)
createHexPyramid(4)

However you can also generate just part of the structure. This gives more insight into how the cube can be constructed from the cornerstone outwards. For example, the following code generates the outermost ring of the outermost shell in a 4x4x4 cube:

createRingOfBlocks(4, 4, 4)

And here is the code that does the same for the pyramid:

createRingOfHexTiles(4, 4, 4, radius=1, thickness=1)

...or from another angle...

One quirk to note about the code...

I calculated the South-Western line of blocks to start from the orange block going to the yellow block (in fact, originally the orange block was coloured red). My thinking was to label each line from 0 to n-1, but ignore hex zero (since it is also hex n-1 of the previous line). It turned out that wasn't the best idea as far as hex colours go.

Instead of re-calculating the formulae for all the hexagons, I took a short-cut and simply remapped the colour of each hexagon as if it was the hexagon following it (in clockwise order)...

def getColorForBlockOrHex(layer, ring, direction, position):
    # There is a better colour scheme than the one I chose.
    # Luckily it is easy to change to it. 
    # Each hex/block uses the colour that the hex after it would have been assigned.
    (direction, position) = getDirAndPosOfNextClockwiseBlockOrHex( direction, position, ring)
    startColor = getStartColorForDirection(direction)
    if ring == 1 or ring == 2:
        (r, g, b) = startColor
    else:
        nextDir = getNextDir(direction)
        endColor = getStartColorForDirection(nextDir)
        (r1, g1, b1) = startColor
        (r2, g2, b2) = endColor
        ratio1 = (ring - position)/(ring-1)
        ratio2 = (position - 1)/(ring-1)
        r = ratio1 * r1 + ratio2 * r2
        g = ratio1 * g1 + ratio2 * g2
        b = ratio1 * b1 + ratio2 * b2
    return (r,g,b)

Here is the full Python script:

import bpy
import math

from math import *

# From http://blenderscripting.blogspot.com/2011/05/blender-25-python-selecting-layer.html:
def selectLayer(layer):   
    return tuple(i == layer for i in range(0, 20))

# From http://wiki.blender.org/index.php/Dev:2.5/Py/Scripts/Cookbook/Code_snippets/Materials_and_textures:
def makeMaterial(name, diffuse, specular, alpha):
    mat = bpy.data.materials.new(name)
    mat.diffuse_color = diffuse
    mat.diffuse_shader = 'LAMBERT' 
    mat.diffuse_intensity = 1.0 
    mat.specular_color = specular
    mat.specular_shader = 'COOKTORR'
    mat.specular_intensity = 0.5
    mat.alpha = alpha
    mat.ambient = 1
    return mat
 
def setMaterial(ob, mat):
    me = ob.data
    me.materials.append(mat)

# ************************************************
# Following code by Andrew Tweddle, November 2012:
# ************************************************

# -----------------------------------
# Code common to blocks and hexagons:
# -----------------------------------

class Direction:
    C = 0
    SW = 1
    W = 2
    NW = 3
    NE = 4
    E = 5
    SE = 6

def DirToString(direction):
    if direction == Direction.C:
        return "C"
    elif direction == Direction.SW:
        return "SW"
    elif direction == Direction.W: 
        return "W"
    elif direction == Direction.NW:
        return "NW"
    elif direction == Direction.NE:
        return "NE"
    elif direction == Direction.E:
        return "E"
    else:
        return "SE"


def getNextDir(direction):
    if direction == Direction.C:
        return Direction.C
    elif direction == Direction.SE:
        return Direction.SW
    else:
        return direction + 1

def getDirAndPosOfNextClockwiseBlockOrHex(direction, position, ring):
    if direction == Direction.C:
        return (direction, position)
    if position == ring - 1:
        newDir = getNextDir(direction)
        return (newDir, 1)
    return (direction, position + 1)

def getStartColorForDirection(direction):
    if direction == Direction.C:
        return (1,1,1)  # White
    elif direction == Direction.SW:
        return (1, 0, 0)  # Red
    elif direction == Direction.W:
        return (1, 1, 0)  # Yellow
    elif direction == Direction.NW:
        return (0, 1, 0)  # Green
    elif direction == Direction.NE:
        return (0, 1, 1) # Cyan
    elif direction == Direction.E:
        return (0, 0, 1)  # Blue
    else:  # direction == Direction.SE
        return (1, 0, 1)  # Magenta
    
def getColorForBlockOrHex(layer, ring, direction, position):
    # There is a better colour scheme than the one I chose.
    # Luckily it is easy to change to it. 
    # Each hex/block uses the colour that the hex after it would have been assigned.
    (direction, position) = getDirAndPosOfNextClockwiseBlockOrHex( direction, position, ring)
    startColor = getStartColorForDirection(direction)
    if ring == 1 or ring == 2:
        (r, g, b) = startColor
    else:
        nextDir = getNextDir(direction)
        endColor = getStartColorForDirection(nextDir)
        (r1, g1, b1) = startColor
        (r2, g2, b2) = endColor
        ratio1 = (ring - position)/(ring-1)
        ratio2 = (position - 1)/(ring-1)
        r = ratio1 * r1 + ratio2 * r2
        g = ratio1 * g1 + ratio2 * g2
        b = ratio1 * b1 + ratio2 * b2
    return (r,g,b)

    
# ----------------------------------
# Code specific to blocks and cubes:
# -----------------------------------
 
def getBlockPositionInLastLayer(ring, direction, position):
    if direction == Direction.C:
        return (0,0,0)
    elif direction == Direction.SW:
        return (ring-1, ring-position-1,0)
    elif direction == Direction.W:
        return (ring-1, 0,position)
    elif direction == Direction.NW:
        return (ring-position-1, 0, ring-1)
    elif direction == Direction.NE:
        return (0, position,ring-1)
    elif direction == Direction.E:
        return (0, ring-1, ring-position-1)
    else:  # direction == Direction.SE
        return (position, ring-1, 0)

def createBlock(layer, ring, direction, position, numberOfLayers):
    (x,y,z) = getBlockPositionInLastLayer(ring, direction, position)
    if layer < numberOfLayers:
        offset = numberOfLayers-layer
        (x,y,z) = (x+offset,y+offset,z+offset)
    blockName = "Block_L" + str(layer) + "R" + str(ring) + DirToString(direction) + "_" + str(position)
    blenderLayer = layer - ring + 1  # For bottom-up layering use: blenderLayer = layer
    layerSelection = selectLayer(blenderLayer)
    bpy.context.scene.layers[blenderLayer] = True  # Otherwise bpy.context.object is not the newly added cube!
    bpy.ops.mesh.primitive_cube_add(location=(x, y, z), layers=layerSelection)
    bpy.context.object.name = blockName
    bpy.context.object.dimensions = (0.995,0.995,0.995)
    # Note: Made it slightly smaller to create shadows between adjacent blocks of the same colour
    # 
    # Create a material for the new block:
    diffuseColor = getColorForBlockOrHex(layer, ring, direction, position)
    matName = "Mat_" + blockName
    mat = makeMaterial(name = matName, diffuse = diffuseColor, specular=(1,1,1), alpha=1)
    setMaterial(bpy.context.object, mat)

def createLineInRingOfBlocks(layer, ring, direction, numberOfLayers):
    for pos in range(1,ring):
        createBlock(layer, ring, direction, pos, numberOfLayers)

def createRingOfBlocks(layer, ring, numberOfLayers):
    if ring == 1:
        createBlock(layer, 1, Direction.C, 1, numberOfLayers)
    else:
        for dir in range(1,7):
            createLineInRingOfBlocks(layer, ring, dir, numberOfLayers)
        
def createLayerOfBlocks(layer, numberOfLayers):
    for ring in range(1,layer+1):
        createRingOfBlocks(layer, ring, numberOfLayers)

def createCube(numberOfLayers):
    for layer in range(1, numberOfLayers+1):
        createLayerOfBlocks(layer, numberOfLayers)


# ----------------------------------
# Code specific to hexagons:
# -----------------------------------
def getHexPositionOnFlatSurface(ring, direction, position, radius):
    sqrt3 = math.sqrt(3)
    if direction == Direction.C:
        return ( 0, 0)
    elif direction == Direction.SW:
        return ( -1.5 * position * radius, -sqrt3 * (ring-1) * radius + sqrt3 / 2 * position * radius)
    elif direction == Direction.W:
        return ( -1.5 * (ring - 1) * radius, -sqrt3 / 2 * (ring - 1 - 2 * position ))
    elif direction == Direction.NW:
        return ( -1.5 * (ring - 1 - position) * radius, sqrt3 / 2 * (ring - 1 + position ) * radius )
    elif direction == Direction.NE:
        return ( 1.5 * position * radius, sqrt3 * ( ring - 1 - position / 2 ) )
    elif direction == Direction.E:
        return ( 1.5 * (ring-1) * radius, sqrt3 / 2 * radius * ( ring - 1 - 2 * position))
    else:  # direction == Direction.SE
        return ( 1.5 * radius * ( ring - 1 - position), - sqrt3 / 2 * radius  * ( ring - 1 + position ))
    
def getHexPosition(layer, ring, direction, position, numberOfLayers, radius, thickness):
    (x,y) = getHexPositionOnFlatSurface(ring, direction, position, radius)
    z = thickness * (numberOfLayers - layer)
    return (x,y,z)

def createHexTile(layer, ring, direction, position, numberOfLayers, radius, thickness):
    (x,y,z) = getHexPosition(layer, ring, direction, position, numberOfLayers, radius, thickness)
    hexName = "Hex_L" + str(layer) + "R" + str(ring) + DirToString(direction) + "_" + str(position)
    blenderLayer = 10 + layer - ring + 1  # For bottom-up layering, use blenderLayer = 10 + layer
    # Note: hexagons are in a separate set of layers from blocks
    layerSelection = selectLayer(blenderLayer)
    bpy.context.scene.layers[blenderLayer] = True  # Otherwise bpy.context.object is not the newly added cube!
    # Make it slightly smaller to create shadows between adjacent hexes of the same colour:
    adjustedThickness = thickness * 0.995
    adjustedRadius = radius * 0.995
    bpy.ops.mesh.primitive_cylinder_add(vertices=6, radius=adjustedRadius, depth=adjustedThickness, end_fill_type='NGON', location=(x,y,z), rotation=(0,0,radians(30)), layers=layerSelection)
    bpy.context.object.name = hexName
    # Create a material for the new hex tile:
    diffuseColor = getColorForBlockOrHex(layer, ring, direction, position)
    matName = "Mat_" + hexName
    mat = makeMaterial(name = matName, diffuse = diffuseColor, specular=(1,1,1), alpha=1)
    setMaterial(bpy.context.object, mat)

def createLineInRingOfHexTiles(layer, ring, direction, numberOfLayers, radius, thickness):
    if ring == 1:
        createHexTile(layer, 1, Direction.C, 1, numberOfLayers, radius, thickness)
    else:
        for pos in range(1,ring):
            createHexTile(layer, ring, direction, pos, numberOfLayers, radius, thickness)

def createRingOfHexTiles(layer, ring, numberOfLayers, radius, thickness):
    if ring == 1:
        createLineInRingOfHexTiles(layer, ring, Direction.C, numberOfLayers, radius, thickness)
    else:
        for dir in range(1,7):
            createLineInRingOfHexTiles(layer, ring, dir, numberOfLayers, radius, thickness)
    
def createLayerOfHexTiles(layer, numberOfLayers, radius, thickness):
    for ring in range(1,layer+1):
        createRingOfHexTiles(layer, ring, numberOfLayers, radius, thickness)

def createHexPyramid(numberOfLayers, radius = 1, thickness = 1):
    for layer in range(1, numberOfLayers+1):
        createLayerOfHexTiles(layer, numberOfLayers, radius, thickness)


# --------------------------------
# Generate cubes and hex pyramids:
# --------------------------------

# Use any number of layers up to 7.
# More than that and hexes/blocks will be placed in camera layers, 
# making deleting more difficult...
createCube(4)
createHexPyramid(4)


# -------------------------------------------------------------
# Sample code to generate cubes and hex pyramids incrementally:
# -------------------------------------------------------------
 
# You can build up the cube incrementally e.g.
# createLineInRingOfBlocks(4, 4, Direction.SW, 4)
# createLineInRingOfHexTiles(4, 4, Direction.SW, 4, radius=1, thickness=1)
# 
# createRingOfBlocks(4, 4, 4)
# createRingOfHexTiles(4, 4, 4, radius=1, thickness=1)

Anyway, that's it. Enjoy!

Why a pyramid of hexagons is cubic

2012-11-11T05:04:00.000-05:00

Below is a "pyramid" made out of hexagonal blocks:

Surprisingly, such a structure always contains a cubic number of hexagonal tiles (aka hexes). The structure above is 4 high and contains 4 x 4 x 4 = 64 hexes.

In a previous blog post, I gave an algebraic proof of why this must be so. Now I present a visual demonstration to help you really understand why!

In a way, what I'm trying to show is this...

Let's look at the pyramid of hexagons from almost directly above. See how each hex has an equivalent block in the cube...

But what about the hidden pieces in the pyramid and cube - do they also correspond directly?

Yes, they do. The visible pieces of each structure form a "shell" over that structure. Remove these shells and you are left with a smaller pyramid and cube. The outer shells of these smaller structures map across in exactly the same way. Keep on peeling away the layers and you see that every hex tile maps to a unique block in the cube and vice versa.

I set up a spreadsheet to show how this nesting works...

And that, my friends, is how beautiful Mathematics can be!

Some hints for why the pyramid of hexagons is cubic

2012-11-08T10:12:00.000-05:00

In my previous two posts I noted that a pyramid built of hexagonal boards contains a cubic number of hexagons and I provided an algebraic proof of this.

My goal is to provide a visual explanation of why this is so.

For those who would like to tackle this challenge themselves, here are a few visual hints...

Is the following shape a hexagon?

Or the silhouette of a cube?

Or the silhouette of the base and two adjacent sides of the same cube?

It's also possible to view this last figure from below, like this:

Looking from below makes it easier to see the cornerstone and edge pieces which aren't visible in the previous figure. So you can count the blocks more easily.

At this point you might want to count the number of blocks adjacent to the white cornerstone (including diagonally). And the number of blocks 2 steps away. And the number of blocks 3 steps away.

Are you seeing a pattern yet?

Proof that a pyramid of hexagons contains a cubic number of tiles

2012-11-03T15:39:00.001-04:00

In my previous blog post I observed that there are a cubic number of hexagons in a pyramid of hexagonal boards (such as found in the game Take It Easy).

My main aim is to provide a visual explanation of why this is so. But first I would like to prove the result algebraically...

The first few equations are reminders of some well-known mathematical results which we will need.

If f is a function defined on the numbers 0, 1, ... n, then

$$ \sum_{{i}={1}}^{n}[f(i)-f(i-1)] = f(n)-f(0) \tag{1}$$
This is true because if you expand out the summation, all the terms except f(n) and f(0) cancel out:
$$ [f(n)-f(n-1)] + [f(n-1)-f(n-2)]+...+[f(2)-f(1)] + [f(1)-f(0)] $$

Recall the formula for the triangular number series: $$ \sum_{{i}={1}}^{n}i= \frac{n \cdot \left(n+1\right)}{2} \tag{2} $$
[You can easily prove this using equation 1. Just set $ f(i) = \frac{i(i+1)}{2} $. Then $ f(i) - f(i-1) = i $ .]

Expand: $$ (n-1)^{3} = n^{3} -3n^{2} + 3n - 1 $$ Then rearrange the terms to give: $$ n^{3} - (n-1)^{3} = 3n^{2}- 3n + 1 \tag{3} $$

A hexagonal board can be seen as concentric rings of hexagonal tiles. Ring 1, the innermost "ring", simply contains the single hexagon at the centre of the board. Ring 2, just outside it, contains 6 hexagons. Ring 3 contains 12. Ring 4 contains 18, and so forth.

So the number of hexagons in ring k is:
$$ r(k)=\begin{cases} 1& \text{if k = 1,}\\ 6(k-1)& \text{if k > 1} \end{cases} $$

Hence the number of hexagons in all the rings from 1 to k is:

$$ \begin{align*} h(k) &= \sum_{{i}={1}}^{k}r(i) \\ &= r\left(1\right) + \sum_{{i}={2}}^{k}[6(i-1)] \\ &= 1 + 6 \cdot \sum_{{i}={1}}^{k-1}i & \text{by re-indexing the summation} \\ &= 1 + 6 \cdot \frac{ \left(k-1\right) \cdot k }{2} & \text{by equation (2)} \\ &= 1 + 3k^{2} - 3k \\ &= 3k^{2} - 3k + 1 \tag{4} \end{align*} $$

Let $ f(k) = k^{3} $

Then:
$$ \begin{align*} f(k) - f(k-1) &= k^{3} - (k-1)^{3} \\ &= 3k^{2} - 3k + 1 & \text{by equation (3) } \\ &= h(k) & \text{by equation (4) } \end{align*} $$ Hence: $$ h(k) = f(k) - f(k-1) \tag{5} $$

So the number of hexagonal tiles in a hexagonal pyramid with n layers is:

$$ \begin{align*} p(n) &= \sum_{{k}={1}}^{n}h(k) \\ &= \sum_{{k}={1}}^{n}[f(k) - f(k-1)] & \text{by equation (5)} \\ &= f(n) - f(0) & \text{by equation (1)} \\ &= n^{3} - 0^{3} \\ &= n^{3} \end{align*} $$

Q.E.D.