This is an old revision of the document!
Table of Contents
How Parkrun volunteers sort barcodes - a computer scientist's perspective
On the Christmas day of 2018, I volunteered at Norwich Parkrun. Towards the end, I ended up helping out with sorting the plastic barcodes. I find the whole process interesting. This is because sorting algorithm is an essential part of computer science curriculum 1). I do have a special connection to sorting algorithm - I bumped into Sir Charles Antony Richard Hoare FRS FREng 2)3), the inventor of quicksort algorithm 4), the day before I was formally taught quicksort for the second time 1) as a computer science undergrad in University of York.
Sorting numbers inside a computer is a bit different to sorting objects in physical world. This is mainly because the uniform cost model does not apply in the physical world 5). However some of the concepts in algorithm analysis can still be applied to physical world, for example time complexity and space complexity 6).
In this blog post, we analyse the algorithm which Parkrun volunteers use to sort barcodes, using some concepts from computer science.
Analysis of algorithms
An algorithm is an unambiguous set of instruction to solve a certain class of problem. A class of problem may be solved by multiple distinctly different algorithms. Different algorithms may be designed for different situation. They may have different advantages and disadvantages.
For example, there are different ways of doing multiplication. One can perform multiplication by repeated addition. However, more commonly for human, multiplication involving two single-digit number is done by reciting the multiplication table (also known as the “times table”). This is because repeated addition can take a long time. For multiplying two numbers which have multiple digits, one might prefer to perform long multiplication, which itself involves reciting the multiplication table. These three methods of performing multiplication are equally valid. They are all examples of algorithm.
In human, multiplication by repeated addition takes such a long time, so it only tends to be used as a last ditch effort for primary school children to complete their exam - it is only used by pupils who cannot remember their multiplication table. In computers, multiplication by repeated addition can be easily implemented on by the use of a for-loop. Although this does not happen very often, as multiplication is supported by most hardware. Although long multiplication and reciting the multiplication table is popular amongst human, computers tend not to use that. Computers use the “shift and add” algorithm, partly because computers operate in binary 7).
Different algorithms have different “complexity”. The complexity of an algorithm describes how the algorithm behaves as the input size changes. There are two kind of complexities - time complexity and space complexity. Time complexity describes how the number of steps required by an algorithm scales, as the input size increases. The space complexity describes how the temporary storage space required by an algorithm scales, as the input size increases.
We describe algorithmic complexity using the “big O notation”. The letter O stands for “the order of the function”. So if one describes the time complexity of function $T$ as $T(n) = O(n^2)$, it means that for function $T$ to process $n$ inputs, the steps required scales in the order of $n^2$. In layman's term, for function $T$ to process $n$ inputs, it needs at least $n^2$ amount of steps. I know some computer scientists and mathematician think that the big O notation is effectively an abuse of mathematical notation, but this is literally how the world works. Oh wells.
The same notation can be applied for the (temporary) space requirement of an algorithm. You can think of this as the amount of scratch paper you need. For example, to do long multiplication, you need at least $3+n$ lines, $n$ being the number of digits of your shortest number. If you don't believe me, you can try it out yourself. The number $3$ comes from your two multiplicand and your results, the $n$ comes from each intermediate lines of your multiplication. This is assuming that you know about commutativity of multiplication, and you always put the longest multiplicand in your first line of your long multiplication! Of course in computers, this refer to the temporary storage space for your variable.
Cost models
When we analyse algorithms on a computer, we tend to use uniform cost model 8), this assumes that the time taken for certain fundamental operation is the same, for example, accessing an element of an array, addition, subtraction, multiplication, division and comparison.
The uniform cost model is certainly not true for human. A good example would be accessing an element of an array. For a computer, it makes no difference whether you are trying to access the 12th element of an array or the 112th element of an array. However, in the physical world, things are slightly different. First of all, we need to define what an “array” is in the physical world. It can be a table of numbers, or a stack of plate. Either way, there is going to be some distance between the 12th element of an array and the 112th element of an array. So accessing them is going to take different amount of time. This can be especially a problem if you did not label your elements, and you end up having to count from the first element of an array. (I suppose this can be a problem for computers too, if you decided to use linked list to implement your data structure.)
Bucket sort
Having got the basics of algorithmic analysis out of the way, we can finally look at how Parkrun volunteers sort the bar codes. Parkrun volunteers use a physical implementation of bucket sort 9)10). This is illustrated by the following two pictures taken at Parkrun.
Bucket sort is a distribution sort - the original input is split into multiple substructure, where comparison sort is performed. It is effectively a two-stage sorting algorithm, how it works can be summarised in the following four steps:
- Initialise the empty buckets.
- Put each object into their corresponding bucket.
- Sort each non-empty bucket using a comparison sort algorithm.
- Visit the buckets in order and put all elements back into the original array.
On computers, bucket sort has the following properties:
- average time complexity $O(n+\frac{n^2}{k}+k)$,
- worst case time complexity: $O(n^2)$,
- worst case space complexity: $O(n \cdot k)$,
where $k$ is the number of buckets.
In Parkrun's particular implementation, the total number of buckets is not defined. Instead, each bucket is set to hold 25 barcodes. The total number of buckets depends on the amount of plastic barcodes.
The advantage of Parkrun's two-stage sorting approach is that the