public:how_parkrun_volunteers_sort_barcodes_-_a_computer_scientist_s_perspective
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
public:how_parkrun_volunteers_sort_barcodes_-_a_computer_scientist_s_perspective [2018/12/27 23:43] – fangfufu | public:how_parkrun_volunteers_sort_barcodes_-_a_computer_scientist_s_perspective [2018/12/28 11:23] (current) – fangfufu | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== How Parkrun volunteers sort barcodes - a computer scientist' | ====== How Parkrun volunteers sort barcodes - a computer scientist' | ||
- | On the Christmas day of 2018, I volunteered at Norwich Parkrun. Towards the end, I ended up helping out with sorting the plastic barcodes. I find the whole process interesting. This is because sorting algorithm is an essential part of computer science curriculum [(https:// | + | On the Christmas day of 2018, I volunteered at Norwich Parkrun. Towards the end, I ended up helping out with sorting the plastic barcodes. I find the whole process interesting. This is because sorting algorithm is an essential part of computer science curriculum [(https:// |
Sorting numbers inside a computer is a bit different to sorting objects in physical world. This is mainly because the uniform cost model does not apply in the physical world [(costmodel > https:// | Sorting numbers inside a computer is a bit different to sorting objects in physical world. This is mainly because the uniform cost model does not apply in the physical world [(costmodel > https:// | ||
- | In this blog post, we analyse the algorithm which Parkrun volunteers use to sort barcodes, using some concepts from computer science. | + | In this blog post, we analyse the algorithm which Parkrun volunteers use to sort barcodes, using some concepts from computer science. This blog post is written in such a way so you can follow it, even if you have not formally studied |
- | ===== Algorithm analysis | + | In short, Parkrun uses bucket sort to sort their barcodes after each event. I do not think I have encountered an implementation of bucket sort on computers, yet I was treated with a real life implementation of bucket sort on Christmas day. |
+ | |||
+ | ===== Analysis of algorithms | ||
An algorithm is an unambiguous set of instruction to solve a certain class of problem. A class of problem may be solved by multiple distinctly different algorithms. Different algorithms may be designed for different situation. They may have different advantages and disadvantages. | An algorithm is an unambiguous set of instruction to solve a certain class of problem. A class of problem may be solved by multiple distinctly different algorithms. Different algorithms may be designed for different situation. They may have different advantages and disadvantages. | ||
Line 26: | Line 28: | ||
===== Bucket sort ===== | ===== Bucket sort ===== | ||
- | Having got the basics of algorithmic analysis out of the way, we can finally look at how Parkrun volunteers sort the bar codes. Parkrun volunteers use a physical implementation of bucket sort [(https:// | + | Having got the basics of algorithmic analysis out of the way, we can finally look at how Parkrun volunteers sort the barcodes. Parkrun volunteers use a physical implementation of bucket sort [(bucketsort > https:// |
<columns 100% - - > | <columns 100% - - > | ||
Line 36: | Line 38: | ||
Bucket sort is a distribution sort - the original input is split into multiple substructure, | Bucket sort is a distribution sort - the original input is split into multiple substructure, | ||
- Initialise the empty buckets. | - Initialise the empty buckets. | ||
- | - Put each object into their corresponding bucket. | + | - Put each object into their corresponding bucket |
- | - Sort each non-empty bucket using a comparison sort algorithm. | + | - Sort each non-empty bucket using a comparison sort algorithm |
- | - Visit the buckets | + | - Visit each bucket |
On computers, bucket sort has the following properties: | On computers, bucket sort has the following properties: | ||
Line 48: | Line 50: | ||
where $k$ is the number of buckets. | where $k$ is the number of buckets. | ||
- | In Parkrun' | + | In Parkrun' |
+ | |||
+ | The advantage of Parkrun' | ||
+ | |||
+ | ==== Distributing barcodes into their respective buckets ==== | ||
+ | The process of distributing barcodes to their respective buckets (stage 1) is a completely stateless process. Volunteers can join and exit the sorting process any time they way. The volunteer pool operates much like a thread pool [(https:// | ||
+ | |||
+ | In fact, I took advantage of the stateless and atomic nature of the stage 1 workload when I was helping out. My hands were quite cold, I was losing dexterity, so I exited the task. I then went and got a cup of hot chocolate to warm up my hands. I rejoined the task after my hands were warm. | ||
+ | |||
+ | The first stage establishes a partial ordering [(https:// | ||
+ | |||
+ | ==== Sorting individual buckets using insertion sort ==== | ||
+ | Different worker threads (volunteers) tend to use different sorting algorithm. Personally I use insertion sort, which is a comparison sort. | ||
+ | |||
+ | Insertion sort is a simple - this is how it works: | ||
+ | - Get a list of unsorted items. | ||
+ | - Divide the unsorted items into two partitions - sorted and unsorted. Set a marker for the sorted section after the first item in the list. The first item is now marked sorted. The rest of the items are marked as unsorted. | ||
+ | - Repeat step 4 to 6 until the unsorted partition is empty. | ||
+ | - Select the first unsorted item. | ||
+ | - Swap this item to the sorted partition, until it arrives at the correct sorted position. | ||
+ | - Advance the marker, so the sorted partition increase its size by one, and the unsorted partition decrease its size by one. | ||
+ | |||
+ | On computers, insertion sort has the following property: | ||
+ | * best case time complexity $O(n)$ | ||
+ | * average time complexity $O(n^2)$, | ||
+ | * worst case time complexity: $O(n^2)$, | ||
+ | |||
+ | * worst case space complexity: $O(n)$. | ||
+ | |||
+ | More information about insertion sort can be found at [(https:// | ||
+ | |||
+ | There are also video animation of insertion sort on the Internet: | ||
+ | <columns 100% - - > | ||
+ | <WRAP centeralign> | ||
+ | //You might want to turn off the audio for this video. // | ||
+ | |||
+ | {{youtube> | ||
+ | </ | ||
+ | < | ||
+ | <WRAP centeralign> | ||
+ | //Leave the audio on for some Romanian folk dance music. // | ||
+ | |||
+ | {{youtube> | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | As a human, I like insertion sort, because it allows me to know how far I have done. The only state information I need to track is the position of the partition marker. This can be done easily by holding the unsorted barcodes in my hand, and place the sorted barcodes on the floor. | ||
+ | |||
+ | However there are some physical optimisation which I made. I collapsed consecutive sorted barcodes into clusters. This is because physically inserting a barcode involves moving all the adjacent barcodes, which takes quite a bit of effort on a rough surface. The two figures below illustrate what I meant: | ||
+ | |||
+ | <columns 100% 50% 50% > | ||
+ | <WRAP centeralign> | ||
+ | // | ||
+ | |||
+ | {{: | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | <WRAP centeralign> | ||
+ | //Partially collapsed clusters of consecutive barcodes, this made it much easier to perform insertion. // | ||
+ | |||
+ | {{: | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | Preallocation of enough empty space for the missing barcodes should help with the problem as well - that way you don't need to shuffle those out-of-place barcodes. | ||
+ | |||
+ | ===== Conclusion ===== | ||
+ | I quite enjoyed participating barcode sorting. It is quite of interesting to observe a physical implementation of bucket sort. A lot of computer algorithms are inspired by real life processes. I think perhaps the bucket sort algorithm you learn from textbooks is the abstraction of real life bucket sort process. After all, Wikipedia does not say who came up with bucket sort [(bucketsort)]. It really does make me think - a lot of assumptions in the physical world do not apply in computers, a lot of assumptions that work in computers do not apply in real life. If real life processes can inspire computer algorithms, then surely computer algorithms can inspire real life processes. | ||
+ | |||
+ | ===== Further readings ===== | ||
+ | There are other topics which link computer science into the way the physical world works, for example operational research [(https:// | ||
+ | |||
- | The advantage of Parkrun' | ||
public/how_parkrun_volunteers_sort_barcodes_-_a_computer_scientist_s_perspective.1545954200.txt.gz · Last modified: 2018/12/27 23:43 by fangfufu