User Tools

Site Tools


public:on_uea_gig_ticket_price_and_ftse_100

This is an old revision of the document!


On UEA Gig Ticket Price and FTSE 100

Please don't take any of the findings too seriously!

Abstract

By analysing The Gig List 1), I discovered that a ticket to a UEA SU's gig has been increasing on average 7.57%, since sterling decimalisation 2) This is higher than the inflation rate in most years. I also compared the price inflation of the gig ticket against FTSE 100. I discovered that between 1985-01-01 and 2017-01-01, FTSE 100 on average has grown by 309%, while average UEA gig ticket price has grown by 936%. This means that if UEA SU was a commercial organisation, it might have been more effective to invest in UEA SU than an index fund linked to FTSE 100.

Finally, some of these results were presented on Instagram and Concrete Confession. Reaction from others have been fairly interesting. The way these data get presented can certainly affect people's opinion of UEA SU.

Background

On Friday evening, I played Scrabble with AV. I threatened to write a program to solve Scrabble, because I was so bad at playing that game. Writing a Scrabble solver is a two part problem - first you need an anagram solver, second you need to search through the board to find the highest scoring solution. But then I discovered Scrabble solver on the Internet, which I gave me a massive boost, and I managed to defeat AV in one game. However, cheating makes the game so boring. Since the problem solving part of my brain was active, I was actively looking for problems to solve.

After AV has left, I arrived at UEA SU shop to get some snacks. I discovered a book named The Gig List, which contains the list of gigs played in UEA venue between 1966-02-22 to 2018-07-20. It cost £6. I immediate realised the historical significance of this book. It is a good tool for economics research I came up with the idea of investigating how the price of gigs changed over time.

Objective

  1. Convert The Gig List from the paper format to text format.
  2. Import The Gig List into Microsoft Excel, to plot a graph to show how price has changed over time.
  3. Analyse the graph.
  4. Disseminate findings from the analysis.

Converting The Gig List from paper format to Excel spreadsheet

From paper to image

Initially I planned to sacrifice my copy of the Gig List. I started by cutting off the spine of the book using a pair of scissors, and scan the pages using the departmental scanner, which has a document autofeeder. However, after 20% in, I realised that this task was too painful for my delicate hands. I then discovered that there is a digital version of the book3). However the downloading of that document had been disabled. I had to use a third party tool 4). The downloaded PDF file does not include any of the textual information. Only images were downloaded. This meant that I still had to perform OCR on the resulting image. I suppose the good thing is that the quality of the input image for OCR is guaranteed, the bad thing is that I wasted my £6.

From image to text

To convert the downloaded images to text, I ran the following command:

pdfimages Gig_list.pdf -all ./Gig_list
for i in {000..088}; do tesseract Gig_list-$i.jpg Gig_list-$i -l eng -c preserve_interword_spaces=1 ; done
cat jpg/*.txt | grep ^[[:digit:]]?*[[:alnum:]]?*  | grep -E 'p$'\|£ > Gig_list_semi_valid.txt
cat Gig_list_semi_valid.txt|sed -r -e 's/^.{2}/&£/'|sed -r -e 's/^.{7}/&£/'|sed -r -e 's/^.{10}/&£/' > Gig_list_delimited.txt

OCR

I tried various OCR engines, including Ocrad, Cuneiform, GOCR and Tesseract. Tesseract produced the best result, probably because it is made by Google.
Note that preserve_interword_spaces was needed to preserve the formatting of the table.

grep and sed

I agree that the way I used grep was particularly ugly. One regular expression search for strings started with number, then alphabets, the next regular expression searches for the letter p and the £ sign. It can definitely be constructed more elegantly, for a starter, these two regular expressions can be combined as one. I agree that I don't fully understand how regular expression works with grep. If someone can properly teach me how it works in person, that would be much appreciated. I don't use it enough to spend time learning about it. I know one day this will come back and bite me, but whatever.

And, no I do not understand how to use sed at all. I basically copied and pasted the code from Stackoverflow. I basically decided to use £ sign as the delimiter, and I decided to use sed to insert those delimiters.

From text format to Excel spreadsheet

Now a bit of manual processing is required - invalid lines needed to be removed. There weren't that many of them. Then open up the text file using Excel, set the delimiter to £ sign.

Excel is pretty much GUI, it is pretty easy to use. So it needs no further explanation. However it should be noted that when plotting the chart, exponential trendline is required, as inflation is an exponential growth.

public/on_uea_gig_ticket_price_and_ftse_100.1574032314.txt.gz · Last modified: 2019/11/17 23:11 by fangfufu