Table of Contents
Downloading Graduation Ceremony Videos for University of East Anglia in 2019
This is a rather amusing story of my successful attempt at downloading graduation videos from a University of East Anglia. I have to say it has been a rather educational experience.
When I was an undergrad, I was told that I have to write ethical statement for my final year project, otherwise I would lose marks. So I think I probably should add an ethical statement for this web page.
Quite frankly, I don't think it is fair to charge new graduates twenty pounds for the DVD version of the ceremony, or twenty-five pounds for the HD version of the ceremony on a USB memory stick. A lot of university provides this kind of video for free, for example University of York. Rumour says University of Bath also provides it for free for their graduates.
One of the more amusing conversation I had was with a GP. He asked me why I don't charge fifteen pounds for these videos, my reply to him was that “I hate capitalism”. 1)I do happen to believe charging so much for the graduation video is exploitative. And yes, on this occasion, I would like to denounce capitalism. LONG LIVE COMMUNISM!!!
Anyway, the people who studied Medicine have to do a Situational Judgement Test. I definitely would not pass the CS equivalent of Situational Judgement Test.
The graduation video for medicine
I wanted to download the graduation video for medicine because two of my good friends are in that ceremony. I also happen to know a lot of people who study that degree for some reason.
The graduation ceremony for this degree happens to be the first ceremony during the graduation week – it happened the first thing on Monday morning. The university promised to provide live stream of the ceremony. Unfortunately the third-party provider was underprepared. They did not expect the number of people who attempted to stream the ceremony, so their server crashed. This was really disappointing, because I got up early to watch the graduation stream live in the lecture theatre. I and another friend sat in the lecture theatre for 1.5 hrs, and we saw nothing. This was because the lecture theatre used the same public live stream URL provided by that third-party. Effectively I got up early for nothing. However this did lead to the decision of providing an online video recording of live stream.
A quick inspection of the source code for the web page with the video recording does not reveal where the source video was located. So I decided to fire up the
copy as cURL. The copied text is a command for
cURL to replay the chosen HTTP request. However, I still had to download the video segment by segment, then merge the whole video.
So I seeked to the beginning and the end of the video, recorded the segment filename. I wrote a for loop in
Bash, which enumerated all the
cURL commands necessary for downloading every segment. I then merged the video segments together using
ffmpeg, the details of I merged the video fragment together is described in the next section.
I have to say in some ways, it was great that the video live stream failed, otherwise I would not have been able to download the video stream - the best thing I could have done was doing a screen capture. I did not figure out how to download live stream until Wednesday morning. My housemates had their ceremony on Tuesday morning - all I could do for them was doing a screen capture.
Downloading other graduation videos
Another one of my friend had her graduation ceremony on Friday. After obtaining videos for two separate ceremonies, I wonder if I could take my art further. I felt the screen captures I did for my housemates was not good enough.
I thought about capturing my own network traffic, then extract the video fragment from the network traffic dump. There are two problems with this approach:
- The network traffic dump will contain traffic irrelevant to video capture.
- The website uses HTTPS.
To solve problem 1), we use a virtual machine to achieve network isolation. The virtual machine cannot see the network traffic that it did not generate. To solve problem 2), we launch our browser with the environmental variable
$SSLKEYLOGFILE in order to log the TLS master secret.
The rest of this section details the setup of my capture environment. We assume you are running Debian Buster 1).
Setting your environment for processing the network dump
I install the following packages:
wireshark tshark ffmpeg
Setting up the virtual machine
I decided to use Oracle Virtualbox 5) as my virtual machine. Again, I used Debian Buster as the guest operating system. Please make sure you have a desktop environment installed in your guest operating system, because you need the GUI to run the browser. I also installed the following extra packages:
You also need to set up a shared folder between your virtual machine and the host. Please follow the guide here 8).
Configuration for SSL decryption
Please review the information this link 9). It contains information on setting up
$SSLKEYLOGFILE environmental variable so the browser generates the Key Log File which captures the pre-master secret. It also shows the necessary configuration required for Wireshark / TShark to decrypt HTTPS traffic.
Please note that from my own experience, despite setting the
$SSLKEYLOGFILE environmental variable, the Firefox 10) came with Debian refused to capture the pre-master secret. If you insist on using a browser that does not honour
$SSLKEYLOGFILE, you might want to try mitmproxy 11), which can generate its own Key Log File.
Finally, TShark does not actually accept
$SSLKEYLOGFILE, I configure its location in Wireshark's GUI.
Capturing the data
In your virtual machine, launch chromium, and verify that the Key Log File is being generated. (Please note that if you are making a new capture, the old Key Log File should be deleted.)
Run the following command to start the capturing network traffic:
sudo tcpdump -i enp0s3 -nn -s0 -vvv port 443 -w dump.pcap
After the video ended, press
Ctrl+C to terminate tcpdump, and close Chromium. Copy
dump.pcap and the Key Log File to the host.
Processing the network traffic dump
The network traffic dump must be processed in the host, because TShark uses a lot of memory (8GB!!!).
Run the following command to extract video segments from the HTTPS packets:
tshark -r dump.pcap --export-objects "http,destdir"
The above command creates a new directory named
destdir. I suppose you can attempt doing that in Wireshark GUI, however I can guarantee you that it is extremely painful for you 12).
We can then merge the video fragments together using the following two commands:
for i in `ls destdir/*.ts* | grep -v \( |sort -V`; do echo file $i >> list; done ffmpeg -safe 0 -f concat -i list -c copy -bsf:a aac_adtstoasc output.mp4
The first command generates the list of the video fragments to be concatenated. Note the
-V option in
sort, by using that option, the filenames are sorted in “natural sort”. So if you have numbers “1 3 10 2”, it gets sorted into “1 2 3 10” rather than “1 10 2 3”. Normally
sort sorts texts character-by-character.
I have no idea why the graduation ceremony video file for medicine is bigger than other (2.7GB vs 1.1GB). I don't know if it actually has more entropy compared to other graduation ceremony videos, or if whoever made it used a lower compression settings. To be fair, the standard variant of The Medicine is 5 years compared to 3 years for a normal degree. Perhaps the file size reflects that.