User Tools

Site Tools


public:downloading_certain_videos_of_a_certain_university

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
public:downloading_certain_videos_of_a_certain_university [2022/07/17 22:24] fangfufupublic:downloading_certain_videos_of_a_certain_university [2022/07/17 23:17] (current) – removed fangfufu
Line 1: Line 1:
-====== Downloading Graduation Ceremony Videos for University of East Anglia in 2019 ====== 
-This is a rather amusing story of my successful attempt at downloading graduation videos from a University of East Anglia. I have to say it has been a rather educational experience.  
- 
-===== Ethical Statement ===== 
-When I was an undergrad, I was told that I have to write ethical statement for my final year project, otherwise I would lose marks. So I think I probably should add an ethical statement for this web page. 
- 
-Quite frankly, I don't think it is fair to charge new graduates twenty pounds for the DVD version of the ceremony, or twenty-five pounds for the HD version of the ceremony on a USB memory stick. A lot of university provides this kind of video for free, for example University of York. Rumour says University of Bath also provides it for free for their graduates.  
- 
-One of the more amusing conversation I had was with a GP. He asked me why I don't charge fifteen pounds for these videos, my reply to him was that "I hate capitalism". ((Well, I will definitely get myself into trouble if I start selling these videos at a cheaper price! I may be dumb, but I'm not stupid. ))I do happen to believe charging so much for the graduation video is exploitative. And yes, on this occasion, I would like to denounce capitalism. LONG LIVE COMMUNISM!!!  
- 
-Anyway, the people who studied Medicine have to do a Situational Judgement Test. I definitely would not pass the CS equivalent of Situational Judgement Test.  
- 
-===== The video for The Medicine ===== 
-Well, it is called The Medicine, because two of my good friends are in that ceremony. I also happen to know a lot of people who study that degree for some reason. 
- 
-The graduation ceremony for this degree happens to be the first ceremony during the graduation week -- it happened the first thing on Monday morning. The university promised to provide live stream of the ceremony. Unfortunately the third-party provider was underprepared. They did not expect the number of people who attempted to stream the ceremony, so their server crashed. This was really disappointing, because I got up early to watch the graduation stream live in the lecture theatre. I and another friend sat in the lecture theatre for 1.5 hrs, and we saw nothing. This was because the lecture theatre used the same public live stream URL provided by that third-party. Effectively I got up early for nothing. However this did lead to the decision of providing an online video recording of live stream.  
- 
-A quick inspection of the source code for the web page with the video recording does not reveal where the source video was located. So I decided to fire up the ''Developer tools'' of Google Chrome. I immediate realised that the full video was split into multiple segments, and a Javascript was sending out request for individual video segments. It is possible to replay the HTTP request sent from the browser to the server, by right clicking the individual network event, and click ''copy as cURL''. The copied text is a command for ''cURL'' to replay the chosen HTTP request. However, I still had to download the video segment by segment, then merge the whole video.  
- 
-So I seeked to the beginning and the end of the video, recorded the segment filename. I wrote a for loop in ''Bash'', which enumerated all the ''cURL'' commands necessary for downloading every segment. I then merged the video segments together using ''ffmpeg'', the details of I merged the video fragment together is described in the next section.  
- 
-I have to say in some ways, it was great that the video live stream failed, otherwise I would not have been able to download the video stream - the best thing I could have done was doing a screen capture. I did not figure out how to download live stream until Wednesday morning. My housemates had their ceremony on Tuesday morning - all I could do for them was doing a screen capture.  
- 
-===== Downloading other graduation videos ===== 
-Another one of my friend had her graduation ceremony on Friday. After obtaining videos for two separate ceremonies, I wonder if I could take my art further. I felt the screen captures I did for my housemates was not good enough.  
- 
-I thought about capturing my own network traffic, then extract the video fragment from the network traffic dump. There are two problems with this approach:  
-  - The network traffic dump will contain traffic irrelevant to video capture.  
-  - The website uses HTTPS.  
- 
-To solve problem 1), we use a virtual machine to achieve network isolation. The virtual machine cannot see the network traffic that it did not generate. To solve problem 2), we launch our browser with the environmental variable ''$SSLKEYLOGFILE'' in order to log the TLS master secret.  
- 
-The rest of this section details the setup of my capture environment. We assume you are running Debian Buster [(https://www.debian.org/releases/buster/)].  
- 
-==== Setting your environment for processing the network dump ==== 
-I install the following packages: 
- 
-     wireshark tshark ffmpeg 
- 
-We need Wireshark [(https://www.wireshark.org/)] to configure the SSL decryption settings. We need TShark [(https://www.wireshark.org/docs/man-pages/tshark.html)] to extract video fragments from the HTTP packets. We need ffmpeg [(https://ffmpeg.org/)] to merge the video fragments together.  
- 
-==== Setting up the virtual machine ==== 
-I decided to use Oracle Virtualbox [(https://www.virtualbox.org/)] as my virtual machine. Again, I used Debian Buster as the guest operating system. Please make sure you have a desktop environment installed in your guest operating system, because you need the GUI to run the browser. I also installed the following extra packages:  
- 
-    tcpdump chromium 
- 
-We need chromium [(https://www.chromium.org/)] to play the graduation video, it honours the ''$SSLKEYLOGFILE'' environmental variable. We need tcpdump [(https://www.tcpdump.org/)] to capture the network traffic.  
- 
-You also need to set up a shared folder between your virtual machine and the host. Please follow the guide here [(https://help.ubuntu.com/community/VirtualBox/SharedFolders)]. 
- 
-==== Configuration for SSL decryption ==== 
-Please review the information this link [(https://wiki.Wireshark.org/TLS?action=show&redirect=SSL#Using_the_.28Pre.29-Master-Secret)]. It contains information on setting up ''$SSLKEYLOGFILE'' environmental variable so the browser generates the Key Log File which captures the pre-master secret. It also shows the necessary configuration required for Wireshark / TShark to decrypt HTTPS traffic.  
- 
-Please note that from my own experience, despite setting the ''$SSLKEYLOGFILE'' environmental variable, the Firefox [(https://www.mozilla.org/en-GB/firefox/new/)] came with Debian refused to capture the pre-master secret. If you insist on using a browser that does not honour ''$SSLKEYLOGFILE'', you might want to try mitmproxy [(https://docs.mitmproxy.org/stable/howto-Wireshark-tls/)], which can generate its own Key Log File.  
- 
-Finally, TShark does not actually accept ''$SSLKEYLOGFILE'', I configure its location in Wireshark's GUI.  
- 
-==== Capturing the data ==== 
-In your virtual machine, launch chromium, and verify that the Key Log File is being generated. (Please note that if you are making a new capture, the old Key Log File should be deleted.) 
- 
-Run the following command to start the capturing network traffic:  
- 
-        sudo tcpdump -i enp0s3 -nn -s0 -vvv port 443 -w dump.pcap 
-         
-After the video ended, press ''Ctrl+C'' to terminate tcpdump, and close Chromium. Copy ''dump.pcap'' and the Key Log File to the host.  
- 
-==== Processing the network traffic dump  ==== 
-The network traffic dump must be processed in the host, because TShark uses a lot of memory (8GB!!!).  
- 
-Run the following command to extract video segments from the HTTPS packets: 
- 
-    tshark -r dump.pcap --export-objects "http,destdir" 
- 
-The above command creates a new directory named ''destdir''. I suppose you can attempt doing that in Wireshark GUI, however I can guarantee you that it is extremely painful for you [(https://www.youtube.com/watch?v=RZhp-Uctd-c)].  
- 
-We can then merge the video fragments together using the following two commands:  
- 
-    for i in `ls destdir/*.ts* | grep -v \( |sort -V`; do echo file $i >> list; done 
-    ffmpeg -safe 0 -f concat -i list -c copy -bsf:a aac_adtstoasc output.mp4 
- 
-The first command generates the list of the video fragments to be concatenated. Note the ''-V'' option in ''sort'', by using that option, the filenames are sorted in "natural sort". So if you have numbers "1 3 10 2", it gets sorted into "1 2 3 10" rather than "1 10 2 3". Normally ''sort'' sorts texts character-by-character.  
- 
-===== Other notes ===== 
-I have no idea why the graduation ceremony video file for The Medicine is bigger than other (2.7GB vs 1.1GB). I don't know if it actually has more entropy compared to other graduation ceremony videos, or if whoever made it used a lower compression settings. To be fair, the standard variant of The Medicine is 5 years compared to 3 years for a normal degree. Perhaps the file size reflects that.  
  
public/downloading_certain_videos_of_a_certain_university.1658096682.txt.gz · Last modified: 2022/07/17 22:24 by fangfufu