User Tools

Site Tools


public:the_tale_of_httpdirfs

The Tale of HTTPDirFS

HTTPDirFS is a FUSE filesystem which allows you to mount a HTTP directory listing. It has a very interesting beginning.

The story starts with a conversation with the admin of the-eye.eu, which is a website containing a lot of questionable content in terms of copyright. Below is the chat log copied from Discord.

fangfufu 07/15/2018
could you enable webdav please? lol
i know i asked this before and got denied
httpfs2 doesn't work
i want to browse your collection
i don't want to have to download everything
or install the server side script for this? https://github.com/cyrus-and/httpfs
GitHub
cyrus-and/httpfs
httpfs - Remote FUSE filesystem via server-side script

-Archivist 07/15/2018
nope
why do you have to be the one in millions that doesn't want to view the site like a normal person

fangfufu 07/15/2018
is there any ways to "stream" the website?
well because i don't want to download the whole website
it would be nice if i can mount it locally

-Archivist 07/15/2018
it's a website, its provided as is, we already do enough extra shit

fangfufu07/15/2018
ok nvm then, oh well :frowning:

I thought it would be funny to actually write a software that allows me to mount a HTTP directory listing locally, and throw that in the Archivist's face. The project turned out to be fairly difficult. Mainly libfuse is multithreaded, and I got about 40% in the concurrency course work in Principles of Programming Languages, back when I was an undergraduate in York. I am reliably told by a computer science postdoc that nobody likes dealing with race conditions.

Obviously I wrote in the README of the project that I dedicated the project to the Archivist, and people on Reddit find it funny.

It is kind of crazy how far this project has come - this software is now available on Debian. It is interesting enough to attract a Debian Developer who packaged and uploaded it.

Finally, researchers in Germany have decided to incorporate HTTPDirFS in their research software framework, for importing data. Their publication record so far suggests that the project is primarily used for biomedical research.

I really don't know what to feel or what to say about this one - this project was originally designed to annoy someone on the Internet. It was not meant to be useful or helpful. It feels really strange that some researchers on the Internet are taking it seriously. Because I am in UEA Triathlon Club, I have a lot of friends who study medicine, I do enjoy being around them. But I find it highly weird that HTTPDirFS somehow winds up helping out with biomedical research - when will people who are somehow related to medicine leave me alone? (Only joking of course!) My dad does biomedical research, so I suppose it feels great to indirectly contribute to the field. :-)

So overall, I am not sure if this project has been a success or failure, in the sense of whether it fulfilled its original purpose. I am not sure if the Archivist is annoyed. However I believe I have provided ample entertainment for the Redditors in his own subreddit.

What is certain is that I am really proud of this project - it feels great that people on the Internet take your toy project seriously, especially when it wasn't meant to be serious at all. Using badly learnt knowledge learnt from undergraduate days in real life brought me great satisfaction. Thank you for teaching me about concurrency, Professor Alan Burns.

Email to Professor Alan Burns

Race conditions might still be in my code, because my code is crappy, and my knowledge is shoddy. I have emailed my undergraduate professor - hopefully he will give me some help. Hopefully, at least he would find my story funny – I know at least if someone send me an email like this, I would love it.

Dear Alan, 
I don't know if you remember me. I was the 2nd year undergraduate
course rep back in 2012. I am currently a PhD student in University of
East Anglia. I am working on Computer Vision. 

Thank you for teaching me concurrency programming in POPL back then. I
am afraid I didn't do so great in your coursework - I think I scraped a
40%. 

However, the things you taught me has proven to be incredibly useful
and valuable, because my hobby project depends on it. I basically wrote
a filesystem (https://github.com/fangfufu/httpdirfs), and it is mildly
popular. Somebody in Canada decided to package my software and upload
it to Debian repository. Debian is one of the largest Linux
distribution. 

Without the knowledge I gained from your module, I might not have been
able to figure out what I was facing. During the process of writing the
cache system for my filesystem, I have hunted down numerous race
conditions. It was kind of funny and bizarre when my code ran fine when
I forced it to run with one thread, but it failed mysteriously when I
decided to run it with multiple threads. I needed multithreading for
performance reasons. 

I have since identified the critical sections of my code, and guard
them using pthread mutexes. During my own testing, I haven't encoutered
deadlocks. Unfortunately, my Canadian friend has reported a sympton
that sounds like a deadlock. The most annoying thing is that it is not
very reproducible. 

I am pretty sure that if I persevere. I will eventually figure
something out. I am doing a PhD afterall. But I just thought my
situation is kind of funny, and you might appreciate the irony of my
story. I have never thought that my hobby project would depend on
knowledge from one of my worst performing module back in my
undergraduate days! 

So my question is, us there a way to check for deadlocks by static code
analysis? I am sorry, but I do not remember if there was a way to check
if my solution to the dinining philosopher coursework would cause
deadlock, other than running it. It would be nice if you point me to
some reading materials. 

Best wishes,
Fufu

Professor Alan Burn's reply

Hi,
Thanks for your 'story', indeed ironic but interesting - I am pleased to have had a possible input.

Deadlock detection is VERY hard - testing will not identify the subtle situations that can lead to this failure

Two approaches - one use a resource usage protocol that prevents deadlocks (they exists for single processor
systems but are not as common for true parallelism) - two, use model checking on a model of your software
to 'prove' deadlock free in ALL circumstances. The latter is a powerful but not too easy to apply, especially to
code that already exists.

Good luck

Alan

HTTPDirFS was accepted into Debian repository by the then DPL himself!

Chris Lamb was the Debian Project Leader in March 2019. In his blogpost, he mentioned httpdirfs-fuse. This is really cool!!! It feels like I unknowingly got an autograph from a Hollywood star!!!

https://web.archive.org/web/20190731125517/https://chris-lamb.co.uk/posts/free-software-activities-in-march-2019

public/the_tale_of_httpdirfs.txt · Last modified: 2019/08/09 12:24 by fangfufu