Table of Contents

2017-06-14 Data Extinction Event Investigation Report

Introduction

My laptop's home partition uses Btrfs alongside with Snapper. Snapper creates periodic volume snapshots, which I used for backup. It was very suitable for the prevention of accidental file deletion. However, I recently put some bulky files (8x ~2GB) on my home partition. Somehow this led to occasional 100% CPU usage for one of the Btrfs-related processes. I decided to clear my volume snapshots to avoid 100% CPU usage.

I decided to manually dump all the snapshots by running the following commands:

cd /home/.snapshots
for i in *; do btrfs subvolume delete $i/snapshot; done
rm -rf *

The problem was that the currently active default subvolume is mounted inside a folder within /home/.snaphshots too! My files are gone by the time I realised what happened.

Background

Well, it is clear that I know how to do backup, as evidenced by this 1) and this 2). I also introduced my mum to Unison 3). The problem was that I had been too lazy to back things up. I think it is important to analyse what led to the decision of not backing up my home partition properly, as this incident has a mild impact on the progress of my PhD.

I think to understand why I decided not to do backups, I need to look at my historical and current data handling practices and their consequences. So I will start by looking at the similar events that happened in the past. I will also include an interesting Bitcoin-related story that I heard from my time in York.

Historical extinction level events

Notable near-misses

Gary's Bitcoin story

Dangerous cultural practices

It is clear that I understand the danger of losing data while performing risky operations. However, it seems that I always get away with it - in the sense that the mission critical files always have outdated backups somewhere.

I think I basically grew complacent. I believe I have learnt a lot of bad habits, rather than changing my bad habits, I managed to build myself layers of defences against those bad habits.

Rather than stop doing shift-delete. I decided to install volume snapshot, so I can liberally delete files. Rather than backing up data before changing partition layout, I rely on the fact that it is pretty easy to revert changes to LVM partitions by using the LVM configuration backup.

I seem to have been ignoring the danger of losing data, because the benefits of getting things down quickly have blinded me.

Backup solution that are being used

The following backup solutions are currently being used:

Backup solutions that can be considered

Future action plan

  1. Commit and push source files more frequently and more diligently.
  2. Expand the coverage of Resilo Sync.
  3. Push more data into Google Drive - after all, I have unlimited Google Drive storage space as a York Alumni.
  4. Investigate btrbk.