Hi! It's me, Joris.

It looks like I've linked you here myself. Linking people to a blogpost I wrote is often a bit akward, especially at work.

I likely shared this blog in an attempt to further a conversation. Usually the post does a better job at succinctly sharing information than I could by talking.

In any case, I hope me sharing this post doesn't come across as humblebragging, that's really the opposite of what I'm trying to achieve.

Thanks for reading!

My Backup Strategy
5 min read

How to avoid permanent data loss

A few times a year (such as during annual reviews), I go through a backup routine. This gives me peace of mind that all our data is stored redundantly - a computer or hard drive failure can never lead to permanent data loss.

Cloud first, Offline second

Overview of my backup process - details in this post!
Enlarge

Overview of my backup process - details in this post!

I live by a cloud-first approach: any device only contains a local copy (a cache if you will) of what’s permanently stored in the cloud. This way, you can at most lose a day’s worth of data when any device crashes or is stolen.

This isn’t rocket science, yet I’m surprised how many people don’t live by this principle - techies and non-techies alike (granted, for some this is a deliberate privacy related choice).

However, cloud-first does not mean cloud-alone. It’s paramount that you take local backups of any data that’s sitting in the cloud.

Why?

  1. Account lock-out: You might lock yourself out (forget password, lost 2-factor authentication, etc) OR worse, the cloud provider locks you out. There’s a ton, of, horror, stories of big providers locking people out of their accounts (for whatever reason, you’ll typically never find out). More often than not, there’s no way to get support or have access restored.
  2. Hackers: providers can get hacked and your data might be stolen and/or erased. This happens, all, the, time.
  3. Disaster strikes: While cloud providers typically have very good data retention procedures, they also can (and do) make mistakes that lead to data loss.

In all of these cases, there’s serious risk of permanent data loss. Don’t overtrust your cloud providers - it’s only a matter of time before you’ll get stung.

Offline Backups - What’s included

  • Google Drive: Stores all our families administrative documents as well as the cloud synced data of a few programs (such as SimpleMind). Google Drive is an essential piece of the paperless workflow I’ve adopted many years ago.
  • Google Photos: All photos from our smartphones are automatically uploaded in full resolution using the Google Photos mobile app.
  • Gmail: email 📨
  • Google Contacts: cloud storage for contacts (never use phone storage for contacts!)
📦 Exporting Google Data: We have a 2TB Google One family storage plan and use Google Takeout to do exports in 10GB zip files.
        # Token generated via https://github.com/settings/tokens/new, takes 30 secs
export GH_USERNAME="jorisroovers"; export GH_TOKEN="token"; export GH_URL="https://api.github.com"

# IMPORTANT: Use an authenticated call to also include private repos
# Get all repos owned by jorisroovers, exclude repos owned by others that we've contributed to
curl -s -u ${GH_USERNAME}:${GH_TOKEN} "$GH_URL/user/repos?per_page=100&page=1" > /tmp/gh_repos
# Clone every repo (ssh key should be in place). SSH required for cloning private repos
cat /tmp/gh_repos | jq -r '.[] | select(.owner.login=="jorisroovers")  | .ssh_url' | xargs -L1 git clone
    
Enlarge Simple script to clone all github repos
  • Home Assistant: The excellent Home Assistant Google Drive Backup add-on pushes all homeassistant config to Google Drive on a daily basis.
  • Apple Health: Contains a bunch of personal health data such as exercise data, weight tracking, sleeping tracking, etc. Synced to iCloud (one of the very few things we use iCloud for). You can also do a direct export from the app itself.
  • SimpleMind: Excellent cross-platform mindmapping tool. Synced to a Google Drive folder.
  • Bookmarks: I no longer use bookmark syncing (provided by most browsers), instead doing a regular export via the Bookmark Manager.
  • Misc: Various bits and pieces, like some critical online data for my parents.

The backup process

My backup process is a variant of the popular 3-2-1 backup strategy, which says to keep at least:

  • 3 copies of all data
  • 2 copies on different storage media
  • 1 offsite copy

Jeff Geerling does an excellent job explaining the 3-2-1 backup strategy and his personal implementation of it in this video and the accompanying github repository.

I use the [Synology DS920+](https://global.download.synology.com/download/Document/Hardware/DataSheet/DiskStation/20-year/DS920+/enu/Synology_DS920_Plus_Data_Sheet_enu.pdf) NAS combined with 2 x 4TB Seagate NAS Drives and an older 512GB Sandisk SSD. I also added 16GB extra RAM to it (20GB total) to better support some VM workloads. This NAS gets almost universal praise and rightly so, it's great!
Enlarge

I use the Synology DS920+ NAS combined with 2 x 4TB Seagate NAS Drives and an older 512GB Sandisk SSD. I also added 16GB extra RAM to it (20GB total) to better support some VM workloads. This NAS gets almost universal praise and rightly so, it’s great!

Backblaze pricing - which is entirely accurate. Recommended.
Enlarge

Backblaze pricing - which is entirely accurate. Recommended.

At least 3 copies of all data

  1. The primary copy in the cloud (i.e. Google Drive for documents, Google Photos for photos, Notion for notes, etc)
  2. Google Drive is continuously synced to a local NAS (this doesn’t cover all data but a large and important part)
  3. Full offline backups (few times a year): going to each of the cloud providers and doing a data export
  4. Off-site copies (see below)
  5. Any local cached copy on laptop or phones (this doesn’t really count)

2 copies on different storage media

  1. The primary cloud copy: Most cloud providers have very strict processes in place to ensure data durability.
  2. Network Attached Storage: I use a Synology DS920+ NAS to store offline backups. The NAS is setup using SHR RAID, ensuring that a single drive failure can never lead to permanent data loss.
  3. Portable SSDs: Samsung T5 portable SSDs, kept in various safe locations around the house. These are air-gapped (no permanent connection to any computer), which brings the overall approach close to a 3-2-1-1-0 backup strategy.
  4. Backblaze

1 offsite copy

  1. I use Backblaze B2 cloud storage to store an off-site copy of the 3 most recent backups. Backblaze is cheap (much cheaper than AWS S3) and works well with Synology (using Synology Hyper Backup). I manually trigger the upload at the end of the backup routine.

Future Improvements

There’s definitely ways to improve my backup strategy:

  1. Fully automated backups: I used to have some automation, but today it takes a few hours to run through the backup routine. I’ve looked at rclone before, but it doesn’t cover all cloud services (e.g. gmail, calendar). Timeliner has the right idea, but is still missing a lot of pieces and looks unmaintained.
rclone supports a [large number of cloud providers](https://rclone.org/#providers) and looks like the perfect tool. Unfortunately, not all cloud providers play along and provide APIs to export their data.
Enlarge

rclone supports a large number of cloud providers and looks like the perfect tool. Unfortunately, not all cloud providers play along and provide APIs to export their data.

  1. Automated restore checks: The worst thing is having backups and finding out they’re incomplete or unusable (e.g. faulty compression) when you actually need them. Any good disaster recovery strategy must include monitoring of backups and automated restoration verification.
  2. Encrypted backups: Some of my local backups are encrypted, but definitely not all of them. I’d like to encrypt more, but worry about not having access to the encryption keys themselves in case of a severe data loss (i.e. the chicken and egg problem). There’s definitely ways to mitigate this, but I haven’t spend any time on that.

Yet, I’m fairly confident that a lot of things would have to go wrong at the same time for permanent data loss to occur. Chances are that I’m having bigger problems if that were to occur (like a solar storm or other calamity).

If you don’t have a clear backup strategy today, then I highly recommend adopting one. It doesn’t have to be comprehensive from day one, the important part is to start, keep it up and expand a bit every time. Just don’t wait for disaster to strike, that’s really just a matter of time!