Freedombox longevity ideas?

jonny · April 17, 2022, 7:41am

Last year my Freedombox crashed due to a corrupted SD-Card and I was not able to recover my backups due to the missing LDAP backup. This made me think about how to improve the durability and reliability of my setup. It brought me to three main catagories I thereafter changed:

data storage: moved the root partition from SD-Card to a 256 GB SSD (SD-Card free setup)
improve cooling: installed a solid alluminum passive heat-sink which keeps my Raspberry around 35°C
minimize write access to SSD: installed log2ram and configured the log syncing only on reboot/shutdown

I’m curious in what you are doing to ensure a reliable and durable Freedobox setup. Are there some measures I could add to those mentioned? My overall aim is to get my box up and running and basically forget about it afterwards (resp. to minimize the admin effort as much as possible).

Tido · April 24, 2022, 10:49am

This post Root filesystem completely in RAM (only data mounts) - #16 by Tido
and the following 4 maybe also of interest for you.

timmy · May 4, 2022, 3:46pm

I run freebox as a VM on Proxmox Virtual Environment (PVE).

Proxmox is running on an array of ZFS disks in Raidz1. VM is backed up to Proxmox Backup Server (PBS) on network once a week (just had to run a restore yesterday after borking matrix really good).

Benefits-
VMs are easier to backup and restore vs other options. And it is also complete. You are not saving some folders or a config file. You are saving the entire system as it existed at a moment in time. This is akin to pulling an SD card out of an SBC and imaging it to a img file. Only faster, 100% more reliable, and 99.9999% less down time. (literally, the pause on a VM for running bitmap capture is less time than it would take me to pull out the SD card from an rpi)
-Want to try a risky change? Run a back up real quick (either local snapshot or full on backup) then let it rip. Snapshots and backups can be run while the VM is running. A quick pause for dirty bitmap gathering and its back up to serve.
-Did an automatic upgrade bork the system? Restore from the last backup or snapshot prior to the update.
-Accidentally deleted an important file? Restore individual files if needed.
-Out-grow the disk? Provisioning more space on the VM is fairly easy.
-Need to assign more RAM or CPU to accommodate a growing load? Piece of cake.

PVE community version is stable and installs on just about anything desktop size or better. Old gaming computer? Say hello to your new VM host. And since PVE supports ZFS out of the box, no need to buy and setup RAID hardware. Setup automated backups to happen how ever often you like and keep as many local snapshots as you please.

PBS is the same way. Simple to setup, connect it to the PVE server on network, and you have a physically separate backup server (with all the same ZFS benefits, as it supports ZFS as well). Setup verification, pruning, garbage collection to trim down the number of backups, preserve how ever much history you like.

OR you can forgo the PBS and simply copy the local snapshots of the VM to an external harddrive for safe keeping (ie in the event of a power surge, fire etc). Or copy them up to an AWS S3 instance, if that’s your fancy. The entire PVE server can bite the dust from a lightning strike and the saved snapshot can be restored on a new PVE server as if nothing happened. PBS setup can always be added later.

ZFS provides disk redundancy without requiring RAID hardware. Should PVE server OS disk fail (if not installed on array at setup), recovering the ZFS pool after replacement and install of new disk is fairly easy (no worries about failed hardware config resulting in data loss).
ZFS also provides protection against bitrot. A 4x1TB disks (very cheap) in a raidz1 config gives you about 2.6TB of storage with single disk failure. Resilvering a new disk can be done live and is also fairly straightforward.
Want to buy the cheapest of disks and take the risk off-brand disk having higher failures? Set it to Raidz2 (two disk failure) for a usable storage of 1.7TB. Now you are safe during a resilver from a second disk failure.
I’ve even used resilver to swap from 500GB disks to 1TB disks over the course of a day. Just pull and resilver one disk at a time until they are all replaced. Then make sure autoexpand is set on the pool.
A host of other advantages with ZFS, too many to list here.

Pick up cheap SSDs or HDDs or go with really nice stuff or even SAS drives. Your pick. Doesn’t matter. I’ve even got one server running a 3TB, 6TB, and two 8TB disks all in the same pool (experiment machine). ZFS rolls with it. Have old disks laying around? Repurpose them and still have insurance against one failing on you. Its free real estate then.

This setup also means you can spool up new VMs of other services that might interest you; the infrastructure is already there and VMs can run side-by-side. Or it frees up your SBCs for other jobs like Pihole or chat bot host. Experimentation is easy in a VM. Also, this may repurpose hardware that is otherwise worthless for its original use but overkill just to run a Freedombox setup. You can shop for free, ready-to-roll VMs at turnkeylinux.org.

You can temp shutdown the current live service and restore a backup to verify its working correctly, validating your backups are good to go. Down time is a minute on the front and backends for startup and shutdown, as the restoration can run concurrently with the live VM. Then just shutdown the production VM and start the restored back up. Once testing is complete, shutdown the restored VM and start the production one again. Delete the restored VM. Back up now verified good or back.

Drawbacks-
A PVE server is an obviously larger, more power hungry, and somewhat noisier setup than a little SBC running on 5V USB with passive cooling. (Not that my servers are loud by any means but there is a set of quiet fans running and that’s more than the two rpi4s with passive aluminum radiators a few feet away)
It requires space for the machine and either hardware laying around or free cash to spend on buying budget hardware to assemble. (I run a vertical mount on the wall holding 6U of surplus-ed servers)
If you go with a PBS as well, then double that.

PVE, VMs, et al may be a learning curve, depending on prior experience.
There is lots of helpful information out there, a supportive community, and loads of resources.
But these resources only shorten the time and effort to get going, not eliminate them.
It is also obviously more work to setup initially and verify the system is all arranged how you want it.

And there is some effort once a month (or whatever interval you wish to pick to verify) backups. But this is true for any arrangement with backups. A backup system with no functional verification is a lottery game on whether it works or not. Data/hash verification hasn’t failed me yet; however, “Trust but verify” is a motto for a reason. I’d like to not learn something isn’t right the hard way.

EDIT- Reasons I went this route.
1- I had hardware on hand and desire to learn more about VM operations so as host a number of services on network.

2- Raspberry pi storage space limited when using SD cards. Doing SSD hip-ons for all them starts to negate the small form factor and result in a physical space issue. Add enough Rpis and the resulting equipment cost and physical space consumption was as much as the repurposed hardware. Data storage required more drives and or larger/more costly setups. No way to share storage across each discreet pi without resorting to some central network storage. If a central network storage was employed, no point in having pis for running the services when the size/cost penalty of turning the NAS into a proper hypervisor running VMs was virtually nil.

3- Rpi operations simply were not reliable. Yes, Rpi4 and current gen raspbian have come a LONG way in reducing the random filesystem corruptions that used to occur. But they are ultimately bound by the SD card, if you are keeping the form-factor small.
And trying to burn a back up image of a 32GB card is painful. I had to do it three times, then hash the output files, and compare. Sometimes I had three different values which then required a 4th image write to see which one it matched. A top this, there was a hefty storage penalty for keeping more than one image per device, as I did not benefit from deduplication or other storage tricks stashing them on my computer or external harddrive.
This doesn’t even count writing that back up image back to a card. Not a perfect science either.

This isn’t a knock on raspberry pi. I love experimenting with them and one 4B has been running a pihole on network without issue for a year or more at this point (pihole has an easy backup tool to store the config in case you have to rebuild). Perfect hosts for services with no user/shared data.
But for hosting services that store user or shared data, they have their limitations. If that data only exists on the devices and backups you manage, it becomes a hassle vs stepping up to a serious hardware solution.

timmy · June 3, 2022, 7:32pm

Follow up-> There is a project out there called Pimox. It is a adaptation to run Proxmox on a Raspberry Pi.
In light of the above, this is attractive for running VMs that can be stored back to a back up device (spare computer) that doesn’t need to run full time. I’ve not gotten it to run from a backup of my freedombox instance yet but work continues.

Since I was in a rush when I first attempted this, I did manage to run Proxmox from an old CF-30 Toughbook. This is probably the most minimum hardware I’ve run Promox with a VM on.
2 Cores, 4GB of RAM. No virt capacity to speak up. Passive thermal cooling.
Boot time for the VM from start to Freedombox answering on all services is 15mins or so. A pi4 would spank this thing, if I get it running.

Time to backup from current VM to PBS, restore from PBS backup to new host, reconfigure for hardware differences, and start VM boot - about an hour (includes all normal integrity checks from VM → Backup and from Backup → New VM instance on host).
The integrity check is CPU dependent so it being performed by the ancient CF-30 can be improved upon using something newer. During the swap, I’d offline the VM to prevent data loss as messages and actions on the still running VM would be lost on the restore back from the other device.

If the Pimox works, then another benefit is that new services and systems (a second freedombox on network with a different user base) can be trialed from a very low power device with minimum cost. If things work, it can always be moved to another device with more resources.

Note Pimox is nowhere near something stable I’d consider for a real production environment. I’d consider a frequent backup regime to another device, just in case, if using it.

joseph · June 7, 2022, 5:11am

I care deeply that my freedombox always works as it is my router. I travel frequently and need remote access. I’ll share some things that are working for me and some lessons learned along the way:

I have an app from my ISP to reset my cable modem. When there is an internet problem it is usually the cable modem at issue. That corrects almost all outages.
I only do freedombox configuration through the GUI. EVER.
I am very conservative about installing non-freedombox packages through apt. Almost never, and nothing that would appear to conflict with a service FreedomBox provides.
I do not install software I can’t get from Debian.
I use auto-update. I survived the buster to bullseye upgrade.
I don’t use FB apps that I don’t think I need.
I don’t worry too much about installing an FB app that I decide not to use. I just turn it off or ignore it. I do not try to purge it from the system and modify the FB sqlite data stores.
I have redundant storage using BtrFS RAID1 (except for the boot image. I live dangerously here, and I’m not sure about how to do that…)
I use ECC memory.
after I got some experience with FreedomBox and came to depend on it I invested in some quality hardware - cheap, good, ~~fast~~ Look at the home office/small business edge servers - but make sure you know the noise rating before you buy.
I keep the filesystem simple. All storage is in the BtrFS / volume (but BtrFS does make me nervous - I use it because it is supported by FB. It’s been good to me.)
If you have hot-plug disks lock the chasis. My network cord got hung up in a disk latch and one disk ran for days/weeks with the mirror disconnected. That eventually caught up with me. Close the door. Lock it. That’s why it’s there.
don’t rely on BtrFS snapshots to get you out of a jam - read up on what you’re doing and test it on another machine. My snapshot restore experience was a super slow system after rollback.
BIOS and power supply that supports last power state will turn the system on after a power outage. Highly recommended.

FreedomBox is a great product for me. It is imperfect. I live with those imperfections for me happy in the knowledge that I have a secure, robust little server that is always there when I need it. Treat it like a production server and it will work like a production server.

timmy · June 9, 2022, 2:16pm

I have redundant storage using BtrFS RAID1 (except for the boot image. I live dangerously here, and I’m not sure about how to do that…)

If you use a hypervisor supporting ZFS pool for boot for hosting FB in a VM, you can leverage ZFS to boot from a mirror.
My backup server is a raidz2 across four disks but, IIRC, the boot section is mirrored across all four. Proxmox/ZFS/GRUB handle ensuring replication of the same boot data across all locations upon any update of boot components. I’d lose data before I lost the ability to boot.

And you don’t need to move off the RAID 1 if you are comfortable with hardware raid. Another server of mine has a ZFS mirror only for the boot disks (a pair of SSDs). The data drives are their own pool apart from the hypervisor.
Run your VM of FB on the RAID1 hardware and let a pair of cheap SSDs hold the hypervisor on a ZFS mirror.

If the OS disk pool completely dies at the same time, no big deal. Slap a new set of disks in, install the hypervisor, re-acquire your RAID with all your VM data, and you are up again.

Running a hypervisor also means you can simulate the same environment on any spare machine for testing (or restore from physical backups after a critical failure). Regular backups can be scheduled and run over network to another device or off site. The duplicate machine can be located in another physical location (if fire/flooding/etc are concerns).

don’t rely on BtrFS snapshots to get you out of a jam - read up on what you’re doing and test it on another machine. My snapshot restore experience was a super slow system after rollback.

This. I’ve really come to love the advantages of running VMs. Yes, there is overhead. But the advantages out-weigh the disadvantages.

As long as I have an intact ZFS pool with my data, the OS disks are meaningless. Rip a defunct one out, stick a new one in, install, and I’m back up.

If I miss-configure or install something that’s turns out to be a big no-no (like the time I tried to federate my FB matrix server), I can delete and restore from backup in minutes.

Zero fear of screw ups or testing out new software. Rip an image of the VM either locally on the server or to the back up server and verify integrity. Then have at it. If I break it, no big deal. Restore, learn, and try again if desired.