Recovering from a full & corrupted root btrfs filesystem

Hello,

I’m using freedomdox buster on a lime2 computer, mostly for syncthing and git-annex through ssh.

Problem Description

Lately it failed me: the root partition got full, which prevented ldap logins.
I also had filesystem errors which lead to the filesystem to get remounted read-only.

Most of the data filling the partition came from syncthing.
I also experimented with propellor recently, it might be part of the issue: because propellor is compiled and ran as root, it may have bypassed quotas (I found >700MiB cabal files related to propellor).

I don’t think I lost any data, and could reinstall, but I would prefer to repair the partition rather than reinstall.

Recovering

I plugged the SD card containing the root filesystem on my laptop.
I can mount it, and remove files to make space.
When doing so, I see no error in sudo journalctl -f.

But when I unmount it and mount it again, I’m back at the starting point: the partition is full, and the files I removed are present again.

# mount /dev/sda2 /mnt
# grep /dev/sda2 /proc/mounts
/dev/sda2 /mnt btrfs rw,relatime,space_cache,subvolid=5,subvol=/ 0 0
# df -h /dev/sda2 
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        30G   30G     0 100% /mnt
# rm -rf /mnt/var/lib/syncthing/redmi6a_DCIM/*
# df -h /dev/sda2
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        30G   21G  8.8G  70% /mnt
# umount /mnt/
# mount /dev/sda2 /mnt
# df -h /dev/sda2 
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        30G   30G     0 100% /mnt

Btrfs being a COW filesystem, such failure to remove files when the disk is full is conceivable, but I would have expected some error messages, I’ve seen none. =

I tried to crub and repair, with no result.

# btrfs scrub start -B /dev/sda2
scrub done for 8529575b-5edd-4392-9102-573824cb5380
scrub started at Tue Jan 28 14:41:38 2020 and finished after 00:06:01
total bytes scrubbed: 29.01GiB with 34 errors
error details: super=2 verify=32
corrected errors: 0, uncorrectable errors: 32, unverified errors: 0
ERROR: there are uncorrectable errors
# umount /mnt/
# btrfs check --repair /dev/sda2 
enabling repair mode
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: 8529575b-5edd-4392-9102-573824cb5380
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
No device size related problem found
[3/7] checking free space cache
cache and super generation don't match, space cache will be invalidated
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 30995677184 bytes used, no error found
total csum bytes: 30058576
total tree bytes: 156532736
total fs tree bytes: 102285312
total extent tree bytes: 16285696
btree space waste bytes: 33822300
file data blocks allocated: 30904627200
referenced 30748667904
# btrfs scrub start -B /dev/sda2
scrub done for 8529575b-5edd-4392-9102-573824cb5380
scrub started at Tue Jan 28 15:00:02 2020 and finished after 00:05:48
total bytes scrubbed: 29.01GiB with 0 errors

but then if I remove files, and scrub again, scrub reports errors again.

I guess I’ll reinstall in the end.
Am I missing something obvious?

Information

  • Hardware: lime2
  • How did you install FreedomBox?: downloaded stable image
1 Like

Well, there is something fishy going on with the microSD card itself. I’m not able to repartition it on linux nor mac OS.
I’m still puzzled this can happens without scary logs.

If this is a hardware problem (SD card going bad), it explains a lot. If you are still able to read the data and not write to it, the following would be worth a try for data recovery:

dd if=/dev/mmcblk0 of=sdcard.img bs=1M status=progress
kpartx -avs sdcard.img

Then check or mount the device /dev/mapper/loop0p2 (or whatever gets created by kpartx) and remove files if possible.

In case of hardware disk failures I suppose the one thing that I know could give us early warnings is smartd from smartmontools. Unfortunately, we don’t have that setup in FreedomBox yet.

I created an issue to consider adding and enabling by default smartmontools.

Hi Sunil, thank you for your help!

I never heard about smart for flash storage, does it really exist?

This event raised a few questions

  • is there any quota mechanism preventing me (or syncthing) from filling the disk ?
  • what lead to the choice of btrfs for the root filesystem?
  • is there plans for making it easier to setup two freedomboxes, with automated backup from one to the other?

Thank you again!