Problem Description
After about 10 days of Freedombox uptime, subproceses related to backup
application (at least df
and mountpoint
) block indefinitely, leading to malfunctioning of other applications (such as tt-rss
and samba
), as well as the backup
application itself.
Steps to Reproduce
I’m not sure this can be easily reproduced.
- Set-up remote SSH server.
- Login to FreedomBox.
- Go to
backup
application. - Configure the remote scheduled backup.
- After about 10 days of uptime the symptoms start manifesting
Expected Results
I expected the backups to work indefinitely, without affecting other applications.
Actual results
I see a lot of df
and mountpoint
suspended processes. If I invoke those programs from the command line, the command line blocks as well.
Another side effect that seems caused by this situation is (but is unconfirmed): tt-rss
application stops working because it is unable to synchronize feeds, apparently because of a timeout when accessing resources shared with backup
application.
Another side effect that seems caused by this situation is (but is unconfirmed): Several days later, other applications, such as cockpit
and samba
stop working.
Rebooting the FreedomBox solves the problem for ~10 days.
Information
- FreedomBox version: Debian GNU/Linux 11 (bullseye) ; FreedomBox versión 22.8
- Hardware: Virtual machine.
- How did you install FreedomBox?: Fresh Debian install.
** Proposed patch **
I have experienced similar issues with custom scripts, and solved them by tuning the configuration of sshfs.
This problem might be related to the SSH configuration, but I think that FreedomBox should be robust enough to deal with it.
I’ll report back if this patch solves the issue.
diff --git a/actions/sshfs b/actions/sshfs
index b2fb6b5c7..b47a2439d 100755
--- a/actions/sshfs
+++ b/actions/sshfs
@@ -55,10 +55,24 @@ def subcommand_mount(arguments):
kwargs = {}
# the shell would expand ~/ to the local home directory
remote_path = remote_path.replace('~/', '').replace('~', '')
+ # 20220415glalejos: added reconnect, ServerAliveInterval and
+ # ServerAliveCountMax. After ~11 uptime days, backups scheduled via
+ # the backup application stop working. I can see a lot of processes
+ # '/usr/share/plinth/actions/storage usage-info' suspended in the
+ # invocation of 'df' program (this blocks too if I manually invoke it from
+ # the command line). Also there are a lot of
+ # '/usr/share/plinth/actions/sshfs is-mounted' suspended in the invocation
+ # of 'mountpoint' program (this, too, blocks if manually invoked from the
+ # command line).
+ # Apparently, this situation has some lateral effects, such as tt-rss
+ # failing to update feeds after several uptime days.
+ # Other custom scripts that I used in the past had simmilar issues, and
+ # the options included in the following 'cmd' helped solving them.
cmd = [
'sshfs', remote_path, arguments.mountpoint, '-o',
f'UserKnownHostsFile={arguments.user_known_hosts_file}', '-o',
- 'StrictHostKeyChecking=yes'
+ 'StrictHostKeyChecking=yes', '-o', 'reconnect', '-o',
+ 'ServerAliveInterval=15', '-o', 'ServerAliveCountMax=3'
]
if arguments.ssh_keyfile:
cmd += ['-o', 'IdentityFile=' + arguments.ssh_keyfile]