System unstable after upgrade to bullseye

I run a Pioneer-FreedomBox-HSK behind a Router (AVM Fritzbox 7390). The Pioneer is mounted and connected to a Bay-SSD. I moved the file system to the ssd as described here.
I have to HDD’s connected to the USB-Slots, which are formatted as a RAID 1 (Also described in the post linked above).
I have installed bepasty, calibre, mindlna, radicale, samba and synthing. I also run a gitea service on the box (Outside of freedombox)

I bought / ordered it on 2021-01-06
About: You are running Debian GNU/Linux 11 (bullseye) and FreedomBox version 21.7. FreedomBox is up to date.

The system ran smoothly until a couple of days before when it upgraded to bullseye. Since than the box reboots rather randomly every other day and is not reachable anymore over the network until I unplug power and power it on again.

At the moment I have no clue where to start looking. I disabled auto-update and auto-update to next stable release but to no avail. I also had issues with the syslog service not running anymore. I reinstalled and reenabled rsyslog.service again, but I have the feeling, that it misses a lot of entries, as there are huge gaps in the timestamps.

Where may I start looking for possible issues?

Regards Oliver

1 Like

This is intentional as single log with journald is better on SD cards instead of multiple files and copies. Please use systemd-journald instead. To see the logs, run:

$ journalctl 

Various units and date ranges can be filtered, see man journalctl.

My own primary FreedomBox Olimex Pioneer hardware went to twice after the upgrade. Each time it seem to have lasted around 3 days. Logs show nothing unusual, just regular messages and then suddenly nothing until next reboot.

I have now connected a USB serial console to the machine and started it. This is a good way to debug further. If there are any kernel messages such as crashes we should be able to see them on this console. If I find anything, I will post here.

1 Like

I scanned through my logs and found a number of odd/ suspicious entries:

sudo journalctl -q

# The log starts on Jul 13 with a number of records, then:
# Oddly the recorded timestamp is quite different from the next one in the log
Jul 13 17:29:42 freedombox systemd-timesyncd[350]: System clock time unset or jumped backwards, restoring from recorded timestamp: Sat 2021-08-28 04:47:02 UTC
Aug 18 02:00:52 freedombox kernel: Booting Linux on physical CPU 0x0

# pam-abl started to throw errors
Aug 18 03:39:36 freedombox pam-abl[6793]: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery (-30973) while starting transaction
Aug 18 03:39:36 freedombox pam-abl[6793]: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery (-30973) while starting the transaction to record_attempt.

# Switching dates???
Aug 18 04:18:38 freedombox sshd[8401]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=78.192.6.4
Jul 13 17:29:38 freedombox systemd-timesyncd[370]: System clock time unset or jumped backwards, restoring from recorded timestamp: Wed 2021-08-18 04:18:39 UTC
Aug 18 04:18:40 freedombox sshd[8401]: Failed password for invalid user conta from 78.192.6.4 port 46970 ssh2

# A huge time gap
Aug 18 04:39:37 freedombox sshd[9527]: Disconnected from invalid user suporte 139.59.244.237 port 43338 [preauth]
Aug 18 09:00:29 freedombox systemd-timesyncd[370]: Initial synchronization to time server 144.91.116.85:123 (0.debian.pool.ntp.org).

# last log entry
Aug 18 18:38:52 freedombox gitea-zblesk[383]: 2021/08/18 18:38:52 Completed GET /user/events 200 OK in 10.600166ms
# I repowered the system
Aug 19 16:37:21 freedombox kernel: Booting Linux on physical CPU 0x0

# last log entry
Aug 19 16:38:39 freedombox syncthing[958]: [M7FML] INFO: Ready to synchronize "Default Folder" (default) (sendreceive)
# Again, I pulled the power plug and repowered
Aug 19 17:55:08 freedombox systemd-timesyncd[372]: Initial synchronization to time server 193.141.27.1:123 (0.debian.pool.ntp.org).

# Again, same scenario
Aug 20 16:54:37 freedombox sshd[19543]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=24.166.58.59
Aug 22 16:37:12 freedombox kernel: Booting Linux on physical CPU 0x0

Aug 22 16:38:27 freedombox systemd[1]: Started Session 1 of user oliver.
# time gap
Aug 22 19:24:00 freedombox systemd-timesyncd[372]: Initial synchronization to time server 144.76.59.37:123 (0.debian.pool.ntp.org).

# An error during auto update I guess
Aug 25 07:00:50 freedombox /usr/bin/plinth[11987]: Error running setup - PackageException(error_string="Error during installation", error_details="dpkg: error: dpkg frontend lock was locked by another process with pid 11372

# last log entry
Aug 26 22:37:22 freedombox sshd[23705]: Disconnected from invalid user admin 164.90.134.223 port 52398 [preauth]
# Repowering
Aug 27 09:22:20 freedombox kernel: Booting Linux on physical CPU 0x0

# last log entry
Aug 28 04:50:18 freedombox sshd[14557]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=2.181.126.23
# Repowering
Aug 28 09:44:47 freedombox systemd-timesyncd[350]: Initial synchronization to time server 176.221.42.125:123 (0.debian.pool.ntp.org).

# First occurence of many pam-abl errors of the same type. Eventually I removed the complete directory in /var/lib/abl to initialize new databases, now the errors are gone
Aug 28 09:51:26 freedombox pam-abl[2473]: Resource temporarily unavailable (11) while opening the database environment
Aug 28 09:51:26 freedombox pam-abl[2473]: Resource temporarily unavailable (11) while Creating database environment.
Aug 28 09:51:26 freedombox pam-abl[2473]: The database environment could not be opened

So far, my box is running since 3 days without noticeable problems.

1 Like

I think, the Pioneer board does not have a battery backed RTC clock. So, the time might jump and when the machine gets connected to the Internet, the time is set properly using the NTP protocol by systemd-timesyncd. So, this may be okay.

This is not serious. When logging into a console, pam-abl module is run by PAM. It is used to ensure that too many failures will result in an temporary ban on the user account. Since network login attempts are protected differently, this is not serious.

Adversaries on the Internet keep trying to login via SSH by trying known usernames and guessing passwords. We have fail2ban that will impose temporary bans on these attempts when certain number of attempts fail. This is not a problem unless you have week password and have not disabled password authentication yet.

Last entry but like in my case, it does not seem serious.

This can happen if FreedomBox tries a package operation when apt is busy. This is harmless as it will try again later.

My machine too is stable. Let’s keep watching. Nice work so far.

1 Like

Adversaries on the Internet keep trying to login via SSH by trying known usernames and guessing passwords. We have fail2ban that will impose temporary bans on these attempts when certain number of attempts fail. This is not a problem unless you have week password and have not disabled password authentication yet.

Meanwhile I disabled password authentication, but thank you anyway for hinting.

My FreedomBox hung today. And I saw the following on the serial console. It looks like there is not enough memory for all the services I am running on my FreedomBox (I run quite a few). I will try trimming them a bit and try again.

[218575.169685] systemd[1]: Failed to start Journal Service.
[218673.885821] systemd[1]: Failed to start Journal Service.
[218768.222955] systemd[1]: Failed to start Journal Service.
[219009.404160] systemd[1]: Failed to start Journal Service.
[219113.418769] systemd[1]: Failed to start Journal Service.
[219211.965658] systemd[1]: Failed to start Journal Service.
[219491.164682] systemd[1]: Failed to start Journal Service.
[219584.282928] systemd[1]: Failed to start Journal Service.
[282024.855801] systemd[1]: Failed to start Journal Service.
[282127.806916] systemd[1]: Failed to start Journal Service.
[282329.369764] systemd[1]: Failed to start Journal Service.
[282502.910094] INFO: task systemd:1 blocked for more than 120 seconds.
[282502.917029]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[282502.925061] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[282667.948641] systemd[1]: Failed to start Journal Service.
[282793.442973] systemd[1]: Failed to start Journal Service.
[282988.381403] systemd[1]: Failed to start Journal Service.
[283333.734162] systemd[1]: Failed to start Journal Service.
[283512.250056] systemd[1]: Failed to start Journal Service.
[283590.372255] INFO: task (php):4306 blocked for more than 120 seconds.
[283590.378988]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[283590.387025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[283590.399213] INFO: task cron:4350 blocked for more than 120 seconds.
[283590.405705]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[283590.413735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[283590.425760] INFO: task cron:4351 blocked for more than 120 seconds.
[283590.432236]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[283590.440253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[283952.862894] INFO: task tar:4376 blocked for more than 120 seconds.
[283952.869390]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[283952.877461] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[284073.692893] INFO: task nscd:370 blocked for more than 120 seconds.
[284073.699414]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[284073.707433] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[284073.719503] INFO: task tar:4376 blocked for more than 241 seconds.
[284073.725854]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[284073.733841] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[284276.689264] systemd[1]: Failed to start Journal Service.
[284436.186446] INFO: task cron:4460 blocked for more than 120 seconds.
[284436.193099]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[284436.201227] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[284557.027062] INFO: task btrfs-transacti:180 blocked for more than 120 seconds.
[284557.034547]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[284557.042597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[284557.053072] INFO: task nscd:370 blocked for more than 120 seconds.
[284557.059475]       Tainted: G         C  E     5.10.0-8-armmp-lpae #1 Debian 5.10.46-4
[284557.067516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[289968.573976] Out of memory: Killed process 11040 (tmux: server) total-vm:106136kB, anon-rss:103112kB, file-rss:8kB, shmem-rss:0kB, UID:10001 pgtables:232kB oom_score_adj:0
[290159.168768] systemd[1]: Failed to start Journal Service.

You can also use zram to create a compressed part in RAM which serves as swap. I had the same problem with hard freeze when ram is full and zram help a lot.

the package is named zram-tools.

1 Like

FreedomBox includes zram-tools from v21.5, so all installations on Debian Bullseye should have it.

2 Likes

Thanks for the info, I missed it on the announcement (and already had it configured at the time)

my freedombox is no longer accessible , always after a day or 3, I then pull the plug and restart

Could you please share the list of services you are running? Also could you try disabling some of the services?

Bepasty ,Cockpit, ejabberd, jsxc, mediawiki ,minetest,mumble, Radical,Roundcube,Syncthing, and searx
Hi I do not have much knowledge
I will uninstall ejabberd, jsxc, mediawiki ,minetest,mumble, Radical,Roundcube, and then wait and see ,these are not really in use yet

sorry right now only bepasty, share, minetest searx and syncthing running. I can not switch of JSXC and Tor