System unstable after upgrade to bullseye

server keeps running :slight_smile: after 7 days
with
bepasty, sharing,jsxc, mintest search synthing

2 days ago, I installed the matrix synapse chat server.
Today, my system booted 4 times (according to journalctl --boot … – comments) from 12:19 'til 12:28 for no apparent reasons, than half an hour later no more entries in journal until I repowered the box at around 18:00
Again, no conspicuous messages …
So my feeling is, the less services, the more stable the system runs, which makes it quite useless :frowning:

It’s not the case. I tried with only Matrix and WireGuard running, and it’s the same. Hanging every second day. It’s very frustrating. I can’t believe that there is no solution for that problem yet. After all, that’s the pioneer edition, it’s kind a symbol of the system, and it’s not working. I think I waited enough for some solution. I’m wandering to go back to previous actually stable release (buster) or to get another hardware (e.g. Raspberry Pi)…or just go to another server OS (e.g. OMV).

Boom, same procedure, now after 10 days, anyhow. But it IS frustrating, because it happens for no apparent reason, and I am getting aware of the system being down only after I can’t chat anymore or commit to my git service, sometimes when being offroad and NOT being able to just hard resetting the box.
I also wonder if I should change the OS.
But I’d prefer to have this issue fixed ASAP, because I’m quite happy with my setup were it not for the casual hangs. Everything else means a probably time consuming process of reinstalling and reconfiguring what I’ve already done.

Is it normal that, at boot, the logs don’t show the same long sequence of kernel messages?

In my own logs, except when I had serious problems, I usually see something starting with:

kernel: Booting Linux on physical CPU 0x0
kernel: Linux version 5.10.0-9-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30)
kernel: CPU: ARMv7 Processor [410fc074] revision 4 (ARMv7), cr=30c5387d

At my system, this is the journal log when filtering for those kernel messages:

oliver@freedombox:~$ sudo journalctl -q -g "Linux version"
Failed to get MESSAGE field: Bad message
Jul 13 17:29:26 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>
Jul 13 17:29:26 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>
Jul 13 17:29:26 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>
Jul 13 17:29:26 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>
Jul 13 17:29:26 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>
Jul 13 17:29:26 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>
Jul 13 17:29:26 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>
Aug 18 02:00:52 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>
Aug 18 09:19:41 freedombox kernel: Linux version 5.10.0-8-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.>

As @sunil pointed out above, the identical time entries may stem from System clock time unset or jumped backwards, because there is no battery backed RTC clock.

There are no messages saying Linux version ... anymore after Dist-Upgrade to Bullseye, or so it seems. How is that?

We have to thank to Olimex team for updating the images of Bulseye properly. Thanks to them, the system now works stably.

Hello Johnny,
Is it possible to reinstall the new freedombox image with the fixed Bulseye over an existing installation? Or do we just reinstall the system from scratch?
Dave

Hi, Dave! The system was unstable after the upgrade with Bullseye, and this went on for a long time without being fixed. So I decided to contact Olimex about this. They replied that they were not aware of this problem and would check what they could do about it. Their reaction was lightning fast - on the same day they had uploaded the images with a properly updated bullseye and wrote to me that they would contact the Freedombox team. Without having to reinstall it, my pioneer started working stably after the last update, so I guess the problem is solved. Do you have any problems with yours?

1 Like

Hi Johnny, After upgrade to Bullseye, the system would crash and become unreachable from the outside world (web, ssh, etc.) every 1 to 3 days. Thinking that it might be hardware related, I purchased a new Olimex A20 Lime 2 board (Rev. L I believe) in December, installed the weekly FB stable image, and it too crashed after a few days. I can’t say if the latest update fixed the problem or not because I just wanted to get it running again after reading your post that Olimex issued a new Bullseys version. Only had Samba, OpenVPN, and Apache running, so it certainly was not overloaded, and the SD card was not filled with Snapshots. Just installed the latest version today and am waiting to see if it is now stable. Thanks for your fast reply.

Hello Oliver,
If you don’t already know, about three days ago, Olimex updated their Bullseye FreedomBox image to fix the random crashes I, and apparently you, have been suffering since the upgrade to Bullseye. I have heard that the normal updates have fixed the problem, but can’t verify that since I installed the latest weekly image and installed it today. Information from Johnny about update from Olimex
Just wanted to let you know.
Dave Oliver

hello,
I tried the 2022-01-07 img, and it lasted 1 day.
server crashed while using transmission.

I tried the 2022-01-07 img and it crashed three times over the last week. Still only running Cockpit, OpenVPN, and Samba. I did find 330 nearly identical log entries that occurred in rapid succession after the last two crashes:

" [1641941461.3557] dhcp4 (eth0): selecting lease failed: -131 NetworkManager" .

I am wondering if this could be some sort of external attack that renders the server unreachable. The router does indicate DOS attacks from various IP addresses. I have hardened my password and selected “Disable password authentication” under Secure Shell (SSH) Server in the System settings.

The first crash that occurred had something to do with either the automatic updates or the automatic backups which occur at night. I unscheduled automatic backups to see if that makes the FB more stable. I can report that the FB lasted through the night. I’ll report back if this solves the problem. The next thing to try is re-flash a new image and not select the recommended automatic updates.

Hello Johnny,
I believe my 10+ year old router/modem was letting DOS and DDOS attacks through and crashing my FBX within a day or two after rebooting or reinstalling newest FBX image. I replaced my modem with a new model and so-far-so-good; its been going for 3 whole days. Looking at the router/modem logs and FBX logs shows an increase in the attacks coming from around the world but I think most are being deflected. Only time will tell. If people are having crashes with old routers, it might be worthwhile to replace the insecure equipment. I will report back in a week or two. To see my experience with day one of the new router/modem go here https://discuss.freedombox.org/t/attacks-on-freedombox-from-around-the-world/1915

Hi all,
apologize for my late response. Unfortunately, The crashes remain so far, every 5 or 6 days my box does not respond anymore. The last time it went silent was today at 7:00. Here some lines from journalctl:

Feb 16 07:00:06 freedombox systemd[1]: Started Timeline of Snapper Snapshots.
Feb 16 07:00:06 freedombox systemd[1]: Started WordPress Scheduled Events Trigger (Cron).
Feb 16 07:00:06 freedombox dbus-daemon[366]: [system] Activating via systemd: service name='org.ope>
Feb 16 07:00:06 freedombox systemd[1]: Starting DBus interface for snapper...
Feb 16 07:00:06 freedombox dbus-daemon[366]: [system] Successfully activated service 'org.opensuse.>
Feb 16 07:00:06 freedombox systemd[1]: Started DBus interface for snapper.
Feb 16 07:00:06 freedombox systemd-helper[26412]: running timeline for 'root'.
-- Boot e7561208bb914a52b985bb8a66b9e17c --
Feb 16 08:10:50 freedombox kernel: Booting Linux on physical CPU 0x0
Feb 16 08:10:50 freedombox kernel: Linux version 5.10.0-11-armmp-lpae (debian-kernel@lists.debian.o>
Feb 16 08:10:50 freedombox kernel: CPU: ARMv7 Processor [410fc074] revision 4 (ARMv7), cr=30c5387d
Feb 16 08:10:50 freedombox kernel: CPU: div instructions available: patching division code
Feb 16 08:10:50 freedombox kernel: CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instructi>
Feb 16 08:10:50 freedombox kernel: OF: fdt: Machine model: Olimex A20-OLinuXino-LIME2
Feb 16 08:10:50 freedombox kernel: Memory policy: Data cache writealloc
Feb 16 08:10:50 freedombox kernel: efi: UEFI not found.
Feb 16 08:10:50 freedombox kernel: Reserved memory: created CMA memory pool at 0x000000004a000000, >
Feb 16 08:10:50 freedombox kernel: OF: reserved mem: initialized node default-pool, compatible id s>
Feb 16 08:10:50 freedombox kernel: Zone ranges:
Feb 16 08:10:50 freedombox kernel:   DMA      [mem 0x0000000040000000-0x000000006fffffff]
Feb 16 08:10:50 freedombox kernel:   Normal   empty
Feb 16 08:10:50 freedombox kernel:   HighMem  [mem 0x0000000070000000-0x000000007fffffff]
Feb 16 08:10:50 freedombox kernel: Movable zone start for each node
Feb 16 08:10:50 freedombox kernel: Early memory node ranges
Feb 16 08:10:50 freedombox kernel:   node   0: [mem 0x0000000040000000-0x000000007fffffff]
Feb 16 08:10:50 freedombox kernel: Initmem setup node 0 [mem 0x0000000040000000-0x000000007fffffff]
Feb 16 08:10:50 freedombox kernel: On node 0 totalpages: 262144
Feb 16 08:10:50 freedombox kernel:   DMA zone: 1728 pages used for memmap
Feb 16 08:10:50 freedombox kernel:   DMA zone: 0 pages reserved
Feb 16 08:10:50 freedombox kernel:   DMA zone: 196608 pages, LIFO batch:63
Feb 16 08:10:50 freedombox kernel:   HighMem zone: 65536 pages, LIFO batch:15
Feb 16 08:10:50 freedombox kernel: psci: probing for conduit method from DT.
Feb 16 08:10:50 freedombox kernel: psci: Using PSCI v0.1 Function IDs from DT
Feb 16 08:10:50 freedombox kernel: percpu: Embedded 21 pages/cpu s54668 r8192 d23156 u86016
Feb 16 08:10:50 freedombox kernel: pcpu-alloc: s54668 r8192 d23156 u86016 alloc=21*4096
Feb 16 08:10:50 freedombox kernel: pcpu-alloc: [0] 0 [0] 1 
Feb 16 08:10:50 freedombox kernel: Built 1 zonelists, mobility grouping on.  Total pages: 260416
Feb 16 08:10:50 freedombox kernel: Kernel command line: console=ttyS0,115200 quiet
Feb 16 08:10:50 freedombox kernel: Dentry cache hash table entries: 131072 (order: 7, 524288 bytes,>
Feb 16 08:10:50 freedombox kernel: Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, l>
Feb 16 08:10:50 freedombox kernel: mem auto-init: stack:off, heap alloc:on, heap free:off
Feb 16 08:10:50 freedombox kernel: Memory: 896504K/1048576K available (12288K kernel code, 1680K rw>

At 8:10 I unplugged the power plug and reconnected a couple of seconds later. So the line before it went dumb reads:
freedombox systemd-helper[26412]: running timeline for 'root'

The next time my system goes down I’ll contact olimex myself.

Regards Oliver

1 Like

System keeps crashing. Anyone else still having this issue?

Jun 06 03:39:01 freedombox systemd[1]: Starting Clean php session files...
Jun 06 03:39:01 freedombox CRON[20022]: pam_unix(cron:session): session closed for user root
Jun 06 03:39:03 freedombox systemd[1]: phpsessionclean.service: Succeeded.
Jun 06 03:39:03 freedombox systemd[1]: Finished Clean php session files.
Jun 06 03:39:03 freedombox systemd[1]: phpsessionclean.service: Consumed 2.021s CPU time.
Jun 06 03:40:05 freedombox systemd[1]: Started WordPress Scheduled Events Trigger (Cron).
Jun 06 03:40:08 freedombox systemd[1]: wordpress-freedombox.service: Succeeded.
Jun 06 03:40:08 freedombox systemd[1]: wordpress-freedombox.service: Consumed 2.374s CPU time.
Jun 06 03:41:54 freedombox sshd[20101]: Received disconnect from 61.177.173.49 port 53246:11:  [preauth]
Jun 06 03:41:54 freedombox sshd[20101]: Disconnected from authenticating user root 61.177.173.49 port 53246 [preauth]
Jun 06 03:44:40 freedombox sshd[20107]: Unable to negotiate with 61.177.173.54 port 36886: no matching key exchange method fo>
Jun 06 03:50:05 freedombox systemd[1]: Started WordPress Scheduled Events Trigger (Cron).
Jun 06 03:50:08 freedombox systemd[1]: wordpress-freedombox.service: Succeeded.
Jun 06 03:50:08 freedombox systemd[1]: wordpress-freedombox.service: Consumed 2.254s CPU time.
Jun 06 03:52:58 freedombox /usr/bin/plinth[514]: # storage usage-info
Jun 06 03:52:58 freedombox sudo[20133]:   plinth : PWD=/ ; USER=root ; COMMAND=/usr/share/plinth/actions/storage usage-info
Jun 06 03:52:58 freedombox sudo[20133]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=110)
Jun 06 03:52:59 freedombox sudo[20133]: pam_unix(sudo:session): session closed for user root
-- Boot 78d0fad7344344b9bf5359ff0b68d2eb --
Jun 06 07:51:35 freedombox kernel: Booting Linux on physical CPU 0x0
Jun 06 07:51:35 freedombox kernel: Linux version 5.10.0-14-armmp-lpae (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1>
Jun 06 07:51:35 freedombox kernel: CPU: ARMv7 Processor [410fc074] revision 4 (ARMv7), cr=30c5387d
Jun 06 07:51:35 freedombox kernel: CPU: div instructions available: patching division code
Jun 06 07:51:35 freedombox kernel: CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Jun 06 07:51:35 freedombox kernel: OF: fdt: Machine model: Olimex A20-OLinuXino-LIME2
Jun 06 07:51:35 freedombox kernel: Memory policy: Data cache writealloc

As usual I am not able to see any hint as to why this happens in the log.

Did you try booting on an SD card with a generic Debian and then check the root file system on your SSD? (you could also use that opportunity to check file systems of your other disks).

Perhaps SSDs are more robust than SD card but power off by removing power probably increase the risk of file system issues.

I solved the problem by replacing my 10+ year old router/modem which was letting DOS and DDOS attacks through and crashing my FBX within a day or two after rebooting or reinstalling newest FBX image. My system has been up and running every since installing my Netgear CAX30. I subscribe to their Armor A.I. service that learns about attacks and blocks them. Good Luck. (Attacks on FreedomBox from Around the World?)

No I didn’t (yet) but I’m going to give it a try.
Removing power: I certainly don’t want to do that but at the moment it’s my only chance to ‘wake up’ my box, unfortunately.

I also see a lot of connection trials in the freedombox journal. My router is a Fritzbox 7590, 2 or 3 years old. But I had realised just now, that I had not updated its OS for a while so I updated my fritzbox a couple of minutes ago.