[FB 20.12][Solved] Plinth fails to start due to new frontpage.py shortcuts and filesystem permissions

Dear FB fellows.

Problem

My FB did an automated upgrade this morning from 20.11 to 20.12 but failed to start the Plinth service afterwards. I’ve seen in the 20.12 announcement[0] that new shortcut paths have been added to the frontpage:

  • frontpage: Allow adding shotcuts using .d drop-in files
  • frontpage: Read shortcuts from multiple locations in /etc/, /usr/share and /var/lib

When starting the service via systemctl I do not get any meaningful error messages:

Jul 03 11:00:03 box systemd[1]: plinth.service: Main process exited, code=exited, status=1/FAILURE
Jul 03 11:00:03 box systemd[1]: plinth.service: Failed with result 'exit-code'.

When starting the plinth service manually in foreground, I get more detail and can see the following error:

root@box:/var/lib/freedombox# sudo -u plinth plinth
    INFO plinth.__main__      FreedomBox Service (Plinth) version - 20.12
    INFO plinth.__main__      Script prefix - /plinth
    INFO axes.watch_login     AXES: BEGIN LOG
    INFO axes.watch_login     AXES: Using django-axes 4.4.0
    INFO axes.watch_login     AXES: blocking by IP only.
    INFO plinth.module_loader Module load order - ['apache', 'api', 'names', 'avahi', 'storage', 'backups', 'bind', 'cockpit', 'firewall', 'config', 'datetime', 'deluge', 'diagnostics', 'dynamicdns', 'ejabberd', 'first_boot', 'help', 'ikiwiki', 'infinoted', 'jsxc', 'letsencrypt', 'matrixsynapse', 'mediawiki', 'minetest', 'mldonkey', 'monkeysphere', 'mumble', 'networks', 'openvpn', 'pagekite', 'power', 'privoxy', 'quassel', 'radicale', 'roundcube', 'searx', 'security', 'shadowsocks', 'snapshot', 'ssh', 'sso', 'syncthing', 'tahoe', 'tor', 'transmission', 'ttrss', 'upgrades', 'users', 'i2p', 'gitweb', 'samba', 'minidlna', 'wireguard', 'sharing', 'coturn', 'performance']
    INFO plinth.modules.names Added domain box.local of type domain-type-local with services __all__
    INFO plinth.modules.names Added domain box.example.org of type domain-type-static with services __all__
    INFO plinth.actions       # dynamicdns status
    INFO plinth.actions       $ ikiwiki get-sites
    INFO plinth.modules.letsencrypt Checking if any Let's Encrypt certificates got renewed.
    INFO plinth.actions       # letsencrypt get-status
Traceback (most recent call last):
  File "/usr/bin/plinth", line 6, in <module>
    plinth.__main__.main()
  File "/usr/lib/python3/dist-packages/plinth/__main__.py", line 152, in main
    frontpage.add_custom_shortcuts()
  File "/usr/lib/python3/dist-packages/plinth/frontpage.py", line 131, in add_custom_shortcuts
    custom_shortcuts = get_custom_shortcuts()
  File "/usr/lib/python3/dist-packages/plinth/frontpage.py", line 175, in get_custom_shortcuts
    for file_path in get_custom_shortcuts_paths():
  File "/usr/lib/python3/dist-packages/plinth/frontpage.py", line 169, in get_custom_shortcuts_paths
    return cfg.expand_to_dot_d_paths(file_paths)
  File "/usr/lib/python3/dist-packages/plinth/cfg.py", line 60, in expand_to_dot_d_paths
    for dot_d_file in sorted(path_d.glob('*' + path.suffix)):
  File "/usr/lib/python3.7/pathlib.py", line 1102, in glob
    for p in selector.select_from(self):
  File "/usr/lib/python3.7/pathlib.py", line 483, in select_from
    if not is_dir(parent_path):
  File "/usr/lib/python3.7/pathlib.py", line 1351, in is_dir
    return S_ISDIR(self.stat().st_mode)
  File "/usr/lib/python3.7/pathlib.py", line 1161, in stat
    return self._accessor.stat(self)
PermissionError: [Errno 13] Permission denied: '/var/lib/freedombox/custom-shortcuts.json.d'

Root cause

Following along I see the corresponding piece of code in /usr/lib/python3/dist-packages/plinth/frontpage.py hardcoding, among other paths /var/lib/freedombox/custom-shortcuts.json.d. This directory does not exist on my machine, the parent folder belongs to root and has 600 permissions (hence not accessible to other users such as plinth).

I am running an original Olimex FreedomBox[1], and it’s been working fine without hassle so far. I am surprised at this breaking change, since I am running a vanilla version with just a few services and no manual custimizations.

Solution

I’ve fixed the issue by allowing non-root users to ‘execute’ the directory:

chmod a+x /var/lib/freedombox

Afterwards I was able to successfully start the service again using systemctl start plinth.

Remaining Question

  • Should /var/lib/freedombox be ‘a+x’?
  • If so, should the freedom box installer / post-install script set the permissions accordingly?

Thanks for listening, I hope this transcript may help others.

Cheers,
Axel

[0] FreedomBox 20.12 released
[1] Pioneer-FreedomBox-HSK - Open Source Hardware Board

4 Likes

Hi Axel,

I am also running an Olimex Pioneer Box and had the same problem after the 20.12 update this morning. I could still access gitweb but plinth was not accessible ("503 Service unreachable) . After some trial and error I decided to boot from another SD card with the image from Juli 2019 (https://ftp.freedombox.org/pub/freedombox/pioneer/) .
However, with this new installation (without any customization except for the username/password) the exact same problem happened after I manually upgraded to 20.12.

Your workaround solved it and I can access the plinth page now.
Thanks a lot! :+1:

Chris

2 Likes

Sorry folks, this is a serous regression. Unfortunately, it didn’t surface in our automated tests or manual ones and slipped by. I will prepare a fix which should become available soon and flow in automatically.

4 Likes

Hey Sunil,

Thanks for taking care. These things happen, so do not worry too much. After all, it was comparably easy to spot and fix. I am super happy with my FB and the experience of running, using and administrating it. You’re doing a great job!

Cheers,
Axel

2 Likes

The same happened for me. I am hoping for the update. My freedom pionier-edition updates automaticaly, so i think in a short time the problem will be fixed.

Probably all boxes have downloaded the auto-update by now, but would it be possible to remove a breaking update from the servers?

So far my Freedombox Pioneer remains broken as even an ‘apt update’ only brought in an updated ca-certs package several minutes ago. I will check the work-around.

Thanks!

Are the automated tests only based on debian testing, or also stable + proposed backports?

Thanks for your support and understanding. I just posted a fixed cfg, frontpage: Ignore errors while reading config and shortcuts (!1854) · Merge requests · FreedomBox / FreedomBox · GitLab. If this gets released today, unstable users will get it immediately, testing users 2 days after and stable users (via backports) a day or so after that.

Since this comes from Debian repositories, I don’t know if there is a way to revert to the old version. If it is allowed, apt won’t be able to downgrade automatically with additional configuration. Further, there needs to be testing to ensure that old version doesn’t bailout on the changes/upgrades done by newer version. As a general practice, it is preferable to undo the changes and release a newer version than to rollback to an older version.

In this case, the fix is simple. Fixing the issue and testing it is better.

Currently, the functional tests are based on unstable only. They were somewhat flaky but understand some major improvements. We were hoping to run them in different situations.

Irrespective of the official infrastructure, members of the community can also help by running the functional tests in various scenarios such as Debian derivatives like Ubuntu, and Raspbian and on situations such as upgraded machines.

1 Like

I see, downgrading or a snapshot roll-back would already seem like a temporary quick-fix of the problem.

What I had in mind originally was just whether Debian has a way to remove or block a broken upgrade that got released unfortunately. That might prevent the further propagation of the problem by blocking further installations by manually upgrading users, or auto-upgrades in timezones that have not downloaded the package yet. (So they can later update directly to a fixed follow-up version without any breakage.)

Very good idea, would you have a link to a guide or docs in FreedomBox/Contribute - Debian Wiki

And maybe let us know, but I guess it could also help if users can consider to do some regular donations: Donate — FreedomBox Foundation

Thank you, would a directly downloaded unstable package also be installable with dpkg -i in this case (link to python package)?

1 Like

To me, this sounds like Debian CI / autopkgtest: Debian Continuous Integration - plinth
The tests are defined in debian/tests:
debian/tests/control · master · FreedomBox / FreedomBox · GitLab

If a package in unstable (freedombox or one of its dependencies) causes the tests to fail in testing, then that package is blocked from migrating to testing (and therefore not eligible for backports).

I would like to add some more tests to cover the core functionality. It wouldn’t work for apps though, because they are optional and don’t have a dependency relation to freedombox package. (It would not have caught this issue either, because it depends on a particular filesystem state.)

Those sound like good preventive measures.

Seems like what I meant is called manual “removal of packages” that are confirmed to have a bug that breaks earlier installations.
https://wiki.debian.org/ftpmaster_Removals#Removals_from_backports

Another idea would be to allow users to schedule auto-updating of non-security backports. For example, “no-wait”, 1, 2 ,3 , or 5 days, 1, or 2 weeks.

Or just a random value between 0-48 hours, by default, to stagger all auto downloads. Allowing to remove a package from the servers, if a breaking bug surfaced. (With a “no-wait” option to allow users ro help with their final update testing and verification. And a “disable” option to temporarily disable backports-updates during times when utmost stability is desired.)

A default delay for non-security backports would probably provide the easiest way for regular users to do their own testing, and to help the community with testing. (While still being able to rely on automatic updates, during periods of lower priority for the server.) Just by running a backup of their server, e.g. in a virtual machine, with the delay disabled.

The delay mechanism would just have to make sure that all security updates, as well as special bugfix package updates, are still installed as soon as possible.

Is there anybody here, which freedombox has come up after the last freedombox update has been installed.

I still cannot reach my freedombox running on the pionier-edition.

Please check again today after 06:30 in your configured time zone.

Now my freedombox-pionier-edition is working again. Thank you all.

1 Like

Not sure if the issue you are talking about is what I am having a problem with. We were on vacation and just got back July 17th I turn on TV and Freedom Box and saw there was an update ready for install…went through install and then restarted and now all I have is a “blue screen” I can get to settings and tried to do factory reset but I have nothing but blue screen. Can someone help?? I love my Freedom box