You're viewing the front-end.social public feed.

Federated feed Local feed

Laurent Bercotska@treehouse.systems
Apr 9, 2026, 10:23 AM
So, systemd's NOTIFY_SOCKET readiness notification mechanism has a sort of authentication mechanism by pid: it doesn't like it when the notification is sent from another process than the main one, unless you add the NotifyAccess=all directive. Okay, fair.
Except that even with this directive, it still can fail for some reason:
if an auxiliary process of the unit sends an sd_notify() message and immediately exits, the service manager might not be able to properly attribute the message to the unit, and thus will ignore it, even if NotifyAccess=all is set for it.
Huh? Even if it the supervisor wants to check that the MAINPID= given in the message is the relevant unit, it shouldn't need the notifier process to still be around, but, let's ignore that for now. So, it has a barrier mechanism to tell the notifier to stick around until the supervisor has acknowledged it. What is that barrier mechanism? The notifier passes an fd to the supervisor via SCM_RIGHTS in the notification message, and when the supervisor closes that fd, it means it has acked the notification, and the notifier can now exit.
That's right, folks: it's a notification that the notification has been received!
There are two mistakes here: 1. the supervisor doesn't need the notifier process to be alive when processing the notification, and 2. NOTIFY_SOCKET being a socket means that any process can connect to it so the supervisor needs to authenticate the relevant unit.
And that is how a bad design decision cascades into a maze of complexity and inefficiency, and a pain to implement. Unless you're willing to use libsystemd to get access to sd_pid_notify_barrier(), good luck getting this to work right.
(systemd people will say this is conspiratorial, but if they wanted to force users to link against libsystemd, they wouldn't behave differently. But I really don't care if it's intentional or not. If it's not intentional, it's just terrible engineering.)
What would be a simpler way to handle all this? Have the supervisor run the unit with a pipe from the daemon to the supervisor, not a socket anyone can connect to. Processes that can write to the pipe are the ones that can inherit it, which means they're necessarily part of the same unit. No need for authentication via pid. When the supervisor receives a certain token on the pipe (e.g. a newline), it means that the daemon is ready. In the daemon or any process notifying on its behalf, that spells: write(fd, "'\n", 1); and that's it.
In other words: the s6 readiness notification protocol, which is exactly what is described in the above paragraph, is as efficient as systemd's synchronization protocol for its notification protocol.
The real tragedy is that when you go through systemd with a looking glass, every part of it is this way. It still works, and nobody notices the inefficiencies because modern machines are blazingly fast, but it is. so. bad.
💬 2🔄 5⭐ 0

Replies

d@nny disc@hipsterelectron@circumstances.run
Apr 9, 2026, 10:25 AM
@ska the anonymity of pipes had not quite occurred to me as a permissions mechanism
💬 2🔄 0⭐ 0
Hugo 雨果whynothugo@fosstodon.org
Apr 9, 2026, 11:10 AM
@hipsterelectron @ska Passing around pipes like this (or file descriptors in general) is very very close to how capabilities work in capability-based systems. They are quite pleasant to use.
💬 1🔄 1⭐ 0
d@nny disc@hipsterelectron@circumstances.run
Apr 9, 2026, 11:12 AM
@whynothugo @ska it happens to have a very curious analogy to a theory of delegated identities i have been working on for anonymous cryptography
💬 1🔄 0⭐ 0
Hugo 雨果whynothugo@fosstodon.org
Apr 9, 2026, 11:13 AM
@hipsterelectron @ska I can imagine why you say this, but I'm still curious to hear more.
💬 1🔄 0⭐ 0
d@nny disc@hipsterelectron@circumstances.run
Apr 9, 2026, 11:25 AM
@whynothugo @ska i am too! i've been trying to write it out. it originated from the desire to expand signal's sealed sender to cover recipient protection (this is impossible in signal's model due to the need for message routing in plaintext). there is one way to expand this in general through the mechanism of ratcheting https://eprint.iacr.org/2020/148 (awesome paper) but in general very little work has been done on anonymity so the application of identity separation is much less well developed
💬 1🔄 0⭐ 0
d@nny disc@hipsterelectron@circumstances.run
Apr 9, 2026, 11:27 AM
@whynothugo @ska fds have a useful analogy to keypairs in that the only way to achieve any form of atomicity in the VFS layer is to open up an fd. but i also think every single filesystem is terrible and could fix that without necessarily invoking capabilities
💬 1🔄 0⭐ 0
Laurent Bercotska@treehouse.systems
Apr 9, 2026, 12:06 PM
@hipsterelectron @whynothugo The set of filesystem primitives is indubitably the weakest part of the Unix API, it's not on the filesystems themselves - what we need is a well-designed transactional file API, and then various filesystem implementations will follow.
That said, the concept of fd is really useful; when Unix people say "everything is a file" they really mean "everything is a file descriptor" and that applies cleanly to a surprising lot of stuff. An fd is a generic handle that has very nice properties (lifetime, configurable sharability, etc.) and can be piggybacked onto by a lot of systems that need the same properties, e.g. capabilities.
💬 2🔄 1⭐ 0
d@nny disc@hipsterelectron@circumstances.run
Apr 9, 2026, 12:09 PM
@ska @whynothugo oh yes my complaints with filesystems go beyond their compliance with the essentially missing VFS API but i wholeheartedly agree here
💬 0🔄 0⭐ 0
Hugo 雨果whynothugo@fosstodon.org
Apr 9, 2026, 11:56 PM
@ska @hipsterelectron One of the more notable missing APIs is one to atomically replace a file.
"Replace file at path with this other file, unless someone else the given fd no longer refers to it". Or something along these lines.
💬 0🔄 0⭐ 0
Laurent Bercotska@treehouse.systems
Apr 10, 2026, 12:00 AM
@whynothugo @hipsterelectron Huh? rename() exists and it's the only reason why the API is even usable. It's the most powerful tool in the box.
I have endless complaints about how it's not powerful enough because it cannot replace directories and I have to constantly use symlinks in order to emulate that; but for not-directories, it's definitely there and essential.
💬 0🔄 0⭐ 1
Hugo 雨果whynothugo@fosstodon.org
Apr 10, 2026, 3:17 PM
@ska @hipsterelectron rename(3) unconditionally overwrites the target file and is racey. If another process overwrite a file after you checked it, that data is silently lost forever.
💬 1🔄 0⭐ 0
Laurent Bercotska@treehouse.systems
Apr 10, 2026, 3:44 PM
@whynothugo @hipsterelectron True, but that does not mean the system call (it is a system call) is racy, it only means it should be used with caution, just like any potentially destructive system call, including write(2) and unlink(2).
I agree, however, that standardizing something like renameat2() would have been better than rename().
💬 0🔄 0⭐ 1
navinavi@social.vlhl.dev
Apr 10, 2026, 3:50 PM
@ska @whynothugo @hipsterelectron

simultaneous processes editing a file without a lock seems like it'd be tricky regardless

like, i suppose you could renameat2 with RENAME_EXCHANGE, and then check again for possible differences, but then the remote process could be doing the same and ahhh

dunno a good fix for this that doesn't involve some flock or redesigning the thing to avoid two writers
💬 0🔄 0⭐ 0
Hugo 雨果whynothugo@fosstodon.org
Apr 11, 2026, 1:15 AM
@navi @hipsterelectron @ska I don't really see renameat2 solving the problem. You can't atomically replace a file ensuring that there are no races. You can exchange it, and then reverse it if the change was wrong, but now you have a new opportunity for races.
What I wish we had was way to perform the operation "only if fd N points still points to that file, if it no longer does, then bail".
💬 0🔄 0⭐ 1
Laurent Bercotska@treehouse.systems
Apr 11, 2026, 1:25 AM
@whynothugo @navi @hipsterelectron I really don't understand what you're going for, because I have literally never encountered the issues you seem to be focused on.
- What is a sequence of operations that you think happens and causes a problem, even with renameat2, and that you want to protect against?
- What does your suggested operation accomplish?
💬 0🔄 0⭐ 1
d@nny disc@hipsterelectron@circumstances.run
Apr 11, 2026, 1:29 AM
@ska @whynothugo @navi would also be curious re this
💬 0🔄 0⭐ 0
Gabriele Sveltogabrielesvelto@mas.to
Apr 9, 2026, 7:29 PM
@hipsterelectron @ska they are extremely useful. I use the same mechanism within Firefox crash reporting machinery because they guarantee that processes at both ends are what they claim to be, and both ends can immediately see if the other end went away without having to deal with PIDs, waitpid() or anything else for the matter, so they work even across unrelated processes.
💬 1🔄 1⭐ 0
Laurent Bercotska@treehouse.systems
Apr 9, 2026, 11:29 PM
@gabrielesvelto @hipsterelectron pipes as a death detection mechanism in unrelated processes are really underestimated and underused, probably because "unrelated" processes still need a common ancestor to create the pipe. But it's a really useful pattern, and that's only one of the many uses of pipes.
💬 0🔄 1⭐ 3
LisPilispi314@udongein.xyz
Apr 9, 2026, 5:16 PM
@ska > What would be a simpler way to handle all this? Have the supervisor run the unit with a pipe from the daemon to the supervisor, not a socket anyone can connect to. Processes that can write to the pipe are the ones that can inherit it, which means they're necessarily part of the same unit.

POSIX does guarantee a whole 7 other file descriptors available for such purposes, after all, why not use them?
💬 0🔄 0⭐ 0
navinavi@social.vlhl.dev
Apr 9, 2026, 11:08 PM
@lispi314 @ska fwiw _POSIX_OPEN_MAX (which is the minimum value for the max fds) is 20
but honestly i doubt any system now a days has less than at least 1024? if not unlimited (w/o accounting for ulimit)
💬 0🔄 0⭐ 0
Laurent Bercotska@treehouse.systems
Apr 9, 2026, 11:36 PM
@navi @lispi314 it's Linux-only, so POSIX limits are extremely conservative. The default is 1024, yes, but that can be changed by the admin, who knows. Would systemd even run with a 20 fd limit? 😁
💬 0🔄 0⭐ 0
mirabilos🐈‍⬛mirabilos@toot.mirbsd.org
Apr 9, 2026, 11:43 PM
@ska @navi @lispi314 I’ve had fun with very low fd limits on some systems I ported mksh to…
💬 0🔄 0⭐ 0