logoalt Hacker News

PeterWhittakertoday at 2:35 PM2 repliesview on HN

Interesting article, but it compares apples to a fruit stand: The approach could be improved by comparing Capsicum to using seccomp in the same way.

Sometime ago I wrote a library for a customer that did exactly that: Open a number of resources, e.g., stdin, stdout, stderr, a pipe or two, a socket or two, make the seccomp calls necessary to restrict the use of read/write/etc. to the associated file descriptors, then lock out all other system calls - which includes seccomp-related calls.

Basically, the library took a very Capsicum-like approach of whitelisting specific actions then sealing itself against further changes.

This is a LOT of work, of course, and the available APIs don't make it particularly easy or elegant, but it is definitely doable. I chose this approach because the docker whitelist approach was far too open ended and "uncurated", if you will, for the use-case we were targeting.

In this particular case, I was aided by the fact the library was written to support the very specific use-case of filters running in containers using FIFOs for IPC, logging, and reporting: Every filter saw exactly the same interfaces to the world, so it was relatively easier to lock things down.

Having said that, I wish Linux had a Capsicum-equivalent call, or, even better for the approach I took, a friendlier way to whitelist specific calls.


Replies

thomashabets2today at 3:24 PM

A problem with that approach is that libc can after an upgrade decide to start doing syscalls you were not expecting. Like the first time you call `printf()` it calls `newfstatat()`. Only the first time. Maybe in the future it'll call it more often than that, and then your binary breaks.

I'm not sure what glibc's latest policy is on linking statically, but at least it used to be basically unsupported and bugs about it were ignored. But even if supported, you can't know if it under some configurations or runtime circumstances uses dlopen for something.

Or maybe once you juggle more than X file descriptors some code switches from using `poll()` to using `select()` (or `epoll()`).

My thoughts last time I looked at seccomp: https://blog.habets.se/2022/03/seccomp-unsafe-at-any-speed.h...

show 3 replies
hrmtst93837today at 6:34 PM

You can make seccomp mimic Capsicum by whitelisting syscalls and checking FD arguments with libseccomp, but that quickly becomes error prone once you factor in syscall variants and helper calls. Read and write take the FD as arg0 while pread and pwrite shift it, and sendfile, splice and io_uring change semantics, and ioctl or fcntl can defeat naive filters, so you wind up with a huge BPF program and still miss corner cases.

Capsicum attaches rights to descriptors and gives kernel enforced primitives like cap_enter and cap_rights_limit, so delegation is explicit and easier to reason about. If you want Linux parity, use libseccomp to shrink the syscall surface, combine it with mount and user namespaces and Landlock for filesystem constraints, and design your app around FD based delegation instead of trying to encode every policy into BPF.