logoalt Hacker News

my123today at 1:47 PM1 replyview on HN

It was a kernel panic for Tahoe. Anything between macOS 12 and 26 wasn't tested so releases in-between might have more issues.

The userspace reboot after FileVault password entry acts a bit oddly with QEMU input devices so you might need to attach a new USB tablet or kbd from the monitor.

> looks like there's a separate patch set for this

Yup and it's a bit of a problem to figure out the right thing to do for it on the upstreaming side as normal guests aren't supposed to do that.

> It's possible to patch out this functionality without special privileges or talk to the in-kernel hypervisor directly

Or pre-patch them all to HVC #1 works too. Patching the host Hypervisor.framework sounds quite brittle especially after they moved to a pile of C++


Replies

m132today at 2:06 PM

> It was a kernel panic for Tahoe.

Ah, must be something else then.

> normal guests aren't supposed to do that

Oh how I wish Arm didn't let anything like this slip into the architecture spec to begin with! Massive source of pain, especially with protected memory/CCA guests. It's not macOS triggering this in isolation either. Most start up binaries for QNX do this too, somehow also in the GIC init path.

I've looked at how different hypervisors/VMMs handle this and, if this makes that patch set any less hacky, Virtualization.framework, QNX Hypervisor, and (I think) VMware all decode and emulate those instructions in software. Virtualization.framework is a remarkable spaghetti in this regard :)

> Or pre-patch them all to HVC #1 works too. Patching the host Hypervisor.framework sounds quite brittle especially after they moved to a pile of C++

Possibly! IIRC, if HCR_EL2.HCD==1, HVC should trap as undefined instruction. Not sure how much of HCR_EL2 can be set from the user-space, but perhaps this could be the least invasive way.

Simply ignoring the instruction, though, is not enough. I remember in my setup, with HVC handling stubbed out, secondary cores would always fail to start. I suspect this to be the culprit.

The SMP bring-up code would fail to pass pointer authentication on the first indirect branch. It would then immediately pivot into FLEH->SLEH->panic(). panic() shortly would attempt to make an indirect jump itself, hoping to crash the other processors, but instead, getting stuck in a loop of calling itself. This would eventually get caught by a stack overflow guard somewhere in FLEH/SLEH, which would place the core in an infinite loop, and... the system would continue to run with just the boot core. Yo dawg, I heard you like panics :)

show 1 reply