Damn... ~1min in he verbally asks to put the 2 ingredients on the list.
Like... my dude that's way even slower than drag&drop the text on a light right next to it!
Same later on about changing the calendar appointment from whatever to 8pm... he is behind a desktop with a mouse, just input the number or click on the arrows to adjust.
I bet some people will mention that those are "just" simple to understand examples or that it's great for accessibility ... but it's not. It's not reliable enough for complex cases and not reliable enough for accessibility. So... yes JUST basic examples that are slower than other means.
PS: I did prototypes using voice and pointing in XR and yes that paradigm IS powerful, it's just being multimodal.