Useful collection of links - thanks. The suggested learning path section: reads AI generated. It won’t take anyone five weeks to work this all out - likely a weekend would suffice (coming from someone who has implemented most variations of voice including SIP).
My personal take on learning this stuff: ask Claude Code to build a greenfield project that does what you want and then actually read the code it produced and really try to understand what it’s doing.