HTML -> markdown -> LLM is standard practice. We strip elements like aside, embed, head , iframe etc. the criteria is conservatively set to avoid removing too many elements (especially in extractMain mode)
https://github.com/lightfeed/extractor/blob/main/src/convert...
I have used gemma 3 and had good results.
Once Gemini 3 flash drops the preview suffix, will update the examples. Thank you for the pointer.