Cool project! I have a suggestion - since the processing is done on a moderately powerful laptop anyway, is it possible to bypass the foot pedal and use audio (from laptop or glasses) to predict when to switch to the next bar? I assume it will be a complicated but would trying to match the FFT series to the sheet music pitch data work (or would harmonics cause major headcahes?)
Music transcription has been around for decades, algorithm-wise. In principle this one should actually be easier since it doesn't need to transcribe from scratch but "only" find where in a given sheet you are most likely to currently be.
(Next step: evaluate afterwords and point out mistakes)