Author here. I've got a few papers about this problem (including one in submission), but it is very very hard to do, especially with acceptable overhead. The state of the art is maybe 100x overhead.