Just because I've dealt with this exact issue in the past, it may have been a 30fps vs 29.97fps issue. For me the audio was a fixed length, but the frame rate was SLIGHTLY too fast. The problem can manifest as either too slow or too fast depending on which side is expecting 30fps vs 29.97fps.
This is very likely it
I think it was just clock drift on the camcorder during the initial recording, as I'm pretty sure I tried adjusting the frequency of the audio track to make it the same duration as the video track, and the A/V sync was still wrong.
I'm so glad the audio and video tracks are stored interleaved, as it made my solution possible, and the results I got were great. By splitting the interleaved video into small enough chunks, padding the audio, and cutting it exactly to video length, the padding was practically imperceptible.
The only issue I ran into was that ffmpeg can't cut audio with any real precision. I eventually figured out that I could dump the audio track to a headerless PCM file, calculate the exact byte offsets for my cut points, and cut them with perfect precision using the head and tail commands from GNU coreutils. This was perfect because I was able to use the cat command to combine all of the padded audio chunks into a single raw PCM file, which I then made an AAC encode of with ffmpeg to mux with my original encoded video track.