In this case I think you'd want to use Source-Aware-Training [0] to associate a "timestamp" vector to each native context chunk (perhaps overlapping) of conversation, probably the weights using a kind of Gray code so that the LLM has the immediate out-of-native context history can be retrieved through the nearby gray code of 1, 2, etc steps ago compared to the current timestep gray code.