I think that might have been the example given here: https://bactra.org/notebooks/nn-attention-and-transformers.h...