logoalt Hacker News

djeastmtoday at 4:35 PM3 repliesview on HN

They provide an explanation for using the term "sleep":

> In animals, the transfer from short-term memory to long-term memory is thought to be supported by hippocampal replay [33], especially during sleep [41]; in this phase, short-term hippocampal memories are reactivated and consolidated into cortical synaptic weights. Sleep makes animals unable to respond to external stimuli, suggesting that it must provide enough cognitive benefit to justify this cost [41]. Inspired by these biological processes, we propose a method for transferring context-window memory into persistent weights. When the model’s context window becomes full during inference, the model enters a “sleep” in which it performs multiple forward passes over the accumulated context and recursively updates its fast weights via a learned local rule. As in animal sleep, the model receives no external input tokens during this phase. After consolidation, the context window is cleared, and the model resumes operation with updated fast weights. During training, the model is optimized end-to-end by backpropagating through the entire process to maximize task performance after sleep.


Replies

pcrhtoday at 4:40 PM

The function of sleep in animals is largely obscure.

One thing we do know for certain is that it is necessary, it is needed in "dumb" animals as well as in you and I. If an animal can't sleep it will eventually die.

I don't think that applies to the activity described in the OP. Does their LLM "die" if it can't perform the function described?

show 10 replies
order-matterstoday at 5:01 PM

but isnt sleep an already defined technical term for significantly reducing power consumption while preserving its state until woken up?

i feel like its confusing to reuse the word for a process that aims to deliberately change state of the machine / process