Surely the system prompt is cached across accounts?
You can cache K and V matrices, but for such huge matrices you'll still pay a ton of compute to calculate attention in the end even if the user just adds a five word question.
I would assume so too, so the costs would not be so substantial to Anthropic.
You can cache K and V matrices, but for such huge matrices you'll still pay a ton of compute to calculate attention in the end even if the user just adds a five word question.