How true is this statement: "He asserted that any country with its own language that did not have a sovereign LLM trained in that language was at a disadvantage as a globally trained, English-speaking LLM would not know about that country’s history, news and culture that was described in the local language."
I thought all big players already train on basically everything remotely available to them no matter the language or quality, so his take sounds like an opinion formed in the early days of generally available LLMs.
> The Olivia system is an HPE Cray Supercomputing EX system, with 448 GPUs and 64,512 CPU cores.
Training a sovereign LLM with this meager hardware as opposed to a LORA on some open source model seems like a huge mistake and a potential red flag.
There is no way these people have the resources to train a fully fledged LLM, so claiming that is their goal makes me think they don't intend for the LLM to be useful.
Which begs the question, whose money are they wasting - and why?
I wonder if instead (or in parallel), Norway should build a set of training data and share it (for free) with all the model builders.
Seems like making the frontier models know Norwegian and their culture is a better (or additional!) way to reach the end they are going for here.
> Marius Husnes, the Head of IT Platform at the library (Nasjonlbiblioteket) discussed the project at Huawei’s ID Forum 2026 in Paris, saying that no commercial LLM provider was developing a local (Norwegian) language LLM. He asserted that any country with its own language that did not have a sovereign LLM trained in that language was at a disadvantage as a globally trained, English-speaking LLM would not know about that country’s history, news and culture that was described in the local language.
I am not overly confident that Marius Husnes knows what he’s talking about here.
The Welsh language getting LLM training with Nemotron
https://www.bangor.ac.uk/news/2025-09-15-reaching-across-the...
may not be the most efficient way to go about things, but there remains a seemingly obvious use case for non-latin languages to do things from scratch.
see sarvam.ai and their tokenisation improvements on local languages [1]. not every llm needs to help with coding, nor it needs to already become Babel fish.
language is culture, so i can see the motivation behind their initiative. it must be nice to afford to do this yourself.
This is a massive storage deployment. Given the I/O demands of LLM training, especially for checkpointing, moving to this scale of NVMe flash makes sense compared to traditional disk arrays.
>As Husnes put it; Norway is a small country solving a problem every non-English-speaking nation will face: how do you build AI that reflects your language, your culture and your history? AI needs custodians, not just builders.
I'm afraid the answer is, mostly you don't.
Such a thing requires strong political will that, at least in my environment, seems basically impossible to align.
The costs are prohibitive, but beyond that, the type of person who cares about local representation like that is either completely fine with letting foreign companies implement it (after all, you can use ChatGPT in Basque if you want to) or is against the idea of AI altogether.
The wording in this article is a bit strange, why the extreme focus on the brand of storage media? Also, the term LLM seems to be used in a very broad way here, are they actually building a language model from scratch, or are they fine-tuning?
Huawei? You'd think that the recent European revulsion from using overseas providers would have reached Norway's public sector too.
Norway isn't in the EU (no restrictions on Huawei) and has cheap electricity, could become an ai powerhouse.
How about that, they actually asked for permission to use data and the companies said yes.
This can’t be right. 2 PB of flash is like $200k. It’s within reach of many individuals. Then again I guess you don’t need that much storage so maybe it is.
> He asserted that any country with its own language that did not have a sovereign LLM trained in that language was at a disadvantage as a globally trained, English-speaking LLM would not know about that country’s history, news and culture that was described in the local language.
I don’t know this is true. But whatever sounds true enough and gets funding seems to be what flies these days.
As a Norwegian I have never needed a Norwegian language model. Doing most things in Norwegian puts you at a disadvantage internationally anyways. Maybe this has value in schools, but wouldn't it just give kids more trust in relying on LLM's? My friends who work in education report that group work has become insufferable because many do not think critically and ask LLM to verify everything. I really don't see a benefit, but maybe they will find one - that is what research is for.
I am reminded that we recently concluded our experiment of forcing things to be digital on school was considered a flop. These things have a cost if we are wrong.
Sapir-Worf hypothesis but for AI
As a Norwegian this sounds like a mistake. Who will use this LLM? Where? For what? The underlying data could be made more easily searchable and digestible for agents in general if the goal is better knowledge of Norwegian culture.
What is called culture here will increasingly be propaganda. It reminds me of people cheering twitter as a replacement of RSS or using facebook to communicate with your customers rather than email. You won't know which will be the winning company, don't know who might control it in the future and we cant predict what it will cost. It doesn't take much to be very annoying.
This is how much storage the average r/datahoarder user has in their basement. Fewer than 100 hard drives.
I thought US has already coerced most countries to not buy hardware from Huawei.
At least in my country, Chinese companies have been barred from official tenders and procurement.
Ad for Huawei?
That's about 350MB per capita. Humans can produce 2-6kb per hour. That's 13 years of non-stop typing. Wonder where it all comes from. I guess it's websites that aren't compressed / extracted.
384 core cpu cluster? 2 petabytes?
Dell just launched a 2U that fits almost 10 petabytes in it. It's probably not 384 core capable but that is very doable right now, Epyc chips are 192 cores each! https://www.techradar.com/pro/dell-launches-record-shatterin...
2 PB? They will not come close to training in on that amount. Maybe years from now.
Even entire governments are captured by a mild LLM psychosis. Which is sad in the case of Norway. I lived in Norway for two years and always found their government to be highly rational, this is not a rational use of public funds (but I suppose they have plenty of capital).
Western society is completely captured by this form of psychosis and its going to bite us in the a* very soon.
I firmly believe all the Boomer leaders throughout the world are being sold a bag of lies by technocrats that "AI", specifically LLMs, are going to cure disease and death and therefor they are willing to handover all control to the technocrats. Fckin croakers at it again.
[flagged]
[flagged]
[flagged]
Ehhh. None of this sounds right. Translation problems maybe. Lack or technical detail understanding maybe... I don't know. Probably not news.
I'm a Norwegian, and I use the national library almost every day for searching through texts. They have truly one of the best working user interfaces (and functionality) for searching through the massive amounts of text.