Tilde, a Latvian language technology company, released TildeOpen LLM on September 3, 2025. This is an open-source foundational large language model (LLM) designed to support European languages, especially those from less represented countries and regions. This initiative marks an important step for the EU in terms of language equity and digital sovereignty.

image.png

TildeOpen LLM is a dense decoder model with 3 billion parameters and uses a permissive CC-BY-4.0 license. It can support a wide range of languages, from Latvian and Lithuanian to Ukrainian and Turkish. The model was trained on European supercomputers LUMI (Finland) and JUPITER, using 2 million GPU hours of computing resources provided by the European Commission's Large AI Prize Challenge.

In terms of technical details, TildeOpen LLM was trained using a GPT-NeoX script inspired by EleutherAI, with 450,000 updates and approximately 20 trillion tokens. The training process includes three-stage sampling: first, uniform distribution across languages, then enhancement of the natural distribution of high-volume languages, and finally a uniform scan to ensure balance. The model's hyperparameters include 60 layers, an embedding dimension of 6144, 48 attention heads, an 8192-token context window, and the use of SwiGLU activation, RoPE positional encoding, and RMSNorm layer normalization.

In terms of language equity and data sovereignty, traditional mainstream models often focus on English and other major languages, leading to poor performance when handling Baltic, Slavic, and other smaller European languages, often resulting in grammatical errors and awkward phrasing. TildeOpen addresses this by introducing a "fair tokenizer," which represents text from different languages in a similar manner, reducing the number of tokens and improving the reasoning efficiency of less-represented languages. Additionally, organizations can self-host the model in their local data centers or secure clouds that meet EU requirements, ensuring compliance with GDPR and other data protection regulations, thereby addressing sovereignty issues related to hosting models in the US or Asia.

TildeOpen, as a foundational model, is expected to release more specialized versions, such as instruction-tuned translation models, further enhancing its capabilities. Through Tilde's efforts, Latvia hopes to establish itself in the global tech landscape while committed to preserving linguistic diversity.

huggingface:https://huggingface.co/TildeAI/TildeOpen-30b

technical:https://tilde.ai/lv/tildeopen-llm/

Key Points:  

🌍 TildeOpen LLM is an open-source large language model that supports multiple European languages, with a special focus on the representation of small country languages.  

💻 The model was trained using European supercomputer resources and advanced three-stage sampling techniques to ensure balance and fairness across different languages.  

🔒 Organizations can self-host the model, complying with data protection regulations like GDPR, thereby enhancing data sovereignty.