Türkiye has officially entered the generative-AI race with Kumru AI, a 7.4-billion-parameter large language model trained from scratch on Turkish text. Built by local start-up VNGRS, Kumru runs on a single 16 GB GPU, beats 70 B models on Turkish benchmarks and can be deployed on-premise for banks, hospitals and government—slashing foreign-vendor reliance and complying with Türkiye’s data-residency rules.
Turkish engineers on Sunday lifted the wraps on Kumru AI, the country’s first domestically developed large language model, in a move Ankara hopes will end dependence on Silicon-Valley giants and anchor Türkiye’s National AI Strategy of turning artificial intelligence into a 5%-of-GDP engine by 2025.
- Model size: 7.4 B parameters (decoder-only)
- Training data: 500 GB cleaned Turkish corpus; 300 B tokens
- Context window: 8,192 Turkish tokens (~20 A4 pages)
- Hardware need: 16 GB VRAM (RTX 3090 / A4000)
- Price tag: ≈ US $2,000 on-premise licence—15× cheaper than running LLaMA-70 B on an H100
- Open-source sibling: Kumru-2B (4.8 GB) for mobiles & edge devices
- Benchmark win: Outscores LLaMA-3.3-70B, Gemma-3-27B and Qwen-2-72B on Turkish tasks [^1238^]
Why Kumru Matters—Ankara’s “Digital Independence” Push
Unlike multilingual models that treat Turkish as an after-thought, Kumru was pre-trained exclusively on Turkish text scraped from court rulings, medical journals, OCR-ed newspapers and parliamentary minutes, then fine-tuned on 1 million instruction samples covering summarisation, legal-QA, healthcare coding and customer support.
“Owning the tokenizer means owning the language,” said lead developer Melikşah Türker, referring to Kumru’s custom byte-pair vocabulary that uses up to 98% fewer tokens than rival multilingual models—saving both compute cost and inference time.
The release dovetails with Türkiye’s 2024–2025 AI Action Plan that earmarks US $300 million for domestic generative-AI, high-performance clusters and a “Central Public Data Area” where start-ups can legally tap state archives.
Benchmarks & Performance
On Cetvel, a 26-task Turkish NLP suite, Kumru-7B tops the leaderboard in:
- Grammatical error correction (87.4 % F1)
- Abstractive summarisation (ROUGE-L 52.1)
- Legal entailment (93.8 % acc.)—a 14-point jump over LLaMA-3.3-70B.
Early adopters include İşbank, piloting Kumru for internal document review, and Ankara City Hospital, testing a patient-discharge summariser that keeps records inside hospital firewalls—crucial for KVKK, Türkiye’s GDPR-equivalent.
Deployment Scenarios
Because the model runs on a single GPU, midsize companies can air-gap it on-premise, avoiding cloud-related CBDT (Cross-Border Data Transfer) headaches. VNGRS offers:
- Kumru-Enterprise: fine-tune on client data (US $8k)
- Kumru-Edge: 2B version for Android/iOS apps (MIT licence)
- API on AWS Istanbul Region: <200 ms latency inside Türkiye
Open-Source & Community Impact
Weights of Kumru-2B are already on Hugging Face under Apache 2.0; the 7B base model will follow once safety red-teaming ends next month. Developers at İstanbul Technical University forked the repo within hours, pushing ** quantized 4-bit** builds that squeeze onto Raspberry Pi 5 for rural schools.
“We want a Turkish LLM stack—from tokenizer to RLHF—fully auditable by our universities,” said Industry Minister Mehmet Fatih Kacır in a recorded message.
Global Context
Türkiye joins Vietnam (ViGPT), Indonesia (Sahabat-AI) and the Czech Republic (CzechLLM) in the small but growing club of nations building sovereign language models to safeguard cultural nuance and data sovereignty. Analysts at Emerging Europe estimate Ankara could save US $120 million/year in licence fees if half of public-sector chatbot traffic shifts to Kumru derivatives.
Road-Map & Challenges
VNGRS admits the current model hallucinates 12% more than GPT-4-Turbo on English QA, lacks multimodal vision and has only 8k context. A 14B successor trained on 1 trillion tokens is slated for Q2 2026, alongside a Turkish RLHF pipeline funded by a fresh TÜBİTAK 2247 grant.
FAQ—Kumru AI
Q: Is Kumru safe for regulated industries?
A: Yes—on-prem deployment keeps data inside Türkiye; VNGRS provides audit logs and KVKK documentation templates.
Q: How does pricing compare with OpenAI?
A: ≈ US $0.0015 / 1k tokens (self-hosted) versus US $0.03 / 1k for GPT-4—roughly 20× cheaper for high-volume apps.
Q: Can I fine-tune Kumru for my niche?
A: Absolutely—LoRA adapters, 1–4 epoch cycles, starting at US $1,500 using VNGRS cloud GPUs in İzmir.
Kumru AI is more than a tech demo it is Türkiye’s declaration of digital independence, offering companies and agencies a low-cost, compliant alternative to U.S. and Chinese giants. If adoption matches Ankara’s bullish targets 50% of public-sector chatbots by 2027 Kumru could become the keystone of a new regional AI hub stretching from the Balkans to Central Asia.
Try the demo: https://kumru.ai
GitHub & weights: vngrs-ai/Kumru-2B
Follow Europeans24 for more updates from Türkiye.



