UK Exeter Talk: Developing Turkish Language Models: Tokenization, Data Quality and Domain Adaptation

This video features my seminar presentation at the University of Exeter, where I shared the core findings of my PhD research. The talk focuses on building more robust, efficient, and scalable language models for morphologically rich and relatively low-resource languages, taking Turkish as the primary framework. In this presentation, I approach the LLM development pipeline through a holistic perspective, discussing not just a single bottleneck but the interconnected challenges of evaluation, tokenization, data quality, and domain adaptation. Key topics covered in this talk: • The critical need for robust evaluation: Building the TR-MMLU ecosystem • Morphologically-aware tokenizer design for Turkish • Measuring tokenization efficiency with TR and Pure metrics • Data quality-aware learning approaches • Domain-specific model adaptations, featuring a Turkish medical dataset • Embedding generation: Insights into the Magibu 200M model • Open-source AI tools, datasets, and accessible benchmarks My ultimate goal with this research is to foster an open-science ecosystem for Turkish NLP that prioritizes high representational power, computational efficiency, and reproducibility. Thank you to the academic community in the UK for the engaging discussions, and to everyone watching.

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Understand AI in 14 minutes – with Anthropic's Chloe Lubinski [ARC 2026]

Understand AI in 14 minutes – with Anthropic's Chloe Lubinski [ARC 2026]

This is not the AI we were promised | The Royal Society

This is not the AI we were promised | The Royal Society

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

The Shocking Reason Why Keir Starmer Has Resigned: Top Economist

The Shocking Reason Why Keir Starmer Has Resigned: Top Economist

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

The World's Most Important Machine

The World's Most Important Machine

Putin’s troops will abandon Crimea: How Ukraine will win the war | Ben Hodges

Putin’s troops will abandon Crimea: How Ukraine will win the war | Ben Hodges

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Santo Rosário | Sexta-feira | 04:00 | 12/06/2026 | Live Ao vivo

Santo Rosário | Sexta-feira | 04:00 | 12/06/2026 | Live Ao vivo

LLM Fine-Tuning ve Dağıtım Eğitimi | Hugging Face, Unsloth, Ollama Kullanımı

LLM Fine-Tuning ve Dağıtım Eğitimi | Hugging Face, Unsloth, Ollama Kullanımı

What to teach when AI writes the code | Rainer Stropek | TEDxLinz

What to teach when AI writes the code | Rainer Stropek | TEDxLinz

The Future of Science With AI | Nobel Prize Dialogue London 2026

The Future of Science With AI | Nobel Prize Dialogue London 2026

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Is the AfD a threat to Germany? Mehdi Hasan & Maximilian Krah | Head to Head

Is the AfD a threat to Germany? Mehdi Hasan & Maximilian Krah | Head to Head

Z.AI And The Chinese Open Source Moment

Z.AI And The Chinese Open Source Moment

Abstract Black and White wave pattern| Height Map Footage| 3 hours Topographic 4k Background

Abstract Black and White wave pattern| Height Map Footage| 3 hours Topographic 4k Background