Mesurer l’immesurable : Comment évaluer les systèmes à base d’IA générative ?

Presentation by: Erin Pacquetet (SCIAM) 📕 Summary: Generative AI is revolutionizing application development, opening up a variety of uses: assistants, content generation, augmented search, and facilitating complex tasks. But a major challenge remains: accurately evaluating products based on models that are both creative and unpredictable. This session explores this paradox: leveraging LLM while controlling the evaluation of its results. We will see how to adjust criteria and methods to assess technical accuracy, consistency, and business relevance. The program includes: limitations of traditional metrics, automated evaluation via "LLM-as-a-judge" (and its biases), the importance of human evaluation, and continuous monitoring to detect deviations and side effects. We will analyze the case of a RAG chatbot, where linguistic creativity and the requirement for truthfulness clash. The evaluation, balancing factuality and fluency, controls accuracy without controlling the question asked. This real-world case study will serve as our guide to implementing a comprehensive and reproducible evaluation pipeline. This session provides benchmarks and tools for methodically evaluating generative systems and leveraging them as a strategic asset in AI. Recorded in April 2026 in Paris, Palais des Congrès, Porte Maillot. 🔥 To stay up-to-date with Devoxx France news, follow us on: LinkedIn: / devoxx-france Bluesky: https://bsky.app/profile/devoxx.fr Visit our website: https://www.devoxx.fr/

L'Agentic Coding, nouveau territoire du Platform Engineering

L'Agentic Coding, nouveau territoire du Platform Engineering

Spécialisez vos Agents avec les Skills

Spécialisez vos Agents avec les Skills

Arthur Mensch, co-founder of Mistral AI, is being questioned at the National Assembly - 12/05/2026

Arthur Mensch, co-founder of Mistral AI, is being questioned at the National Assembly - 12/05/2026

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

OCTO Counter - Mastering RAG: Connecting AI Gen Models to Enterprise Data

OCTO Counter - Mastering RAG: Connecting AI Gen Models to Enterprise Data

Le Rôle de la Génération Augmentée de Récupération (RAG) en IA

Le Rôle de la Génération Augmentée de Récupération (RAG) en IA

La découverte qui s’apprête à bouleverser l’informatique quantique

La découverte qui s’apprête à bouleverser l’informatique quantique

Kafka 4, fantastique ?

Kafka 4, fantastique ?

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

2 ans après, les devs n'ont pas disparu : du coup l'IA ca sert à rien ?

2 ans après, les devs n'ont pas disparu : du coup l'IA ca sert à rien ?

[Leçon inaugurale] Yann Le Cun - Apprentissage profond et au-delà : les nouveaux défis de l'IA

[Leçon inaugurale] Yann Le Cun - Apprentissage profond et au-delà : les nouveaux défis de l'IA

Le Frontend mérite aussi du monitoring !

Le Frontend mérite aussi du monitoring !

//. 132 Running an AI locally: tools, models and hardware configuration

//. 132 Running an AI locally: tools, models and hardware configuration

Agentic development: Stitch and Jules in the antigravity of the Gemini constellation

Agentic development: Stitch and Jules in the antigravity of the Gemini constellation

LLM, RAG et IA agentique : comprendre l'évolution de l'IA

LLM, RAG et IA agentique : comprendre l'évolution de l'IA

Don't learn AI Agents without Learning these Fundamentals

Don't learn AI Agents without Learning these Fundamentals

"Perspectives on IA" : conf. de Yann LeCun, WinterWeek – Graduate School – Univ. Gustave Eiffel

"Perspectives on IA" : conf. de Yann LeCun, WinterWeek – Graduate School – Univ. Gustave Eiffel

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

Production Troubleshooting : boostez vos skills, une étude de cas

Production Troubleshooting : boostez vos skills, une étude de cas

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)