Mesurer l’immesurable : Comment évaluer les systèmes à base d’IA générative ?

Presentation by: Erin Pacquetet (SCIAM) 📕 Summary: Generative AI is revolutionizing application development, opening up a variety of uses: assistants, content generation, augmented search, and facilitating complex tasks. But a major challenge remains: accurately evaluating products based on models that are both creative and unpredictable. This session explores this paradox: leveraging LLM while controlling the evaluation of its results. We will see how to adjust criteria and methods to assess technical accuracy, consistency, and business relevance. The program includes: limitations of traditional metrics, automated evaluation via "LLM-as-a-judge" (and its biases), the importance of human evaluation, and continuous monitoring to detect deviations and side effects. We will analyze the case of a RAG chatbot, where linguistic creativity and the requirement for truthfulness clash. The evaluation, balancing factuality and fluency, controls accuracy without controlling the question asked. This real-world case study will serve as our guide to implementing a comprehensive and reproducible evaluation pipeline. This session provides benchmarks and tools for methodically evaluating generative systems and leveraging them as a strategic asset in AI. Recorded in April 2026 in Paris, Palais des Congrès, Porte Maillot. 🔥 To stay up-to-date with Devoxx France news, follow us on: LinkedIn:   / devoxx-france   Bluesky: https://bsky.app/profile/devoxx.fr Visit our website: https://www.devoxx.fr/

L'Agentic Coding, nouveau territoire du Platform Engineering
▶︎

L'Agentic Coding, nouveau territoire du Platform Engineering

Spécialisez vos Agents avec les Skills
▶︎

Spécialisez vos Agents avec les Skills

Arthur Mensch, co-founder of Mistral AI, is being questioned at the National Assembly - 12/05/2026
▶︎

Arthur Mensch, co-founder of Mistral AI, is being questioned at the National Assembly - 12/05/2026

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source
▶︎

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

OCTO Counter - Mastering RAG: Connecting AI Gen Models to Enterprise Data
▶︎

OCTO Counter - Mastering RAG: Connecting AI Gen Models to Enterprise Data

Le Rôle de la Génération Augmentée de Récupération (RAG) en IA
▶︎

Le Rôle de la Génération Augmentée de Récupération (RAG) en IA

La découverte qui s’apprête à bouleverser l’informatique quantique
▶︎

La découverte qui s’apprête à bouleverser l’informatique quantique

Kafka 4, fantastique ?
▶︎

Kafka 4, fantastique ?

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
▶︎

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

2 ans après, les devs n'ont pas disparu : du coup l'IA ca sert à rien ?
▶︎

2 ans après, les devs n'ont pas disparu : du coup l'IA ca sert à rien ?

[Leçon inaugurale] Yann Le Cun - Apprentissage profond et au-delà : les nouveaux défis de l'IA
▶︎

[Leçon inaugurale] Yann Le Cun - Apprentissage profond et au-delà : les nouveaux défis de l'IA

Le Frontend mérite aussi du monitoring !
▶︎

Le Frontend mérite aussi du monitoring !

//. 132 Running an AI locally: tools, models and hardware configuration
▶︎

//. 132 Running an AI locally: tools, models and hardware configuration

Agentic development: Stitch and Jules in the antigravity of the Gemini constellation
▶︎

Agentic development: Stitch and Jules in the antigravity of the Gemini constellation

LLM, RAG et IA agentique : comprendre l'évolution de l'IA
▶︎

LLM, RAG et IA agentique : comprendre l'évolution de l'IA

Don't learn AI Agents without Learning these Fundamentals
▶︎

Don't learn AI Agents without Learning these Fundamentals

"Perspectives on IA" : conf. de Yann LeCun, WinterWeek – Graduate School – Univ. Gustave Eiffel
▶︎

"Perspectives on IA" : conf. de Yann LeCun, WinterWeek – Graduate School – Univ. Gustave Eiffel

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed
▶︎

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

Production Troubleshooting : boostez vos skills, une étude de cas
▶︎

Production Troubleshooting : boostez vos skills, une étude de cas

How AI agents & Claude skills work (Clearly Explained)
▶︎

How AI agents & Claude skills work (Clearly Explained)