Mesurer l’immesurable : Comment évaluer les systèmes à base d’IA générative ?

Presentation by: Erin Pacquetet (SCIAM) 📕 Summary: Generative AI is revolutionizing application development, opening up a variety of uses: assistants, content generation, augmented search, and facilitating complex tasks. But a major challenge remains: accurately evaluating products based on models that are both creative and unpredictable. This session explores this paradox: leveraging LLM while controlling the evaluation of its results. We will see how to adjust criteria and methods to assess technical accuracy, consistency, and business relevance. The program includes: limitations of traditional metrics, automated evaluation via "LLM-as-a-judge" (and its biases), the importance of human evaluation, and continuous monitoring to detect deviations and side effects. We will analyze the case of a RAG chatbot, where linguistic creativity and the requirement for truthfulness clash. The evaluation, balancing factuality and fluency, controls accuracy without controlling the question asked. This real-world case study will serve as our guide to implementing a comprehensive and reproducible evaluation pipeline. This session provides benchmarks and tools for methodically evaluating generative systems and leveraging them as a strategic asset in AI. Recorded in April 2026 in Paris, Palais des Congrès, Porte Maillot. 🔥 To stay up-to-date with Devoxx France news, follow us on: LinkedIn:   / devoxx-france   Bluesky: https://bsky.app/profile/devoxx.fr Visit our website: https://www.devoxx.fr/

Comment ça marche l'IA Générative ? LLM, RAG sous le capot.
▶︎

Comment ça marche l'IA Générative ? LLM, RAG sous le capot.

Spécialisez vos Agents avec les Skills
▶︎

Spécialisez vos Agents avec les Skills

2 ans après, les devs n'ont pas disparu : du coup l'IA ca sert à rien ?
▶︎

2 ans après, les devs n'ont pas disparu : du coup l'IA ca sert à rien ?

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed
▶︎

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

OCTO Counter - Mastering RAG: Connecting AI Gen Models to Enterprise Data
▶︎

OCTO Counter - Mastering RAG: Connecting AI Gen Models to Enterprise Data

Agentic development: Stitch and Jules in the antigravity of the Gemini constellation
▶︎

Agentic development: Stitch and Jules in the antigravity of the Gemini constellation

L'Agentic Coding, nouveau territoire du Platform Engineering
▶︎

L'Agentic Coding, nouveau territoire du Platform Engineering

Le Frontend mérite aussi du monitoring !
▶︎

Le Frontend mérite aussi du monitoring !

LLM, RAG et IA agentique : comprendre l'évolution de l'IA
▶︎

LLM, RAG et IA agentique : comprendre l'évolution de l'IA

"Perspectives on IA" : conf. de Yann LeCun, WinterWeek – Graduate School – Univ. Gustave Eiffel
▶︎

"Perspectives on IA" : conf. de Yann LeCun, WinterWeek – Graduate School – Univ. Gustave Eiffel

Arthur Mensch, co-founder of Mistral AI, is being questioned at the National Assembly - 12/05/2026
▶︎

Arthur Mensch, co-founder of Mistral AI, is being questioned at the National Assembly - 12/05/2026

€100k and 6 months or €1k and 70 hours: where is the developer profession headed according to Did...
▶︎

€100k and 6 months or €1k and 70 hours: where is the developer profession headed according to Did...

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
▶︎

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

[Leçon inaugurale] Yann Le Cun - Apprentissage profond et au-delà : les nouveaux défis de l'IA
▶︎

[Leçon inaugurale] Yann Le Cun - Apprentissage profond et au-delà : les nouveaux défis de l'IA

He created an AI to do his job (his boss is hallucinating)
▶︎

He created an AI to do his job (his boss is hallucinating)

Building AI Agent Systems and Scaling Challenges in Agentic AI
▶︎

Building AI Agent Systems and Scaling Challenges in Agentic AI

Intelligence artificielle, bullsh*t, pipotron ? Benjamin Bayart [EN DIRECT]
▶︎

Intelligence artificielle, bullsh*t, pipotron ? Benjamin Bayart [EN DIRECT]

Kafka 4, fantastique ?
▶︎

Kafka 4, fantastique ?

Production Troubleshooting : boostez vos skills, une étude de cas
▶︎

Production Troubleshooting : boostez vos skills, une étude de cas

Running LLMs Locally Just Got Way Better - Ollama + MCP
▶︎

Running LLMs Locally Just Got Way Better - Ollama + MCP