INSAIT Tech Series: Prof. Zico Kolter - AI Safety & Robustness: Recent Advances & Future Directions
INSAIT Tech Series: Prof. Zico Kolter - AI Safety and Robustness: Recent Advances Abstract In order to prevent undesirable outputs, most large language models (LLMs) have built-in “guardrails” that enforce policies specified by the developers, for example, that LLMs should not produce output deemed harmful. Unfortunately, using adversarial attacks on such models, it has been possible to circumvent these safeguards, allowing bad actors to manipulate LLMs for unintended purposes. Historically, such adversarial attacks have been extremely hard to prevent. However, in this talk I will highlight several recent advances that have substantially improved the practical robustness of LLMs. This work has culminated in a recent competition where attackers were unable to break an LLM we have deployed after a month of attempts. I’ll highlight the current state and challenges in the field, and discuss the future of safe AI systems.

Visualizing transformers and attention | Talk for TNG Big Tech Day '24
![Yann LeCun's $1B Bet Against LLMs [Part 1]](https://i.ytimg.com/vi/kYkIdXwW2AE/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDbV4izF3i-wxevCVIn7FJjoy1vlA)
Yann LeCun's $1B Bet Against LLMs [Part 1]

This is not the AI we were promised | The Royal Society

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

The French Do Not Care About Work

Devoxx Greece 2026: Less Compute More Impact How Model Quantization Fuel the Next Wave of Agentic AI

But what is quantum computing? (Grover's Algorithm)

Something is jamming GPS over Europe. Here's what we found

Context Engineering for Agentic Coding: Building a Context Compiler | Driver AI | Boston Tech Week

INSAIT Tech Series: Prof. Iryna Gurevych - NLP for more realistic fact-checking.

Is AI Hiding Its Full Power? With Geoffrey Hinton

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Introduction to Generative AI

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

FULL DISCUSSION: Google's Demis Hassabis, Anthropic's Dario Amodei Debate the World After AGI | AI1G

The Mind Behind Linux | Linus Torvalds | TED

Andrew Ng: Building Faster with AI

Demis Hassabis: We're Three Quarters of the Way to AGI

Transformers, the tech behind LLMs | Deep Learning Chapter 5

