Text encoding débacles (PConf.dev 2026)

Presented by Thomas Munro at PGConf.dev 2026 (https://2026.pgconf.dev) You might think that text encoding is a problem that was solved by UTF-8. This is basically true for many developers, but PostgreSQL continues to support dozens of encodings and multi-encoding configurations. There are some rough and even dangerous edges, with implications even if you only use UTF-8. I want to present prototypes to address those with a practical model, and some other opportunities I have spotted along the way. Overview of the PostgreSQL text encoding model, related OS concepts and motivations The holes in that model, including shared catalogs and views, authentication, file systems and more In which usage patterns do we get away with that? Or not? A proposed model to nail down the encoding of everything, while allowing for reasonable usage patterns Overview of closely related pg_wchar, holes and improvements Opportunities to go faster What would it take to support NUL in text? https://2026.pgconf.dev/session/572 🎬 See more PGConf.dev 2026 videos at    • PGConf.dev 2026   Join us in Montreal for PConf.dev 2027 https://2027.pgconf.dev Connect with us: Mastodon: https://mastodon.social/@pgconfdev Web: https://2026.pgconf.dev #postgresql #PGConfDev