Data Lake в 2025 (modern data stack) | Iceberg, S3 Minio, Trino, Spark, PostgreSQL
🚀 In this video, you'll see how to build a real data lake from scratch and understand why a data engineer needs Iceberg, Trino, MinIO, Spark, and PostgreSQL! I'll demonstrate everything using a live project: we'll connect analytics, set up storage in S3, create a metastore, and learn how to write and read data using SQL and PySpark. Links: IT Mentoring/Consulting – https://korsak0v.notion.site/Data-Eng... TG Channel – https://t.me/DataLikeQWERTY Instagram – / i__korsakov Habr – https://habr.com/ru/users/k0rsakov/pu... Project GitHub – https://github.com/k0rsakov/pet_proje... Apache Iceberg Data Engineer Infrastructure – https://habr.com/ru/articles/850674/ 🔻 What awaits you: • What is a Data Lake and why is it needed in 2025 (in simple terms, in a nutshell!) • How is a Data Lake different from a classic one? DWH • What tasks does Trino + Iceberg + S3 + Spark + PostgreSQL solve? • What a modern data engineer's infrastructure looks like (and how to quickly set it up) • How Trino reads data from different sources • How to create tables via SQL and view them in S3 • How metastore works on PostgreSQL and why is it needed • How to fill a Data Lake with external data via Apache Spark • Hands-on: queries, schemas, table creation, reading via Spark and Trino • Tips and life hacks for working with a Data Lake Timecodes: 00:00 – Start 00:23 – What is a Data Lake 02:17 – Infrastructure overview 04:51 – Setting up a connection to the Data Lake 05:51 – Setting up a connection to OLTP 08:29 – First write to Data Lake Iceberg via Trino 13:29 – Writing data to Data Lake Iceberg via Spark (PySpark) 16:43 – Reading data from Data Lake Iceberg via Trino 17:03 – Reading data from Data Lake Iceberg via Spark (PySpark) 17:22 – Summary #DataLake #Trino #Iceberg #S3 #MinIO #Spark #PostgreSQL #DataEngineering #BigData #ETL #SQL 🔥 Don't forget to like, subscribe to the channel, and turn on the bell so you don't miss new videos!

DataOps, AIOps, MLOps — в чём разница и зачем вообще так много «Ops»

Apache Iceberg: What It Is and Why Everyone’s Talking About It.

Что такое озёра данных за 10 мин

MinIO is dead. The Great Migration has begun. What should DevOps do?

Свой распределённый S3 на базе MinIO — практический опыт наступания на грабли / Алексей Плетнёв

ETL: Что это и как работает (Как работают данные: практические кейсы) #14

Зачем Apache Iceberg, если уже есть много других форматов

S3 хранилище — Лучший способ хранить файлы на бэкенде | Как работать с S3 через Python

Построение полного цикла ETL и Reverse ETL: от OLTP к OLAP и обратно | Практический кейс Reverse ETL

Фабрика DAG в Airflow: Infrastructure as Code для Data Engineer | Генерация DAG через JSON/YAML

Владимир Озеров — Быстрая обработка данных в Data Lake с помощью Trino

Device Searches 2026: What the FSB Looks for at the Border and How to Hide Your Data

Как на самом деле работает Apache Iceberg / Владимир Озеров

DWH, Data Lake и Data Lakehouse: что это такое и в чем разница? // Курс «Data Engineer»

Apache iceberg: tips and tricks

Data Lake and DWH: Practical Experience | Webinar by Alexander Volynsky | karpov.courses

What is Databricks? The Story Behind the Modern Data Platform (Visual Explanation)

Горячее/Тёплое/Холодное хранение: сравниваем сжатие Snappy, ZSTD, GZIP и LZ4 для Data Lake 2025

