Building real-time data products at LinkedIn with Apache Samza
Presented at Strata+Hadoop World, New York, 16 October 2014 http://strataconf.com/stratany2014/pu... Slides: https://speakerdeck.com/ept/building-... Abstract: The world is going real-time. MapReduce, SQL-on-Hadoop and similar batch processing tools are fine for analyzing and processing data after the fact — but sometimes you need to process data continuously as it comes in, and react to it within a few seconds or less. How do you do that at Hadoop scale? Apache Samza is an open source stream processing framework designed to solve these kinds of problems. It is built upon YARN/Hadoop 2.0 and Apache Kafka. You can think of Samza as a real-time, continuously running version of MapReduce. Samza has some unique features that make it powerful. It provides high performance for stateful processing jobs, including aggregation and joins between many input streams. It is designed to support an ecosystem of many different jobs written by different teams, and it isolates them from each other, so that one badly behaved job can’t affect the others. At LinkedIn, we have been using Samza in production for some time, both for internal analytics purposes and for data products that are served on the live site. In this talk, we’ll discuss our experience of working with Samza. You’ll learn about: What kinds of real-time data problems you can solve with Samza How Samza reliably scales to millions of messages per second How Samza compares to other stream processing frameworks How Samza can help collaboration between different data science, product, and engineering teams within an organization How to avoid implementing the same data pipeline twice (once for offline/batch processing and once for real-time/stream processing) Lessons we learnt on how to structure real-time data pipelines for scale and flexibility

Architecting a Modern Financial Institution

Data Analytics for Beginners | Data Analytics Training | Data Analytics Course | Intellipaat

Doku: Die geheime Welt des deutschen Adels

The Hard Fall of Porsche

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

How Uber scaled its Real Time Infrastructure to Trillion events per day

How AI agents & Claude skills work (Clearly Explained)

I Think They Are Lying To You

"New Form of Imperialism": Renowned U.N. Scientist on AI Boom's Huge Water, Carbon & Land Footprint

Why birth rates are falling everywhere all at once | FT

Every Free App You Actually Need Explained in 20 Minutes

How To Think SO CLEARLY People Assume You're A Genius

Why Aliens Would NEVER Invade Africa

Is this AI's moment of truth? | BBC News

Data Intensive Applications with Martin Kleppmann

Ex-Google Recruiter Explains Why "Lying" Gets You Hired

How SpaceX Humiliated Wall Street

"Turning the database inside out with Apache Samza" by Martin Kleppmann

JavaScript Tutorial For Beginners | JavaScript Training | JavaScript Course | Intellipaat

