Stream Azure OpenAI Responses Token by Token in Python

Streaming Azure OpenAI replies sounds like flipping `stream=True`, until a buffering proxy swallows your tokens and a content filter cuts the reply off mid-sentence. This walks through what's actually on the wire — Server-Sent Events over one long-lived HTTP response, JSON delta chunks, and the literal `[DONE]` marker — then shows the synchronous generator pattern and the async FastAPI `StreamingResponse` version that terminates Azure's upstream stream and re-emits a fresh one to the browser. The gotchas worth your time: guard every `delta.content` against None or you'll print the string "None", pass `stream_options` with `include_usage` if you want token counts at all, and disable proxy buffering plus raise idle timeouts so Nginx or Front Door doesn't defeat the whole point. Also handle `CancelledError` and propagate client disconnects, or you'll keep paying for generation nobody reads. For anyone building human-facing chat UIs in Python — and a useful reminder that backend parsing, JSON validation, and tool-calling flows are usually cleaner as plain blocking calls. ⏱️ Chapters: 0:00 Intro 0:04 Why Stream Tokens? 0:41 What's On the Wire 1:23 Turning It On 2:02 Reading the Chunks 2:45 Async to the Browser 3:30 Production Gotchas 4:21 Stream vs Non-Stream 4:57 Recap and Takeaway Subscribe for more practical Azure engineering walkthroughs. Check the current Azure docs — cloud services change. #AzureOpenAI #Python #FastAPI #ServerSentEvents #LLMEngineering