How to Build a Virtual Cell in Python from Scratch
This is a gentle introduction to building a virtual cell in Python. We build a simple model that predicts how a cell’s gene expression changes in response to a perturbation. I try to explain everything step by step to show the complete thought process behind every decision. We start by answering the fundamental questions: what is a virtual cell and why it matters for disease understanding and drug discovery. Then we go through the entire process and cover: downloading and exploring single-cell RNA-seq data, preprocessing the data, designing a training pipeline, splitting data by unseen perturbations, representing perturbations with gene embeddings, building PyTorch datasets and data loaders, training a simple neural net, testing the model, comparing against a baseline, dealing with model collapse, improving the model with highly variable genes, pseudobulk expression, better evaluation, and delta prediction, discussing possible next steps. The final model takes a perturbation embedding, predicts a change in expression, and adds that change to the control cell state. To be clear - this tutorial is for educational purposes and aims to illustrate the main steps involved in building a virtual cell. It does not produce a competitive model for perturbation response prediction, but it is a starting point for you to play around with and improve. Code: https://github.com/MaciejPiernik/virt... Resources Arc Institute Virtual Cell Atlas: https://arcinstitute.org/tools/virtua... Virtual Cell Challenge dataset: https://github.com/ArcInstitute/arc-v... Gene embeddings (benchmark paper + downloads): https://www.biorxiv.org/content/10.11... CELLxGENE: https://cellxgene.cziscience.com/ Chapters 0:00 Intro 6:30 Representing a cell 11:24 Project setup 12:58 Intuition 20:25 What data we need? 25:02 Downloading data 28:58 Exploring data 35:09 Preprocessing data 43:12 The training pipeline 45:52 Splitting data 57:31 Encoding perturbations 1:02:55 Gene embeddings 1:12:40 The full training loop 1:27:52 The model 1:30:34 Data loaders 1:39:31 Mapping genes to embeddings 1:57:52 First training run 2:01:00 Refactor 2:07:32 Testing the model 2:15:21 Technical improvements 2:21:17 Model collapse 2:24:55 Fix #1: Highly variable genes 2:26:52 Fix #2: Pseudobulk 2:34:06 Fix #3: Loss & eval 2:40:20 Baseline 2:43:55 Fix #4: Predicting delta 2:48:13 Improving over baseline 2:54:46 Next steps and Conclusion #VirtualCell #MachineLearning #Bioinformatics #Python #SingleCell #RNASeq #DeepLearning #DrugDiscovery #ComputationalBiology
![Beginner to T-SQL [Full Course]](https://i.ytimg.com/vi/cACat4KNncg/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLA4o6xA3UzwmxwP9P-enFU9sgxi6Q)
Beginner to T-SQL [Full Course]

Google Maps is unreasonably fast. Let me explain

Training Sand to Think: Artificial General Intelligence & Future of Physics

Full Archon Guide - Build AI Coding Harnesses That Actually Ship (LIVE)

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Co-Creator of Haskell: Functional Programming, Thinking in Types, Useless Languages | Simon Jones

Sarah Paine - Why Putin and Xi can't escape geography

AlphaFold - The Most Useful Thing AI Has Ever Done

Something is jamming GPS over Europe. Here's what we found
![Mini Hackathon - Build a Power App! [Full Course]](https://i.ytimg.com/vi/Gx7xL8w2AnY/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDg-4z-P6ph4ZXx54pdOkTeAq53JA)
Mini Hackathon - Build a Power App! [Full Course]

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains
![Data Modeling for Power BI [Full Course] 📊](https://i.ytimg.com/vi/MrLnibFTtbA/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLASQdyWMIppxB5x-w51fuei9wE8xw)
Data Modeling for Power BI [Full Course] 📊

I Found The $200,000 Missing Lego

Graham Hancock: We've Forgotten the Warnings!

Exposing The Solid State Donut Battery. It's Over.

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

Data Analysis with Python: Part 5 of 6 - Visualization with Matplotlib and Seaborn (Live Course)

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Free Event: Power BI Beginner to Pro 2026 Edition - Full Hands-On Tutorial
![Hands-On Power BI Tutorial 📊Beginner to Pro [Full Course] ⚡](https://i.ytimg.com/vi/5X5LWcLtkzg/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDNz9Q76fnXAXaDKC50Y458IHy8OA)
