CSC4700-GPU Programming, the C++ way

This lecture introduces GPU programming using C++ and the Kokkos library. The primary goal is to demonstrate how to write portable C++ code that can run efficiently on both CPUs and GPUs, addressing the challenges of diverse HPC architectures and programming models. The lecture covers Kokkos's API, parallel patterns (algorithms), views for data management, and integration with HPX for asynchronous operations. [00:00:00] - Introduction and Announcements [00:01:04] - Review of Previous Lecture: GPUs [00:03:02] - GPU Programming Model Differences [00:03:21] - Bridge Between CPU and GPU Programming [00:04:05] - Overview of Programming Models and Challenges [00:07:05] - Introduction to Kokkos Library [00:09:37] - Kokkos Users, Capabilities, and Backends [00:11:39] - Kokkos Kernels and Patterns [00:12:20] - Parallelization Example: Sequential and Parallel [00:16:04] - Passing Work to Kokkos: Function Objects and Lambdas [00:22:40] - Importance of Lambda Captures in GPUs [00:24:58] - Parallel Reduction Patterns in Kokkos [00:31:39] - Naming and Debugging Kokkos Kernels [00:32:52] - Summary So Far & Transition [00:33:45] - Kokkos Views: Motivation and Usage [00:36:04] - Kokkos Views: Dimensions and Syntax [00:40:48] - View Allocation and Memory Management [00:43:45] - Questions and Limitations About Views [00:44:28] - Execution Spaces in Kokkos [00:48:12] - Memory Spaces and Data Placement in Kokkos [00:51:39] - Host, Device, and Mirrored Views [00:59:07] - Data Layouts in Kokkos [01:01:09] - Integrating Kokkos with HPX [01:03:04] - Asynchronous Execution and HPX Futures [01:08:16] - Kokkos Executives and Execution Policies in HPX [01:12:09] - Final Example and Multi-dimensional Iteration [01:12:54] - Conclusion and Summary

CSC4700- Introduction to GPU Programming
▶︎

CSC4700- Introduction to GPU Programming

CSC4700-Tasks & Concurrency (1st Part)
▶︎

CSC4700-Tasks & Concurrency (1st Part)

The model-code gap • Simon Brown • Devoxx Poland 2024
▶︎

The model-code gap • Simon Brown • Devoxx Poland 2024

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains
▶︎

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup
▶︎

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Fast and Glorious: Crafting Elegant Java with LLMs - Adam Bien Trójmiasto JUG #151
▶︎

Fast and Glorious: Crafting Elegant Java with LLMs - Adam Bien Trójmiasto JUG #151

Tips for C Programming
▶︎

Tips for C Programming

CSC4700-Data Parallelism (1st Part)
▶︎

CSC4700-Data Parallelism (1st Part)

The Story of C++: The World's Most Consequential Programming Language | The Official Story
▶︎

The Story of C++: The World's Most Consequential Programming Language | The Official Story

Object Oriented Programming | OOPS in Python | OOPS Tutorial | Intellipaat
▶︎

Object Oriented Programming | OOPS in Python | OOPS Tutorial | Intellipaat

Co-Creator of Haskell: Why Learn Functional Programming, Useless vs Useful Languages | Simon Jones
▶︎

Co-Creator of Haskell: Why Learn Functional Programming, Useless vs Useful Languages | Simon Jones

Web Scraping Using Python For Beginners and File Handling in Python | Python Web Scraping
▶︎

Web Scraping Using Python For Beginners and File Handling in Python | Python Web Scraping

CSC4700-Distributed Parallelism with HPX (1st Part)
▶︎

CSC4700-Distributed Parallelism with HPX (1st Part)

Once You Understand it, You Will Think Everything Else is Silly - Toyota E-CVT
▶︎

Once You Understand it, You Will Think Everything Else is Silly - Toyota E-CVT

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source
▶︎

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

EEVblog 1752 - Texas Instruments SCREWED UP the NE5532!
▶︎

EEVblog 1752 - Texas Instruments SCREWED UP the NE5532!

CSC4700-Introduction to Distributed Parallelism
▶︎

CSC4700-Introduction to Distributed Parallelism

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker
▶︎

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Andrew Kelley: A Practical Guide to Applying Data Oriented Design (DoD)
▶︎

Andrew Kelley: A Practical Guide to Applying Data Oriented Design (DoD)

Jfrog | Jfrog Artifactory | Jfrog Artifactory Tutorial | Artifactory Tutorial | Intellipaat
▶︎

Jfrog | Jfrog Artifactory | Jfrog Artifactory Tutorial | Artifactory Tutorial | Intellipaat