Croqtile Tutorial¶
Welcome to the Croqtile tutorial. This guide walks you through writing high-performance GPU kernels using Croqtile, starting from scratch and building up to production-grade patterns.
Each chapter introduces a small set of new concepts by evolving a running example. By the end, you will have encountered every major Croqtile construct in a concrete, working program. For detailed syntax design and language reference, see the Coding Reference.
Chapters¶
- Installation: Setting Up the Croqtile Compiler
- Hello Croqtile: From Zero to Running Kernel
- Data Movement: Tiles Instead of Elements
- Parallelism: Mapping Work to Hardware
- Tensor Cores: The
mmaOperations - Branch and Control: Warp Roles and Persistent Kernels
- Synchronization: Pipelines, Events, and Double Buffering
- Advanced Data Movement: TMA, Swizzle, and Irregular Access
- C++ Interop: Inline Code and the Preprocessor
- Debug and Verbose: Printing, RTTI, and GDB
Prerequisites¶
- Basic C++ knowledge (functions, pointers, arrays)
- Familiarity with GPU programming concepts (threads, blocks, shared memory)
- A working Croqtile compiler (see Chapter 0)