Skip to content

Croqtile Tutorial

Welcome to the Croqtile tutorial. This guide walks you through writing high-performance GPU kernels using Croqtile, starting from scratch and building up to production-grade patterns.

Each chapter introduces a small set of new concepts by evolving a running example. By the end, you will have encountered every major Croqtile construct in a concrete, working program. For detailed syntax design and language reference, see the Coding Reference.

Chapters

  1. Installation: Setting Up the Croqtile Compiler
  2. Hello Croqtile: From Zero to Running Kernel
  3. Data Movement: Tiles Instead of Elements
  4. Parallelism: Mapping Work to Hardware
  5. Tensor Cores: The mma Operations
  6. Branch and Control: Warp Roles and Persistent Kernels
  7. Synchronization: Pipelines, Events, and Double Buffering
  8. Advanced Data Movement: TMA, Swizzle, and Irregular Access
  9. C++ Interop: Inline Code and the Preprocessor
  10. Debug and Verbose: Printing, RTTI, and GDB

Prerequisites

  • Basic C++ knowledge (functions, pointers, arrays)
  • Familiarity with GPU programming concepts (threads, blocks, shared memory)
  • A working Croqtile compiler (see Chapter 0)