跳转至
Croqtile
Tile Data with ChunkAt
English
中文
正在初始化搜索引擎
codes1gn/croqtile-tutorial
Croqtile
codes1gn/croqtile-tutorial
Home
What is Good in CroqTile
Part I — Tutorial
Part I — Tutorial
Ch 0: Installing Croqtile
Ch 1: Hello Croqtile
Ch 2: Data Movement
Ch 3: Parallelism
Ch 4: Tensor Contraction (MMA)
Ch 5: Control Flow
Ch 6: Synchronization
Ch 7: Advanced Data Movement
Ch 8: C++ Interop
Ch 9: Debug & Verbose
Part II — Performance Tuning Demos
Part II — Performance Tuning Demos
Profiling Setup
Dense GEMM FP16 (from naive)
Sparse GEMM (FP16 + E4M3)
Block-Scaled GEMM FP8
Fused MoE FP8
Part III — Coding Reference
Part III — Coding Reference
Program Structure
Program Structure
Croqtile-C++ Program Structure
Shaped Data
Shaped Data
Shape and MDSpan
Integers and I-Tuples
Spanned Data and Buffers
Dynamic Shape and Symbolic Dimension
Loop and Parallelism
Loop and Parallelism
About Parallelism and Iterations
SPMD Parallelism
Loop Control
Data Movement
Data Movement
The DMA Statement
Tile Data with ChunkAt
Advanced Data Movement
C++ Embeddings
C++ Embeddings
Inputs and Output
Macro and Preprocessing
MPMD Programming
MPMD Programming
Thread Masking
Events for Async Execution
More on Async Execution
Advanced
Advanced
Experimental Features
Optimization Patterns
Optimization Patterns
Tileflow Optimizations
Async DMA Patterns
Multi-buffering Patterns
Reference Palette
Design Rationales
Design Rationales
Syntax Design
Why Symbolic Shapes
Symbolic Evaluation System
Dynamism Support
Part IV — Design Rationales
Home
Part III — Coding Reference
Data Movement
(翻译中)
¶
本章节正在翻译中。
回到页面顶部