|
| 1 | +# GPU.cpp Lifecycle |
| 2 | + |
| 3 | +```mermaid |
| 4 | +flowchart TD |
| 5 | + %% Data Preparation & Upload |
| 6 | + subgraph "Data Preparation & Upload" |
| 7 | + A["CPU Data"] |
| 8 | + B["Define Data Properties<br>(shape, type, size)"] |
| 9 | + C["Create GPU Buffer<br>(allocate raw buffer)"] |
| 10 | + D["Create Tensor<br>(allocates Array with one<br> or more buffers<br>and associates Shape)"] |
| 11 | + |
| 12 | + E["Upload Data via toGPU <br>(raw buffer)<br>toGPU<br>(ctx, data, buffer, size)"] |
| 13 | + F["Upload Data via toGPU<br>(Tensor overload)<br>toGPU(ctx, data, tensor)"] |
| 14 | + G["Optional: <br> Kernel Parameters<br>toGPU(ctx, params, Kernel)"] |
| 15 | + end |
| 16 | +
|
| 17 | + %% Buffer Setup & Bindings |
| 18 | + subgraph "Buffer & Binding Setup" |
| 19 | + H["Define Bindings<br>(Bindings, TensorView)"] |
| 20 | + I["Map GPU buffers<br> to shader bindings<br>(Collection from Tensor<br> or single buffers)"] |
| 21 | + end |
| 22 | +
|
| 23 | + %% Kernel Setup & Execution |
| 24 | + subgraph "Kernel Setup & Execution" |
| 25 | + J["Define KernelCode<br>(WGSL template, workgroup size, precision)"] |
| 26 | + K["Create Kernel"] |
| 27 | + L["Dispatch Kernel"] |
| 28 | + end |
| 29 | +
|
| 30 | + %% GPU Execution & Result Readback |
| 31 | + subgraph "GPU Execution & Result Readback" |
| 32 | + M["Kernel Execution<br>(GPU shader runs)"] |
| 33 | + N["Readback Data<br>(toCPU variants)"] |
| 34 | + end |
| 35 | +
|
| 36 | + %% Context & Resources |
| 37 | + O["Context<br>(Device, Queue,<br>TensorPool, KernelPool)"] |
| 38 | +
|
| 39 | + %% Flow Connections |
| 40 | + A --> B |
| 41 | + B --> C |
| 42 | + B --> D |
| 43 | + C --> E |
| 44 | + D --> F |
| 45 | + F --> H |
| 46 | + E --> H |
| 47 | + H --> I |
| 48 | + I --> K |
| 49 | + J --> K |
| 50 | + G --- K |
| 51 | + K --> L |
| 52 | + L --> M |
| 53 | + M --> N |
| 54 | +
|
| 55 | + %% Context shared by all stages |
| 56 | + O --- D |
| 57 | + O --- E |
| 58 | + O --- F |
| 59 | + O --- K |
| 60 | + O --- L |
| 61 | + O --- N |
| 62 | +``` |
| 63 | + |
| 64 | +• The `gpu::Array` (which wraps a GPU buffer with usage and size) and the `gpu::Shape` (which defines dimensions and rank) are combined—via the creation process—to produce a `gpu::Tensor`. |
| 65 | +• A `gpu::TensorView` provides a non‑owning view into a slice of a `gpu::Tensor`. Ex. `TensorView view = {tensor, 0, 256};` |
| 66 | +• `gpu::Bindings` collect multiple Tensors (or TensorViews) along with view offset/size information for use in a kernel. |
| 67 | +• The `gpu::TensorPool` (managed by the Context) is responsible for the lifetime of tensors and GPU resource cleanup. |
| 68 | +• `gpu::KernelCode` contains the WGSL shader template plus metadata (workgroup size, precision, label, and entry point) that drive the kernel configuration. |
| 69 | +• The `gpu::createKernelAsync/gpu::createKernel` functions (within the Execution Flow) use the `gpu::Context`, `gpu::Bindings`, and `gpu::KernelCode` to configure and construct a `gpu::Kernel` that manages all the underlying GPU resources (buffers, bind groups, compute pipeline, etc.). |
| 70 | +• `gpu::KernelCode`’s workgroup size (a `gpu::Shape`) defines the dispatch configuration, and the `gpu::Kernel` eventually uses the underlying `gpu::Array` (contains` WGPUBuffer, WGPUBufferUsage, size_t`) and `gpu::Shape` data from the created Tensor. |
| 71 | + |
| 72 | +`gpu::Tensor` Ranks: |
| 73 | +Rank 0: Scalar |
| 74 | +Rank 1: Vector |
| 75 | +Rank 2: Matrix |
| 76 | +Rank 3: 3D Tensor (or Cube) |
| 77 | +Rank 4: 4D Tensor |
| 78 | +Rank (max 8): Higher Dimensional Tensors |
0 commit comments