mirror of
https://github.com/fumiama/gozel.git
synced 2026-06-11 03:40:24 +08:00
feat(examples): add image_scale (#7)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This commit is contained in:
61
examples/vadd_event/README.md
Normal file
61
examples/vadd_event/README.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# Vector Addition — Immediate Command List with Events
|
||||
|
||||
> ![Tips]
|
||||
> **SYCL** is used to write this kernel, which is not a common practice.
|
||||
> Please also have a look at the **OpenCL** kernel examples like [image_scale](../image_scale/).
|
||||
|
||||
The same vector addition workload as the `vadd` example, but driven by an **immediate command list** and **events** instead of explicit command queues. This demonstrates fine-grained dependency tracking: memory copies signal events, and the kernel launch waits on those events before executing.
|
||||
|
||||
## What It Does
|
||||
|
||||
1. Discovers a GPU device and prints its basic & compute properties
|
||||
2. Allocates host and device memory for two float32 vectors (256 MiB each)
|
||||
3. Fills both vectors with random values
|
||||
4. Loads a SPIR-V kernel (`vector_add`) that computes `a[i] += b[i]` in parallel
|
||||
5. Creates an **event pool** with 3 events to express data-flow dependencies
|
||||
6. Submits all work through a single **immediate command list**:
|
||||
- Two H→D copies, each signaling its own event
|
||||
- Kernel launch that **waits** on both copy events before executing
|
||||
- D→H copy that waits on the kernel event
|
||||
7. Synchronizes via `HostSynchronize` on the immediate command list
|
||||
8. Validates every element against the CPU reference
|
||||
|
||||
## Key Difference from `vadd`
|
||||
|
||||
| Aspect | `vadd` | `vadd_event` |
|
||||
|--------|--------|-------------|
|
||||
| Submission | 3 separate command lists executed on a command queue | 1 immediate command list |
|
||||
| Synchronization | `zeCommandQueueSynchronize` | `zeCommandListHostSynchronize` |
|
||||
| Dependencies | Implicit via command list ordering + barriers | Explicit via events (wait lists) |
|
||||
|
||||
## Run
|
||||
|
||||
```bash
|
||||
go run main.go
|
||||
```
|
||||
|
||||
## Sample Output
|
||||
|
||||
```
|
||||
=============== Device Basic Properties ===============
|
||||
Running on device: ID = 32103 , Name = Intel(R) Graphics @ 0.00 GHz.
|
||||
=============== Device Compute Properties ===============
|
||||
Max Group Size (X, Y, Z): (1024, 1024, 1024)
|
||||
Max Group Count (X, Y, Z): (4294967295, 4294967295, 4294967295)
|
||||
Max Total Group Size: 1024
|
||||
Max Shared Local Memory: 65536
|
||||
Num Subgroup Sizes: 3
|
||||
Subgroup Sizes: [8 16 32 0 0 0 0 0]
|
||||
=============== Computation Configuration ===============
|
||||
Group Size (X, Y, Z): (1024, 1, 1)
|
||||
Group Count: 65536
|
||||
Total Elements (N): 67108864
|
||||
Buffer Size: 256 MiB
|
||||
=============== Calculation Results ===============
|
||||
GPU Execution Time: 51.768500 ms
|
||||
GPU Throughput: 5.19 GiB/s
|
||||
=============== Validation Results ===============
|
||||
CPU Execution Time: 38.237400 ms
|
||||
CPU Throughput: 7.02 GiB/s
|
||||
Test Passed!!!
|
||||
```
|
||||
@@ -16,10 +16,11 @@ import (
|
||||
"github.com/fumiama/gozel/ze"
|
||||
)
|
||||
|
||||
//go:generate clang++ -fsycl -fsycl-device-only -fsycl-targets=spirv64 -Xclang -emit-llvm-bc main.cpp -o device_kern.bc
|
||||
//go:generate sycl-post-link -symbols -split=auto -o device_kern.table device_kern.bc
|
||||
//go:generate llvm-spirv -o main.spv device_kern_0.bc
|
||||
//go:generate clang++ -fsycl -fsycl-device-only -fno-sycl-instrument-device-code -fsycl-targets=spirv64 -Xclang -emit-llvm-bc main.cpp -o device_kern.bc
|
||||
//go:generate sycl-post-link -symbols -split=auto -emit-param-info -properties -o device_kern.table device_kern.bc
|
||||
//go:generate llvm-spirv --sycl-opt -o main.spv device_kern_0.bc
|
||||
//go:generate clang++ -target spirv64-unknown-unknown -S -emit-llvm -x ir device_kern_0.bc -o main.ll
|
||||
//go:generate llvm-spirv -to-text main.spv -o main.spt
|
||||
|
||||
//go:embed main.spv
|
||||
var kernelspv []byte
|
||||
|
||||
79
examples/vadd_event/main.spt
Normal file
79
examples/vadd_event/main.spt
Normal file
@@ -0,0 +1,79 @@
|
||||
119734787 66560 393230 34 0
|
||||
2 Capability Addresses
|
||||
2 Capability Linkage
|
||||
2 Capability Kernel
|
||||
2 Capability Int64
|
||||
5 ExtInstImport 1 "OpenCL.std"
|
||||
3 MemoryModel 2 2
|
||||
12 EntryPoint 6 29 "__sycl_kernel_vector_add" 5 6
|
||||
3 ExecutionMode 29 31
|
||||
3 Source 4 100000
|
||||
11 Name 5 "__spirv_BuiltInGlobalInvocationId"
|
||||
9 Name 6 "__spirv_BuiltInGlobalOffset"
|
||||
9 Name 11 "__sycl_kernel_vector_add"
|
||||
|
||||
13 Decorate 5 LinkageAttributes "__spirv_BuiltInGlobalInvocationId" Import
|
||||
3 Decorate 5 Constant
|
||||
4 Decorate 5 BuiltIn 28
|
||||
4 Decorate 5 Alignment 32
|
||||
11 Decorate 6 LinkageAttributes "__spirv_BuiltInGlobalOffset" Import
|
||||
3 Decorate 6 Constant
|
||||
4 Decorate 6 BuiltIn 33
|
||||
4 Decorate 6 Alignment 32
|
||||
11 Decorate 11 LinkageAttributes "__sycl_kernel_vector_add" Export
|
||||
4 Decorate 12 FuncParamAttr 5
|
||||
4 Decorate 12 Alignment 4
|
||||
4 Decorate 13 FuncParamAttr 5
|
||||
4 Decorate 13 FuncParamAttr 6
|
||||
4 Decorate 13 Alignment 4
|
||||
4 Decorate 30 FuncParamAttr 5
|
||||
4 Decorate 30 Alignment 4
|
||||
4 Decorate 31 FuncParamAttr 5
|
||||
4 Decorate 31 FuncParamAttr 6
|
||||
4 Decorate 31 Alignment 4
|
||||
4 TypeInt 2 64 0
|
||||
5 Constant 2 21 2147483648 0
|
||||
4 TypeVector 3 2 3
|
||||
4 TypePointer 4 5 3
|
||||
2 TypeVoid 7
|
||||
3 TypeFloat 8 32
|
||||
4 TypePointer 9 5 8
|
||||
5 TypeFunction 10 7 9 9
|
||||
4 TypePointer 15 5 2
|
||||
2 TypeBool 22
|
||||
4 Variable 4 5 5
|
||||
4 Variable 4 6 5
|
||||
|
||||
|
||||
|
||||
5 Function 7 11 0 10
|
||||
3 FunctionParameter 9 12
|
||||
3 FunctionParameter 9 13
|
||||
|
||||
2 Label 14
|
||||
4 Bitcast 15 16 5
|
||||
6 Load 2 17 16 2 32
|
||||
4 Bitcast 15 18 6
|
||||
6 Load 2 19 18 2 32
|
||||
5 ISub 2 20 17 19
|
||||
5 ULessThan 22 23 20 21
|
||||
5 InBoundsPtrAccessChain 9 24 13 20
|
||||
6 Load 8 25 24 2 4
|
||||
5 InBoundsPtrAccessChain 9 26 12 20
|
||||
6 Load 8 27 26 2 4
|
||||
5 FAdd 8 28 27 25
|
||||
5 Store 26 28 2 4
|
||||
1 Return
|
||||
|
||||
1 FunctionEnd
|
||||
|
||||
5 Function 7 29 0 10
|
||||
3 FunctionParameter 9 30
|
||||
3 FunctionParameter 9 31
|
||||
|
||||
2 Label 32
|
||||
6 FunctionCall 7 33 11 30 31
|
||||
1 Return
|
||||
|
||||
1 FunctionEnd
|
||||
|
||||
Reference in New Issue
Block a user