1
0
mirror of https://github.com/fumiama/gozel.git synced 2026-06-05 00:10:24 +08:00

feat(examples): add image_scale (#7)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This commit is contained in:
fumiama
2026-03-29 17:11:22 +08:00
committed by GitHub
parent 68ca8b5e2e
commit 6522bde914
123 changed files with 1074 additions and 163 deletions

48
examples/vadd/README.md Normal file
View File

@@ -0,0 +1,48 @@
# Vector Addition — Command Queue
> ![Tips]
> **SYCL** is used to write this kernel, which is not a common practice.
> Please also have a look at the **OpenCL** kernel examples like [image_scale](../image_scale/).
A classic GPU compute example: perform element-wise addition of two large float32 vectors on the GPU, then validate the result against a CPU reference.
## What It Does
1. Discovers a GPU device and prints its basic & compute properties
2. Allocates host and device memory for two float32 vectors (256 MiB each)
3. Fills both vectors with random values and copies them to device memory
4. Loads a SPIR-V kernel (`vector_add`) that computes `a[i] += b[i]` in parallel
5. Launches the kernel via a **command queue** with explicit command lists (pre-copy → compute → post-copy)
6. Reads back the results and validates every element against the CPU reference
7. Reports GPU vs. CPU execution time and throughput
## Run
```bash
go run main.go
```
## Sample Output
```
=============== Device Basic Properties ===============
Running on device: ID = 32103 , Name = Intel(R) Graphics @ 0.00 GHz.
=============== Device Compute Properties ===============
Max Group Size (X, Y, Z): (1024, 1024, 1024)
Max Group Count (X, Y, Z): (4294967295, 4294967295, 4294967295)
Max Total Group Size: 1024
Max Shared Local Memory: 65536
Subgroup Sizes: [8 16 32]
=============== Computation Configuration ===============
Group Size (X, Y, Z): (1024, 1, 1)
Group Count: 65536
Total Elements (N): 67108864
Buffer Size: 256 MiB
=============== Calculation Results ===============
GPU Execution Time: 53.858600 ms
GPU Throughput: 4.98 GiB/s
=============== Validation Results ===============
CPU Execution Time: 65.882900 ms
CPU Throughput: 4.07 GiB/s
Test Passed!!!
```

View File

@@ -16,10 +16,11 @@ import (
"github.com/fumiama/gozel/ze"
)
//go:generate clang++ -fsycl -fsycl-device-only -fsycl-targets=spirv64 -Xclang -emit-llvm-bc main.cpp -o device_kern.bc
//go:generate sycl-post-link -symbols -split=auto -o device_kern.table device_kern.bc
//go:generate llvm-spirv -o main.spv device_kern_0.bc
//go:generate clang++ -fsycl -fsycl-device-only -fno-sycl-instrument-device-code -fsycl-targets=spirv64 -Xclang -emit-llvm-bc main.cpp -o device_kern.bc
//go:generate sycl-post-link -symbols -split=auto -emit-param-info -properties -o device_kern.table device_kern.bc
//go:generate llvm-spirv --sycl-opt -o main.spv device_kern_0.bc
//go:generate clang++ -target spirv64-unknown-unknown -S -emit-llvm -x ir device_kern_0.bc -o main.ll
//go:generate llvm-spirv -to-text main.spv -o main.spt
//go:embed main.spv
var kernelspv []byte

79
examples/vadd/main.spt Normal file
View File

@@ -0,0 +1,79 @@
119734787 66560 393230 34 0
2 Capability Addresses
2 Capability Linkage
2 Capability Kernel
2 Capability Int64
5 ExtInstImport 1 "OpenCL.std"
3 MemoryModel 2 2
12 EntryPoint 6 29 "__sycl_kernel_vector_add" 5 6
3 ExecutionMode 29 31
3 Source 4 100000
11 Name 5 "__spirv_BuiltInGlobalInvocationId"
9 Name 6 "__spirv_BuiltInGlobalOffset"
9 Name 11 "__sycl_kernel_vector_add"
13 Decorate 5 LinkageAttributes "__spirv_BuiltInGlobalInvocationId" Import
3 Decorate 5 Constant
4 Decorate 5 BuiltIn 28
4 Decorate 5 Alignment 32
11 Decorate 6 LinkageAttributes "__spirv_BuiltInGlobalOffset" Import
3 Decorate 6 Constant
4 Decorate 6 BuiltIn 33
4 Decorate 6 Alignment 32
11 Decorate 11 LinkageAttributes "__sycl_kernel_vector_add" Export
4 Decorate 12 FuncParamAttr 5
4 Decorate 12 Alignment 4
4 Decorate 13 FuncParamAttr 5
4 Decorate 13 FuncParamAttr 6
4 Decorate 13 Alignment 4
4 Decorate 30 FuncParamAttr 5
4 Decorate 30 Alignment 4
4 Decorate 31 FuncParamAttr 5
4 Decorate 31 FuncParamAttr 6
4 Decorate 31 Alignment 4
4 TypeInt 2 64 0
5 Constant 2 21 2147483648 0
4 TypeVector 3 2 3
4 TypePointer 4 5 3
2 TypeVoid 7
3 TypeFloat 8 32
4 TypePointer 9 5 8
5 TypeFunction 10 7 9 9
4 TypePointer 15 5 2
2 TypeBool 22
4 Variable 4 5 5
4 Variable 4 6 5
5 Function 7 11 0 10
3 FunctionParameter 9 12
3 FunctionParameter 9 13
2 Label 14
4 Bitcast 15 16 5
6 Load 2 17 16 2 32
4 Bitcast 15 18 6
6 Load 2 19 18 2 32
5 ISub 2 20 17 19
5 ULessThan 22 23 20 21
5 InBoundsPtrAccessChain 9 24 13 20
6 Load 8 25 24 2 4
5 InBoundsPtrAccessChain 9 26 12 20
6 Load 8 27 26 2 4
5 FAdd 8 28 27 25
5 Store 26 28 2 4
1 Return
1 FunctionEnd
5 Function 7 29 0 10
3 FunctionParameter 9 30
3 FunctionParameter 9 31
2 Label 32
6 FunctionCall 7 33 11 30 31
1 Return
1 FunctionEnd