mirror of
https://github.com/fumiama/gozel.git
synced 2026-06-05 00:10:24 +08:00
Vector Addition — Command Queue
Note
SYCL is used to write this kernel, which is not a common practice. Please also have a look at the OpenCL kernel examples like image_scale.
A classic GPU compute example: perform element-wise addition of two large float32 vectors on the GPU, then validate the result against a CPU reference.
What It Does
- Discovers a GPU device and prints its basic & compute properties
- Allocates host and device memory for two float32 vectors (256 MiB each)
- Fills both vectors with random values and copies them to device memory
- Loads a SPIR-V kernel (
vector_add) that computesa[i] += b[i]in parallel - Launches the kernel via a command queue with explicit command lists (pre-copy → compute → post-copy)
- Reads back the results and validates every element against the CPU reference
- Reports GPU vs. CPU execution time and throughput
Run
go run main.go
Sample Output
=============== Device Basic Properties ===============
Running on device: ID = 32103 , Name = Intel(R) Graphics @ 0.00 GHz.
=============== Device Compute Properties ===============
Max Group Size (X, Y, Z): (1024, 1024, 1024)
Max Group Count (X, Y, Z): (4294967295, 4294967295, 4294967295)
Max Total Group Size: 1024
Max Shared Local Memory: 65536
Subgroup Sizes: [8 16 32]
=============== Computation Configuration ===============
Group Size (X, Y, Z): (1024, 1, 1)
Group Count: 65536
Total Elements (N): 67108864
Buffer Size: 256 MiB
=============== Calculation Results ===============
GPU Execution Time: 53.858600 ms
GPU Throughput: 4.98 GiB/s
=============== Validation Results ===============
CPU Execution Time: 65.882900 ms
CPU Throughput: 4.07 GiB/s
Test Passed!!!