1
0
mirror of https://github.com/fumiama/gozel.git synced 2026-06-05 00:10:24 +08:00
Files
gozel/examples/vadd/README.md
2026-03-29 17:12:56 +08:00

1.9 KiB

Vector Addition — Command Queue

Note

SYCL is used to write this kernel, which is not a common practice. Please also have a look at the OpenCL kernel examples like image_scale.

A classic GPU compute example: perform element-wise addition of two large float32 vectors on the GPU, then validate the result against a CPU reference.

What It Does

  1. Discovers a GPU device and prints its basic & compute properties
  2. Allocates host and device memory for two float32 vectors (256 MiB each)
  3. Fills both vectors with random values and copies them to device memory
  4. Loads a SPIR-V kernel (vector_add) that computes a[i] += b[i] in parallel
  5. Launches the kernel via a command queue with explicit command lists (pre-copy → compute → post-copy)
  6. Reads back the results and validates every element against the CPU reference
  7. Reports GPU vs. CPU execution time and throughput

Run

go run main.go

Sample Output

===============  Device Basic Properties  ===============
Running on device: ID = 32103 , Name = Intel(R) Graphics @ 0.00 GHz.
=============== Device Compute Properties ===============
Max Group Size (X, Y, Z):    (1024, 1024, 1024)
Max Group Count (X, Y, Z):   (4294967295, 4294967295, 4294967295)
Max Total Group Size:        1024
Max Shared Local Memory:     65536
Subgroup Sizes:              [8 16 32]
=============== Computation Configuration ===============
Group Size (X, Y, Z):        (1024, 1, 1)
Group Count:                 65536
Total Elements (N):          67108864
Buffer Size:                 256 MiB
===============    Calculation Results    ===============
GPU Execution Time:          53.858600 ms
GPU Throughput:              4.98 GiB/s
===============    Validation Results    ===============
CPU Execution Time:          65.882900 ms
CPU Throughput:              4.07 GiB/s
Test Passed!!!