Skip to content

Atomic operations on local memory are broken #572

@dzhang314

Description

@dzhang314

While I was working on some code for big-integer accumulation on GPUs, I discovered that atomic integer operations on local memory are broken in oneAPI.jl. I've simplified the problematic code down to the following minimal reproducer:

using oneAPI
using KernelAbstractions
using Atomix: @atomic


@kernel function test_kernel!(result::AbstractVector{UInt32})
    i = @index(Global, Linear)
    g = @index(Group, Linear)
    l = @index(Local, Linear)

    # Initialize an accumulator in local memory.
    a = @localmem(UInt32, (1,))
    if isone(l)
        a[1] = 0
    end

    @synchronize()

    # Every thread atomically adds its index to the accumulator.
    @atomic a[1] += UInt32(i)

    @synchronize()

    # The first thread in each workgroup writes the sum to the output array.
    if isone(l)
        result[g] = a[1]
    end
end


function run_test(backend::Backend)
    result = KernelAbstractions.allocate(backend, UInt32, 256)
    test_kernel!(backend, 256)(result; ndrange=65536)
    return Vector{UInt32}(result)
end


const REFERENCE_VALUES = [65536 * UInt32(g) - 32640 for g = 1:256]

for trial = 1:10
    println("Trial $trial:")
    test_values = run_test(oneAPIBackend())
    for (g, (ref_val, test_val)) in enumerate(zip(REFERENCE_VALUES, test_values))
        if ref_val != test_val
            println("  Workgroup $g miscalculated: expected $ref_val, got $test_val")
        end
    end
end

This program consistently produces incorrect output in 1-3% of workgroups, with nondeterminism in which particular workgroups miscompute. A typical output on my system looks like this:

Trial 1:
  Workgroup 62 miscalculated: expected 4030592, got 2764760
  Workgroup 64 miscalculated: expected 4161664, got 2855384
Trial 2:
  Workgroup 20 miscalculated: expected 1278080, got 553912
  Workgroup 62 miscalculated: expected 4030592, got 2765272
  Workgroup 64 miscalculated: expected 4161664, got 3125344
Trial 3:
  Workgroup 19 miscalculated: expected 1212544, got 1136120
  Workgroup 20 miscalculated: expected 1278080, got 1116016
  Workgroup 23 miscalculated: expected 1474688, got 1383672
  Workgroup 46 miscalculated: expected 2982016, got 2608496
  Workgroup 57 miscalculated: expected 3702912, got 2774368
  Workgroup 62 miscalculated: expected 4030592, got 3779064
  Workgroup 63 miscalculated: expected 4096128, got 3839736
Trial 4:
  Workgroup 20 miscalculated: expected 1278080, got 1199096
  Workgroup 46 miscalculated: expected 2982016, got 2796536
  Workgroup 59 miscalculated: expected 3833984, got 3593208
Trial 5:
  Workgroup 15 miscalculated: expected 950400, got 889080
  Workgroup 18 miscalculated: expected 1147008, got 1075960
  Workgroup 23 miscalculated: expected 1474688, got 1289072
  Workgroup 54 miscalculated: expected 3506304, got 3286264
  Workgroup 57 miscalculated: expected 3702912, got 3471864
Trial 6:
  Workgroup 21 miscalculated: expected 1343616, got 1258488
  Workgroup 22 miscalculated: expected 1409152, got 1319416
  Workgroup 24 miscalculated: expected 1540224, got 956752
  Workgroup 46 miscalculated: expected 2982016, got 2793720
  Workgroup 56 miscalculated: expected 3637376, got 3408120
  Workgroup 58 miscalculated: expected 3768448, got 3531256
Trial 7:
  Workgroup 19 miscalculated: expected 1212544, got 1063536
  Workgroup 20 miscalculated: expected 1278080, got 1039592
  Workgroup 21 miscalculated: expected 1343616, got 1174384
  Workgroup 22 miscalculated: expected 1409152, got 1233008
  Workgroup 23 miscalculated: expected 1474688, got 1380856
  Workgroup 49 miscalculated: expected 3178624, got 2980856
  Workgroup 52 miscalculated: expected 3375232, got 2952560
  Workgroup 54 miscalculated: expected 3506304, got 3287288
Trial 8:
  Workgroup 14 miscalculated: expected 884864, got 828920
  Workgroup 18 miscalculated: expected 1147008, got 1075704
  Workgroup 21 miscalculated: expected 1343616, got 1005152
  Workgroup 49 miscalculated: expected 3178624, got 2979064
  Workgroup 56 miscalculated: expected 3637376, got 3181936
Trial 9:
  Workgroup 19 miscalculated: expected 1212544, got 1059440
  Workgroup 20 miscalculated: expected 1278080, got 1040104
  Workgroup 22 miscalculated: expected 1409152, got 1230192
  Workgroup 23 miscalculated: expected 1474688, got 1383160
  Workgroup 46 miscalculated: expected 2982016, got 2796024
  Workgroup 61 miscalculated: expected 3965056, got 3716344
  Workgroup 63 miscalculated: expected 4096128, got 3841272
Trial 10:
  Workgroup 57 miscalculated: expected 3702912, got 3008488
  Workgroup 62 miscalculated: expected 4030592, got 3273960
  Workgroup 63 miscalculated: expected 4096128, got 3838712
  Workgroup 64 miscalculated: expected 4161664, got 3901944

Details of the machine I'm running on:

  • CPU: Intel Core i9 11900KF
  • GPU: Intel Arc Pro B50 (Intel Device E212)
  • OS: Linux Mint 22.3 (kernel 6.17)
  • Julia version 1.12, all packages at latest release (oneAPI v2.6.1, KernelAbstractions v0.9.41, Atomix v1.1.3)

I'm happy to provide further details of my setup if it would be useful for debugging. Here's what I know about the problem so far:

  • The use of local memory is essential; the problem goes away if the accumulator is placed in global memory.
  • Int32 and UInt32 both trigger the bug, so it does not appear to be related to signedness.
  • Int64 and UInt64 also trigger the bug if you change the lines:
        if isone(l)
            result[g] = a[1]
        end
    to instead say:
        if isone(l)
            a[1] &= 0x0000FFFFFFFFFFFF
            result[g] = a[1]
        end
    For some reason, this in-place mutation of local memory is necessary to trigger the bug with 64-bit integers. Writing, for example, result[g] = a[1] & 0x0000FFFFFFFFFFFF does not trigger the bug; it's the writeback to local memory that does it. Again, both Int64 and UInt64 are affected, so the bug appears unrelated to signedness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions