Since GGML now has direct conv2d support for CPU and Vulkan we might want to try it out here and see if it helps. Compared to im2col this uses less memory and should run faster on GPUs that don't have matrix cores.
As a quick test I naively switched all instances of ggml_conv_2d in the code with ggml_conv_2d_direct and replaced the ggml directory with the one from llama.cpp. Right now it generates images fine on CPU (it's a bit slower than im2col) but it fails with a segfault on Vulkan.
Since GGML now has direct conv2d support for CPU and Vulkan we might want to try it out here and see if it helps. Compared to im2col this uses less memory and should run faster on GPUs that don't have matrix cores.
As a quick test I naively switched all instances of
ggml_conv_2din the code withggml_conv_2d_directand replaced the ggml directory with the one from llama.cpp. Right now it generates images fine on CPU (it's a bit slower than im2col) but it fails with a segfault on Vulkan.