Update madrona upstream#248
Conversation
|
Hi @shacklettbp, (lldb) next
Process 5870 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000100001f48 viewer`main(argc=2, argv=0x000000016fdff348) at viewer.cpp:69:19
66 true;
67 #endif
68
-> 69 WindowManager wm {};
70 WindowHandle window = wm.makeWindow("GPUDrive", 1920, 1080);
71 render::GPUHandle render_gpu = wm.initGPU(0, { window.get() });
72
Target 0: (viewer) stopped.
(lldb) next
Process 5870 stopped
* thread #2, stop reason = breakpoint 2.3
frame #0: 0x000000018b890648 Foundation`-[NSThread main]
Foundation`-[NSThread main]:
-> 0x18b890648 <+0>: pacibsp
0x18b89064c <+4>: stp x29, x30, [sp, #-0x10]!
0x18b890650 <+8>: mov x29, sp
0x18b890654 <+12>: ldr x8, [x0, #0x8]
Target 0: (viewer) stopped.
(lldb) continue
Process 5870 resuming
OOM: 42
Process 5870 exited with status = 1 (0x00000001) While I am getting this error on MacOS, but I also see it on linux. |
|
Also, I am unable to get the headless to run on CUDA. (base) aarav@emerge2-desktop:~/gpudrive/build$ ./headless CUDA 1
Compiler Flags:
-I/home/aarav/gpudrive/external/madrona/src/mw/device/include
-I/home/aarav/gpudrive/external/madrona/src/common/../../include
-I/usr/local/cuda-12.8/targets/x86_64-linux/include
-std=c++20
-default-device
-rdc=true
-use_fast_math
-DMADRONA_GPU_MODE=1
-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP
-DCCCL_DISABLE_BF16_SUPPORT=1
-DCUB_DISABLE_BF16_SUPPORT=1
-arch
sm_89
-DMADRONA_MWGPU_NUM_SMS=(76_i32)
-DMADRONA_MWGPU_MAX_BLOCKS_PER_SM=(1_i32)
-dopt=on
--extra-device-vectorization
-lineinfo
-dlto
-DMADRONA_MWGPU_LTO_MODE=1
-DMADRONA_MWGPU_TASKGRAPH=1
Linker Flags:
-arch=sm_89
-ftz=1
-prec-div=0
-prec-sqrt=0
-fma=1
-optimize-unused-variables
-lineinfo
-lto
-verbose
Compiling GPU engine code:
/home/aarav/gpudrive/external/madrona/src/mw/device/memory.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/state.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/crash.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/consts.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/taskgraph.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/taskgraph_utils.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/sort_archetype.cpp
/home/aarav/gpudrive/external/madrona/src/mw/device/host_print.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../common/hashmap.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../common/navmesh.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../core/base.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../physics/physics.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../physics/geo.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../physics/xpbd.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../physics/tgs.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../physics/narrowphase.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../physics/broadphase.cpp
/home/aarav/gpudrive/external/madrona/src/mw/../render/ecs_system.cpp
/home/aarav/gpudrive/src/sim.cpp
/home/aarav/gpudrive/src/level_gen.cpp
/home/aarav/gpudrive/src/level_gen.cpp(280): warning #177-D: function "gpudrive::createFloorPlane" was declared but never referenced
static void createFloorPlane(Engine &ctx)
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
CUDA linking Failed!
/home/aarav/gpudrive/src/level_gen.cpp: link error: error: Linking globals named 'initBVHParams': symbol multiply defined!
ERROR 9 in nvvmCompileProgram callback
Error at /home/aarav/gpudrive/external/madrona/src/mw/cuda_exec.cpp:703 in auto madrona::compileCode(const char **, int64_t, const char **, int64_t, const char **, int64_t, const char **, int64_t, const MegakernelConfig *, int64_t, CompileConfig::OptMode, ExecutorMode, bool)::(anonymous class)::operator()(nvJitLinkResult) const
nvJitLink error: NVVM compilation error
Aborted (core dumped) |
|
@aaravpandya I checked out your branch and I get a segfault in loadPhysicsObjects when running ./headless CUDA 1. Am I missing some files or is this another bug? |
|
Can you also give me the command to run the viewer? |
|
It looks like another bug. I have been getting that sporadically too while running on CPU. To run the viewer, its simply |
|
Ok, I think I fixed the loadPhysicsObjects bug, it was a Madrona issue. By the way for future reference: This is unsafe because the std::string (return value of .string()) will be deconstructed immediately after this statement finishes, leaving the const char * pointers dangling. You need to put the std::strings in an array that keeps them in scope and then have a separate array of const char * that point to that array, unfortunately (I'll push this fix). |
|
@aaravpandya After my latest commit this branch works for me now. Let me know if you run into any problems. The one thing I notice is that when simulating multiple worlds, only the first world shows anything in the viewer, on both CPU and GPU backends. Has the viewer always behaved this way in GPUDrive? This isn't the case for our other environments. |
|
Thank you so much @shacklettbp . I see that it now works on CUDA and CPU. Was not able to fully test the viewer on ubuntu, but @daphne-cornelisse ran it and it went past the OOM error from previous. However, on mac, I am still getting the OOM error. I dont think I am out of memory because I have 48 gigs of ram. Regarding the viewer, I believe there was drop down from where we can select the world we want to view. I was actually going to ask you if you could provide us with some release notes / documentation regarding the new changes/features in madrona that we can use. For eg, I see that madrona has a new batch renderer and some new named tensor interfaces. Would like to know how we can use them. |
|
What is the full backtrace for the error you're getting on macos? It works on my macbook. Right, when I select anything other than world 0 in that drop down in the viewer, the world is blank (no entities). There are no release notes & minimal documentation unfortunately. We added the batch renderer as part of a SIGGRAPH Asia paper: https://madrona-engine.github.io/renderer.html. If you're interested in using it, probably it would make sense to setup a zoom call with everyone (me, anyone on your side, and Luc (the first author on the batch renderer paper)). The named tensor thing is kind of a TensorDict like interface, I'm using it for JAX interop currently. I don't think it's useful for you guys in its current state, are there any features you need in that space? |
|
So I get this same backtrace as before. I made sure I am on the latest commit btw and all submodules are updated. It fails at creating the WindowManager object. (lldb)
Process 98495 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000100001f48 viewer`main(argc=2, argv=0x000000016fdff348) at viewer.cpp:69:19
66 true;
67 #endif
68
-> 69 WindowManager wm {};
70 WindowHandle window = wm.makeWindow("GPUDrive", 1920, 1080);
71 render::GPUHandle render_gpu = wm.initGPU(0, { window.get() });
72
Target 0: (viewer) stopped.
(lldb) next
Process 98495 stopped
* thread #2, stop reason = breakpoint 1.11
frame #0: 0x000000018512c648 Foundation`-[NSThread main]
Foundation`-[NSThread main]:
-> 0x18512c648 <+0>: pacibsp
0x18512c64c <+4>: stp x29, x30, [sp, #-0x10]!
0x18512c650 <+8>: mov x29, sp
0x18512c654 <+12>: ldr x8, [x0, #0x8]
Target 0: (viewer) stopped.
(lldb) continue
Process 98495 resuming
OOM: 42
Process 98495 exited with status = 1 (0x00000001) Perhaps I am on some incompatible XCode or MacOS version? I am on |
|
I see, can you run it again under the debugger with the latest commit / submodule update and give the debugger backtrace? You should be able to get a proper back trace now. |
|
Hi @shacklettbp Sorry, I got distracted and didnt work on this PR more. Also, I see you have removed the sorting code from the taskgraph. Do we not need to sort archetypes manually anymore? Is it handled inside madrona now ? Thanks |
|
This is very confusing!! I removed the fprintf for the OOM statement, and it still prints it. Upon code recompile, I see that its recompiling the change - Is this OOM not being triggered in Perhaps something is wrong with my dependencies ? |
|
@shacklettbp So I tried this on a different mac (my work laptop) and it seems like I am able to run the viewer. This is a specific problem only with my local. We are going to merge this. Thanks for all the help :) |
8ada1fe to
edbaedd
Compare
No description provided.