Memory bandwidth in terms of SM number

Hi, really thanks for the nice tool!

Just some quick questions, I'm not familiar with underlying designs of GPUs, and curious about the used memory bandwidth in terms of SM used number.

Can we only use one SM to saturate the memory bandwidth theoretically? Or will there be some contentions if multiple SMs are involved for accessing data in global memory, comparing only using one SM?

I'm not sure if it's correct to test the memory bandwidth achieved by, e.g., only one SM, by changing the following code to `const int TOTAL_BLOCKS=1;`. 

https://github.com/ekondis/gpumembench/blob/838a4fcd5e6bdf8d09a10da10fb4b98bf484771a/cachebench-cuda/cache_kernels.cu#L353

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory bandwidth in terms of SM number #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Memory bandwidth in terms of SM number #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions