Skip to content

[DietCode] Local Padding#11793

Closed
ArmageddonKnight wants to merge 1 commit into
apache:mainfrom
UofT-EcoSystem:bojian/DietCode_Upstreaming
Closed

[DietCode] Local Padding#11793
ArmageddonKnight wants to merge 1 commit into
apache:mainfrom
UofT-EcoSystem:bojian/DietCode_Upstreaming

Conversation

@ArmageddonKnight

Copy link
Copy Markdown
Contributor

This PR is for the code generation changes required for dynamic MetaScheduler (see apache/tvm-rfcs#72 for the RFC, #11516 for the tracking issue describing the changes). Any feedback or comments are welcome.

FYI, @comaniac @junrushao1994

@comaniac

Copy link
Copy Markdown
Contributor

Also cc @Hzfengsy @vinx13 @spectrometerHBH @masahi

@junrushao junrushao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending out the PR! We might want to deliberate on the implementation to ensure its correctness. To be clear, running a pass inside a Schedule class will invalidate all the scheduling states and thus lead to incorrect results.

@Hzfengsy Hzfengsy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to clarify some conceptions so that we can be on the same page:

  1. postproc is one of the stages in meta-schedule, which also means it is part of Schedule
  2. During Scheduling, we can only mutate mod with schedule primitives. IRMutator is not allowed.
  3. Schedule transformations with primitives can be traced by printing the tracing path.

However, this PR tries to directly mutate mod with IRMutator in postproc (also at schedule stages). I recommend doing the following steps and we can move on:

  1. Add a schedule primitive called "padding", which can pad local buffers during schedule
  2. Call the padding primitive at postproc

Please let me know if you have any questions @ArmageddonKnight

@comaniac

Copy link
Copy Markdown
Contributor

Per offline discussion with @junrushao1994 and @ArmageddonKnight, here is the current action items:

  1. The local padding pass will be moved to TIR transformation, meaning that local padding becomes an implicit transformation similar to loop partitioning. A config will be exposed to control whether to turn on or off (default off) to keep all current workloads unchanged.
  2. In the local padding implementation, the logic related to var node name hints will be improved to leverage a more reliable factor (e.g., pointer reference).

@ArmageddonKnight ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from e4d5ee8 to c27bd64 Compare June 23, 2022 20:50
@ArmageddonKnight

Copy link
Copy Markdown
Contributor Author

@junrushao1994 @Hzfengsy I have finished the revision. Please have a second look when you have time.

Also cc @comaniac

@ArmageddonKnight

ArmageddonKnight commented Jun 26, 2022

Copy link
Copy Markdown
Contributor Author

It seems that for some reason the CI build is stopped (as I am unable to query the current CI status), would it be possible to re-trigger the CI.

@ArmageddonKnight

Copy link
Copy Markdown
Contributor Author

@tvm-bot rerun

Comment thread src/driver/driver_api.cc
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
Comment thread src/tir/transforms/local_pad.cc Outdated
PrimExpr predicate_lhs_;

friend class LocalPadder;
};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I would love to propose that we restructure the logic of this class a little bit.

Looks like the analyzer is interested in the following patterns:

if A </<=/>=/> X:
  B[Y] = ...

If so, there isn't much reason to use a visitor pattern because recursion didn't actually happen. Instead, let's go with a more plain and readable fashion, for example

// inputs:
IfThenElse if_then_else;
// extract the lhs & rhs of the if-condition
PrimExpr predicate_lhs{nullptr};
PrimExpr predicate_rhs{nullptr};
if (const auto *op = if_then_else->condition.as<LENode>()) {
  predicate_lhs = op->a;
  predicate_rhs = op->a;
} else if (...) {
  // use a macro or something to deal with LT, GE, GT
}
// then let's analyze the body statement
const BufferStoreNode* buffer_store = if_then_else->then_case.as<BufferStoreNode>();
ICHECK(buffer_store);
if (StructuralEqual()(buffer_store->indices[0], predicate_lhs)) {
  ... // some logic here
} else {
  ... // some logic here
}

@ArmageddonKnight ArmageddonKnight Jul 5, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid we cannot directly do it like this. The reason is because the predicates are usually combined together into a single one and hence we need some way of splitting them. The implementation you provide might not be able to handle situations like the following:

if (inlineable_predicate1 && non_inlineable_predicate2 && inlineable_predicate3)
  A_shared[...] = A[...];

|  // after transformation

if (noninlineable_predicate2)
  A_shared[...] = inlineable_predicate1 && inlineable_predicate3 ? A[...] : padded_value;

@junrushao junrushao dismissed their stale review July 4, 2022 01:46

The PR has been going towards positive direction

Comment thread tests/python/unittest/test_tir_transform_local_pad.py Outdated
@renfeier

Copy link
Copy Markdown

hi, @ArmageddonKnight
it seems the tvm transform config "tir.enable_local_pad "does not work since the same schedule build result kernel src code are the same when set config tir.enable.lcal_pad true/false, when i use the test example you upload before, example code will show belows:

def save_kernel_source(kernel, log_kernel_filename):
kernel_src=kernel.imported_modules[0].get_source()
if log_kernel_filename is not None:
with open(log_kernel_filename, 'w') as fout:
fout.write("{}".format(kernel_src))
else:
print({}.format(kernel_src))

@tvm.testing.requires_gpu
@tvm.testing.requires_cuda
def test_dense_local_padding():
"""
Test that local padding is delivering the correct compute outcome.
"""
x_np = np.random.uniform(-0.1, 0.1, size=(960, 770)).astype(np.float32)
w_np = np.random.uniform(-0.1, 0.1, size=(770, 2304)).astype(np.float32)
y_np = np.matmul(x_np, w_np)
y_empty = np.empty(shape=y_np.shape, dtype=y_np.dtype)
tir_sched = Schedule(Dense_960x770x2304)
sample_dense_sched(tir_sched)
with tvm.transform.PassContext(config={"tir.enable_local_pad": False}):
nopad_cuda_kernel = tvm.build(tir_sched.mod["main"], [], target="cuda")
save_kernel_source(nopad_cuda_kernel, "nolocalpad_kernel.cu")
with tvm.transform.PassContext(config={"tir.enable_local_pad": True}):
cuda_kernel = tvm.build(tir_sched.mod["main"], [], target="cuda")
save_kernel_source(cuda_kernel, "localpad_kernel.cu")

 cuda_ctx = tvm.cuda()
 module_data = [x_np, w_np, y_empty]
 module_data = [tvm.nd.array(d, device=cuda_ctx) for d in module_data]
 cuda_kernel(*module_data)
 np.testing.assert_allclose(module_data[-1].numpy(), y_np, atol=1e-3, rtol=1e-3)

the localpad_kernel.cu are same with nolocalpad_kernel.cu

@ArmageddonKnight

Copy link
Copy Markdown
Contributor Author

@renfeier The reason is ebcause we are refactoring the implementation, so the pass itself is temporarily commented out. Sorry I was quite busy with university business and will finish the refactoring recently.

@renfeier

Copy link
Copy Markdown

refactoring
@ArmageddonKnight
Thank you for the prompt reply. Looking forward to your update

@ArmageddonKnight

ArmageddonKnight commented Aug 6, 2022

Copy link
Copy Markdown
Contributor Author

@junrushao1994 As was discussed, I have fixed the implementation. Please review it again.

@ArmageddonKnight

ArmageddonKnight commented Aug 8, 2022

Copy link
Copy Markdown
Contributor Author

Hmm ... seems that the Cortex CI pipelines are always interrupted for some reason, and this is happening on the main branch as well.

@ArmageddonKnight ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from 20c9054 to 80bb3ee Compare August 8, 2022 23:38
@ArmageddonKnight

Copy link
Copy Markdown
Contributor Author

@junrushao1994 The refactored implementation has passed the CI tests. Please review it when you have time available. Thanks.

@ArmageddonKnight ArmageddonKnight force-pushed the bojian/DietCode_Upstreaming branch from 0ce3787 to 5fa292a Compare August 26, 2022 05:10
@ArmageddonKnight

ArmageddonKnight commented Aug 30, 2022

Copy link
Copy Markdown
Contributor Author

Hi @junrushao , it has been sometime since this PR is submitted. May I know whether there are any updates on this? And whether further changes are required?

@areusch areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
@masahi

masahi commented Dec 12, 2022

Copy link
Copy Markdown
Member

@ArmageddonKnight @junrushao What is the status of this PR or DietCode upstreaming in general? I'm interested in dynamic shape tuning, and I can help this effort.

@masahi

masahi commented Dec 13, 2022

Copy link
Copy Markdown
Member

This looks similar to #12750, maybe we don't need this? cc @vinx13

@vinx13

vinx13 commented Dec 13, 2022

Copy link
Copy Markdown
Member

@masahi PadEinsum can achieve something similar since the padding is in the shared memory

@tqchen tqchen closed this Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants