Skip to content

[Metaschedule] Add test case for multi-anchor subgraph#10856

Merged
junrushao merged 9 commits into
apache:mainfrom
masahi:e2e-multi-anchor
Apr 1, 2022
Merged

[Metaschedule] Add test case for multi-anchor subgraph#10856
junrushao merged 9 commits into
apache:mainfrom
masahi:e2e-multi-anchor

Conversation

@masahi

@masahi masahi commented Apr 1, 2022

Copy link
Copy Markdown
Member

This adds a demonstration of extracting, scheduling, and e2e-compiling relay subgraphs with multiple anchor ops. Since task extraction is not associated with TE scheduling anymore, extracting a subgraph with multiple anchor TE compute just works.

The test case manually creates a simple fused mod with two relay.dense. But in the future, an effort like #9628 should make it easier to construct multi-anchor subgraphs.

The extracted TensorIR block corresponding to two TE dense compute looks like this:

@tvm.script.ir_module
class Module:
    @T.prim_func
    def main(placeholder: T.Buffer[(128, 128), "float32"], placeholder_1: T.Buffer[(128, 128), "float32"], placeholder_2: T.Buffer[(128, 128), "float32"], T_matmul_NT: T.Buffer[(128, 128), "float32"]) -> None:
        # function attr dict
        T.func_attr({"global_symbol": "main", "tir.noalias": True})
        # body
        # with T.block("root")
        T_matmul_NT_1 = T.alloc_buffer([128, 128], dtype="float32")
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(placeholder[i, k], placeholder_1[j, k])
                T.writes(T_matmul_NT_1[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_1]})
                with T.init():
                    T_matmul_NT_1[i, j] = T.float32(0)
                T_matmul_NT_1[i, j] = T_matmul_NT_1[i, j] + placeholder[i, k] * placeholder_1[j, k]
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT_1"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(T_matmul_NT_1[i, k], placeholder_2[j, k])
                T.writes(T_matmul_NT[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_2]})
                with T.init():
                    T_matmul_NT[i, j] = T.float32(0)
                T_matmul_NT[i, j] = T_matmul_NT[i, j] + T_matmul_NT_1[i, k] * placeholder_2[j, k]
    

@junrushao1994 @csullivan @comaniac @mbs-octoml @mikepapadim

@masahi masahi force-pushed the e2e-multi-anchor branch from d895854 to b3a3a7c Compare April 1, 2022 03:31

@comaniac comaniac left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread tests/python/unittest/test_meta_schedule_multi_anchor.py Outdated
Comment thread src/relay/backend/te_compiler_cache.cc Outdated
Co-authored-by: Junru Shao <junrushao1994@gmail.com>

tune_rec = TuningRecord(sch.trace, [0.0], workload, tvm.target.Target(target), [])

database.commit_tuning_record(tune_rec)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junrushao1994 @zxybazh

I keep writing this database boilerplate for manual scheduling. I'm thinking about a clean API so that users don't have to go through the explicit task extraction -> database creation steps. Right now it looks like

    relay_mod = tvm.IRModule.from_expr(...)
    target = "llvm"    
    params = {"weight1": weight1_np, "weight2": weight2_np}

    def schedule_fn(task, sch):
        if "nn_dense_nn_dense" in task.task_name:
            schedule_dense_dense(sch)
            return True
        return False

    database = apply_manual_schedules(relay_mod, target, params, schedule_fn)

    with ApplyHistoryBest(database):
           ...

If this looks ok, I can PR it after this one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a DummyDatabase class in python/tvm/meta_schedule/testing/utils.py where we don't need to create json files for intermediate results, and I was wondering if we could further reduce boilerplate by enhancing that class. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah using the Dummy classes can be very helpful in tuning and it's acutally a good idea to provide new use interface as you mentioned, a schedule function to work on on the relay level. Let me know when you got the PR ready : )

@tmoreau89 tmoreau89 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@junrushao junrushao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niiiiiice work! Thanks @masahi!

@junrushao junrushao merged commit 93b255c into apache:main Apr 1, 2022
junrushao pushed a commit that referenced this pull request Apr 5, 2022
As discussed in #10856 (comment), add a utility under `meta_schedule/testing/utils.py` to clean up the database boilerplate. Also using `DummyDatabase` instead of `JsonDatabase` for further clean up, as suggested by @junrushao1994 .
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
This adds a demonstration of extracting, scheduling, and e2e-compiling relay subgraphs with multiple anchor ops. Since task extraction is not associated with TE scheduling anymore, extracting a subgraph with multiple anchor TE compute just works.

The test case manually creates a simple fused mod with two `relay.dense`. But in the future, an effort like apache#9628 should make it easier to construct multi-anchor subgraphs.

The extracted TensorIR block corresponding to two TE `dense` compute looks like this:

```
@tvm.script.ir_module
class Module:
    @T.prim_func
    def main(placeholder: T.Buffer[(128, 128), "float32"], placeholder_1: T.Buffer[(128, 128), "float32"], placeholder_2: T.Buffer[(128, 128), "float32"], T_matmul_NT: T.Buffer[(128, 128), "float32"]) -> None:
        # function attr dict
        T.func_attr({"global_symbol": "main", "tir.noalias": True})
        # body
        # with T.block("root")
        T_matmul_NT_1 = T.alloc_buffer([128, 128], dtype="float32")
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(placeholder[i, k], placeholder_1[j, k])
                T.writes(T_matmul_NT_1[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_1]})
                with T.init():
                    T_matmul_NT_1[i, j] = T.float32(0)
                T_matmul_NT_1[i, j] = T_matmul_NT_1[i, j] + placeholder[i, k] * placeholder_1[j, k]
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT_1"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(T_matmul_NT_1[i, k], placeholder_2[j, k])
                T.writes(T_matmul_NT[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_2]})
                with T.init():
                    T_matmul_NT[i, j] = T.float32(0)
                T_matmul_NT[i, j] = T_matmul_NT[i, j] + T_matmul_NT_1[i, k] * placeholder_2[j, k]
    
```
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
…#10876)

As discussed in apache#10856 (comment), add a utility under `meta_schedule/testing/utils.py` to clean up the database boilerplate. Also using `DummyDatabase` instead of `JsonDatabase` for further clean up, as suggested by @junrushao1994 .
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Apr 11, 2022
This adds a demonstration of extracting, scheduling, and e2e-compiling relay subgraphs with multiple anchor ops. Since task extraction is not associated with TE scheduling anymore, extracting a subgraph with multiple anchor TE compute just works.

The test case manually creates a simple fused mod with two `relay.dense`. But in the future, an effort like apache#9628 should make it easier to construct multi-anchor subgraphs.

The extracted TensorIR block corresponding to two TE `dense` compute looks like this:

```
@tvm.script.ir_module
class Module:
    @T.prim_func
    def main(placeholder: T.Buffer[(128, 128), "float32"], placeholder_1: T.Buffer[(128, 128), "float32"], placeholder_2: T.Buffer[(128, 128), "float32"], T_matmul_NT: T.Buffer[(128, 128), "float32"]) -> None:
        # function attr dict
        T.func_attr({"global_symbol": "main", "tir.noalias": True})
        # body
        # with T.block("root")
        T_matmul_NT_1 = T.alloc_buffer([128, 128], dtype="float32")
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(placeholder[i, k], placeholder_1[j, k])
                T.writes(T_matmul_NT_1[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_1]})
                with T.init():
                    T_matmul_NT_1[i, j] = T.float32(0)
                T_matmul_NT_1[i, j] = T_matmul_NT_1[i, j] + placeholder[i, k] * placeholder_1[j, k]
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT_1"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(T_matmul_NT_1[i, k], placeholder_2[j, k])
                T.writes(T_matmul_NT[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_2]})
                with T.init():
                    T_matmul_NT[i, j] = T.float32(0)
                T_matmul_NT[i, j] = T_matmul_NT[i, j] + T_matmul_NT_1[i, k] * placeholder_2[j, k]
    
```
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Apr 11, 2022
…#10876)

As discussed in apache#10856 (comment), add a utility under `meta_schedule/testing/utils.py` to clean up the database boilerplate. Also using `DummyDatabase` instead of `JsonDatabase` for further clean up, as suggested by @junrushao1994 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants