Skip to content

[ARITH] Tight bound for floormod#6771

Merged
tqchen merged 4 commits into
apache:mainfrom
hzfan:i64_regression_pr
Oct 28, 2020
Merged

[ARITH] Tight bound for floormod#6771
tqchen merged 4 commits into
apache:mainfrom
hzfan:i64_regression_pr

Conversation

@hzfan

@hzfan hzfan commented Oct 27, 2020

Copy link
Copy Markdown
Contributor

Estimate the range of floormod(a, b) when b < 0. Fix #6691

Comment thread src/arith/const_int_bound.cc Outdated
Comment on lines +262 to +279
/* let a / b = x + y, where x is integer, y \in [0, 1)
* floormod(a, b) = a - floordiv(a, b) * b
* floordiv(a, b) = x
* floormod(a, b) = a - floordiv(a, b) * b
* = a - x * b
* = a - (a / b - y) * b
* = a - a + y * b
* = y * b
* note that 0 <= y < 1
* when b > 0, 0 <= b * y < b
* 0 <= b * y <= b - 1
* when b < 0, b < b * y <= 0
* b + 1 <= b * y <= 0
* In all cases, min(0, b + 1) <= b * y <= max(0, b - 1)
* min(0, b_min + 1) <= b * y <= max(0, b_max - 1)
* That is, min(0, b_min + 1) <= floormod(a, b) <= max(0, b_max - 1)
*/
int64_t b_min_cap = InfAwareAdd(b.min_value, 1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: why don't you move this (very well done) explanation at the beginning of the function? It seems to cover more than just the else branch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks for the suggestion.

@tqchen

tqchen commented Oct 27, 2020

Copy link
Copy Markdown
Member

@hzfan can you also comment why negative b appears in the case of #6691, is it because we did not capture bound of b? Usually negative bound should not appear

@tqchen tqchen added the status: need test case need test cases to cover the change label Oct 27, 2020
@hzfan

hzfan commented Oct 28, 2020

Copy link
Copy Markdown
Contributor Author

@tqchen The ir (before narrowing) is like

for (ax0.ax1.fused.ax2.outer.fused: int64, 0i64, 42i64) "parallel" {
  for (n.oc_chunk.fused.oh.outer.fused: int64, 0i64, a_very_long_upper_bound) {
    ...
    ...
    for (oh.inner: int32, 0, 2) {
      for (ow.inner: int32, 0, 14) {
        for (oc_block: int32, 0, 16) "vectorized" {
          ...floormod(n.oc_chunk.fused.oh.outer.fused, cast(int64, floordiv(((cast(int32, ((floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64) + 1i64)) + 1) - cast(int32, (floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64))), 2)))...
        }
      }
    }
  }
}

Let a = n.oc_chunk.fused.oh.outer.fused (the first operand of floormod), b = cast(int64, floordiv(((cast(int32, ((floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64) + 1i64)) + 1) - cast(int32, (floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64))), 2)) (second operand of floormod)
const_int_bound gives 0 <= a <= 6 and -5 <= b <= 7.

In a simplified form,

b = floordiv( cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2), 2)
c = floormod(ax0.ax1.fused.ax2.outer.fused, 7)

I guess cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2) is not simplified as 2, instead, its bound is analyzed directly, so the min becomes negative due to the minus operation.

@hzfan

hzfan commented Oct 28, 2020

Copy link
Copy Markdown
Contributor Author

It sort of make sense, because at the narrowing pass, we still don't know c * 2 + 1 fits in i32 and the cast cannot be removed yet.

I added the above example as a test. The FPS improves to 400 with INDEX_DEFAULT_I64=ON

@tqchen

tqchen commented Oct 28, 2020

Copy link
Copy Markdown
Member

I see, in this particular case, perhaps it makes sense to optimize such pattern and make sure cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2) get simplified as well. One way to do so is to first change i32 casts to i64 before narrow.

It would also be useful to find out why cast(value, i32) is inserted since we preferred i64 in most cases.

We can do that in another PR. This PR's improvement is certainly useful

@tqchen tqchen merged commit b4858d4 into apache:main Oct 28, 2020
@tqchen tqchen added status: accepted and removed status: need test case need test cases to cover the change labels Oct 28, 2020
@tqchen

tqchen commented Oct 28, 2020

Copy link
Copy Markdown
Member

THanks @hzfan @giuseros . It would also be great if we can followup further on the above case:

  • Find out why cast i32 is inserted (ideally we should be all in i64 in compute mode)
  • Consider insert a cast removing pass if cast i32 persists, or update rewrite simplify to handle cast

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Oct 29, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

3 participants