Skip to content

selector: fuse mul+add → mla — flat_flight filter emits separate mul+add where native uses MLA #257

@avrabe

Description

@avrabe

Lever #2 from the flat_flight gap decomposition: fuse mul + addmla

Per the 262→103 gap decomposition, after const-CSE the next instruction-selection lever for flat_flight is multiply-accumulate fusion. Measured on current main (f6a0c96, cortex-m4f):

synth lowers the filter's gyro*980 + accel*20 as separate mul then add (2 sites — pitch + roll axes):

movw r4, #0x3d4        ; 980
mul  r5, r3, r4        ; r5 = gyro * 980
movw r7, #0x14         ; 20
mul  r8, r6, r7        ; r8 = accel * 20
add.w r2, r5, r8       ; r2 = r5 + r8   ← fuse this add into the mul

native (gcc -O2) uses MLA — the add is free:

mla r2, r7, r6, r2     ; r2 = 980 * accel + r2   (one instruction)
mla r4, r7, r5, r4

The transform

Peephole: when a mul rD, rA, rB result feeds exactly one add rE, rD, rC (rD not otherwise live), rewrite to mla rE, rA, rB, rC and drop the mul. Cortex-M4 has single-cycle MLA. Saves 1 instruction + 1 temp register per site — 2 sites in flat_flight, and it recurs in any a*k1 + b*k2 filter/accumulator (very common in control code).

Bonus adjacent: multiply-by-constant strength reduction

The multipliers here are constants (980, 20). Native strength-reduces *20 to add.w r,r,r,lsl#2; lsls #2 (= *5 *4), avoiding the movw #20; mul entirely. A mul-by-small-constant → shift/add peephole would compound with the MLA fusion. (Lower priority than the MLA fold itself.)

Scope

Pure instruction selection (no regalloc dependency), so it composes cleanly with the const-CSE/spill work on the VCR-RA-001 track. flat_flight-microbench (261) + controller (168) are staged — I'll post the silicon delta when it lands. Filing per my offer on #209; close as dup/wontfix if it's already on the roadmap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions