Release the constraint to enable e8mf8 in EEW=32 #1613

sequencer · 2024-08-24T14:48:32Z

Since T1 implemented LMUL=1/8 in EEW=32 case. we submit this PR for consideration:
making e8mf8 type to be allowed when VLEN>=64 and EEW=32, the common case of VLEN is VLEN>>EEW, e.g. VLEN=64/128/256, and EEW=32.
On the other hand, e8mf8 doesn't make sense in the architecture design that the pair of (e8mf8, e16mf4, e32mf2) can always be replaced by (e8mf4, e16mf2, e32m1).
However, when specification allows SEW_min=4 in the future, e4mf8 might find its place for some edge AI scenario.

But at least, if we don't change the specification(to disallow e8mf8 in EEW=32), I think we still need to change the reason why e8mf8 not being allowed in the specification.

gfavor · 2024-08-24T20:02:21Z

src/v-st-ext.adoc

+that of the vector register width.  In general, the requirement is to 
+support LMUL {ge} SEW~MIN~/VLEN, where SEW~MIN~ is the narrowest supported
+SEW value and VLEN is the length of vector register.  In the standard
+extensions, SEW~MIN~=8.  For standard vector extensions with VLEN=32,


This changed text expresses different requirements than the original text and, in essence, is a material change to the ratified spec. For example, it no longer expresses a requirement for implementations with VLEN>64 to support fractional LMULs.

At the same time, this change seem sunnecessary since an implementation with VLEN>64 is already free to support e8mf8. The requirement expressed in the ELEN=32 example does not imply a limit on mf8 support when at least one full 8-bit element can be supported (i.e. with VLEN>32). It is just expressing minimum requirements.

And, in fact, the initial statement "Implementations must provide ..." effectively expesses a requirement that e8mf8 must be supported in VLEN>32 implementations since such implementations can hold at least one 8-bit element in a vector register. And the second sentence expresses a general minimum requirement - which the further sentences expand on with concrete examples.

Now I would agree that the first sentence could be clarified a little bit. For example saying "Implementations must provide fractional LMUL settings that allow at least one element of the narrowest supported type to occupy a fraction of a vector register corresponding to the ratio of the narrowest supported type's width to that of the largest supported type's width

aswaterman

I'd echo what Greg said.

I do foresee the addition of 4-bit types, but that support would come in the form of new ISA extensions. Those ISA extensions might impose additional constraints on what values vtype must support. It's neither necessary nor appropriate to impose those stricter requirements on all implementations.

Some other thoughts. Generic support for 4-bit types is probably not what's needed. Often we either want to perform mixed-precision arithmetic (e.g. 16b += 4b x 8b), which doesn't cleanly fit into this framework. Or we want to perform dot products, which do not rely on narrower SEW/LMUL. (Consider the vqdot proposal, which has both input EEW and output EEW the same; we perform 4-input dot products of 8b numbers into 32b accumulators using SEW=32.) In neither case is it clear that the proposed change is helpful.

sequencer · 2024-08-30T04:25:12Z

I agree 16b(e16mf2) += 4b(e4mf8) x 8b(e8mf4) is a good point, the source of problem is vector datatype is encoded in instruction opcodes + vcsr, rather than providing tag to each elements(it is expensive), adding such instruction and csr might resolve this issue, but that's totally another isa.
I'm also not so sure how to add int4 support in our lane-based design, we cannot implementation the vqdot due it complicate the lane-based design datapath.(huge wire congestion...)

I updated this PR, and ask for another turn of review: clarify the constraints: not adding, but allowing the implementation of mf8 in EEW=32 when VLEN>=64. and adding the reason why e8mf8 not useful in the EEW=32.

e8mf8 type is allowed when VLEN>=64 and EEW=32, the common case of VLEN is VLEN>EEW, e.g. VLEN=64/128/256, and EEW=32. Signed-off-by: Jiuyang Liu <[email protected]>

gfavor reviewed Aug 24, 2024

View reviewed changes

aswaterman requested changes Aug 24, 2024

View reviewed changes

sequencer added 3 commits September 1, 2024 12:55

release the constraint to enable e8mf8 in EEW=32

ae496fb

e8mf8 type is allowed when VLEN>=64 and EEW=32, the common case of VLEN is VLEN>EEW, e.g. VLEN=64/128/256, and EEW=32. Signed-off-by: Jiuyang Liu <[email protected]>

add Jiuyang Liu to contributors

b6667f8

fixup: release the constraint to enable e8mf8 in EEW=32

40bc05c

sequencer force-pushed the patch-1 branch from c0287c2 to 40bc05c Compare September 1, 2024 04:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release the constraint to enable e8mf8 in EEW=32 #1613

Release the constraint to enable e8mf8 in EEW=32 #1613

sequencer commented Aug 24, 2024 •

edited

Loading

gfavor Aug 24, 2024

aswaterman left a comment

sequencer commented Aug 30, 2024

Release the constraint to enable e8mf8 in EEW=32 #1613

Are you sure you want to change the base?

Release the constraint to enable e8mf8 in EEW=32 #1613

Conversation

sequencer commented Aug 24, 2024 • edited Loading

gfavor Aug 24, 2024

Choose a reason for hiding this comment

aswaterman left a comment

Choose a reason for hiding this comment

sequencer commented Aug 30, 2024

sequencer commented Aug 24, 2024 •

edited

Loading