SpacemiT A100 (SpacemiT K3) uarch-tool benchmarks
VLEN: 1024
Detect all1s tail/mask policy with simple code snippet:
Tail agnostic policy: undisturbed
Mask agnostic policy: undisturbed
Is vl always set to min(AVL,VLMAX): yes
Note: spec allows ceil(AVL/2)<=vl<=VLMAX for VLMAX<AVL<2*VLMAX
Measures how LMUL scheduling impacts when results are ready:
A) LMUL=8 v0 overlap with LMUL=1 v0: 963.3002700 cycles/iter
B) LMUL=8 v0 overlap with LMUL=1 v3: 960.7684240 cycles/iter
C) LMUL=8 v0 overlap with LMUL=1 v7: 972.4773225 cycles/iter
D) LMUL=8 v0 overlap with LMUL=1 v8: 964.3978443 cycles/iter
E) LMUL=8 v0 overlap with LMUL=1 v0..v8: 909.1791667 cycles/iter
- The difference between A and C indices that results for the upper part of a vector register group gets completed later than the lower part, even when long dependencies are present.
Measures overhead of reinterpreting a mask as a vector:
A) reinterpret: 20.4754600 cycles/iter
B) don't reinterpret: 20.0241394 cycles/iter
- There seems to be a minimal overhead in reinterpreting a mask register.