Tenstorrent Ascalon-X uarch-tool benchmarks
VLEN: 256
Detect all1s tail/mask policy with simple code snippet:
Tail agnostic policy: all1s
Mask agnostic policy: all1s
Is vl always set to min(AVL,VLMAX): yes
Note: spec allows ceil(AVL/2)<=vl<=VLMAX for VLMAX<AVL<2*VLMAX
Measures how LMUL scheduling impacts when results are ready:
A) LMUL=8 v0 overlap with LMUL=1 v0: 96.9213867 cycles/iter
B) LMUL=8 v0 overlap with LMUL=1 v3: 95.1469726 cycles/iter
C) LMUL=8 v0 overlap with LMUL=1 v7: 126.3798217 cycles/iter
D) LMUL=8 v0 overlap with LMUL=1 v8: 80.5019226 cycles/iter
E) LMUL=8 v0 overlap with LMUL=1 v0..v8: 153.7512207 cycles/iter
Measures overhead of reinterpreting a mask as a vector:
A) reinterpret: 8.2517700 cycles/iter
B) don't reinterpret: 8.6268005 cycles/iter