EV Tensor ABI Proposal
Date: 2026-04-02 Status: proposal only Scope: host/firmware ABI design for EV kernels
Goal
Replace graph-specific EV config structs with:
- one shared ABI header
- one shared tensor descriptor
- one shared physical-layout descriptor
- one optional quantization descriptor
- small per-op parameter structs
This is intended for all EV kernels, not only tess/detess.
Why Change
The current EV graph ABIs have drifted in two bad ways:
- different graphs evolved different struct shapes
- some graphs are half-migrated and carry both legacy scalar fields and array-based fields
Examples from the current tree:
internals/sima-ai-cvu-sw/graphs/tessellate/config/tessellate_config.hmixes array fields with duplicated legacy scalar fields.internals/sima-ai-cvu-sw/graphs/detessellate/config/detessellate_config.his already array-based but different from tess.internals/sima-ai-cvu-sw/graphs/quantize/config/quantize_config.handinternals/sima-ai-cvu-sw/graphs/dequantize/config/dequantize_config.hrepeat the same pattern with different per-op fields.internals/sima-ai-cvu-sw/graphs/detessdequant/config/detessdequant_config.his a larger multi-head variant of the same idea.
The result is more ABI drift, harder roundtrip validation, and easier host/device mismatch.
Design Principles
- Logical tensor view and physical storage layout are different concepts.
sizes + stridesare authoritative for the logical tensor, like PyTorch.- Tiled EV storage needs its own first-class descriptor.
- Bus addresses should be the preferred canonical address domain for EV dispatch.
- Per-op configs should stay small and readable.
- Versioning must be explicit.
Proposed Header
Draft header:
core/include/pipeline/EvTensorAbi.h
The proposal introduces:
sima_ev_abi_headersima_ev_storage_descsima_ev_strided_descsima_ev_tiled_descsima_ev_quant_descsima_ev_tensor_desc
Then each kernel embeds those descriptors in a small op-specific config, for example:
sima_ev_tess_config_v1sima_ev_detess_config_v1sima_ev_quantize_config_v1sima_ev_dequantize_config_v1sima_ev_detessdequant_config_v1sima_ev_preproc_config_v1
Why This Shape
Better Than Parallel Arrays
Parallel arrays like:
input_width_array[32]input_height_array[32]tile_width_array[32]input_dtype_array[32]
are compact, but they are brittle:
- fields drift independently
- host/device generation mismatches are easy
- validation is awkward
- multi-IO graphs duplicate logic
array-of-structs is better than struct-of-parallel-arrays for ABI stability here.
Better Than One Giant Universal Config Blob
One giant transform_config for every EV graph is too blunt:
preprochas crop/resize/color params that do not belong toquantizequantizehas rounding/saturation params that do not belong totessdetessdequantnaturally wants many input/output descriptors
So the right layering is:
- shared descriptor family
- per-op wrappers
Mapping To Current EV Kernels
Tessellate
Logical input:
- dense NDHWC or NCHW-style tensor described by
logical.sizesandlogical.strides_bytes
Physical output:
- tiled descriptor with explicit tile sizes, tile traversal order, channel blocking, and tile alignment
Recommended config:
sima_ev_tess_config_v1
Detessellate
Logical output:
- dense tensor described exactly the same way as tess input
Physical input:
- tiled descriptor that must exactly match tess output layout
Recommended config:
sima_ev_detess_config_v1
This is the key contract win: tess and detess can roundtrip through the same descriptor family.
Quantize
Logical input:
- dense tensor
Logical output:
- dense tensor
Quantization:
- carried in
sima_ev_quant_desc
Per-op params:
- rounding mode
- saturation mode
Recommended config:
sima_ev_quantize_config_v1
Dequantize
Logical input:
- dense tensor with quant metadata
Logical output:
- dense tensor with floating-point dtype
Recommended config:
sima_ev_dequantize_config_v1
QuantTess
This should not invent a new descriptor shape. It should be one op that:
- consumes a dense logical tensor with quant metadata
- produces a tiled output tensor
That can use the same sima_ev_tensor_desc building blocks plus a small combined-op wrapper.
DetessDequant
This is the main multi-head case. It should use:
- one shared ABI header
- a list of input tensor descriptors
- a list of output tensor descriptors
- a per-head params block where needed
Recommended config:
sima_ev_detessdequant_config_v1
Address Domain
Why bus is the preferred canonical domain:
- EV hardware dereferences device-visible addresses, not host CPU physical addresses
- the current platform already pins several working graphs to
bus/bus - the current shared helper reflects that in
internals/sima-ai-soc-pipeline/config_manager/include/configManagerCommon.h - the current policy note records graph-specific correctness differences in
internals/sima-ai-soc-pipeline/config_manager/doc/evxx_graph_address_policy.md
The proposal still keeps addr_space explicit during migration.
Migration Plan
- Freeze the descriptor vocabulary
- land the shared header
- do not change live kernels yet
- Convert host builders first
- make config-manager emit the new descriptors in parallel with old configs
- add roundtrip/validation helpers from the shared descriptors
- Convert single-IO kernels
- tess
- detess
- quantize
- dequantize
- Convert combined and multi-head kernels
- quanttess
- detessdequant
- preproc
- Delete legacy per-graph ABI structs
- once host and firmware both use the shared descriptor family
Non-Goals
- This proposal does not try to encode every preproc algorithm detail into the shared tensor descriptor.
- This proposal does not force every kernel to use one identical top-level config blob.
- This proposal does not remove graph-specific validation; it gives those validators a common descriptor vocabulary.
Recommendation
Adopt the shared descriptor ABI in:
core/include/pipeline/EvTensorAbi.h
Then migrate EV kernels in this order:
- tessellate
- detessellate
- quantize
- dequantize
- quanttess
- detessdequant
- preproc
That gives the cleanest path to one durable EV ABI without repeating the current half-legacy, half-array drift.