Fushed push with bounds check

discussion of forwardcom instruction set and corresponding hardware and software

Moderator: agner

Post Reply
James
Posts: 5
Joined: 2024-01-28, 18:02:15

Fushed push with bounds check

Post by James »

I don't know if it makes sense at the hardware level, but the common dynamic array push operation wants to do a bounds check either immediately before or after the write.
agner
Site Admin
Posts: 177
Joined: 2017-10-15, 8:07:27
Contact:

Re: Fushed push with bounds check

Post by agner »

The addressing mode with bounds check works only if the array size is fixed and known at compile time.
The push instruction will signal an error if writing to inaccessible memory.
The push instruction is already quite complex at the hardware level. Adding still more complexity might be critical. It would also need an extra operand for the stack limit.

I think it is easier to just insert a compare instruction before or after the push. The compare instruction can jump to an error branch which is more efficient than an error trap.
HubertLamontagne
Posts: 80
Joined: 2017-11-17, 21:39:51

Re: Fushed push with bounds check

Post by HubertLamontagne »

Yeah, this is std::vector::push_back(), right? This is at least 5 micro-ops even in the very best case:
- Load [array size, current allocation size, pointer to data]
- (second micro-op from 24byte unaligned vector load)
- Increment array size and check that it's lower or equal to the allocated size, else jump to an allocation routine
- Write [new array size]
- Write [data] at [pointer + old array size]

Compare this to 7 micro-ops in a more conventional architecture:
- Load [array size]
- Load [current allocation size]
- Load [data pointer]
- Increment array size
- Compare and jump to allocation routine if larger
- Write [new array size]
- Write [data] at [pointer + old array size]
Here's example of what a C++ compiler outputs for this: https://godbolt.org/z/vYhMYbeWE

Essentially your savings would come from 2 sources:

- Squeezing the 3 initial loads into 2 loads: That's a lot of lifting to save 1 contiguous memory load in my opinion. I think conventional architectures have better schemes to deal with this, such as adding more data cache load ports, data cache banking (so that the loads come from different banks), and contiguous multi-register load instructions such as ARM64's Load Pair instruction (LDP).

- Combining the array size increment and the compare-and-jump together: That's actually not crazy. Incrementing a register by a small immediate and comparing to another register and jumping is a common sequence of operations, you see it often on loop ends. It would still be a kinda large instruction but it has many properties that make it OK for the typical pipelines: it only reads 2 registers, it only writes 1 register, it only does 1 conditional jump and always to the same address, it doesn't load or save from memory or cause traps. It's also OK for compilers (they can simply combine the contiguous ADD and the COMPARE-AND-BRANCH-IF-LARGER).
James
Posts: 5
Joined: 2024-01-28, 18:02:15

Re: Fushed push with bounds check

Post by James »

Nice
It would also allow pushing vector registers without needing to carry around their size separately
Post Reply