Previously the pattern to extract status flags from inline assembly
blocks was to use setcc in the block to write the flag to a register.
This was suboptimal in a few ways:
- It would lead to code like: sete %cl; test %cl; jne, i.e. a flag would just be loaded into a register and then reloaded to a flag.
- The setcc would force the block to use an additional register.
- If the client code didn't care for the flag value then the setcc would be entirely pointless but could not be eliminated by the optimizer.
A more modern inline asm construct (since gcc 6, and recent clang)
allows for "flag output operands", where a C variable can be written
directly from a flag. The optimizer can then use this to produce direct
code where the flag does not take a trip through a register.
In practice this makes each affected operation sequence shorter by five
bytes of instructions. It's unlikely this has a measurable performance
impact.