Paths

Table of Contentst

amd64 atomic.h: minor codegen optimization in flag access
ClosedPublic
Actions

Authored by rlibby on Feb 28 2020, 5:50 AM.

Details

Reviewers

kib
markj
mjg

Commits

rS358439: amd64 atomic.h: minor codegen optimization in flag access

Summary

Previously the pattern to extract status flags from inline assembly
blocks was to use setcc in the block to write the flag to a register.
This was suboptimal in a few ways:

It would lead to code like: sete %cl; test %cl; jne, i.e. a flag would just be loaded into a register and then reloaded to a flag.
The setcc would force the block to use an additional register.
If the client code didn't care for the flag value then the setcc would be entirely pointless but could not be eliminated by the optimizer.

A more modern inline asm construct (since gcc 6, and recent clang)
allows for "flag output operands", where a C variable can be written
directly from a flag. The optimizer can then use this to produce direct
code where the flag does not take a trip through a register.

In practice this makes each affected operation sequence shorter by five
bytes of instructions. It's unlikely this has a measurable performance
impact.

Test Plan

kyua test -k /usr/tests/sys/Kyuafile

Manual inspection of generated code. E.g.

generic_stop_cpus:

Before:

ffffffff80c23fff: b8 ff ff ff ff        movl    $4294967295, %eax
ffffffff80c24004: f0                    lock
ffffffff80c24005: 0f b1 0d 88 17 cf 00  cmpxchgl        %ecx, 13571976(%rip)
ffffffff80c2400c: 0f 94 c1              sete    %cl
ffffffff80c2400f: 84 c9                 testb   %cl, %cl
ffffffff80c24011: 0f 84 0b 01 00 00     je      267 <generic_stop_cpus+0x1a2>

After:

ffffffff80c23907: b8 ff ff ff ff        movl    $4294967295, %eax
ffffffff80c2390c: f0                    lock
ffffffff80c2390d: 0f b1 0d 80 1e cf 00  cmpxchgl        %ecx, 13573760(%rip)
ffffffff80c23914: 74 2e                 je      46 <generic_stop_cpus+0xc4>

smr_lazy_advance:

Before:

ffffffff80c25ebc: 8b 46 0c              movl    12(%rsi), %eax
ffffffff80c25ebf: 89 c2                 movl    %eax, %edx
ffffffff80c25ec1: 29 da                 subl    %ebx, %edx
ffffffff80c25ec3: 85 d2                 testl   %edx, %edx
ffffffff80c25ec5: 7e 08                 jle     8 <smr_advance+0x10f>
ffffffff80c25ec7: f0                    lock
ffffffff80c25ec8: 0f b1 5e 0c           cmpxchgl        %ebx, 12(%rsi)
ffffffff80c25ecc: 0f 94 c2              sete    %dl
ffffffff80c25ecf: 89 c8                 movl    %ecx, %eax
ffffffff80c25ed1: f0                    lock
ffffffff80c25ed2: 0f b1 5e 08           cmpxchgl        %ebx, 8(%rsi)
ffffffff80c25ed6: 0f 94 c1              sete    %cl

After:

ffffffff80c257bc: 8b 46 0c              movl    12(%rsi), %eax
ffffffff80c257bf: 89 c2                 movl    %eax, %edx
ffffffff80c257c1: 29 da                 subl    %ebx, %edx
ffffffff80c257c3: 85 d2                 testl   %edx, %edx
ffffffff80c257c5: 7e 05                 jle     5 <smr_advance+0x10c>
ffffffff80c257c7: f0                    lock
ffffffff80c257c8: 0f b1 5e 0c           cmpxchgl        %ebx, 12(%rsi)
ffffffff80c257cc: 89 c8                 movl    %ecx, %eax
ffffffff80c257ce: f0                    lock
ffffffff80c257cf: 0f b1 5e 08           cmpxchgl        %ebx, 8(%rsi)

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rlibby created this revision.Feb 28 2020, 5:50 AM

Harbormaster completed remote builds in B29661: Diff 68928.Feb 28 2020, 5:50 AM

Thank you for this.

This revision is now accepted and ready to land.Feb 28 2020, 12:36 PM

I only have a minor comment that the feature is somewhat new in clang, I don't know if we care [that is, if we should version check]. I also happened to have a complete rewrite here D23661, but I think this should go in regardless of what happens with that review.

markj accepted this revision.Feb 28 2020, 2:54 PM

In D23869#524905, @mjg wrote:

I only have a minor comment that the feature is somewhat new in clang, I don't know if we care [that is, if we should version check]. I also happened to have a complete rewrite here D23661, but I think this should go in regardless of what happens with that review.

Thanks for the pointer.

Yes, clang 9, it looks like. I'll update the commit log with that version number.

Given that this is not MFC'd, it should be enough that the in-tree clang version supports it, I think? I'd rather not add #ifdef __GCC_ASM_FLAG_OUTPUTS__. That would be a possibility, but maybe worse than just doing nothing.

Yea, just go for it. Should anyone complain we can revisit.

Closed by commit rS358439: amd64 atomic.h: minor codegen optimization in flag access (authored by rlibby). · Explain WhyFeb 28 2020, 6:32 PM

This revision was automatically updated to reflect the committed changes.

rlibby added a commit: rS358439: amd64 atomic.h: minor codegen optimization in flag access.

Herald added a subscriber: imp. · View Herald TranscriptFeb 28 2020, 6:32 PM

How does this compare to using the __atomic compiler builtins?

I've changed CheriBSD RISCV to use them (https://github.com/CTSRD-CHERI/cheribsd/blob/master/sys/riscv/include/atomic.h) and am considering submitting a patch to use them for all architectures.

In D23869#525257, @arichardson wrote:

How does this compare to using the __atomic compiler builtins?

I've changed CheriBSD RISCV to use them (https://github.com/CTSRD-CHERI/cheribsd/blob/master/sys/riscv/include/atomic.h) and am considering submitting a patch to use them for all architectures.

Please see D23661

This has been supported since GCC 6/clang 9 (https://reviews.llvm.org/D57394). It would be better if the logic were conditioned on __GCC_ASM_FLAG_OUTPUTS__.

Now, on FreeBSD 12.1 (cc is clang 8.0.1), I cannot build world with WITHOUT_CLANG_BOOTSTRAP=yes...

Revision Contents
Changeset List

Path

Size

head/

sys/

amd64/

include/

atomic.h

18 lines

Diff 68957

View Options

head/sys/amd64/include/atomic.h

amd64 atomic.h: minor codegen optimization in flag accessClosedPublicActions