Page MenuHomeSerpent OS

Boulder: Default -march
Open, HighPublic


Support for the new x86_64-[v234] march option appears in the Upcoming GCC-11 and Clang-12. Currently boulder is plumbed to operate with -march=haswell as the default as it basically covers Haswell and Zen generations onwards. Changing to x86_64-v3 may reduce performance without leading to any real changes in supported CPUs.

Differences in the compilers
The main differences between the 2 are dropping support for fsgsbase, pclmul, rdrnd and xsaveopt which are all supported by zen CPUs. That doesn't really seem like a positive as software like zlib utilise pclmul explicitly. We may need to add a couple of additional options to the FLAGS to ensure full compatibility (either adding this support explicitly or excluding some others).

GCC: PTA_HLE (-mno-hle; deprecated from Clang years ago)
Clang: FeatureERMSB (doesn't seem to plumbed in) | FeatureINVPCID (-mno-invpcid)

Can test what tunables are enabled when selecting a -march level via gcc -march={haswell,x86_64-v3,znver1} -Q --help=target


  • Wait for release of new compiler releases
    • Clang 12 (in bootstrap)
    • Gcc 11 (tested in bootstrap)
  • Analyse benchmarks between a small core system built with haswell/x86_64-v3 (unnecessary since I'm including the differences)
  • Determine which approach to take

Event Timeline

sunnyflunk created this task.
sunnyflunk updated the task description. (Show Details)

Spot the difference (and note it matches the read of the source)

< #define __tune_haswell__ 1
< #define __core_avx2__ 1
< #define __XSAVEOPT__ 1
< #define __core_avx2 1
< #define __PCLMUL__ 1
< #define __haswell 1
< #define __tune_core_avx2__ 1
< #define __RDRND__ 1
< #define __FSGSBASE__ 1
< #define __haswell__ 1

> #define __k8 1
> #define __k8__ 1
<   -march=                                     haswell
>   -march=                                     x86-64-v3

<   -mfsgsbase                                  [enabled]
>   -mfsgsbase                                  [disabled]

<   -mhle                                       [enabled]
>   -mhle                                       [disabled]

<   -mpclmul                                    [enabled]
>   -mpclmul                                    [disabled]

<   -mrdrnd                                     [enabled]
>   -mrdrnd                                     [disabled]

<   -mtune=                                     haswell
>   -mtune=                                     generic

<   -mxsaveopt                                  [enabled]
>   -mxsaveopt                                  [disabled]

-march=x86-64-v3 -mtune=haswell -mfsgsbase -mpclmul -mrdrnd -mxsaveopt becomes functionally equivalent to -march=haswell (only -mhle different, but it's not on zen chips)

Based on this, no benchmarks will be necessary and will either keep haswell or move to -march=x86-64-v3 -mtune=skylake -mfsgsbase -mpclmul -mrdrnd -mxsaveopt (note that -mtune doesn't actually do anything according to GCC output) so that the oddities of -mhle (and -minvpcid with clang) are removed from the compiler.