背景 RISC-V 与厂商扩展 RISC-V 是一种模块化的指令集架构。在基础指令集之上,既有官方的标准扩展,也允许硬件厂商定义自己的扩展。
https://en.wikipedia.org/wiki/RISC-V#ISA_base_and_extensions
以 LicheePi 4A 搭载的 th1520 SoC 为例,其使用的处理器核心为玄铁 C910,支持的指令集为 rv64gc_zfh_xtheadba_xtheadbb_xtheadbs_xtheadcmo_xtheadcondmov_xtheadfmemidx_xtheadfmv_xtheadint_xtheadmac_xtheadmemidx_xtheadmempair_xtheadsync
其中 g
是 imafd_zicsr_zifencei
的缩写,rv64i
指代 RISC-V 64-bit 整数基础指令集,其余单字母和 z
开头的为标准扩展,x
开头的为厂商扩展,thead
是阿里平头哥的前缀:
标准扩展
i
: base interger isa
m
: multiply / division
a
: atomic
f
: single-precision floating-point
d
: double-precision floating-point
zicsr
: control status register
zifencei
: i-cache flush (for i-d consistency)
c
: compressed (16 bit) instruction
zfh
: half-precision floating-point
平头哥厂商扩展
xtheadba
: address calculations
xtheadbb
: basic bit-manipulation
xtheadbs
: single-bit instructions
xtheadcmo
: cache management operations
xtheadcondmov
: conditional moves
xtheadfmemidx
: floating-point memory operations
xtheadfmv
: double floating-point high-bit data transmission instructions
xtheadint
: acceleration interruption instructions
xtheadmac
: multiply-accumulate
xtheadmemidx
: GPR memory operations
xtheadmempair
: two-GP-register memory operations
xtheadsync
: multi-processor synchronization
https://wiki.sipeed.com/hardware/zh/lichee/th1520/lp4a.html https://www.xrvm.cn/product/xuantie/C910
标准的工具链主线已经逐渐加入了对这些扩展的支持,我们可以通过 gcc -march
参数来启用这些扩展,这将有助于提高程序在支持这些扩展的硬件上运行时的性能。
https://gcc.gnu.org/gcc-13/changes.html#riscv https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Options.html
用于 LicheePi 4A 的 NixOS 前人已经成功在 x86_64 上通过交叉编译构建出了用于 LicheePi 4A 的 NixOS 系统镜像,尝试在此基础上启用厂商扩展的支持。
https://thiscute.world/posts/how-nixos-start-on-licheepi4a/ https://github.com/ryan4yin/nixos-licheepi4a
虽然本文的过程是基于 LicheePi 4A 的,但逻辑上是可以用于任意 RISC-V 厂商扩展的。
问题 好消息是,NixOS 在对 ARM 的支持过程中已经提供了较为完善的交叉编译和 gcc march
参数的支持,我们可以用类似下面的方式指定交叉编译的目标架构和 gcc 的参数:
1 2 3 4 5 nixpkgs.crossSystem = { config = "riscv64-unknown-linux-gnu" ; gcc.arch = "rv64gc_zfh_xtheadba_..." ; gcc.abi = "lp64d" ; };
https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Options.html
坏消息是,仅仅启用 gcc 的支持是不够的。NixOS 的交叉编译过程需要使用 QEMU 来模拟目标架构的运行环境,从而运行编译出的程序,来进行测试、生成初始配置等工作。
好消息是,QEMU 对厂商自定义扩展的支持也日趋完善,我们可以通过 -cpu
参数或 QEMU_CPU
环境变量指定指令集:
1 qemu-riscv64 -cpu rv64gc,zfh=true ,xtheadba=true ,... <binary>
1 QEMU_CPU="rv64gc,zfh=true,xtheadba=true,..." qemu-riscv64 <binary>
https://wiki.qemu.org/ChangeLog/8.0#RISC-V
坏消息是,NixOS 的交叉编译过程不仅没有提供直接的方式来指定 QEMU 的 CPU 参数,而且在其编译过程中使用 meson 和 binfmt 等各种或隐式或显示的方式来调用 QEMU。没有统一的调用方式,也就难以找到统一的传参方式。
例如,构建 fontconfig
时,NixOS 通过手动调用 QEMU,运行 fontconfig
来生成缓存:
https://github.com/NixOS/nixpkgs/blob/7d66df760c0d524479a6f946e34963fb055211e0/pkgs/development/libraries/fontconfig/make-fonts-cache.nix 1 2 3 4 5 6 7 8 runCommand "fc-cache" { } '' # ... ${stdenv.hostPlatform.emulator buildPackages} ${lib.getExe' fontconfig "fc-cache" } -sv # ... ''
再比如,构建 gdk-pixbuf
时,NixOS 在 postInstall
钩子脚本中手动调用 QEMU:
https://github.com/NixOS/nixpkgs/blob/291addf97dbb30867590494b0dab8ffbb39abd20/pkgs/development/libraries/gdk-pixbuf/default.nix 1 2 3 4 5 6 7 8 postInstall = '' # ... '' + lib.optionalString withIntrospection '' # We need to install 'loaders.cache' in lib/gdk-pixbuf-2.0/2.10.0/ ${stdenv.hostPlatform.emulator buildPackages} $dev/bin/gdk-pixbuf-query-loaders --update-cache '' ;
还有些直接使用 meson 构建系统的交叉编译配置文件 cross-file.conf
的 exe_wrapper
参数来让 meson 自动调用 QEMU
1 2 3 4 5 6 7 nativeBuildInputs = [ ] ++ lib.optionals (! stdenv.buildPlatform.canExecute stdenv.hostPlatform) [ mesonEmulatorHook ];
https://github.com/NixOS/nixpkgs/blob/fba6f87e2635373ac77608841d9a239ea35a410d/pkgs/top-level/all-packages.nix#L2265-L2284 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 mesonEmulatorHook = makeSetupHook { name = "mesonEmulatorHook" ; substitutions = { crossFile = writeText "cross-file.conf" '' [binaries] exe_wrapper = '${lib.escape [ "'" "\\ " ] (stdenv.targetPlatform.emulator pkgs)} ' '' ; }; } ( if (! stdenv.hostPlatform.canExecute stdenv.targetPlatform) then ../by-name/me/meson/emulator-hook.sh else throw "mesonEmulatorHook may only be added to nativeBuildInputs when the target binaries can't be executed; however you are attempting to use it in a situation where ${stdenv.hostPlatform.config} can execute ${stdenv.targetPlatform.config} . Consider only adding mesonEmulatorHook according to a conditional based canExecute in your package expression." );
最后还有一些特例,在构建 gobjetct-introspection
时,NixOS 通过一个自定义的 wrapper 脚本来调用 QEMU:
https://github.com/NixOS/nixpkgs/blob/fa42801050c1d56f70c783cf5f43fd79f3ab542a/pkgs/development/libraries/gobject-introspection/wrapper.nix 1 2 3 4 5 6 7 buildCommand = '' # ... export emulator=${lib.escapeShellArg (stdenv.targetPlatform.emulator buildPackages)} export emulatorwrapper="$dev/bin/g-ir-scanner-qemuwrapper" # ... substituteAll "${./ wrappers/ g-ir-scanner-qemuwrapper.sh} " "$dev/bin/g-ir-scanner-qemuwrapper" ''
https://github.com/NixOS/nixpkgs/blob/fa42801050c1d56f70c783cf5f43fd79f3ab542a/pkgs/development/libraries/gobject-introspection/wrappers/g-ir-scanner-qemuwrapper.sh 1 2 3 exec @emulator@ ${GIR_EXTRA_OPTIONS:-} \ ${GIR_EXTRA_LIBS_PATH:+-E LD_LIBRARY_PATH="${GIR_EXTRA_LIBS_PATH} "} \ "$@ "
更糟糕的是,RISC-V 的做法是在指令编码空间预留出给厂商自定义扩展的空间,厂商并不需要保证其自定义扩展的编码不与其它非标准扩展冲突。因此,在拿到一个裸二进制数据时,QEMU 完全没有办法仅根据其编码来判断其属于哪一个厂家的哪一个扩展。这就要求必须人为向 QEMU 传入正确的 CPU 参数,否则就会出现 illegal instruction 异常,甚至模拟出错误的行为。
尝试解决 nix 灵活的 overlay 机制给了我们在不修改 nixpkgs 绝大部分内容的情况下正确传参的机会,我们可以想办法在所有调用 QEMU 的地方加入 -cpu
参数,或者在调用前设置 QEMU_CPU
环境变量。
不过,鉴于本人使用 nix 的经验也就半年,以下方案可能不是最优的,仅供参考。
mesonEmulatorHook 复用 mesonEmulatorHook 来通过 cross-file.conf
向 meson 传参的包很多,只要能解决它就能解决大部分问题。
注意到,cross-file.conf
的 exe_wrapper
参数可以是一个 list,我们可以将其从一个独立的 $emulator
字符串替换为一个类似 [$emulator, '-cpu', $qemu-cpu]
的列表,即可实现向 QEMU 传参。
overlay 实现如下:
overlay.nix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 { qemu-cpu, ... }: self: super: { mesonEmulatorHook = super.makeSetupHook { name = "mesonEmulatorHook" ; substitutions = { crossFile = super.writeText "cross-file.conf" '' [binaries] exe_wrapper = ['${super.lib.escape [ "'" "\\ " ] (super.stdenv.targetPlatform.emulator super.buildPackages)} ', '-cpu', '${qemu-cpu} '] '' ; }; } ( if (super.stdenv.buildPlatform != super.stdenv.targetPlatform) then ./emulator-hook.sh else throw "mesonEmulatorHook may only be added to nativeBuildInputs when the target binaries can't be executed; however you are attempting to use it in a situation where ${super.stdenv.hostPlatform.config} can execute ${super.stdenv.targetPlatform.config} . Consider only adding mesonEmulatorHook according to a conditional based canExecute in your package expression." ); }
对症下药 剩下的大多都是在构建脚本中手动调用 QEMU 的情况,如前述 fontconfig
的例子,我们可以直接 override 掉其构建脚本:
make-fonts-cache.nix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 { buildPackages, fontconfig, lib, runCommand, stdenv, qemu-cpu, }: runCommand "fc-cache" { } '' # ... export QEMU_CPU=${qemu-cpu} # added ${stdenv.hostPlatform.emulator buildPackages} ${lib.getExe' fontconfig "fc-cache" } -sv # ... ''
进而创建 overlay 来使用新的构建脚本
overlay.nix 1 2 3 4 5 6 7 8 9 10 11 12 { qemu-cpu, ... }: self: super: { makeFontsCache = if (super.stdenv.buildPlatform != super.stdenv.targetPlatform) then super.callPackage ./make-fonts-cache.nix { inherit qemu-cpu; } else super.makeFontsCache ; }
而对于一些在 postInstall
等钩子中调用 QEMU 的包,要修改就更加简单,直接创建 overlay 来在原来的钩子脚本前后添加 export QEMU_CPU
即可:
overlay.nix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 { qemu-cpu, ... }: self: super: { gdk-pixbuf = if (super.stdenv.buildPlatform != super.stdenv.targetPlatform) then super.gdk-pixbuf.overrideAttrs (old: { postInstall = '' export QEMU_CPU=${qemu-cpu} '' + old.postInstall; }) else super.gdk-pixbuf ; }
gobjetct-introspection 针对这个特例,我们需要修改 wrapper 脚本来添加一个 emulatorargs
参数,直接向 QEMU 传参:
g-ir-scanner-qemuwrapper.sh 1 2 3 exec @emulator@ @emulatorargs@ ${GIR_EXTRA_OPTIONS:-} \ ${GIR_EXTRA_LIBS_PATH:+-E LD_LIBRARY_PATH="${GIR_EXTRA_LIBS_PATH} "} \ "$@ "
然后在 overlay 中覆盖原始的 buildCommand 来使用新的 wrapper 脚本,并向脚本传入 emulatorargs
参数:
overlay.nix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 { qemu-cpu, ... }: self: super: { gobject-introspection = if (super.stdenv.buildPlatform != super.stdenv.targetPlatform) then super.gobject-introspection.overrideAttrs (old: { buildCommand = old.buildCommand + '' ( export emulator=${super.lib.escapeShellArg (super.stdenv.targetPlatform.emulator super.buildPackages)} export emulatorargs="-cpu ${qemu-cpu} " substituteAll "${./ g-ir-scanner-qemuwrapper.sh} " "$dev/bin/g-ir-scanner-qemuwrapper" chmod +x "$dev/bin/g-ir-scanner-qemuwrapper" ) '' ; }) else super.gobject-introspection ; }
写到这里突然反应过来了,其实 emulator
是个任意字符串,完全可以不使用额外的 emulatorargs
参数,也就不需要修改 wrapper 脚本,直接在 overlay 中覆盖 buildCommand
即可:
overlay.nix 1 2 3 export emulator= "${super.lib.escapeShellArg (super.stdenv.targetPlatform.emulator super.buildPackages)} -cpu ${qemu-cpu} "
结果 经过这些修改,可以成功编译出启用了厂商扩展的 NixOS 镜像,刷入板卡后运行 gcc -v
可以看到编译器的配置中包含了 xthead
厂商扩展:
1 2 3 4 5 6 7 8 Using built-in specs. COLLECT_GCC=/nix/store/8algs7jgq516m2v1di5v15l9n1d7c288-gcc-riscv64-unknown-linux-gnu-13.2.0/bin/gcc COLLECT_LTO_WRAPPER=/nix/store/8algs7jgq516m2v1di5v15l9n1d7c288-gcc-riscv64-unknown-linux-gnu-13.2.0/libexec/gcc/riscv64-unknown-linux-gnu/13.2.0/lto-wrapper Target: riscv64-unknown-linux-gnu Configured with: ../gcc-13.2.0/configure --prefix=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-riscv64-unknown-linux-gnu-13.2.0 --with-gmp-include=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gmp-with-cxx-riscv64-unknown-linux-gnu-6.3.0-dev/include --with-gmp-lib=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gmp-with-cxx-riscv64-unknown-linux-gnu-6.3.0/lib --with-mpfr-include=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-mpfr-riscv64-unknown-linux-gnu-4.2.1-dev/include --with-mpfr-lib=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-mpfr-riscv64-unknown-linux-gnu-4.2.1/lib --with-mpc=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-libmpc-riscv64-unknown-linux-gnu-1.3.1 --with-native-system-header-dir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-riscv64-unknown-linux-gnu-2.39-52-dev/include --with-build-sysroot=/ --with-gxx-include-dir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-riscv64-unknown-linux-gnu-13.2.0/include/c++/13.2.0/ --program-prefix= --enable-lto --disable-libstdcxx-pch --without-included-gettext --with-system-zlib --enable-static --enable-languages=c,c++ --disable-multilib --disable-plugin --disable-libcc1 --with-isl=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-isl-riscv64-unknown-linux-gnu-0.20 --with-arch=rv64gc_zfh_xtheadba_xtheadbb_xtheadbs_xtheadcmo_xtheadfmemidx_xtheadfmv_xtheadmac_xtheadmemidx_xtheadsync --with-abi=lp64d --disable-bootstrap --build=x86_64-unknown-linux-gnu --host=riscv64-unknown-linux-gnu --target=riscv64-unknown-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.2.0 (GCC)
Configured with:
一行中:
1 --with-arch=rv64gc_zfh_xtheadba_xtheadbb_xtheadbs_xtheadcmo_xtheadfmemidx_xtheadfmv_xtheadmac_xtheadmemidx_xtheadsync --with-abi=lp64d
简单跑一个 CorkMark 测试:
with xthead* 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 16599 Total time (secs): 16.599000 Iterations/Sec : 3614.675583 Iterations : 60000 Compiler version : GCC14.2.1 20250322 Compiler flags : -O2 -march=rv64gc_zfh_xtheadba_xtheadbb_xtheadbs_xtheadcmo_xtheadcondmov_xtheadfmemidx_xtheadfmv_xtheadint_xtheadmac_xtheadmemidx_xtheadmempair_xtheadsync -DPERFORMANCE_RUN=1 -lrt Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0xbd59 Correct operation validated. See README.md for run and reporting rules. CoreMark 1.0 : 3614.675583 / GCC14.2.1 20250322 -O2 -march=rv64gc_zfh_xtheadba_xtheadbb_xtheadbs_xtheadcmo_xtheadcondmov_xtheadfmemidx_xtheadfmv_xtheadint_xtheadmac_xtheadmemidx_xtheadmempair_xtheadsync -DPERFORMANCE_RUN=1 -lrt / Heap
native 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 12419 Total time (secs): 12.419000 Iterations/Sec : 3220.871246 Iterations : 40000 Compiler version : GCC14.2.1 20250322 Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0x25b5 Correct operation validated. See README.md for run and reporting rules. CoreMark 1.0 : 3220.871246 / GCC14.2.1 20250322 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
约有 12% 的性能提升。
仓库 https://github.com/ngc7331/nixos-licheepi4a/tree/25.05-xthead