Implement the ffs and fls instructions, and their longer counterparts, in cpufunc, in terms of gcc extensions like __builtin_ffs, for riscv architectures, and use those, rather than simple libkern implementations, in building riscv kernels.
This is a clone of D20250, which was for arm64.