How to add an FPU to SiFive FE310 RISC-V

Dumi Loghin
5 min readMar 5, 2020

--

In this story, I will show you how to add a “hardfloat” Floating Point Unit (FPU) to a RISC-V core and run it on an FPGA. Specifically, I am using the SiFive Freedom E310 project that allows running a Rocket Chip core on an Artix-7 FPGA.

First, you should follow this tutorial and successfully run the “Hello World” program on the FPGA. We are going to use those steps in our story.

In this story, I am using an Artix-7 100T Arty FPGA Evaluation Kit, Vivado v2019.2 (64-bit) on Ubuntu 18.04.4 LTS, and riscv64-unknown-elf-gcc-8.3.0–2019.08.0-x86_64-linux-ubuntu14 and riscv-openocd-0.10.0–2019.08.2-x86_64-linux-ubuntu14 from SiFive.

Problem

On the hardware side, we look at the original freedom repository. We can see in src/main/scala/everywhere/e300artydevkit/Config.scala that DefaultFreedomEConfig is based on TinyConfig. In src/main/scala/system/Configs.scala, TinyConfig is based on With1TinyCore. Then, in src/main/scala/subsystem/Configs.scala, With1TinyCore has “fpu = None”.

On the software side, when compiling a program from the original freedom-e-sdk, you can see that a 32-bit “IMAC” architecture is used:

$ make BSP=metal PROGRAM=hello TARGET=freedom-e310-arty 
...
softwareCFLAGS=”-march=rv32imac -mabi=ilp32 -mcmodel=medlow ...

“IMAC” means that the ISA has the Base Integer Instruction Set (i) + Standard Extension for Integer Multiplication and Division (m) + Standard Extension for Atomic Instructions (a) + Standard Extension for Compressed Instructions (c ) (see RISC-V ISA base and extensions on Wikipedia).

Solution

Software — lets use a program that computes the constant e (Euler’s number, the base of natural logarithms) using an infinite series:

int i, n;
float e = 2.0;
float fact = 1.0;
for (i = 2; i < n; i++) {
fact = fact / i;
e = e + fact;
}

You can find the C code in my fork of the freedom-e-sdk repository. When you compile this program for TARGET=freedom-e310-arty, the resulting .elf has no floating point instruction:

$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty clean
$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty software
$ riscv64-unknown-elf-objdump -d software/euler/debug/euler.elf | grep fadd
<no result>

The settings for target freedom-e310-arty are in bsp/freedom-e310-arty folder. Lets copy freedom-e310-arty folder under bsp in a new folder called freedom-e310-arty-64bit-fpu. Next, we replace the first (without comments) three lines in settings.mk to:

RISCV_ARCH=rv64imafdc
RISCV_ABI=lp64d
RISCV_CMODEL=medany

We changed the ISA to 64-bit and added the Standard Extension for Single-Precision Floating-Point (f) + Standard Extension for Double-Precision Floating-Point (d). The ABI specifies that long and pointer are 64-bit and the FPU registers can hold 64-bit double values. Now, when we compile our test program for the new target and check for floating point instructions, we get:

$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu clean
$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu software
$ riscv64-unknown-elf-objdump -d software/euler/debug/euler.elf | grep fadd
20400290: 00f777d3 fadd.s fa5,fa4,fa5

Hardware — lets upgrade the tiny core to 64-bit. In my fork of the freedom repository, in src/main/scala/subsystem/Configs.scala, I first added:

class With1Tiny64bitCore extends Config((site, here, up) => {
case XLen => 64
case RocketTilesKey => List(RocketTileParams(
core = RocketCoreParams(
useVM = false,
mulDiv = Some(MulDivParams(mulUnroll = 8))),
btb = None,
dcache = Some(DCacheParams(
rowBits = site(SystemBusKey).beatBits,
nSets = 256, // 16Kb scratchpad
nWays = 1,
nTLBEntries = 4,
nMSHRs = 0,
blockBytes = site(CacheBlockBytes),
scratch = Some(0x80000000L))),
icache = Some(ICacheParams(
rowBits = site(SystemBusKey).beatBits,
nSets = 64,
nWays = 1,
nTLBEntries = 4,
blockBytes = site(CacheBlockBytes)))))
case RocketCrossingKey => List(RocketCrossingParams(
crossingType = SynchronousCrossing(),
master = TileMasterPortParams()
))
})

Second, lets add an FPU in src/main/scala/subsystem/Configs.scala:

class With1Tiny64bitFPUCore extends Config((site, here, up) => { 
case RocketTilesKey => up(RocketTilesKey, site) map { r =>
r.copy(core = r.core.copy(
fpu = r.core.fpu.map(_.copy(fLen = 64))))
}
})

Third, we add a new configuration named TinyFPUConfig for the system in src/main/scala/system/Configs.scala:

class TinyFPUConfig extends Config(
new WithNoMemPort ++
new WithNMemoryChannels(0) ++
new WithNBanks(0) ++
new With1Tiny64bitCore ++
new With1Tiny64bitFPUCore ++
new BaseConfig)

Fourth, we use this configuration in src/main/scala/everywhere/e300artydevkit/Config.scala by replacing TinyConfig with TinyFPUConfig in the DefaultFreedomEConfig class.

Finally, we recompile:

make BOARD=arty_a7_100 -f Makefile.e300artydevkit clean
make BOARD=arty_a7_100 -f Makefile.e300artydevkit verilog
make BOARD=arty_a7_100 -f Makefile.e300artydevkit mcs

or just use make.sh in my repository.

Next, follow the steps described in the original tutorial to upload the mcs image builds/e300artydevkit/obj/E300ArtyDevKitFPGAChip.mcs to the FPGA.

Debug — lets debug the code running on the FPGA using gdb. I recommend using three terminals. In the first terminal, compile and upload the code:

$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu clean
$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu software
$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu upload

Before running the upload command, I recommend changing scripts/upload by commenting the last three commands and running openocd in the foreground:

# $openocd -f $cfg &
$openocd -f $cfg
# $gdb $elf — batch -ex “set remotetimeout 240” -ex “target extended-remote localhost:${GDB_PORT}” -ex “monitor reset halt” -ex “monitor flash protect 0 64 last off” -ex “load” -ex “monitor resume” -ex “monitor shutdown” -ex “quit”# kill %1

In the second terminal, open the UART (it may be /dev/ttyUSB1 or /dev/ttyUSB2, you need to check your setup):

$ miniterm /dev/ttyUSB1 57600

In the third terminal, go to software/euler in my repository. Note that there is a .gdbinit file that sets a break-point to the return statement of main and prints the hexa value of e.

set remotetimeout 300
target remote localhost:3333
load
break euler.c:45
cont
x &e
x e
quit

You need to add the path to this file in your $HOME/.gdbinit to be able to use it. So add a line like:

add-auto-load-safe-path /home/<yourhome>/git/freedom-e-sdk/software/euler/.gdbinit

Now, run gdb and you should get something like:

$ riscv64-unknown-elf-gdb debug/euler.elf 
GNU gdb (SiFive GDB 8.3.0–2019.08.0) 8.3

Reading symbols from debug/euler.elf…
main (argc=1, argv=0x20405cb8) at euler.c:45
45 return 0;
Loading section .init, size 0x19e lma 0x20400000
Loading section .text, size 0x598a lma 0x20400200
Loading section .rodata, size 0x618 lma 0x20405b90
Loading section .init_array, size 0x8 lma 0x204061a8
Loading section .data, size 0xb00 lma 0x204061b0
Start address 0x20400000, load size 27720
Transfer rate: 40 KB/sec, 4620 bytes/write.
Breakpoint 1 at 0x204002a8: file euler.c, line 45.
Note: automatically using hardware breakpoints for read-only addresses.
Breakpoint 1, main (argc=1, argv=0x20405cb8) at euler.c:45
45 return 0;
0x80001008: 0x402df855
0x2: 0x00000000
A debugging session is active.
Inferior 1 [Remote target] will be detached.Quit anyway? (y or n) [answered Y; input not from terminal]
[Inferior 1 (Remote target) detached]

In the second terminal, you should see:

Euler’s number with 20 iterations:

Unfortunately, the floating point value is not printed, I suppose there is an issue with formatting floating point in the BSP software. I will investigate this in a future post.

However, if you take the hexa value 0x402df855 from gdb and convert it (e.g. you can use this online converter) you get a correct value of 2.71828. Done!

--

--

Dumi Loghin

I am a Research Fellow in Computer Science with experience in parallel and distributed systems, blockchain, and performance evaluation.