How to add an FPU to SiFive FE310 RISC-V

5 min readMar 5, 2020

In this story, I will show you how to add a “hardfloat” Floating Point Unit (FPU) to a RISC-V core and run it on an FPGA. Specifically, I am using the SiFive Freedom E310 project that allows running a Rocket Chip core on an Artix-7 FPGA.

First, you should follow this tutorial and successfully run the “Hello World” program on the FPGA. We are going to use those steps in our story.

In this story, I am using an Artix-7 100T Arty FPGA Evaluation Kit, Vivado v2019.2 (64-bit) on Ubuntu 18.04.4 LTS, and riscv64-unknown-elf-gcc-8.3.0–2019.08.0-x86_64-linux-ubuntu14 and riscv-openocd-0.10.0–2019.08.2-x86_64-linux-ubuntu14 from SiFive.

Problem

On the hardware side, we look at the original freedom repository. We can see in src/main/scala/everywhere/e300artydevkit/Config.scala that DefaultFreedomEConfig is based on TinyConfig. In src/main/scala/system/Configs.scala, TinyConfig is based on With1TinyCore. Then, in src/main/scala/subsystem/Configs.scala, With1TinyCore has “fpu = None”.

On the software side, when compiling a program from the original freedom-e-sdk, you can see that a 32-bit “IMAC” architecture is used:

$ make BSP=metal PROGRAM=hello TARGET=freedom-e310-arty 
...
softwareCFLAGS=”-march=rv32imac -mabi=ilp32 -mcmodel=medlow ...

“IMAC” means that the ISA has the Base Integer Instruction Set (i) + Standard Extension for Integer Multiplication and Division (m) + Standard Extension for Atomic Instructions (a) + Standard Extension for Compressed Instructions (c ) (see RISC-V ISA base and extensions on Wikipedia).

Solution

Software — lets use a program that computes the constant e (Euler’s number, the base of natural logarithms) using an infinite series:

int i, n;
float e = 2.0;
float fact = 1.0;
for (i = 2; i < n; i++) {
  fact = fact / i;
  e = e + fact;
}

You can find the C code in my fork of the freedom-e-sdk repository. When you compile this program for TARGET=freedom-e310-arty, the resulting .elf has no floating point instruction:

$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty clean
$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty software
$ riscv64-unknown-elf-objdump -d software/euler/debug/euler.elf | grep fadd
<no result>

The settings for target freedom-e310-arty are in bsp/freedom-e310-arty folder. Lets copy freedom-e310-arty folder under bsp in a new folder called freedom-e310-arty-64bit-fpu. Next, we replace the first (without comments) three lines in settings.mk to:

RISCV_ARCH=rv64imafdc
RISCV_ABI=lp64d
RISCV_CMODEL=medany

We changed the ISA to 64-bit and added the Standard Extension for Single-Precision Floating-Point (f) + Standard Extension for Double-Precision Floating-Point (d). The ABI specifies that long and pointer are 64-bit and the FPU registers can hold 64-bit double values. Now, when we compile our test program for the new target and check for floating point instructions, we get:

$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu clean
$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu software
$ riscv64-unknown-elf-objdump -d software/euler/debug/euler.elf | grep fadd
    20400290: 00f777d3           fadd.s fa5,fa4,fa5

Hardware — lets upgrade the tiny core to 64-bit. In my fork of the freedom repository, in src/main/scala/subsystem/Configs.scala, I first added:

class With1Tiny64bitCore extends Config((site, here, up) => {
  case XLen => 64
  case RocketTilesKey => List(RocketTileParams(
    core = RocketCoreParams(
      useVM = false,
      mulDiv = Some(MulDivParams(mulUnroll = 8))),
    btb = None,
    dcache = Some(DCacheParams(
      rowBits = site(SystemBusKey).beatBits,
      nSets = 256, // 16Kb scratchpad
      nWays = 1,
      nTLBEntries = 4,
      nMSHRs = 0,
      blockBytes = site(CacheBlockBytes),
      scratch = Some(0x80000000L))),
    icache = Some(ICacheParams(
      rowBits = site(SystemBusKey).beatBits,
      nSets = 64,
      nWays = 1,
      nTLBEntries = 4,
      blockBytes = site(CacheBlockBytes)))))
  case RocketCrossingKey => List(RocketCrossingParams(
    crossingType = SynchronousCrossing(),
    master = TileMasterPortParams()
  ))
})

Second, lets add an FPU in src/main/scala/subsystem/Configs.scala:

class With1Tiny64bitFPUCore extends Config((site, here, up) => { 
  case RocketTilesKey => up(RocketTilesKey, site) map { r =>
    r.copy(core = r.core.copy(
      fpu = r.core.fpu.map(_.copy(fLen = 64))))
  }
})

Third, we add a new configuration named TinyFPUConfig for the system in src/main/scala/system/Configs.scala:

class TinyFPUConfig extends Config(
  new WithNoMemPort ++
  new WithNMemoryChannels(0) ++
  new WithNBanks(0) ++
  new With1Tiny64bitCore ++
  new With1Tiny64bitFPUCore ++
  new BaseConfig)

Fourth, we use this configuration in src/main/scala/everywhere/e300artydevkit/Config.scala by replacing TinyConfig with TinyFPUConfig in the DefaultFreedomEConfig class.

Finally, we recompile:

make BOARD=arty_a7_100 -f Makefile.e300artydevkit clean
make BOARD=arty_a7_100 -f Makefile.e300artydevkit verilog
make BOARD=arty_a7_100 -f Makefile.e300artydevkit mcs

or just use make.sh in my repository.

Next, follow the steps described in the original tutorial to upload the mcs image builds/e300artydevkit/obj/E300ArtyDevKitFPGAChip.mcs to the FPGA.

Debug — lets debug the code running on the FPGA using gdb. I recommend using three terminals. In the first terminal, compile and upload the code:

$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu clean
$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu software
$ make BSP=metal PROGRAM=euler TARGET=freedom-e310-arty-64bit-fpu upload

Before running the upload command, I recommend changing scripts/upload by commenting the last three commands and running openocd in the foreground:

# $openocd -f $cfg &
$openocd -f $cfg# $gdb $elf — batch -ex “set remotetimeout 240” -ex “target extended-remote localhost:${GDB_PORT}” -ex “monitor reset halt” -ex “monitor flash protect 0 64 last off” -ex “load” -ex “monitor resume” -ex “monitor shutdown” -ex “quit”# kill %1

In the second terminal, open the UART (it may be /dev/ttyUSB1 or /dev/ttyUSB2, you need to check your setup):

$ miniterm /dev/ttyUSB1 57600

In the third terminal, go to software/euler in my repository. Note that there is a .gdbinit file that sets a break-point to the return statement of main and prints the hexa value of e.

set remotetimeout 300
target remote localhost:3333
load
break euler.c:45
cont
x &e
x e
quit

You need to add the path to this file in your $HOME/.gdbinit to be able to use it. So add a line like:

add-auto-load-safe-path /home/<yourhome>/git/freedom-e-sdk/software/euler/.gdbinit

Now, run gdb and you should get something like:

$ riscv64-unknown-elf-gdb debug/euler.elf 
GNU gdb (SiFive GDB 8.3.0–2019.08.0) 8.3
…
Reading symbols from debug/euler.elf…
main (argc=1, argv=0x20405cb8) at euler.c:45
45 return 0;
Loading section .init, size 0x19e lma 0x20400000
Loading section .text, size 0x598a lma 0x20400200
Loading section .rodata, size 0x618 lma 0x20405b90
Loading section .init_array, size 0x8 lma 0x204061a8
Loading section .data, size 0xb00 lma 0x204061b0
Start address 0x20400000, load size 27720
Transfer rate: 40 KB/sec, 4620 bytes/write.
Breakpoint 1 at 0x204002a8: file euler.c, line 45.
Note: automatically using hardware breakpoints for read-only addresses.Breakpoint 1, main (argc=1, argv=0x20405cb8) at euler.c:45
45 return 0;
0x80001008: 0x402df855
0x2: 0x00000000
A debugging session is active.Inferior 1 [Remote target] will be detached.Quit anyway? (y or n) [answered Y; input not from terminal]
[Inferior 1 (Remote target) detached]

In the second terminal, you should see:

Euler’s number with 20 iterations:

Unfortunately, the floating point value is not printed, I suppose there is an issue with formatting floating point in the BSP software. I will investigate this in a future post.

However, if you take the hexa value 0x402df855 from gdb and convert it (e.g. you can use this online converter) you get a correct value of 2.71828. Done!

How to add an FPU to SiFive FE310 RISC-V

Problem

Solution

Written by Dumi Loghin