How to Increase the Size of the Data Memory on SiFive FE310 RISC-V

Dumi Loghin
3 min readMar 18, 2020

--

(Throughout this story, I am using my freedom repository forked from SiFive freedom repository. You can read my previous story to get started with SiFive FE310 with hardfloat FPU on Arty A7–100T FPGA.)

In this short story, I will show you how to increase the size of the data memory on a SiFive Freedom FE310 RISC-V system running on an Arty A7–100T FPGA.

Hardware

By default, the data memory size is 0x4000 (or 16 kB). You can see this when you make an image, for example:

./make.sh fp32
...
80000000–80004000 ARWX dtim@80000000
...

This may not be enough when you want to run programs that have big static or dynamic data arrays.

The source of this small memory is in the subsystem/Configs.scala file in the Rocket Chip repository, where TinyConfig (or TinyFPUConfig) is defined as:

class TinyConfig extends Config(
new WithNoMemPort ++
new WithNMemoryChannels(0) ++
new WithNBanks(0) ++
new With1TinyCore ++
new BaseConfig)
class TinyFPUConfig extends Config(
new WithNoMemPort ++
new WithNMemoryChannels(0) ++
new WithNBanks(0) ++
new With1Tiny64bitCore ++
new With1Tiny64bitFPUCore ++
new BaseConfig)

The system has no memory channels and banks (that is why I refer to it as “data memory” and not RAM since it is a cache). Instead, it has 16 kB of scratchpad memory, as defined in subsystem/Configs.scala:

class With1Tiny64bitCore extends Config((site, here, up) => {
case XLen => 64
case RocketTilesKey => List(RocketTileParams(
...
dcache = Some(DCacheParams(
rowBits = site(SystemBusKey).beatBits,
nSets = 256, // 16Kb scratchpad
nWays = 1,
nTLBEntries = 4,
nMSHRs = 0,
blockBytes = site(CacheBlockBytes),
scratch = Some(0x80000000L))),
icache = Some(ICacheParams(
...

(Note that there are 16 kB (kilobytes), not 16 Kb (kilobits) as written in the code comment). This memory is mapped to the RAMB18 blocks of the FPGA which consist of 18Kb of memory. Arty A7–100T has 4,860 Kb (607.5 kB) of block RAM, meaning there are 270 RAMB18 blocks. Given this physical limit, we can only increase the data memory to 512 kB. This is equivalent to using 8192 sets since the number of sets has to be a power of 2. Lets define a new class With1Tiny64bitCoreXMem:

class With1Tiny64bitCoreXMem extends Config((site, here, up) => {
case XLen => 64
case RocketTilesKey => List(RocketTileParams(
...
dcache = Some(DCacheParams(
rowBits = site(SystemBusKey).beatBits,
nSets = 8192, // 512kB scratchpad
nWays = 1,
nTLBEntries = 4,
nMSHRs = 0,
blockBytes = site(CacheBlockBytes),
scratch = Some(0x80000000L))),
icache = Some(ICacheParams(
...

I modified the build script (make.sh) to build the new systems as:

./make.sh fp32xmem
...
...
80000000–80080000 ARWX dtim@80000000
...

Software

Now lets test our modification using a test program. We are going to use a matrix multiplication code from my freedom-e-sdk repository. The code is in software/mm/mm.c. Make sure you define large enough matrices (#define N 64) and comment the line #define WITH_POSIT_32:

#define PFDEBUG
// #define WITH_POSIT_32
#define N 64

The code uses three square matrices of size N x N filled with floats (32-bit or 4 B). When N is 64, the matrices use 49,152 B which is more than the initial data memory size of 16 kB. If you try to compile this code, you get an error:

make PROGRAM=mm TARGET=freedom-e310-arty-64bit-fpu CONFIGURATION=debug upload
...
region `ram’ overflowed by 39004 bytes
...

The problem is in the metal.default.lds file of the bsp (bsp/freedom-e310-arty-64bit-fpu/metal.default.lds), where the RAM has only 16 kB:

MEMORY
{
flash (rxai!w) : ORIGIN = 0x20400000, LENGTH = 0x1fc00000
itim (wx!rai) : ORIGIN = 0x8000000, LENGTH = 0x4000
ram (wxa!ri) : ORIGIN = 0x80000000, LENGTH = 0x4000
}

Lets create a new bsp configuration by copying the folder freedom-e310-arty-64bit-fpu into a new folder freedom-e310-arty-64bit-fpu-xmem. Then, lets modify metal.default.lds in this new folder such that:

MEMORY
{
flash (rxai!w) : ORIGIN = 0x20400000, LENGTH = 0x1fc00000
itim (wx!rai) : ORIGIN = 0x8000000, LENGTH = 0x4000
ram (wxa!ri) : ORIGIN = 0x80000000, LENGTH = 0x80000
}

Now we are ready to compile and upload the program:

make PROGRAM=mm TARGET=freedom-e310-arty-64bit-fpu-xmem CONFIGURATION=debug upload
...
text data bss dec hex filename
26032 2840 172364 201236 31214 mm.elf

Enjoy!

--

--

Dumi Loghin

I am a Research Fellow in Computer Science with experience in parallel and distributed systems, blockchain, and performance evaluation.