Raspberry Pi 4 or Jetson Nano — Which One Is Better?

Dumi Loghin
5 min readNov 16, 2020

Disclosure: Some of the links below are affiliate links.

Raspberry Pi 4 Model B (4GB RAM) (RP4) is the 4th edition of the famous Raspberry Pi embedded system. Raspberry Pi systems are widely used by hobbyists and makers due to low cost, low power and good software support. Compared to the previous edition, the 4th edition brings more RAM and 4 ARM Cortex-A72 CPU cores. The price of this system is $56 on Amazon.

Jetson Nano is an embedded system from Nvidia which integrates 4 ARM Cortex-A57 CPU cores, 128 Nvidia Maxwell GPU cores and 4GB LPDDR4 in the same chip. Jetson Nano is designed to be compatible with Raspberry Pi. For example, the J41 header of Nano has the same pinout as the 40-pin header of Raspberry Pi. The price of Nano (4GB) is $99 on Amazon.

There is a 2GB Jetson Nano version at $59, but in this article we evaluate the 4GB Jetson Nano.

                      Table 1. Specs
+------------------+------------------+-----------------+
| | Jetson Nano | Raspberry Pi 4 |
+------------------+------------------+-----------------+
| CPU | ARM Cortex-A57 | ARM Cortes-A72 |
| Type | 64bit | 64bit |
| Cores | 4 | 4 |
| Frequency | 102MHz - 1.43GHz | 600MHz - 1.5GHz |
| L1 cache | 32kB | 32kB |
| L2 cache | 2MB | 1MB |
| Memory | 4GB | 4GB |
| Memory Type | LPDDR4 | LPDDR4 |
| GPU Architecture | Maxwell | VideoCore VI 3D |
| GPU cores | 128 | - |
+------------------+------------------+-----------------+

Let’s evaluate the performance of these two systems at sub-system level, starting with the CPU.

CPU

To evaluate the performance at the CPU-level, we use CoreMark benchmark. We run it on one, two, three and all four cores, and we measure the power used by the entire board during each run. As expected, the CPU of RP4 has better performance, but the difference is very small, as shown in Figure 1. However, on one and two cores, RP4 uses more power to achieve its slightly higher performance. With three and four cores, Nano uses a bit more power.

Figure 1. CPU performance and power

In summary, RP4 and Nano exhibit similar performance and power results at CPU-level.

Memory

To evaluate the performance of the memory sub-system, we use lmbech to measure the read-write bandwidth.

Figure 2. Memory bandwidth

As shown in Figure 2, Nano has better bandwidth at both cache and main memory levels. For example, the main memory bandwidth is 3.2 GB/s on Nano compared to 1.4 GB/s on RP4. We also observe Nano’s advantage of having an extra MB of L2 cache. Hence, Nano is the winner at memory-level.

Networking

At networking-level, both systems have Gigabit Ethernet but Nano does not have WiFi and Bluetooth. Even if you can add your own WiFi module through USB or M.2 Key E, this incurs extra cost and time to set up. Thus, I give Nano a big negative point for the lack of WiFi and Bluetooth.

Storage

Both systems have micro SD card slots. Alternatively, you can plug larger disks or SSDs through the USB3 interface.

Operating System

The official OS of RP4 is Raspberry Pi OS (previously known as Raspbian), which is a Debian-based Linux distribution. Even if the CPU is 64bit, the official OS is 32bit (armv7l):

$ uname -a
Linux raspberrypi 5.4.51-v7l+ #1333 SMP Mon Aug 10 16:51:40 BST 2020 armv7l GNU/Linux

On the other hand, Nano comes with 64bit Ubuntu 18.04.4 LTS (aarch64 architecture):

$ uname -a
Linux dumi-jetnano 4.9.140-tegra #1 SMP PREEMPT Mon Dec 9 22:47:42 PST 2019 aarch64 aarch64 aarch64 GNU/Linux

Machine Learning

In this article, I will evaluate the GPU running Machine Learning (ML) tasks. But first, let us measure the performance of the two systems while running ML inference on the CPU, as a baseline. I am using Tensorflow Lite and its pre-built benchmarking tool with COCO SSD MobileNet v1 model.

# On RP4
$ ./linux_arm_benchmark_model --graph=ssd_mobilenet_v1_1_metadata_1.tflite --num_threads=4
# On Nano
$ ./linux_aarch64_benchmark_model --graph=ssd_mobilenet_v1_1_metadata_1.tflite --num_threads=4

Surprisingly, the inference runs faster on Nano’s CPU (40.461 ms, on average) compared to RP4’s CPU (55.873 ms, on average). Since the CPU performance is similar, the difference may be due to memory accesses which take longer on RP4 and/or different OS and executables. Since the software stack on RP4 is 32bit, operations with 64bit integers need additional instructions. I will talk more about this in another article.

To investigate this issue further, I use perf (Linux tool) to get runtime metrics such as instructions, cycles, and memory accesses. The results show that Nano runs ~17 billion instructions using ~13 billion cycles, while RP4 runs ~22 billion instructions using ~16 billion cycles. Moreover, RP4 makes twice as much memory accesses (~7 billion) compared to Nano (~3.4 billion), as shown below. Yet another performance issues are (1) the CPU utilization: 3.5 on RP4 vs. 3.7 on Nano (out of 4 — the number of cores)., and (2) the actual runtime frequency: 1.41 GHz on RP4 vs. 1.48 GHz on Nano. All these combined lead to lower time performance on RP4 compared to Nano. (This is because T = C/(f * U) — where T is the execution time, C is the number of cycles, f is the frequency and U is the CPU utilization.)

# On RP4
$ perf stat -e armv7_cortex_a15/mem_access/ ./linux_arm_benchmark_model --graph=ssd_mobilenet_v1_1_metadata_1.tflite --num_threads=4
...
7,062,668,270 armv7_cortex_a15/mem_access/u
3.385302077 seconds time elapsed# On Nano
$ perf stat -e armv8_pmuv3/mem_access/ ./linux_aarch64_benchmark_model --graph=ssd_mobilenet_v1_1_metadata_1.tflite --num_threads=4
...
3,430,775,018 armv8_pmuv3/mem_access/
2.551278354 seconds time elapsed

GPU

The biggest advantage of Nano is represented by its embedded Nvidia GPU. In my previous article, I evaluated the performance of the GPU in different Jetson systems using SHOC benchmark. We saw that Nano’s GPU can achieve 230 GFLOPS while using close to 7W of power.

Using the same class of model (SSD Mobilenet v1 — but with different implementation compared to Tensorflow Lite), we evaluate how fast it runs on the GPU. With this model, the inference takes 82.7 ms per image, on average, including CPU and GPU times. For this measurement, I used Nvidia’s tutorial for Jetson systems. I run DetectNet with SSD Mobilenet v1 model over the 85 images that are placed by default in the bin folder:

...jetson-inference/build/aarch64/bin$ for FILE in `ls images | tr -s ' '`; do ./detectnet-console --network=ssd-mobilenet-v1 images/$FILE output.jpg >> log.txt; done

Summary

RP4 is cheaper than Nano, comes with WiFi and Bluetooth by default, and has an established community and many tutorials that can help users jump start. On the negative side, RP4 has lower memory performance compared to Nano and its official OS is still 32bit even if the hardware is 64bit. On the other hand, Nano has a powerful GPU that can be used to accelerate ML tasks and there are many tutorials and tools that can help users to quickly start with ML on Jetson systems.

In the end, it really depends on the project. For projects that do not need to run fancy ML models, RP4 is powerful enough to do the job (and cheaper). But if you need higher performance, Nano is a better choice even though it costs twice as much.

--

--

Dumi Loghin

I am a Research Fellow in Computer Science with experience in parallel and distributed systems, blockchain, and performance evaluation.