Jetson TK1, Jetson TX1, Jetson TX2, Jetson Nano — Which one is more efficient?
In this post, we benchmark some boards from the famous Jetson family, developed by Nvidia. In particular, we look at the performance and power usage of Jetson TK1, TX1, TX2, and Nano.
Jetson TK1 was the first in the family, being launched in 2014. At around $199, this system had a quad-core, 32-bit ARM Cortex-A15, 192 Nvidia Kepler GPU cores, and 2GB of LPDDR3 RAM. The board had only one USB 3 port, but included a SATA port. You can find its full specifications here.
Jetson TX1 was launched in 2015 and represented a significant improvement compared to TK1 because it was equipped with 64-bit CPU and Maxwell-generation GPU. At $599, this board comes with a quad-core ARM Cortex-A57 CPU, 256 Nvidia Maxwell GPU cores, and 4GB LPDDR4 RAM. The development board accommodates the TX1 module which comes with a big heat-sink and a fan. The board has only one USB 3 port, but is equipped with WiFi and Bluetooth modules, including antennas. You can read more about TX1 here.
Jetson TX2 was launched in 2017 and the module is fully compatible with the carrier board of TX1. At $699, TX2 comes with a hexa-core CPU with two Nvidia Denver and four ARM Cortex-A57 cores, 256 Nvidia Pascal GPU cores, and 8GB LPDDR4 RAM. This page has more details about TX2.
Jetson Nano, launched in 2019, is a more affordable development kit, compared to TX1/TX2. For $99, it offers a quad-core ARM Cortex-A57 CPU clocked at 1.43GHz, 128 Nvidia Maxwell GPU cores, and 4GB of LPDDR4 RAM. The board has four USB 3, one HDMI, one display, one Ethernet, one barrel jack, and one micro USB-B port. Its expansion header is compatible with the one on Raspberry Pi. On the negative side, the board does not come with any WiFi or Bluetooth modules. You can find more about Nano here.
Table 1 summarizes the specifications of these four Jetson systems.
I start the presentation of my benchmarks with the idle power of the boards, when only the OS is running on them. I use a power meter to get the AC power of the entire board, including the power converter (from 240V to 12V for TK1, to 19V for TX1 and TX2, and to 5V for Nano). I measure this AC power for practical reasons: it represents the electricity that you pay at the end of the month.
Fig. 1 shows the idle power of the boards. Nano has a very low idle power (only 1.5 Watts). Interestingly, TX2 has 40% lower idle power compared to TX1, even if they use the same carrier board and power supply. This shows a big improvement in the design and manufacturing of the TX2 module, even if it has more CPU cores and more memory.
Next, we will take a look at the CPU, GPU, and memory sub-systems.
CPU
I am using CoreMark to get the performance of the CPU in terms of iterations per second. This benchmark is used to estimate the performance of embedded systems and it is supposed to replace the well-known Dhrystone benchmark.
First, I present the performance of the CPU when CoreMark runs on a single core, in Fig. 2. Along with the performance, I plot the peak power of the entire board. In terms of performance, the ascending order is Nano, TK1, TX1, and TX2. In terms of power, the ascending order is Nano, TK1, TX2, and TX1. We observe that TX2 delivers more performance for less power compared to TX1. One can observe that Nano delivers less that TK1, even if its CPU is 64-bit and of newer generation. However, the maximum clock frequency of Nano is 1.43GHz compare to 2.32GHz for TK1. For a 62% increase in frequency, TK1 delivers only 8.6% more performance while using 67% more power. Actually, Jetson Nano is the most efficient out of the four boards at a single core level, as shown in Fig. 4. The performance-to-power ratio (PPR) is a good estimation of the efficiency of a computer system.
Next, we look at the performance off all cores, when CoreMark runs an instance on each core. It is expected for TX2 to deliver more than the other boards since it has six cores. However, it delivers 96% (almost double) more performance than TX1, with only 50% (two more) cores. And all this while using less power than TX1, as shown in Fig. 3. Clearly, TX2 is the champion in terms of efficiency when all cores are used. Nano comes second, as shown in Fig. 4.
In summary, TX1 is not a very successful design from the performance and power points of view. TX2 is a very powerful board, but it is expensive. Nano is a good choice, being cheap and efficient.
GPU
The main distinctive feature of the Jetson boards is their Nvidia GPU. This GPU allows them to run CUDA code, including machine learning and deep learning applications.
To get the performance of the GPU in terms of GFLOPS (billion of Floating Point Operations per Second), I use the SHOC benchmark, in particular its level0/MaxFlops.cu program. Then I parse the results to get the absolute maximum GFLOPS value among different operations, and plot the result in Fig. 5. I observed that the maximum performance is obtained by the multiply-add kernel operating on simple precision (32-bit) floats.
Again, the champion in terms of both maximum performance and efficiency is Jetson TX2. TX1 comes second, but its power usage is the highest at almost 12W. Jetson Nano is comparable to TX1 in terms of efficiency, and with TK1 in terms of raw performance, even if is has only 128 cores.
For a rough comparison, 267 million TX2 boards could reach the performance of the top supercomputer in the world, as of Nov 2019. These boards would use 2,536,500 kW compared to the 10,096 kW of the top supercomputer (Summit from Oak Ridge National Laboratory). At this scale, it seems that Jetson is not very efficient.
Memory
The memory of these Jetson boards is shared between the CPU and the GPU. Here, I use lmbench to get the bandwidth of the memory from the CPU’s perspective. Fig. 7 clearly shows the three levels of the memory hierarchy, namely L1 cache, L2 cache, and main memory.
At L1 cache, TK1 is the best, mainly because it runs at the highest clock frequency (2.32GHz). Nano is the last, but it also has the lowest clock frequency (1.43GHz).
However, at the main memory level, Nano achieves 3.2 GB/s, which is close to the 3.9 GB/s of TX2.
Summary
These performance results show that TX2 is the most powerful board in the family. However, it is also the most expensive. TX1 does not seem to be a very successful design. Even if it delivers more performance that the first member of the family (TK1), TX1 uses more power and is quite inefficient, at least at the CPU level. Jetson Nano is not impressive in terms of performance, but is the most efficient if we also consider power, and cost.
In a future story, I will benchmark these boards with some AI applications.