1.3 Paper outline
This paper will highlight some of the choices and design features of the SHAKTI processor program. Section-2 will showcase the reasoning behind choosing RISC-V as the ISA for SHAKTI. Some of the advantages and benefits of using BSV as the HLS language are mentioned in Section-3. On the hard- ware aspects of the project, Section-4 provides a brief overview of the c-class micro-architecture and discusses some of its features. Sections-5 and 6 provide a walk-through of some of the verification issues and meta-platforms that assist significantly in processor development.
2.Choosing an Instruction set Architecture
The Instruction Set Architecture (ISA), plays the most important role in defining the processor HW and its associated SW stack. As seen Figure-2, the ISA forms the interface between the hardware and software. For the hardware community, it defines important metrics like power, performance and area. A bad ISA can easily lead to huge area overheads, thereby increasing the net power consumption of the chip. For example, the various addressing modes supported by the ISA can easily define the number of read and write ports required on a registerfile. An instruction requiring multiple reads and multiple updates from/to the registerfile will incur a huge overhead.
Fig. 2. ISA forms the most critical interface between hardware and software.
For the software community, the ISA defines the basis for a variety of infrastructure like compilers, operating systems, applications, common libraries, etc. It is this which dictates the flexibility and ease of porting an application or OS onto a platform. A complex and fragmented ISA increases the burden of the software programmer to maintain compatibility across multiple platforms.
Even though it is quite evident that ISA forms the spine of any processor ecosystem (HW and SW both), the industry today poses several viable ISAs, all of which are proprietary. This greatly limits the ability for open processor eco-systems to evolve. Not only is the licensing fee for these ISAs exorbitant but some companies restrict the licensee from making any modifications or optimizations, and they strictly restrict publication of the internal findings. By having a free and open ISA such as SPARCv8, OpenRISCV or RISC-V one does not only circumvent the licensing issues but also gains from benefits similar to those mentioned in Section-1.1.
Today there do exist free and open RISC ISAs: SPARC v8, OpenRISC and RISC-V. Before choosing any of these ISAs for SHAKTI processor development, we need to evaluate which of these have the capabilities to scale to the next generation compute requirements. It is likely that the following three platforms will dominate the future workloads: IoT, Personal mobile computing and Warehouse-scale computers. This suggests three key requirements that need to be supported by an ISA:
—Base Plus Extension ISA: The industry today is moving towards domain specific computing where custom application specific accelerators are used. To match these needs, the ISA should have a small core instruction set on which compilers and OSes can depend on. The ISA should also specify optional but common ISA extensions to serve standard compute workloads. And finally, the ISA should have space to add custom opcodes to invoke application specific accelerators.
—Code Density: For IoT based applications the ISA should be able to generate programs which have a high code density.
—Single, Double and Quad Precision support: Warehouse-scale applications, even today, process large data sets which utilize QP software libraries. ISAs which can enable these as special opcodes and enable easy hardware implementations will be more attractive.
As evident from Table-II, the RISC-V ISA is the only free and open ISA which meets all the above requirements. It has also avoided some of the mistakes of previous ISAs (like delayed branches, module ISA extensions, etc). This makes RISC-V really simple and easy to adopt. RISC-V also stands out superior as compared to some of the commercial competitors such as MIPS and ARM ISAs. Following provides a quick summary of points which gives RISC-V an upper-hand:
—MIPS: Over 30 years, it has evolved into a much larger ISA, now with about 400 instructions.The ISA is overoptimized for a specific micro-architectural pattern, the five-stage,single-issue, in-order pipeline. RISC-V on the other hand does not mandate any micro-architectural features, thereby enabling varied micro-archiectures to exist ranging from simple in-order cores to out-of-order multi- cores.
—Oracle’s SPARC: To accelerate function calls, SPARC employs a large, windowed register file the operating system must be routinely invoked to handle the window overflows and underflows. The register windows come at a significant area and power cost for all implementations. It was designed to be implemented in a single-issue, in-order, five-stage pipeline, and the ISA reflects this assumption.
—ARMv7: This is a popular 32-bit RISC-inspired ISA,Between ARM and Thumb, there are over 600 instructions in the integer ISA alone. NEON, the integer SIMD and floating-point extension, adds hundreds more. Even if it had been legally feasible for us to implement ARMv7, it would have been quite challenging technically.,There was no support for 64-bit addresses, and the ISA lacked hardware support for the IEEE 754-2008 standard. RISC-V on the other hand comprises merely of 40 BASE instructions which any compiler, os and hardware needs to be support.
—ARMv8: With 64-bit addresses and an expanded integer register set. The new architecture removed several features of ARMv7 that complicated implementations. For example, the program counter is no longer part of the integer register set; instructions are no longer predicated; the load-multiple and
Table II. Comparison of Free and Open RISC based ISAs
store-multiple instructions were removed; and the instruction encoding was regularized. Overall, the ISA is complex and unwieldy. There are 1070 instructions comprising 53 formats and and eight data addressing modes, all of which takes 5,778 pages to document. Finally, like its predecessor, ARMv8 is a closed standard. It cannot be sub-settled, making implementations far too bulky to serve as embedded processors or as control units for custom accelerators.
—Intel’s 8086 architecture: Outside of the domain of embedded systems, virtually all popular software has been ported to, or was developed for, the x86.
In addition to the features listed in Table-II, the RISC-V ISA also clearly separates the user-mode ISA from the privileged mode ISA, allowing full virtualization and enabling experimentation in the privileged ISA while maintaining user application binary interface (ABI) compatibility. Some more features which make RISC-V an atttractive ISA for the future are:
(1) RISC-V separate the ISA into a small base ISA and optional extensions. The base ISA is lean enough to be suitable for educational purposes and for many embedded processors, including the control units of custom accelerators.
(2) Support both 32-bit and 64-bit address spaces, as 32-bit will continue to be popular in small systems for centuries while the latter is desirable even for modest personal computers.
(3) Facilitate custom ISA extensions, including tightly coupled functional units and loosely coupled accelerators
(4) Support variable-length instruction set extensions, both for improved code density and for expanding the space of possible custom ISA extensions
(5) Orthogonalize the user ISA and privileged architecture, allowing full virtualizability and enabling experimentation in the privileged ISA while maintaining user application binary interface (ABI) compatibility.
The RISC-V ISA is now maintained officially by the RISC-V foundation [RISC-V 2015] which is supported by more than 100+ members. The arguments and facts presented in this section make it even more clear that not only should all computing devices adopt a free and open ISA but also adopt RISC-V for its potential to scale for future workloads. The SHAKTI program has thus adopted RISC-V as its standard base ISA.
3. Choosing a Hardware Description Language
Another critical aspect of developing processors is to choose the right description language to design them. As mentioned in the earlier sections, one of the primary reasons of building open-source processor eco-systems is the reusability of the code-base and the IP. This requires the micro-architecture description to be easily understandable, quick to modify and fast to prototype. Additionally, these description languages also need to be open-source and cannot be proprietary for similar reasons mentioned in section-2 (as shown by the need for reverse-engineering of FPGAs: it’s no good releasing the HDL source for a design if the toolchain for compiling it to an FPGA target costs more).
While the industry has standardized on using Verilog and VHDL for all production grade chips over the past three decades, these languages suffer from several limitations making them a bad choice to develop next-gen processors. Both of these languages were intended to enable simulation of digital circuits. It was later when people started adopting them for logic synthesis as well. However, the fact that each of these languages hold constructs which cannot be synthesized makes them difficult to use for designing complex processors. Another major barrier while using these languages is the low-level of abstraction that is offered. Defining complex designs using Verilog will require mapping the abstracted spec definition to be correctly mapped to low-level gate structures. This leads to large number of human-errors in the design. Modern software engineers familiar with Object Oriented Programming Languages such as Python, C++, Rust, etc. look at this situation with a significant degree of bewilderment and disbelief, and older software engineers will easily recognize both Verilog and VHDL as containing stagnated constructs and design methodologies from software languages developed as far back as the 1980s. To overcome some of the major limitations of Verilog and VHDL, the industry has seen a growth in High-Level-Synthesis languages: Bluespec System Verilog (BSV) [Nikhil 2004], Chisel [Bachrach et al. 2012], Clash [Baaij et al. 2010], SystemC [sys 2012], SystemVerilog [sys 2018] etc. Languages like BSV and Chisel are open-source and anyone is at liberty to build their own compilers and libraries to support the language constructs. The SHAKTI ecosystem uses Bluespec System Verilog for its development owing to the following reasons:
—Architecturally Transparent: BSV is architecturally transparent and the designer has complete control over defining explicitly the architecture of the design. The powerful types and static elaboration enable the user to express architecture elegantly and succinctly. Since static elaboration is deterministic, the HSL does not create surprises for the user.
—Superior Behavioral Semantics: BSV supports atomic rules and parameterized interface definitions. The atomic nature of rules allow for a higher-abstracted definition of the concurrency nature of the architecture. The interfaces in BSV allow you bundle different combinations of ports (like input, output, etc) as sub-interfaces and methods, thus avoiding the rats-nest-like code found in Verilog and VHDL.
—Strong Parameterization: BSV has a strong parameterization (templates, similar to c++) feature which improves code size, code-structure, code-reuse and correctness. This feature allows a user to define parameterized modules, interfaces and functions to generate interfaces and modules, nested rules, etc. This provides the user with higher expressive power.
—Guaranteed Synthesis: The constructs and structure defined by BSV are completely synthesizable. This enables the user to quickly prototype their designs on FPGAs from day one without facing any synthesis issues (like loops, latches etc.). BSV also enables generating synthesizable test-benches which can ported to FPGAs as well.
Today, Bluespec Inc. provides a BSV compiler which can generate both synthesizable Verilog and a cycle-accurate C model of the BSV design. The C model is known to be nearly 8-10x faster in simulation when compared to state-of-the-art verilog simulators in the market. This drastically speeds up verification process of designs, thereby leading to quick turn around time and reduced time to market. All of the above mentioned points makes BSV a sensible risk-reducing choice on which RISC-V based processors can be designed.