**Nuclei RISC-V Processor** 900 Series Product Brief ### **Overall Introduction** Nuclei 900 Series is a high-performance processor series based on RISC-V architecture that is compatible with RV32IMACFDBPKV/Zcxlcz or RV64IMACFDBPKV/Zc. 900 Series features a dual-issue 9-stage in-order execution pipeline processor. 900 Series has 4 different classes, of which N900 Series is 32b embedded processor; U900 Series is 32b application processor; NX600 Series is 64b embedded processor and UX900 Series is 64b application processor. As a high-performance processor series, **N/U900** gives 2.84/6.05(legal/best) Dhrystone/MHz, 5.5 Coremark/MHz; **NX and UX900** gives 3.09/7.7(legal/best) Dhrystone/MHz, 5.5 Coremark/MHz<sub>o</sub> 900 Series supports both instruction and data local memory (ILM/DLM) gives better real time processing capability. User can also configure instruction and data cache (I-Cache/D-Cache) to improve the performance of the overall subsystem. 900 Series supports SMP up to 16 core per cluster. Cluster cache with cache coherence is supported under SMP mode. 900 Series supports various RISC-V extensions, including vector, single/double precision floating point, DSP, NICE(Nuclei Instruction Co-unit Extension), and TEE(Trusted Execution Environment), etc., giving customer rich configuration features. 900 Series is under mass production of broad applications. RV32IMACFDBPKV/Zcxlcz RV64IMACFDBPKV/Zc **Dual-Issue** Interrupt Extension AXI System Bus RISC-V 4-Wire JTAG Standard Debug 2-Wire cJTAG # 900 Series Common Configs and Applications - 32b architecture, 9-stage pipeline - Dual issue - RV32IMACFDBPKV/Zcxlcz - On chip instruction and data local memory - SMP up to 16 cores **MCU** **Industrial** **Automotive** - 32b architecture, 9-stage pipeline - Dual issue - RV32IMACFDBPKV/Zcxlcz - 16-64KB instruction and data cache - Support MMU, able to run Linux - SMP up to 16 cores 32b Linux **Security** - 64b architecture, 9-stage pipeline - Dual issue - RV64IMACFDBPKV/Zc - On chip instruction and data local memory - 16-64KB instruction and data cache - SMP up to 16 cores ΑI AR/VR **Storage** - 64b architecture, dual-issue 9stage pipeline - RV64IMACFDBPKV/Zc - On chip instruction and data local memory - 16-64KB instruction and data cache - Support MMU, able to run Linux - SMP up to 16 cores - Cluster Cache configurable, support cache coherence 64b Linux **Application Processor** ADAS Robotics ## 900 Single Core Feature - RV32IMACFDBPKV/Zcxlcz or RV64IMACFDBPKV/Zc compatible; - 9-stage in order pipeline, dual-issue for high performance embedded and Linux application; - Support 64b AXI system interface, 32b AHB-Lite private peripheral interface and ILM/DLM interface; - Double/Single Precision floating point and DSP Extension; - Support VPU(128/256/512/1024); - Configurable ILM (Instruction Local Memory) & DLM(Data Local Memory) with ECC; - Configurable I-Cache with Scratchpad mode & D-Cache with ECC; - Configurable MMU(SV32/SV39/SV48); - Configurable PMP and TEE (Trust Execution Environment) for system security; - Support standard JTAG & cJTAG interface and Linux/Windows debug tools; - Support standard RISC-V GNU toolchain and Linux/Windows dev environment (IDE) ## **Different 900 Series Class** | | Address bit | MMU | Multi-core | |-------|-------------|-----|------------| | N900 | 32 | No | Yes | | U900 | 32 | Yes | Yes | | NX900 | 64 | No | Yes | | UX900 | 64 | Yes | Yes | #### 900 Series SMP Feature - Support configurable dual-mode feature (Application Mode and Real-time Mode); - Single cluster supports up to 16 cores with sync or async clock configuration and MOESI coherence protocol; - System interface - Support 64/128/256/512-b AXI Cluster Memory Ports - Support 32-b AHB-Lite Cluster Peripheral Ports - Number of IOCP (I/O Coherent Ports) is configurable; - Support Hardware Data Prefetch; - Cluster Cache - Size configurable - 64-Byte Cache Line Size - Configurable Tag RAM and Data RAM - Support mostly exclusive mechanism with core cache - Support up to 16-way structure - Support dynamically configurable Cluster Local Memory mode #### 900 Series Dual-Mode 900 SMP supports Application Mode + Real-time Mode. Cluster Cache can be defined as cluster local memory (CLM) or cluster level cache. - Application mode with MMU - Cluster cache can be used as system level instruction and data cache - Private Timer - System level interrupt handling, support PLIC(Platform Level Interrupt Controller) - Real-time mode - Cluster cache can be used as system level local memory - Private Timer - Fast interrupt handling, support ECLIC ## **Vector Computing Unit (VPU)** - High Frequency that matches main CPU core; - 3 instruction queues, support 8/16/32/64 integer, 8/16/31/64 fixed point, 16/32/64 floating point and BF16 #### **900 Series DSP Extension** - Support Packed-SIMD DSP features that follow RISC-V "P" Extension; - Can be configured with Nuclei custom DSP instruction: change any particular Byte to XLEN GPR; - Support 3 extra extensions: N1, N2 and N3, increasing SIMD parallel computing performance by 1x; - Support DSP Library NMSIS, which is compatible with ARM CMSIS, helping customer to process complicated DSP computation; - Detailed definition and supported instruction can be referred here: <u>Nuclei® RISC-V Packed-SIMD DSP QuickStart</u> #### **L2** Pre-fetch Feature - L2 pre-fetch increases L2 cache system performance - Pre-fetch engine supports streaming, fixed stride, complicated pattern and next line pre-fetch - Much better performance under SPECint2k6 benchmark Y-axis: enhancement factor; X-axis: average and max enhancement of each NMSIS DSP Library under P-extension Y-axis: enhancement factor; X-axis: average and max enhancement of each NMSIS DSP Library under P-extension # 900 Series Memory Subsystem 900 Series supports local instruction and data memory: ILM (Instruction Local Memory) 和DLM (Data Local Memory), providing real-time processing capability: - ILM and DLM can be configured from 128B-2GB, allowing excellent flexibility; - AHB-Lite interface and SRAM interface with customized address space. #### 900 Series supports Instruction Cache - 2-way, 64B cache line structure - Cache size from **8KB-64KB** - Support cache line LOCK and INVAL operation ### 900 Series supports Data Cache - 2-way, 64B cache line structure - Cache size from **8KB-64KB** - Support cache line LOCK and INVAL operation ## 900 Series supports Cluster Cache - 16-way, 64B cache line structure - Cache size from 128KB-4MB - Snoop activity of each core can be managed by software - Cluster cache can be configured as cluster local memory for better real-time capability # **900 Series System Interface Introduction** | Bus Interface | Description | Atomic<br>Support | Burst<br>Support | Cacheablility | Protocols | Bus Width | |-----------------|------------------------------------------|-------------------|------------------|---------------|-----------|--------------------| | System Bus | System Instruction and Data | Yes | Yes | Configurable | AXI4 | 64/128/256/512 bit | | ILM Interface | Local Instruction | No | No | No | SRAM | 64 bit | | DLM Interface | Local Data | No | No | No | SRAM | 2*32 bit | | PPI Interface | Private Peripherals | No | No | No | AHB-Lite | 32 bit | | Slave Interface | External Master Read | No | Yes | No | AXI4 | 64 bit | | IOCP Interface | Cache Coherence with<br>External Masters | No | Yes | No | AXI4 | 64/128 bit | # **Nuclei CPU Subsystem** Using internal tools from Nuclei to integrate CPU IPs with other peripheral IPs, verify and deliver a full subsystem solution to customer. - Save money: Full subsystem IP reduces customer's cost; - Save time: Fully customized SoC subsystem saves customer's development cycle; - Save effort: Related SoC driver and SDK help fast prototype bring up. # **Innovative Subsystem IP Use Case** #### Use Case #1 Single-core: Customer succeeded to bring up in 2 weeks based on delivered IP package & SDK ### Use Case #2 Multi-core: Supported two modes (real-time & application), including IDU, bus matrix, etc. # **Nuclei IDE** Eclipse CDT Based development environment, easy hands on with manual. - Nuclei RISC-V GCC, OpenOCD and QEMU integrated - Nuclei Package(NPK) software solution - Support SoC Subsystem SDK one-click import - Portable executables, without installation - One-click project template - One-click project configuration - In system debugging and programming - Integrated serial port tool - Real time register display - Support Linux and Windows - Deeply integrated with RV Prof professional performance profiling and optimizing tool, instruction and cycle level accurate - Embedded with RISC-V e-trace, debug and analyze performance with ATB2AXI module and trace decoder # 900 Series Has Been Deployed to Various Applications