DeepSPICE: Accelerating Digital Cell Characterization Using Deep Learning

Ishwar Suriyaprakash Homestead High School, Cupertino, CA

# INTRODUCTION

- $\circ$  Digital Integrated Circuits (IC or chips) implement complex arithmetic & logic functions in silicon
- o Circuits created using network of building blocks called cells (or gates) from a collection (library)
- o Cells implement smaller primitive Boolean functions such as NOR, NAND



NOR cell MAND cell Circuit for a complex function implemented with cells



- o Circuit validation includes determining properties such as power consumed and speed
- $\circ$  Circuit speed: Propagation times of events (0→1 or 1→0) from inputs to outputs
- o Method to determine circuit speed is hierarchical
	- 1. Input to output propagation delays of each cell type determined using simulations
	- 2. Worst case circuit propagation times are calculated from pre-computed cell delays
- o Cell simulations (Step 1 above) to determine delays are computationally very expensive
- **Goal:** Compute delays with cell simulation time reduced by 2X using deep learning Delay prediction error should be within 15% of average simulation delays Method should work for cells with inputs ranging from 2 to 7

# INTRODUCTION

#### Why are cell simulations expensive?

- $\circ$  Number of different cell types  $\sim$ 30K in a modern cell library
- o Number of inputs per cell can range from 1 (e.g. inverter) to 20
- Determining cell input-to-output delays involves transistor-level simulations
- Event combinations at cell's inputs need to be simulated to determine delays



Event  $0 \rightarrow 1$  called rise transition Event 1→0 called fall transition Event  $0\rightarrow 0$  called steady 0 Event 1→1 called steady 1

- o Number of simulations for k-input cell =  $2^k \times 2^k = 2^{2k} = 4^k$
- Example: Simulations for 12 input cell =  $4^{12}$  = 16.8 million
	- Assuming 1 second per simulation, this would take ~194 days!
	- In practice, large number of parallel machines are used to contain this cost

DeepSPICE approach: Learn from simulations on a small subset of input event combinations & predict the delays for the rest of the event combinations

- Recent research focused on learning delays from transistor behavior at one part of silicon wafer (on which circuits are manufactured) using full set of simulations and predicting those delays in another part of that silicon wafer
- $\circ$  Those works did not focus on reducing the number of input event combinations

DeepSPICE is the first to estimate cell delay by learning from subset of input event combinations

# INTRODUCTION

#### How are cells constructed?

- o Cells implemented using Metal Oxide Semiconductor (MOS) 3-terminal transistor switches
- o Complementary MOS (CMOS) technology uses two types of transistors: NMOS and PMOS



- o Cells are implemented using PMOS and NMOS transistor networks
- $\circ$  PMOS network implements out = 1 by establishing charge path to out from power supply
- $\circ$  NMOS network implements out = 0 by establishing discharge path to ground from out





- o Transistor-level circuit of cell, comprising of PMOS and NMOS network, is used for simulation
- o Inputs are connected to voltage sources
- o Voltage sources are setup to apply piece-wise linear (PWL) waveforms to apply events
	- Each input event combination is a different set of PWL waveforms at inputs
- o Output is connected to a load capacitor
- $\circ$  SPICE<sup>1</sup> file format used to represent the connections of transistors, inputs and output
- o Process technology file that describes transistor model parameters included for simulation
- $\circ$  Transistor-level simulator (HSPICE<sup>2</sup>, SPECTRE<sup>3</sup>, NGSPICE<sup>4</sup>) used for simulating each cell
	- § Simulator solves equations (nonlinear, Kirchhoff's) to compute voltage, current values
	- Voltage & current values are computed for nodes within cell for each time step
- o Input-to-output delay for each input event combination obtained through this simulation
- o For each input event combination, delays obtained for different input transition (rise, fall) times
- 1 Simulation Program with Integrated Circuit Emphasis

2 – Synopsys, 3 – Cadence, 4 - Public

### **METHODS**



#### **Transistor-level simulations performed to create the baseline results**

- o 14 cells that implement Boolean functions created for experiment
	- § 2 2-input, 3 3-input, 3 4-input, 3 5-input, 2 6-input and 1 7-input cells created
- o CMOS transistor network for each cell implemented in SPICE file format
- o Load capacitor of 5fF connected at each cell's output
- o Temperature of 27C used as set-point for simulation
- o Length of each transistor set at 0.18 micron, width adjusted based on cell type
- o Three different input transition times, 10ps, 20ps and 30ps, chosen for experiment
- o Simulations performed with publicly available NGSPICE transistor-level simulator
- $\circ$  Number of input event combinations, NIC = (Number of truth table rows with output 1) x

(Number of truth table rows with output 0) x 2

*- multiplied by 2 for output rise & fall*

o Total number of baseline simulations, BS = NIC x 3 *- multiplied by 3 for 3 different input transition times*

## **METHODS**



#### **Deep Learning performed to learn and predict from subset of simulation results**

- o Deep Neural Network (DNN) built with dense layers in Keras
	- § Neurons/layer & layers scaled based on number of cell inputs
	- ReLU function used as non-linearity in a neuron
- o DNN trained with data from a subset of simulation results forming the training set TR
	- Training features are initial voltage, final voltage, and transition time value at each input
	- Training output is the measured input-to-output delay for each input combination from simulation
	- § Trained model is used to predict input-to-output delays for input combinations in test set TS

#### **Metrics for measuring DeepSPICE goodness**

**Error (%)** = 100 x NRMSE = 100 x  $\frac{RMSE$  between simulated and predicted delays for TS

Mean delay of test set TS

NRMSE = Normalized root mean square error

- $\circ$  Time for baseline = Time for simulating all combinations for a cell
- $\circ$  Time for DeepSPICE = Time for simulating training subset TR + Training time + Prediction time for TS Time for baseline
- o **Acceleration Factor** = Time for DeepSPICE

### RESULTS

- $\circ$  For each cell, DeepSPICE was performed with the following options
	- § Training with (a) 25% subset of simulation results, and (b) with 30% subset of simulation results
	- For each training option above, 3 trials to select a different random training set
	- For each trial above, 3 DNN training & prediction runs with different initial DNN starting states
		- Only the mean results from the 3 DNN runs for each trial is reported in the table below
- $\circ$  Cells are ordered in the table below in non-decreasing order of number of inputs

#### **Cell properties & Baseline time Training with 25% of simulation results Training with 30% of simulation results**



Inp - Number of inputs Tran - Number of transistors BS - Total number of baseline simulations Time DS – Time for DeepSPICE

### RESULTS

Comparison of the mean results over the 3 trials for each training option is shown in the 2 graphs below



#### **Results with 25% training subset**

- o For 3 largest cells (12, 13, 14) with 6 and 7 inputs, DeepSPICE achieves goal of >2X acceleration & < 15% error
- o For 5-input cells, DeepSPICE achieves <15% error goal but achieves only under 1.7X acceleration
- o For 3 & 4-input cells, DeepSPICE, most have >20% error with limited acceleration on 5/6 cells
- o For 2-input cells, DeepSPICE takes at least 5X more time than baseline and has >25% error

#### **Results with 30% training subset**

- o For 3 largest cells DeepSPICE achieves <10% error but acceleration is below 2X goal
- o For 5-input cells, DeepSPICE achieves <10% error but achieves acceleration of >1.2X only on 2/3 cells
- o Overall, the trend is similar to 25% training but with less error and less acceleration (due to increased training)

# **DISCUSSION**

- $\circ$  DeepSPICE shows promising results especially for large cells for which simulations are expensive
- $\circ$  Results on small cells are poor this is somewhat expected
	- Small cells have few baseline simulation combinations to begin with
	- § Training with an even smaller set of simulations can lose significant information
	- § Baseline approach is preferred for small cells and is computationally feasible
- o Intuition behind DeepSPICE is validated
	- § Possible to learn about charging and discharging paths within cells from few simulations
- $\circ$  Other research on accelerating cell simulations don't aim to reduce input simulation combinations
- o Several challenges encountered that were overcome
	- § Finding a free transistor-level simulator (NGSPICE) and learning how to use a circuit simulator
	- § Creating Boolean functions for cells that can best validate DeepSPICE
		- For a given number of inputs for a cell, it took time to find Boolean functions that maximize baseline simulation combinations
		- Boolean function should also be implementable without any additional inverters at inputs
	- § Creating transistor-level circuits for cells and setting up simulations in NGSPICE
	- § Manually running the simulations and Deep Learning was difficult initially
		- So the entire flow was automated using Python to perform simulations, training and predictions

## CONCLUSIONS

- o Exciting to observe DeepSPICE performing well at predicting delays of larger cells
- $\circ$  Plan to investigate modifications to approach to reduce error further without increasing time
	- Ex: Use time series of input waveforms as training data in place of initial and final voltages & transition time
- o DeepSPICE goodness has to be evaluated for cells that include resistances and capacitances
	- § Current evaluation was on cells that contain only transistors with interconnecting wires as ideal conductors
	- § Actual cell implementations have wires having resistances and node-pairs within cells having capacitances
- Approach can be extended to different applications
	- To predict power consumption of cells
	- § To predict delays for large circuits composed of interconnection of cells
- $\circ$  Plan to explore Recurrent Neural Networks in place of DNN to determine improvement in goodness
- o Project was an exciting learning experience on multiple concepts
	- § Digital design process, Cells and CMOS networks, Transistor-level simulation
	- Deep Learning with Keras

## ACKNOWLEDGMENTS

o Many thanks to my math teacher, Mr. Greg Burroughs, for his feedback and suggestions.

#### REFERENCES

1.Batten, Christopher. ECE 5745 Complex Digital ASIC Design.

https://www.csl.cornell.edu/courses/ece5745/handouts/ece5745-T05-methodology-auto.pdf.

2.Černý, David, and Josef Dobeš. "Deep Learning Neural Network Algorithm for

Computation of Spice Transient Simulation of Nonlinear Time Dependent Circuits." Electronics, vol. 11, no. 1, 2021, p. 15., https://doi.org/10.3390/electronics11010015.

3.Lee, Chuan-Zheng. Transistors.

https://web.stanford.edu/class/archive/engr/engr40m.1178/slides/transistors.pdf.

4.NGSPICE: Circuit Simulator - Oregon State University.

https://web.engr.oregonstate.edu/~traylor/ece391/smith\_NGSPICE\_USERGUIDE\_ECE391.pdf.

5."PrimeSim HSPICE the Gold Standard for Accurate Circuit Simulation." The Gold Standard for Accurate Circuit Simulation, https://www.synopsys.com/implementation-and-signoff/amssimulation/primesim-hspice.html.

6.Rosenbaum, E., et al. "Machine Learning for Circuit Aging Simulation." University of Illinois Urbana-Champaign, Institute of Electrical and Electronics Engineers Inc., 12 Dec. 2020, https://experts.illinois.edu/en/publications/machine-learning-for-circuit-aging-simulation.

7."Spectre Simulation Platform." Cadence,

https://www.cadence.com/en\_US/home/tools/custom-ic-analog-rf-design/circuit-simulation/spectresimulation-platform.html.

8.Tan, Wei-Lii. "Machine Learning Overcomes Library Challenges at the Latest Process Nodes." Tech Design Forum Techniques, https://www.techdesignforums.com/practice/technique/machinelearning-overcomes-library-challenges-at-newer-process-nodes/.

9."What Is Library Characterization? – How It Works & Techniques." Synopsys, https://www.synopsys.com/glossary/what-is-library-characterization.html.