Openram Manual
User Manual: Pdf
Open the PDF directly: View PDF
.
Page Count: 49
| Download | |
| Open PDF In Browser | View PDF |
OpenRAM Manual Matthew R. Guthaus - mrg@ucsc.edu James Stine - james.stine@okstate.edu and many others January 25, 2018 1 1 License Copyright 2018 Regents of the University of California and The Board of Regents for the Oklahoma Agricultural and Mechanical College (acting for and on behalf of Oklahoma State University) Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 2 Contents 3 2 Introduction The OpenRAM project aims to provide a free, open-source memory compiler development framework for Random-Access Memories (RAMs). Most academic Integrated Circuit (IC) design methodologies are inhibited by the availability of memories. Many standard-cell process design kits (PDKs) are available from foundries and vendors, but these PDKs do not come with memory arrays or compilers. Some PDKs have options to request “black box” memory models, but these are not modifiable, have limited available configurations, and do not have full details available to academics. These restrictions make comparison and experimentation with real memory systems impossible. OpenRAM, however, is user-modifiable and portable through technology libraries to enable experimentation with real-world memories at a variety of performance points and costs. The specific features of OpenRAM are: • Memory Array Generation Currently, OpenRAM supports simple 1 read/write port synchronous memories, but it will be extended to multi-port memories, register files, and asynchronous memories in the future. The generation includes features such as automatic word-line driver sizing, efficient decoder sizing, multiple-word column support, and self-timing with replica bitlines. • Portability and Extensibility OpenRAM is a Python program. Python enables portability to numerous platforms and enables the program to be extended by anyone. In general, it works on Linux, MacOS, and Windows platforms. User-readable technology files enable migration to a variety of process technologies. Currently, an implementation in a non-fabricale 45nm technology (FreePDK45) is provided and the MOSIS Scalable CMOS (SCN3ME SUBM.30) is provided. The compiler has also been extended to several technologies. We hope to work with vendors to distribute the technology information of others commercial technologies soon. OpenRAM makes calls to commercial circuit simulators and DRC/LVS tools in an abstracted way for circuit simulation and verification. This enables adaptation to other design methodologies. However, it also supports a completely open-source platform for older technologies. • Timing and Power Characterization OpenRAM provides a basic framework for analysis of timing and power. This includes both analytical estimates, un-annotated spice simulations, or back-annotated simulations. The timing and power views are provided in the Liberty open format for use with the most common logic synthesis and timing analysis tools. • Commercial Tool Independence and Interoperability To keep OpenRAM portable and maximize its usefulness, it it independent from any specific commercial tool suite or language. OpenRAM interfaces to both open-source (e.g., NGSpice) and commercial circuit simulators through the standard Spice3 circuit format. The physical layout is directly generated in the GDSII layout stream format which can be imported into any academic or commercial layout tools. We provide a Library Exchange Format (LEF) file for interfacing with commercial Placement and Routing tools. We provide a Verilog behavioral model for simulation. • Silicon Verification TBD 4 2.1 Requirements Development is done on Ubuntu or MacOS systems with Python 2.7. 2.1.1 Timing Verification Tools For peformance reasons, OpenRAM uses analytical delay models by default. If you wish to enable simulation-based timing characterization, you must enable this on the command line with the “-c” command line argument. OpenRAM can use the following circuit simulators and possibly others if they support the Spice3 file format: • HSpice I-2013.12-1 or later • ngSpice 26 http://ngspice.sourceforge.net/ • CustomSim (xa) M-2017.03-SP5 or later 2.1.2 Physical Verification Tools By default, OpenRAM will perform DRC and LVS on each level of hierarchy. To do this, you must have a valid DRC and LVS tool and the corresponding rule files for the technology. OpenRAM can, however, run without DRC and LVS verification using the “-n” command line argument. It is not recommended to use this if you make any changes, however. DRC can be done with: • Calibre 2012.3 15.13 or later (SCMOS or FreePDK45) • Magic http://opencircuitdesign.com/magic/ (SCMOS only) LVS can be done with: • Calibre 2012.3 15.13 or later (SCMOS or FreePDK45) • Netgen http://opencircuitdesign.com/netgen/ (SCMOS only) 2.1.3 Technology Files To work with FreePDK45, you must install the FreePDK baseline kit from: https://www.eda. ncsu.edu/wiki/FreePDK45:Contents We have included an example Calibre DRC deck for MOSIS SCMOS design rules, but DRC with Magic relies on its own design rules: https://www.mosis.com/files/scmos/scmos.pdf We require the format 32 or later to enable stacked vias which is included with Qflow: git clone http://opencircuitdesign.com/qflow cp tech/osu050/SCN3ME_SUBM.30.tech5 You can over-ride the location of the DRC and LVS rules with the DRCLVS HOME environment variable. 2.1.4 Spice Models FreePDK45 comes with a spice device model. Once this is installed, it is used. SCMOS, however, does not come with a device spice model. This must be obtained from MOSIS or another vendor. We use the ON Semiconductor 0.5um device models. You can over-ride the location of the spice models with the SPICE MODEL DIR environment variable. 2.2 Environment Variables In order to make OpenRAM flexible, it uses two environment variables to make it relocatable in a variety of user scenarios. Specifically, the user may want technology directories that are separate from OpenRAM. Or, the user may want to have several versions of OpenRAM. This is done with the folowing required environment variables: specifically: • OPENRAM HOME defines the location of the compiler source directory. • OPENRAM TECH defines the location of the OpenRAM technology files. This is discussed later in Section ??. Other environmental variables and additional required paths for specific technologies are dynamically added during runtime by sourcing a technology setup script. These are located in the ”$OPENRAM TECH/setup scripts” directory. Example scripts for SCMOS and FreePDK45 are included with the distribution. These setup any things needed by the PDK. 2.3 Design Flow 2.4 Usage The OpenRAM compiler rquires a single argument of a configuration file. The configuration file specifies, at a minimum, the memory size parameters in terms of the number of words, word size (in bits), and number of banks. By default, OpenRAM will chose the number of columns to make the memory reasonably square. Commonly, the configuration file also includes parameters for the output path, base output file name, and technology of an SRAM. The configuration file can be used to over-ride any option in the options.py file. Many of these can also be controlled by the command-line which over-ride the configuration file. The one exception is the technology name. The technology name of a config file will over-ride a command-line option. The unit tests use the command line to read a configuration file, so it is a chicken and egg situation. 6 User Specification (word size, memory size, aspect ratio, etc. Tech Library Memory Compiler (Python) Front-End Methodology Logical Simulator (e.g. ngspice, spectre) Estimated Timing/Power LEF/FRAM GDSII Spice/LVS Verilog Simulator (e.g. ngspice, spectre) Memory Characterizer (Python) Front-End Physical Liberty (.lib) Extractor (e.g. Calibre) Memory Characterizer (Python) Back-End Methodology Liberty (.lib) Annotated Spice Timing/Power Figure 1: Overall Compilation and Characterization Methodology Lastly, the configuration file can over-ride any of the different circuit implementations for each module. For example, you can replace the default address decoder or bitcell with a new one by specifying a new python module that implements a new one. An entire example configuration file looks like: word_size = 16 num_words = 32 num_banks = 1 tech_name = "freepdk45" output_path = "/tmp/outputdir" output_name = "mysram" bitcell = "custom_bitcell" In this example, the user has specified a custom bitcell that will be used when creating the bitcell array and other modules. OpenRAM has many command line arguments. Other useful command line arguments are: • -h : To get help for the command-line options • -v : To increase the verbosity (may be used multiple times) 7 3 Overview of the SRAM Structure The baseline SRAMs generated by OpenRAM have 1 read/write port as shown in Figure ??. The address is decoded (Section ??) into a one-hot set of word lines (WL) which are driven by word line drivers (Section ??) over the bit-cell array (Section ??). To facilitate reads, the precharge circuitry (Section ??) precharges the bitlines so that the column mux (Section ??) can select the appropriate word which is then sensed by the sense amplifiers (Section ??). Write drivers (Section ??) use the bidirectional nature of the column mux to write the appropriate columns in a given memory row. A representative layout of such a memory closely resembles the logical representation and is shown in Figure ??. The address and data flip-flops and control circuitry are not shown but are detailed in Section ??. Decoder WL Driver Precharge 6T Cell Upper Address Column Mux Lower Address Write Driver Sense Amp Data In/Out Figure 2: Single Port SRAM Architecture PRECHARGE WL DRIVER DECODER ARRAY COLUMN MUX WRITE DRIVER SENSE AMP Figure 3: 1k SRAM with Two Columns and 16-bit Data 3.1 Inputs/Outputs The inputs to the SRAM are: • clk - External Clock • CSb - Active-low Chip Select 8 • WEb - Active-low Write Enable • OEb - Active-low Output Enable • ADDR# - corresponds to the Address Bus input, labeled 0 to N-address bits. • DATA# - corresponds to the bi-directional Data bus. The outputs to the SRAM are: • DATA# - correspond to the bi-directional Data bus. 3.2 Top-Level SRAM Module The sram class in sram.py is the top-level SRAM module. This class handles the overall organization of the memory and the input/output signals. Based on the user inputs, the various bus and array sizes are calculated and passed to the bank module. All other sub-modules access the value of sizes from bank. The overall organization is depicted in Figure ??, discussion of the design data structure is discussed in Section ?? and the modules contained in the top-level SRAM are detailed in Section ??. When the user has specified the desired size (word size, total number of words and number of banks) of the memory that is to be generated, the following parameters must be calculated. There are several constraints to be considered in this calculations: (i) sram can generate 1 bank, 2 banks or 4 banks. (ii) The area of each bank should be as square as possible which is dependent on the area of a 6T cell. (iii) There are several options for multiplexing (column-mux): 2-way, 4-way, 8-way and none. All of the top level routing is performed in the sram class. FIXME: More soon... 4 Modules This section provides an overview of the main modules that are used in an SRAM. For each module, we will provide both an architectural description and an explanation of how that design is generated and used in OpenRAM. The modules described below are provided in the first release of OpenRAM, but by no means is this an exhaustive list of the possible circuits that can be adapted into a SRAM architecture; refer to Section ?? for more information on adding different module designs to the compiler. Each module has a corresponding python class in the compiler directory. These classes are used to generate both the GDSII layout and spice netlists. Each module can consist of library cells as discussed in Section ??, paramterized cells in Section ?? or other modules. A discussion of the design hierarchy and how to implement a module is provided in Section ??. When combining modules at any level of hierarchy, DRC rules for minimum spacing of metals, wells, etc. must be followed and DRC and LVS are run by default after each hierarchical module’s creation. 9 4.1 The Bitcell and Bitcell Array The 6T cell is the most commonly used memory cell in SRAM devices. It is named a 6T cell because it consist of 6 transistors: 2 access transistors and 2 cross coupled inverters as shown in Figure ??. The cross coupled inverters hold a single data bit that can either be driven into, or read from the cell by the bitlines. The access transistors are used to isolate the cell from the bitlines so that data is not corrupted while a cell is not being accessed. Figure 4: Schematic of 6T cell. The 6T cell can be accessed to perform the two main operation associated with memory: reading and writing. When a read is to be performed, both bitlines are precharged to VDD. This precharging is done during the first half of the read cycle and is handled by the precharge circuitry. In the second half of the read cycle the wordline is asserted, which enable the access transistors. If a 1 is stored in the cell then BLB is discharged to Gnd and BL is pulled up to Vdd. Conversely, if the value stored is a 0, then BL is discharged to Gnd and BLB is pulled up to Vdd. While performing a write operation, both bitlines are also precharged to Vdd during the first half of the write cycle. Again, the world line is asserted, and the access transistors are enabled. The value that is to be written into the cell is applied to BL, and its complement is applied to BLB. The drivers that are applying the signals to the bitlines must be appropriately sized so that the previous value in the cell can be overwritten. The 6T cells are tiled together in both the horizontal and vertical directions to make up the memory array. The size of the memory array is directly related to the numbers of words, and the size of those words, that will need to be stored in the RAM. For example, an 8kb memory with a word size of 8 bits could be implemented as 8 columns and 1024 rows. It is common practice to keep the aspect ratio of memory array as square as possible1 . This helps to make sure that the bitlines do not become too long, which can increase the bitline capacitance, slow down the operation and lead to more leakage. To make the design “more square”, multiple words can share rows by interleaving the bits of each word. If the previous 8kb memory was rearranged to allow 2 words per row, then the array would have 16 columns and 512 rows. In OpenRAM, we provide a library cell for the 6T cell so that users can easily swap in different 1 Future versions will consider optimizing delay and/or power as well. 10 memory cell designs. The memory cell is the most important cell in the RAM and should be customized to minimize area and optimize performance. The memory cell is the most replicated cell in the RAM; minimizing its size can have a drastic effext on the overall size of the RAM. Also, the transitors in the cell must be carefully sized to allow for correct read and write operation as well as protection against corruption. The bitcell class in bitcell.py instantiates a single memory cell and is usually a pre-made library cell. The bitcell_array class in bitcell_array.py dynamically implements the memory cell array by instantiating a single memory cell according to the number of rows and columns. During the tiling process, the cells are abutted so that all bitlines and word lines are connected in the vertical and horizontal directions respectively. In order to share supply rails, cells are flipped in alternating rows. To avoid any extra routing, the power/ground rails, bitlines, and wordlines should span the entire width/height of the cell so thay they are automatically connected when the cells are abutted. 4.2 Precharge Circuitry The precharge circuit is depicted in Figure ?? and is implemented by three PMOS transistors. The input signal to the cell, clk, enables all three transistors during the first half of a read or write cycle (i.e. while the clock signal is low). M1 and M2 charge BL and BLB to Vdd and M3 helps to equalize the voltages seen on BL and BLB. VDD CLK M1 M2 M3 BL BL_bar Figure 5: Schematic of a single precharge cell. FIXME: Change PCLK to CLK. In OpenRAM, the precharge citcuitry is dynamically generated using the parameterized transistor class (ptx). The precharge class in precharge.py dynamically generates a single precharge cell. The offsets of the bitlines and the width of the precharge cell are equal to the 6T cell so that the bitlines are correctly connected down to the 6T cell. The precharge_array class is then used to generate a precharge array, which is a single row of n precharge cells, where n equals the number of columns in the bitcell array. 11 4.3 Address Decoders The address decoder takes the row address bits from the address bus as inputs, and asserts the appropriate wordline in the row that data is to be read or written. A n-bit address input controls 2n word lines. OpenRAM provides a hierarchical address decoder as the default, but will soon have other options. 4.3.1 Hierarchical Decoder Hierarchical decoder is a type of decoder which the constrcution takes place hierarchically. The simple 2:4 decoder is shown in the Figure ??. The operation of this decoder can be explained as follows: soon after the address signals A0 and A1 are put on the address lines, depending on the signal combination, one of the wordlines will rise after a brief amount of time. For example if the address input is A0A1=00 then the output is W0W1W2W3=1000. The 2:4 address decoder uses inverters and two input nand gates for its constrcution while the gates are sized to have equal rise and fall time. As the decoder size increases the size of the nand gates required for decoding also increases. Table ?? gives the detailed input and output siganls for the 2:4 hierarchical decoder. Figure 6: Schematic of 2-4 simple decoder. A[1:0] 00 01 10 11 Selected WL 0 1 2 3 Table 1: Truth table for 2:4 hierarchical decoder. An n-bit decoder requires 2n logic gates, each with n inputs. For example, with n = 6, 64 N AN D6 gates are needed to drive 64 inverters to implement the decoder. It is clear that gates with more than 3 inputs create large series resistances and long delays. Rather than using n-input gates, it is preferable to 12 use a cascade of gates. Typically two stages are used: a predecode stage and a final decode stage. The predecode stage generates intermediate signals that are used by multiple gates in the final decode stage. Figure 7: Schematic of 4 to 16 hierarchical decoder. Figure ?? shows the 4 to 16 heirarchical decoder. The structure of the decoder consists of two 13 2:4 decoders for predecoding and 2-input nand gates and inverters for final decoding to form the 4:16 decoder. In the predecoder, a total of 8 intermediate signals are generated from the address bits and their complements. The concept of using predecoing and final decoding stage for construction of address decoder is very procutive since small decoders like 2:4 decoder is used for predecoding. The operation of 4:16 heirarchical decoder can explained with an example. If the address is A0A1A2A3=0000 the output of the predecoder1 and predeocder2 will be WL0WL1WL2WL3=1000 and WL0WL1WL2WL3=1000, respectively. According to the connections in figure ?? the wordline 0 of predecoder1 and predecoder2 are conneted to the first 2-input nand gate in the decode stage representing the wordline 0 of the final decoding stage. Hence depengin on the combination of the input signal one of the wordline will rise. In this case since the address input is A0A1A2A3=0000 the wordline 0 should go high. Table ?? gives the detailed input and output siganls for the 4:16 hierarchical decoder. A[3:0] 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 predecoder1 1000 1000 1000 1000 0100 0100 0100 0100 0010 0010 0010 0010 0001 0001 0001 0001 predecoder2 1000 0100 0010 0001 1000 0100 0010 0001 1000 0100 0010 0001 1000 0100 0010 0001 Selected WL 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Table 2: Truth table for 4:16 hierarchical decoder. As the size of the address line increases higher level decoder can be created using the lower level decoders. For example for a 8:256 decoder, two instances of 4:16 followed by 256 2-input nand gates and inverters can form the decoder. In order to construct the 8:256 decoder, first 4:16 decoder should be constructed through using 2:4 deccoders. Hence the name is hierarchical decoder. 4.4 Wordline Driver Word line drivers are inserted, in between the word line output of the address decoder and the word line input of the bitcell-array. The word line drivers ensure that as the size of the memory array increases, and the word line length and capacitance increases, the word line signal is able to turn on the access transistors in the 6T cell. Also, as the bank select signal in multi-bank structures is AN DED with the word line output of decoder, bitcells turn on only when bank is selected. Figure ?? shows the diagram of word line driver and its input/output pins. In OpenRAM, word line drivers are created by using the pinv and nand2 classes which takes the transistor size and cell height as inputs (so that it can abutt the 6T cell). Word line driver is added as seperate module in compiler. 14 Figure 8: Diagram of word line driver. 4.5 Column Mux The column mux takes the column address bits from the address bus selects the appropriate bitlines for the word that is to be read from or written to. It takes n-bits from the address bus and can select 2n bitlines. The column mux is used for both the read and write operations; it connects the bitline of the memory array to both the sense ampflifier and the write driver. OpenRAM provides several options for column mux, but the default is a single-level column mux which is sized for optimal speed. 4.5.1 Tree Decoding Column Mux The schematic for a 4-1 tree multiplexer is shown in Figure ??. FIXME: Shading/opacity is different on different platforms. Make this a box in the image. It doesn’t work on OSX. This tree mux selects pairs of bitlines (both BL and BL B) as inputs and outputs. This 4-1 tree mux illustrates the process of choosing the correct bitlines if there are 4 words per row in the memory array. Each bitline pair represents a single bit from each word. A binary reduction pattern, shown in Table ??, is used to select the appropriate bitlines. As the number of words per row in the memory array increases, the depth of the column mux grows. The depth of the column mux is equal to the number of bits in the column address bus. The 4-1 tree mux has a depth of 2. In level 1, the least significant bit from the column address bus selects either the first and second words or the third and fourth words. In level 2, the most signifant column address bit selects one of the words passed down from the previous level. Relative to other column mux designs, the tree mus uses significantly less devices. But, this type of design can provide poor performance if a large decoder with many levels are needed. The delay of 15 Figure 9: Schematic of 4-1 tree column mux that passes both of the bitlines. of a tree mux quadratically increases with each level. Due to this fact, other types of column decoders should be considered for larger arrays. Selected BL BL0 BL1 BL2 BL3 Inp1 SEL0 bar SEL0 SEL0 bar SEL0 Inp2 SEL1 bar SEL1 bar SEL1 SEL1 Binary 00 01 10 11 Table 3: Binary reduction pattern for 4-1 tree column mux. In OpenRAM, the tree column mux is a dynamically generated design. The tree_mux_array is made up of two dynamically generated cells: muxa and mux_abar. The only diffference between these cells is that input select signal is either hooked up to the SEL or SEL bar signals (see highlighted boxes in Figure ??). These cells are initialized the the column_muxa and column_muxabar classes in columm_mux.py. Instances of ptx PMOS transistors are added to the design and the necessary routing is performed using the add_rect() function. A horizontal rail is added in metal2 for both the SEL and Sel bar signals. Underneath those input rails, horizontal straps are added. These straps are used to connect the BL and BL B outputs from muxa to the BL and BL B outputs of mux_abar. 16 Vertical conenctors in metal3 are added at the bottom of the cell so that connections can be made down to the sense amp. Vertical connectors are also added in metal1 so that the cells can connect down to other mux cells when the depth of the tree mux is more than one level. The tree_mux_array class is used to generate the tree mux. Instances of both the muxa and mux_abar cells are instantiated and are tiled row by row. The offset of the cell in a row is determined by the depth of that row in the tree mux. The pattern used to determine the offset of the mux cells is muxa.width ∗ (i) ∗ (2 ∗ row depth) where is the column number. As the depth increases, the mux cells become further apart. A separate “for” loop is invoked if the depth > 1, which extends the power/ground and select rails across the entire width of the array. Similarly, if the depth > 1, spice net names are created for the intermediate connection made at the various levels. This is necessary to ensure that a correct spice netlist is generated and that the input/output pins of the column mux match the pins in the modules that it is connected to. 4.5.2 Single Level Column Mux The optimal design for column mux uses a single NMOS device, driven by the input address or decoded input addresses. Figure ?? shows the schematic of a 2:1 single-level column mux. In this column mux one bit of address and its complementry drive the pass transistors. Selected transistors will connect their corresponding bitlines ( 1 set of column out of 2 set of columns) to sense-amp and write-driver circuitry for read or write operation. Figure ?? shows the schematic of a 4:1 single-level column mux. In this column mux, 2 input address are decoded using a 2:4 decoder ( 2:4 decoder is explain in section ??). 2:4 decoder provides a one-hot set of outputs, so only one set of columns will be selected and connected to sense-amp and write-driver ( in figure ?? one set of column out of four sets of column is selected). In OpenRAM, the single-level_mux_array is a dynamically generated design and it is made up of dynamically generated cell (single-level_mux). single-level_mux uses the parameterized transistor class ptx to generate two NMOS transistors which will connect the BL and BLB of selected columns to sense-amp and write-driver. Horizontal rails are added for sel signals. Vertical straps connect the BL and BLB of bitcell array to BL and BLB of single-level column mux and also BL-out and BLB-out of single-level column mux to BL and BLB of sense-amp and write-driver. Figure 10: Schematic of a 2:1 single level column mux. 17 Figure 11: Schematic of a 4:1 single level column mux. 4.6 Sense Amplifier The sense amplifier is used to sense the difference between the bitline and bitline bar while a read operation is performed. The sense amp is necessary to recover the signals from the bitlines because they do not experience full voltage swing. As the size of the memory array grows, the load of the bitlines increases and the voltage swing is limited by the small memory cell driving this large load. A differential sense amplifier is used to“sense” the small voltage difference between the bitlines. Figure 12: Schematic of a single sense amplifier cell. The schematic for the sense amp is shown in Figure ??. The sense amplifier is enable by the SCLK signal, which initiates the read operation. Before the sense amplifier is enable, the bitlines are precharged to Vdd by the precharge unit. When the sense amp is enabled, one of the bitlines experiences a voltage drop based on the value stored in the memory cell. If a zero is stored, the bitline voltage drops. If a one is stored, the bitline bar voltage drops. The output signal is then taken to a true logic level and latched for output to the data bus. 18 In OpenRAM, the sense amplifier is a libray cell. The associated layout and spice netlist can be found in the gds_lib and sp_lib in the FreePDK45 directory. The sense_amp class in sense_amp.py instantiates a single instance of the sense amp library cell. The sense_amp_array class handles the tiling of the sense amps cells. One sense amp cell is needed per data bit and the sense amp cells need to be appropriately spaced so that they can hook up to the column mux bitline pairs. The spacing is determined based on the number of words per row in the memory array. Instances are added and then Vdd, Gnd and SCLK rails that span the entire width of the array are drawn using the add rect() function. We chose to leave the sense amp as a libray cell so that custom amplifier designs could be swapped into the memory as needed. The two major things that need to be considered while designing the sense amplifier cell are the size of the cell and the bitline/input pitches. Optimally, the cell should be no larger than the 6T cell so that it abuts to the column mux and no extra routing or space is needed. Also, the bitline inputs of the sense amp need to line up with the outputs of the write driver. In the current version of OpenRAM, the write driver is situated under the sense amp, which had bitlines spaning the entire height of the cell. In this case, the sense amplifier is disabled during a write operation but the bitlines still connect the write driver to the column mux without any extra routing. 4.7 Write Driver The write driver is used to drive the input signal into the memory cell during a write operation. It can be seen in Figure ?? that the write driver consists of two tristate buffers, one inverting and one non-inverting. It takes in a data bit, from the data bus, and outputs that value on the bitline, and its complement on bitline bar. The bitlines need to be complements so that the data value can be correctly stored in the 6T cell. Both tristates are enabled by the EN signal. Figure 13: Schematic of a write driver cell, which consists of 2 tristates (non-inverting and inverting) to drive the bitlines. Currently, in OpenRAM, the write driver is a library cell. The associated layout and spice netlist can be found in the gds_lib and sp_lib in the FreePDK45 directory. Similar to the sense_amp_array, the write_driver_array class tiles the write driver cells. One driver cell is needed per data bit and Vdd, Gnd, and EN signals must be extended to span the entire width of the cell. It is not optimal 19 to have the write driver as a library cell because the driver needs to be sized based on the capacitance of the bitlines. A large memory array needs a stronger driver to drive the data values into the memory cells. We are working on creating a parameterized tristate class, which will dynamically generate write driver cells of different sizes/strengths. 4.8 Flip-Flop Array In a synchronous SRAM it is necessary to synchronize the inputs and outputs with a clock signal by using flip-flops. In FreePDK45 we provide a library cell for a simple master-slave flip-flop, see schematic in Figure ??. In our library cell we provide both Q and Q bar as outputs of the flop because inverted signals are used in various modules. The ms_flop class in ms_flop.py instatitates a single masterslave flop, and the ms_flop_array class generates an array of flip-flops. Arrays of flops are necessary for the data bus (an array for both the inputs and outputs) as well as the address bus (an array for row and column inputs). The ms_flop_array takes the number of flops and the type of array as inputs. Currently, the type of the array must be either “data in”, “data out”, “addr row”, or “addr col” verbatim. The array type input is used to look up that associated pin names for each of the flop arrays. This was implemented very quickly and should be improved in the near future... Figure 14: Schematic of a master-slave flip-flop provided in FreePDK45 library 4.9 Control Logic The details of the control logic architecture are outlined in Section ??. The control logic module, control_logic.py, instantiates a control_logic class that arranges all of the flip-flops and logic associated with the control signals into a single module. Flip-flops are instantiated for each control signal input and library NAND and NOR gates are used for the logic. A delay chain, of variable length, is also generted using parameterized inverters. The associated layouts and spice netlists can be found in the gds_lib and sp_lib in the FreePDK45 directory. 20 5 Bank and SRAM The overall memory architecture is shown in figure ??. As shown in this figure one Bank contains different modules including precharge-array which is positioned above the bitcell-array, column-muxarray which is located below the bitcell-array, sense-amp-array, write-driver-array, data-in-ms-flop-array to synchronize the input data with negative edge of the clock, tri-gata-array to share the bidirectional data-bus between input and output data, hierarchical decoder which is placed on the right side of the bitcell-array (predecoder + decoder), wordline-driver which drives the wordlines horizontally across the bitcell-array and address-ms-flops to synchronize the input address with positive edge of the clock. In bitcell-array each memory cell is mirrored vertically and horizontally inorder to share VDD and GND rails with adjacent cells and form the array. Data-bus is connected to tri-gate, address-bus is connected to address-ms-flops and bank-select signal will enable the bank when it goes high. To complete the SRAM design, bank is connected to control-logic as shown in figure ??. Control-logic controls the timing of modules inside the bank. CSb, OEb, Web and clk are inputs to the control logic and output of control logic will ANDed with bank-select signal and send to the corresponding modules. In order to reduce the delay and power, divided wordline strategy have been used in this compiler. Part of the address bits are used to define the global wordline (bank-select) and rest of address bits are connected to hierarchical decoder inside each bank to generate local wordlines that actually drive the bitcell access transistors. As shown in figure ?? SRAM is divided to two banks which share data-bus, address-bus, control-bus and control-logic. In this case one bit of address (most significant bit) goes to an ms-flop and outputs of ms-flop (address-out and address-out-bar) are connected to banks as bank-select signals. Control logic is shared between two banks and based on which bank is selected, control signals will activate modules inside the selected bank. In this architecture, the total cell capacitance is reduced by up to a factor of two. Therefore the power will be reduced greatly and the delay among the wordlines is also reduced. In figure ??, four banks are connected together. In this case a 2:4 decoder is added to select one of the banks using two most significant bits of input address. Control signals are connected to all banks but will turn on only the selected bank. 6 Software Implementation OpenRAM is implemented using object-oriented data structures in the Python programming language. The top-level executable is openram.py which parses input arguments, creates the memory and saves the output. 6.1 Design Hierarchy All modules in OpenRAM are derived from the design class in design.py. The design class is a data structure that consists of a spice netlist, a layout, and a name. The spice netlist capabilities are inherited from the hierarchy_spice class while the layout capabilities are inherited from the hierarchy_layout class. The only additional function in design.py is DRC_LVS(), which performs a DRC/LVS check on the module. 21 Figure 15: Overal bank and SRAM architecture. 6.1.1 Spice Hierarchy The spice hierarchy is stored in the spice class in hierarchy_spice.py. When the design class is initialized for a module, a data structure for the spice hierarchy is created. The spice data stucture name becomes the name of the top-level subcircuit definition for the module. The list of pins for the module are added to the subcircuit definition by using the add_pin() function. The add_mod() function 22 Figure 16: SRAM is divided to two banks which share the control-logic. adds an instance of a module/library cell/parameterized cell as a subcircuit to the top-level structure. Each time a sub-module has been added to the hierarchy, the pins of the sub-module must be connected using the connect_pins() function. It is important to note that the pins must be listed in the same order as they were added to the submodule. Also, an assertion error will occur if there is a mismatch in the number of net connections. The spice class also contains functions for reading or writing spice files: • sp_read(): this function is used to read in spice netlists and parse the inputs defined by the “subckt” definition. • sp_write(): this function creates an empty spice file in write mode and calls sp_write_file(). • sp_write_file(): this function recursively writes the modules and sub-modules from the data structure into the spice file created by sp_write(). 6.1.2 Layout Hierarchy The layout hierarchy is stroed in the layout class in hierarchy_layout.py. When the design class is initialized for a module, a data structure for the layout hierarchy is created. The layout data 23 Figure 17: SRAM is divided to 4 banks wich are controlled by the control-logic and a 2:4 decoder. 24 module (e.g., bit-cell array) design DRC_LVS() layout spice Figure 18: Class hierarchy structure has two main components: a structure for the instances of sub-modules contained in the layout, and a structure for the objects (such as shapes, labels, etc...) contained in the layout. The functions included in the layout class are: • def add_inst(self,name,mod,offset,mirror): adds an instance of a physical layout (library cell, module, or parameterized cell) to the module. The input parameters are : name - name for the instance. mod - the associated spice module. offset - the x-y coordinates, in microns, where the instance should be placed in the layout. mirror - mirror or rotate the instance before it is added to the layout. Accepted values for mirror are: "R0", "R90", "R180", "R270" ∗ Currently, only “R0” works. "MX" or "x", "MY" or "y", "XY" or "xy" (“xy” is equivalent to “R180”) • add_rect(self,layerNumber,offset,width,height): adds a rectangle to the module’s layout. The inputs are: 25 layernumber - the layer that the rectangle is to be drawn in. offset - the x-y coordinates, in microns, where the rectangle’s origin will be placed in the layout. width - the width of the rectangle, can be positive or negative value. height - the height of the rectangle, can be positive or negative value. • add_label(self,text,layerNumber,offset,zoom): adds a label to the layout. The inputs are: text - the text for the label layernumber - the layer that the label is to be drawn in . offset - the x-y coordinates, in microns, where the label will be placed in the layout. zoom - magnification of the label (ex: “1e9”). • add_path(self,layerNumber,coordinates,width): this function is under construction... • gds_read(): reads in a GDSII file and creates a VlsiLayout() class for it. • gds_write(): writes the entire GDS of the object to a file by gdsMill vlsiLayout() class and calling the gds2writer() (see Sections ?? and ??. • gds_write_file(): recursively the instances and objects in layout data structure to the gds file. • pdf_write(): this function is under construction... 6.2 Creating a New Design Module Each module in the SRAM is its own Python class, which contains a design class, or data structure, for the layout and spice. The design class (design.py) is initialized within the module class, subsequently creating separate data structurse to hold the layout (hierarchy_layout) and spice (hierarchy_spice) information. By having a class for each module, it is very easy to instatiate instances of the modules in any level of the hierarchy. Follow these guidelines when creating a new module: • Derive your class from the design module: class bitcell_array(design.design): • Always use the python constructor __init__ method so that your class is initialized when an object of the module is instatiated. The module parameters should also be declared: def __init__(self, cols, rows): • In the constructor, call the base class constructor with the name such as: design.design.__init__(self,"bitcell_array") 26 • Add the pins that will be used in the spice netlist for your module using the add_pin() function from the hierarchy_spice class. self.add_pin("vdd") • Create an instance of the module/library cell/parameterized cell that you want to add to your module: cell=bitcell.bitcell(cell_6t) • Add the subckt/submodule instance to the spice hierarchy using the add_mod() function from the hierarchy_spice class: self.add_mod(cell) • Add layout instance into your module’s layout hierarchy using the add_instance() function, which takes a name, mod, offset, and mirror as inputs: self.add_inst(name=name,mod=cell,offset=[x_off,y_off],mirror=x) • Connect the pins of the instance that was just added by using the connect_pins function from the hierarchy_spice class: self.connect_inst([BL[%d]%col, BR[%d]%col, WL[%d]%row, gnd, vdd]). The pins must be listed in the same order as they were added to the submodule. Also, an assertion error will occur if there is a mismatch in the number of net connections. • Do whatever else needs to be done. Add rectangles for power/ground rails or routing, add labels, etc... • Every module needs to have “self” height and width variable that can be accessed from outside of the module class. These paramaters are commonly used for placing instances modules in a layout. For library cells, the self.width and self.height variables are automatically parsed from the GDSII layout using the cell_size() function in vlsi_layout. Users must define the width and height of dynamically generated designs. • Add a call to the DRC_LVS() function. 6.3 GDSII Files and GdsMill) GDSII is the standard file used in indusrty to store the layout information of an integrated circuit. The GDSII file is a stream file that consists of records and data types that hold the data for the various instances, shapes, labels, etc.. in the layout. In OpenRAM, we utlize a nifty tool, called gdsMill, to read, write, and manipulate GDSII files. GdsMill was developed by Michael Wieckowski at the University of Michigan. 27 GDS file Header structure record list GDS file End GDS II Version 5 Date Modified:2013,4,28,17,2,41 Date Last Accessed:2013,4,28,17,2,41 Library: DEFAULT.DB Units: 1 user unit=0.0005 database units, 1 database unit= > meters. Structure Name: Xn Drawing Layer: 11 Data Type: 0 XY Point: 10,25 XY Point: 140,25 XY Point: 140,-435 XY Point: 10,-435 XY Point: 10,25 Reference Name:lptxnmos0.135_size0.09 Mirror X:False Rotate:False Magnify:False XY Point: 0,0 Purpose Layer: 0 Mirror X:False Rotate:False Magnify:False Magnification:1000.0 XY Point: 210,270 Text String: G\00 a structure record End of Structure. End of GDS Library. Figure 19: example of a GDSII file 6.3.1 GDSII File Format The format of gds file contains several parts, as it could be shown in Figure ??. The first part is the gds file header, which the contains GDSII version number, date modified, date last accessed, library, user units, and database units. The second part is the list of structures. These structures contain geometries or references to other structures of the layout in heirarchical form. Within a structure there are several kinds of records: • Rectangle - basic geometry unit in a design, represent one layer of material in a circuit(i.e. a metal pin). Five coordinates and layer number are stored in rectangle record. • Structure Reference - a structure that is used in this structure. The information about this reference will be used store as a structure in the same gds file. 28 • Text - a text record used for labels. • Path - used to represent a wire. • Boundary - defines a filled polygon. • Array Reference - specifies an array of structure instances • Node - Electrical nets may be specified with the NODE record The last part is the tail of the GDSII file which ends the GDS Library. FIXME: Provide a link to the complete GDSII specification. 6.3.2 GdsMill As previously stated, GdsMill is a set of scripts that can be used to read, write, and manipulate GDSII files. The gds2 reader and gds2 writer: In GdsMill, the gds2_reader and gds2_writer classes contain the various functions used to convert data between GDSII files and the vlsilayout class. These classes process the data by iterating through every record in the GDS structures and check or write every data record. The record type (see Section ??),is tracked and identified using flags. FIXME: Do we need more information of these classes, or should we just point to the GdsMill documentation? The VlsiLayout Class: After the gds2_reader class reads in the records, the data has to be stored in a way that can be easily used by our code. Thus, the VlsiLayout class is made to represent the layout. VlsiLayout contains the same information as GDSII file but in a different way. VlsiLayout stores records in data structures, which are defined in gdsPrimitives.py. Each record type has a corresponding class defined in gdsPrimitives. Thus, a vlsilayout should at least contains following member data: • self.rootStructureName - name of the top design. • self.structures -list of structure that are used in the class. • self.xyTree - contains a list of all structure names that appeared in the design. The VlsiLayout class also contains many functions for adding structures and records to a layout class, but the important and most useful functions have been aggregated into a wrapper file. This wrapper is called geometry.py and is located in the compiler directory. 29 6.3.3 OpenRAM-GdsMill Interface Dynamically generated cells and arrays each need to build a VlsiLayout data structure to represent the hierarchical layout. This is performed using various functions from the VlsiLayout class in GdsMill, but the GdsMill file is very large and can be difficult to understand. To make things easier, OpenRAM has its own wrapper class called geometry in geometry.py. This wrapper class initializes data structures for the instances and objects that will be added to the VlsiLayout class. The functions add_inst(), add_rect(), add_label() in hierarchy_layout, add the structures to the geometry class, which is then written out to a GDSII file using VlsiLayout and the gds2_writer. User included library cells, which should be in gds files, can be used as dynamically generated cells by using GDSMill. Cell information such as cell size and pin location can be obtained by using built in functions in the VlsiLayout class. Cell size can be finded by using the readLayoutBorder function of the VlsiLayout class. A boundary layer should be drawn in each library cell to indicate the cell area. The readLayoutBorder function will return the width and height of the boundary. If a boundary layer do not exist in the layout, then measureSize can find the physical size cell. The first method is used as primary method in auto_Measure_libcell the lib utility.py, while the second method is used as a back up one. Each technolgy setup will import this utility function and read the library cell. Pin location can be find by using the readPin function of the VlsiLayout class. The readPin function will return the biggest boundary which covers the label and is at the same layer as the label is. 6.4 Technology Directory The aim of creating technology directory is to make OpenRAM portable to different technologies. This directory contains all the information related to the specific process/technology that is being used. In OpenRAM, the default technology is FreePDK45, which has it own technolony directory in the trunk. The technology-specific directory should consist of the following: • Technology-Specific Parameters - These parameters should include layer numbers and any design rules that may be needed for generating dynamic designs (DRC rules). The parameters should be added in /techdir/tech/tech.py and layer map in the /techdir. • Library Cells - The library cells and corresponding spice netlists should be added to the /gds_lib and /sp_lib directories. • Portation Functions - Some of the dynamically generated cells may need helper functions to deal with technology-specific requirements. Additional, tech-specific, functions should be added to the /techdir. For more information regarding the technology directory and how to set one up for a new technology, refer to Section ?? 30 6.5 DRC/LVS Interface Each design class contains a function DRC_LVS() that performs both DRC and LVS on the current design module. This enables bottom-up correct-by-construction design and easy identification of where errors occur. It does incur some run-time overhead and can be disabled on the command line. The DRC_LVS() function saves a GDSII file and a Spice file into a temporary directory and then calls two functions to perform DRC and LVS that are tool-dependent. A reference implementation for the DRC and LVS functions are provided for Cadence Calibre since this is the most common DRC/LVS tool. Each of these functions generates a batch-mode “runset” file which contains the options to correctly run DRC and LVS. The functions then parse the batch mode output for any potential errors and returns the number of errors encountered. The function run_drc() requires a cell name and a GDSII file. The cell name corresponds to the top level cell in the GDSII file. It also uses the layer map file for the technology to correctly import the GDSII file into the Cadence database to perform DRC. The function returns the number of DRC violations. The function run_lvs() requires a cell name, a GDSII file, and a Spice file. Calibre will extract an extracted Spice netlist from the GDSII file and will then compare this netlist with the OpenRAM Spice netlist. The function returns the number of uncompared and unmatched devices/nets in the design. For both DRC and LVS, the summary file and other report files are left in the OpenRAM temporary directory after DRC/LVS is run. These report files can be examined to further understand why errors were encountered. In addition, by increasing the debug level, the command line to re-create the DRC/LVS check can be obtained and run manually. 7 Custom Layout Design Functions in Software OpenRAM provides classes that can be used to generated parameterized cells for the most common cells: transistors, inverters, nand2, nand3, etc... There are many advantages to having parameterized cells. The main advantage is that it makes it easier to dynamically generate designs and cuts down the necessary code to be written. We also need parameterized cells because some designs, such as the wordline drivers, need to be dynamically sized based on the size of the memory. Lastly, there may be certain physical dimension requirements that need to be met for a cell, while still maintaing the expected operation/performance. In OpenRAM we currently provide five parameterized cells: parameterized transistor (ptx), parameterized inverter (pinv), parameterized nand2 (nand_2), parameterized nand3 (nand_3) and parameterized nor2 (nor_2). 7.1 Parameterized Transistor The parameterized transistor class generates a transistor of specified width and number of mults. The ptx is constructed as follows: def __init__(self,name,width,mults,tx_type) An explanation of the ptx parameters is shown in Table ??. A layout of ptx, generated by the following instatiation, is depicted in Figure ??. 31 fet = ptx.ptx(name = "nmos_1_finger", width = tech.drc["minwidth_tx"], mults = 1, tx_type = "nmos"). Parameter width mults tx_type Explanation active height mult number of the transistor type of transistor,nmos and pmos Table 4: Parameter Explanation of ptx Finger Number 3 width = tech.drc["minwidth_tx"] Figure 20: An example of Parameterized Transistor (ptx) 7.2 Parameterized Inverter The parameterized inverter (pinv) class generated an inverter of a specified size/strength and height. The pinv is constructed as follows: def __init__(self, cell_name, size, beta=tech.[pinv.beta], cell_size=tech.cell[height]) The parameterized inverter can provide significant drive strength while adhering to physical cell size limitations. That is achieved by having many small transistors connected in parallel, thus the height of the inverter cell can be manipulated without the affecting the drive strength. The NMOS size is an input parameter, and the PMOS size will be determined by beta ∗ N M OS size, where beta is the ratio of the PMOS channel width to the NMOS channel width. The following code instatiates the pinv instance seen in Figure ??. a=pinv.pinv(cell_name="pinv",size=tech.drc["minwidth_tx"]*8) The pinv parameters are explained in Table ??. 32 nmos_size=size pmos_size=size*beta cell_size=cell_size=tech.cell["height"] Figure 21: An example of Parameterized Inverter(pinv) Parameter size beta = tech.[pinv.beta] cell_size = tech.cell[height] Explanation The logic size of the transistor of the nmos in the pinv Ratio of pmos channel width to nmos channel width. physical dimension of cell height. Table 5: Parameter Explanation of pinv 7.3 Parameterized NAND2 The parameterized nand2 (nand_2) class generated a 2-input nand gate of a specified size/strength and height. The nand_2 is constructed as follows: def __init__(self, name, nmos_width, height=tech.cell_6t[height]) 33 The NMOS size is an input parameter, and the PMOS size will be equal to NMOS to have the equal rising and falling for output. The following code instatiates the nand_2 instance seen in Figure ??. a=nand_2.nand_2(name="nand2", nmos_width=2*tech.drc["minwidth_tx"], height=tech.cell_6t["height"]) Figure 22: An example of Parameterized NAND2(nand 2) The nand_2 parameters are explained in Table ??. Parameter nmos_width height = tech.cell 6t[height] Explanation The logic size of the transistor of the nmos in the nand2 physical dimension of cell height. Table 6: Parameter Explanation of nand2 34 7.4 Parameterized NAND3 The parameterized nand3 (nand_3) class generated a 3-input nand gate of a specified size/strength and height. The nand_3 is constructed as follows: def __init__(self, name, nmos_width, height=tech.cell_6t[height]) The NMOS size is an input parameter, and the PMOS size will be equal to 2/3 NMOS size to have the equal rising and falling for output. The following code instatiates the nand_3 instance seen in Figure ??. a=nand_3.nand_3(name="nand3", nmos_width=3*tech.drc["minwidth_tx"], height=tech.cell_6t["height"]) Figure 23: An example of Parameterized NAND3(nand 3) The nand_3 parameters are explained in Table ??. Parameter nmos_width height = tech.cell 6t[height] Explanation The logic size of the transistor of the nmos in the nand3 physical dimension of cell height. Table 7: Parameter Explanation of nand3 35 7.5 Parameterized NOR2 The parameterized nor2 (nor_2) class generated a 2-input nor gate of a specified size/strength and height. The nor_2 is constructed as follows: def __init__(self, name, nmos_width, height=tech.cell_6t[height]) The NMOS size is an input parameter, and the PMOS size will be equal to 2 NMOS size to have the equal rising and falling for output. The following code instatiates the nor_2 instance seen in Figure ??. a=nor_2.nor_2(name="nor2", nmos_width=2*tech.drc["minwidth_tx"], height=tech.cell_6t["height"]) Figure 24: An example of Parameterized NOR2(nor 2) The nor_2 parameters are explained in Table ??. Parameter nmos_width height = tech.cell 6t[height] Explanation The logic size of the transistor of the nmos in the nor2 physical dimension of cell height. Table 8: Parameter Explanation of nor2 7.6 Path and Wire OpenRam provides two routing classes in custom layout design. Both Path and wire class will take a set of coordinates connect those points with rectilinear metal connection. The difference is that path only use the same layers for both vertical and horizontal connection while wire will use two different adjacent metal layers. The this example will construct a metal1 layer path 36 layer_stack = ("metal1") position_list = [(0,0), (0,3), (1,3), (1,1), (4,3)] w=path.path(layer_stack,position_list) and This exmaple will construct a wire using metal1 for vertical connection and metal2 for horizontal connection: layer_stack = ("metal1","via1","metal2") position_list = [(0,0), (0,3), (1,3), (1,1), (4,3)] w=wire.wire(layer_stack,position_list) 8 Porting to a new Technologies The folllowing sub-directories and files should be added to your new technology directory: • /sp_lib - spice netlists for library cells • /gds_lib - GDSII files for the library cell • layers.map - layer/purpose pair map from the technology • /tech - contains tech parameters, layers, and portation functions. 8.1 The GDS and Spice Libraries The GDS and Spice libraries , \gds_lib and \sp_lib, should contain the GDSII layouts and spice netlists for each of the library cells in your SRAM design. For the FreePDK45 technology, library cells for the 6T Cell, Sense Amp, Write Driver, Flip-Flops, and Control Logic are provided. To reiterate: all layouts must be exported in the GDSII file format. The following commands can be used to stream GDSII files into or out of Cadence Virtuoso: To stream out of Cadence: strmout -layerMap ../sram_lib/layers.map -library sram -topCell $i -view layout -strmFile ../sram_lib/$i.gds To stream a layout back into Cadence: strmin -layerMap ../sram_lib/layers.map -attachTechFileOfLib NCSU_TechLib_FreePDK45 -library sram_4_32 -strmFile sram_4_32.gds When you import a gds file, make sure to attach the correct tech lib or you will get incorrect layers in the resulting library. 37 8.2 Technology Directory Inside of the /tech directory should be the Python classes for tech.py, ptx_port.py, and any other portation functions. The tech.py file is very important and should contain the following: • Layer Number/Name - GDSII files only contain layer numbers and it can be difficult to keep track of which layer corresponds to what number. In OpenRAM code, layers are referred to by name and tech.py maps the layer names that we use to the layer numbers in the layer.map This will associate the layer name used in OpenRAM program with the number used in the layer.map, thus the code in complier wont need to be changed for each technology. • Tech Parameters - important rules from the DRC rule deck(such as layer spacing and minimum sizes) should be included here. Please refer to the rules that are included in tech.py to get a better idea as to what is important. • Cell Sizes and Pin Offsets - The cell_size() and pin_finder() functions should be used to populate this class with the various cell sizes and pin locations in your library cells. These functions are relatively slow because they must traverse the every shape in the entire hierarchy of a design. Due to this fact, these function are not invoked each time the compiler is run, it should be run one time or if any changes have been made to library cells. This sizes and pin locations gathered are needed to generate the dynamic cells and perform routing at the various levels of the hierarchy. It is suggested that boundary boxes on a specific layer should be added to define the cell size. 9 Timing and Control Logic This section outlines the necessary signals, timing considerations, and control circuitry for a synchronous SRAM. 9.1 Signals Top-Level Signals: • ADDR - address bus. • DATA - bi-directional data bus. • CLK - the global clock. • OEb - active low output enable. • CSb - active low chip select. • WEb - active low write enable. Internal Signals: • clk bar - enables the precharge unit. • s en - enables the sense amp during a read operation. 38 Figure 25: Timing diagram for read operation showing the setup, hold, and read times. • w en - enable the write driver during a write operation. • tri en and tri en bar - enable the data input tri-gate during a read operation. 9.2 Timing Considerations The main timing considerations for an SRAM are: • Setup Time - time an input needs to be stable before the positive/negative clock edge. • Hold Time - time an input needs to stay valid after the positive/negative clock edge. • Minimun Cycle Time - time inbetween subsequent memory operations. • Memory Read Time - time from positive clock edge until valid data appears on the data bus. • Memory Write Time - time from negative clock edge until data has been driven into a memory cell. 9.3 SRAM Operation Read Operation: 1. Before the clock transition (low to high) that initiates the read operation: 39 (a) The chip must be selected (CSb low). (b) The WEb must be high (read). (c) The row and column addresses must be applied to the address input pins (ADDR). (d) OEb should be selected (OEb low). 2. On the rising edge of the clock (CLK): (a) The control signals and address are latched into flip-flops and the read cycle begins. (b) The precharging of the bit lines starts. (c) The address bits become available for the decoder and column mux, which select the row and columns that we want to read from. 3. On the falling edge of the clock (CLK): (a) Word line has been asserted, the value stored in the memory cells pulls down one of the bitlines (BL if a 0 is stored, BL bar if a 1 is stored). (b) s en enables the sense amplifier which senses the voltage difference of the bit lines, produces the output and keeps the value in its latch circuitry. (c) Tri-gate enables and put the output data on data bus. Data remains valid on the data bus for a complete clock cycle. Write Operation: 1. Before the clock transition (low to high) that initiates the write operation: (a) The chip must be selected (CSb low). (b) The WEb must be low to enable the data input tristates. (c) The row and column addresses must be applied to the address input pins (ADDR). (d) OEb must be high (no output is available and sense amp disabled) 2. On the rising edge of the clock (CLK): (a) OEb stays high (no output is available and sense amp disabled) (b) The inputs addresses are latched into flip-flops, precharging starts, and the write operation begins. (c) The address bits become available for the decoder and column mux, which select the row and columns that we want to write to. 3. On the falling edge of the clock (CLK): (a) The data to be written must be applied to DATA and latched into flip-flops. (b) w en enables the write driver, which drives the data input through the column mux and into the selected memory cells. The write delay is the time from the negative clock edge until the data value is stored in the memory cell on node X. 40 Figure 26: Timing diagram for write operation showing the setup, hold, and write times. 9.4 Zero Bus Turnaround (ZBT) In timing of SRAM, during a read operation, data should be available after the clock edge while during a write, data should be set up before the clock edge. Due to this issue a wait state (dead cycle) is neccessary when SRAM switches from read mode to write mode. To avoide dead cycles in SRAM timing which slow down the operation and degrade the performance of SRAM, Zero Bus turnaround (ZBT) technique is used. Using ZBT, during a write, data is set up after positive clock edge and before negative clock edge and input data is latched in negative edge flip-flops. Using ZBT, we will get a higher memory throughput and there is no waite states. Figure ?? shows the correct timing for input signals during the write opertion to avoide the wait states. Figure ?? shows how a write cycle is followed by a read cycle with no wait state through using ZBT. Input address bits should be ready before positive edge to be loaded to positive edge flip-flops. Output data is ready to be loaded to data-bus during seconde half of cycle (after negative edge of clock) and input data should be ready before negative edge of clock to be loaded in negative edge flip-flops. 41 Figure 27: (a) Zero Bus Turnaround timing. 9.5 Control Logic The control circuitry ensures that the SRAM operates as intended during a read or write cycle by enabling the necessary structures in the SRAM. As shown in Figure ??, the control logic takes three active low signals as inputs: chip select bar (CSb), output enable bar (OEb), and write enable bar (W Eb). CSb enables the entire SRAM chip. When CSb is low, the appropriate control signals are generated and sent to the architecture blocks. Conversely, if CSb is high then no control signals are generated and SRAM is turned off or disabled. The OEb signal signifies a read operation; while it is low the value seen on the data bus will be an output from the memory. Similarly, the W Eb signal signifies a write operation. All of the input control signals are latched with master-slave flip-flops, ensuring that the control signal stays valid for the entire operation cycle. The control signal flip-flops use the normal clock to generate local signals used to enable or disable structures based on the operation. Address flip-flops are combined with global clock as well. In a standard write SRAM, switching from a read to a write operation results in a dead cycle. To avoid this dead cycle, Data flip-flops are latched with clk bar in order to have a Zero Bus Turnaround (ZBT) memory. More details on ZBT timing are outlined in Section ??. After all control signals are latched, they are ANDED with the clk bar because the read/write circuitries should only be enabled after the precharging of the bitlines had ended on the negative edge of the clock. The w en signal enables the write driver during a write to the memory .The s en signal is generated using a Replica Bitline (RBL) to enable the sense amplifier during a read operation. Details on RBL architecture are outlined in section ??. tri en and tri en bar enable the tristates during read in order to drive the outputs onto the data bus. Table ?? shows the truth table for the control logic. The s en signal to enable the sense amplifier is true when (CS.OE.Clk bar) is true. Similarly, write driver enable signal, w en, is true when (CS.W E.clk bar) is true. tri en and tri en bar are true when ¬(OEb bar|clk) and ¬(OEb.clk bar) are true, respectively. Operation READ WRITE Inputs CSb OEb WEb 0 0 1 0 1 0 Outputs s en w en tri en 1 0 1 0 1 0 Table 9: Generation of control signals. 42 Figure 28: (a) Control Logic diagram and (b) Replica Bitline Schematic. 43 9.6 Replica Bitline Delay In SRAM read operation, discharging the bitline is the most time consuming procedure. Generally, sense amplifier amplifies the small voltage difference on the bitlines at the proper sense timing, to realize highspeed operation. Therefore, the timing for sense amplifier (s en) is extremely important for the high speed and low power SRAM. If the s en arrives early before the bitline difference reaches the sense amplifier input transistors offset voltage, a read functional failure may occur. Contrarily, a late-arrived s en would consume more unnecessary time, thereby wasting the power. The conventional way of generating s en signal is to use a replica bitline (RBL). RBL as shown in ?? consists of a column of SRAM cells (dummy cells), which track the random process variation in array. RBL is presented for matching the delay of the activation of the sense amplifier with the delay of the propagation of the required voltage swing at the bitlines. In RBL technique, delay driven memory cell in control path is same as read path. Therefore the delay shift of control path according to the Process, Voltage and Temperature (PVT) variation is same ratio as that of read path. The RBL technique attains self-timed tracking with optimal s en timing according to PVT variation. Using replica circuits, the variation on the delay of the sense amp activation and bitline swing is minimized. RBL technique uses a Replica Cell (RC) driving a short bitline signal. The short bitlineś capacitance is set to be a fraction of the main bitline capacitance (e.g. one tenth). This fraction is determined by the required bitline swing (bitline voltages larger than offset voltage at input transistors of sense amplifier) for proper sensing. So in SRAM, an extra column block is converted into the replica column whose capacitance is the desired fraction of the main bitline. Therefore, its capacitance ratio to the main bitlines is set purely by the ratio of the geometric lengths (e.g. one tenth). The RC is hard wired to store a zero such that it will discharge the RBL once it is accessed. Because of its similarity with the actual memory cell (in terms of design and fabrication) the delay of RBL tracks the delay of real bitlines very well and can be made roughly equal. Figure ?? shows the schematic of the 6T replica cell. The timing for s en is generated as follows. At first, the RBL and the normal bitlines are precharged to VDD. Next, selected memory cells and RC are activated. RC draws the current from the RBL and normal bitlines are also discharged through the accessed cell. Discharged swing on RBL is inverted and then buffered to generate the signal to enable the sense amplifier. 9.7 Timing and Power Characterizer The section will provide an explanantion of the characterizer that will generete spice stimuli for the top-level SRAM and perform spice timing simulations to determine the memory setup&hold times, the write delay, and read delay. It will also provide a spice power estimate. 10 Unit Tests OpenRAM comes with a unit testing framework based on the Python unittest framework. Since OpenRAM is technology independent, these unit tests can be run in any technology to verify that the technology is properly ported. By default, FreePDK45 is supported. The unit tests consist of the following tests that test each module/sub-block of OpenRAM: • 00_code_format_check__test.py - Checks the format of the codes. returns error if finds T AB in codes. 44 Figure 29: Replica Bitline Schematic • 01_library_drc_test.py - DRC of library cells in technology gds_lib • 02_library_lvs_test.py - LVS of library cells in technology gds_lib and sp_lib • 03_contact_test.py - Test contacts/vias of different layers • 03_path_test.py - Test different types of paths based off of the wire module • 03_ptx_test.py - Test various sizes/fingers of PMOS and NMOS parameterized transistors • 03_wire_test.py - Test different types of wires with different layers • 04_pinv_test.py - Test various sizes of parameterized inverter • 04_nand_2_test.py - Test various sizes of parameterized nand2 45 Figure 30: Replica Bitline Schematic • 04_nand_3_test.py - Test various sizes of parameterized nand3 • 04_nor_2_test.py - Test various sizes of parameterized nor2 • 04_wordline_driver_test.py - Test a wordline driver array. • 05_array_test.py - Test a small bit-cell array • 06_nand_decoder_test.py - Test a dynamic NAND address decoder • 06_hierarchical_decoder_test.py - Test a dynamic hierarchical address decoder • 07_tree_column_mux_test.py - Test a small tree column mux. • 07_single_level_column_mux_test.py - Test a small single level column mux. • 08_precharge_test.py - Test a dynamically generated precharge array • 09_sense_amp_test.py - Test a sense amplifier array • 10_write_driver_test.py - Test a write driver array • 11_ms_flop_array_test.py - Test a MS FF array • 13_control_logic_test.py - Test the control logic module • 14_delay_chain_test.py - Test a delay chain array • 15_tri_gate_array_test.py - Test a tri-gate array • 16_replica_bitline_test.py - Test a replica bitline • 19_bank_test.py - Test a bank • 20_sram_test.py - Test a complete small SRAM 46 • 21_timing_sram_test.py - Test timing of SRAM • 22_sram_func_test.py - Test functionality of SRAM Each unit test instantiates a small component and performs DRC/LVS. Automatic DRC/LVS inside OpenRAM is disabled so that Python unittest assertions can be used to track failures, errors, and successful tests as follows: self.assertFalse(calibre.run_drc(a.cell_name,tempgds)) self.assertFalse(calibre.run_lvs(a.cell_name,tempgds,tempspice)) Each of these assertions will trigger a test failure. If there are problems with interpreting modified code due to syntax errors, the unit test framework will not capture this and it will result in an Error. 10.1 Usage A regression script is provided to check all of the unit tests by running: python tests/regress.py from the compiler directory located at: ”OpenRAM/trunk/compiler/”. Each individual test can be run by running: python tests/{unit-test file} e.g. python tests/05_array_test.py from the compiler directory located at: ”openram/trunk/compiler/”. As an example, the unit tests all complete and provide the following output except for the final 20_sram_test which has 2 DRC violations: [trunk/compiler]$ python tests/regress.py runTest (01_library_drc_test.library_drc_test) ... ok runTest (02_library_lvs_test.library_lvs_test) ... ok runTest (03_contact_test.contact_test) ... ok runTest (03_path_test.path_test) ... ok runTest (03_ptx_test.ptx_test) ... ok runTest (03_wire_test.wire_test) ... ok runTest (04_pinv_test.pinv_test) ... ok runTest (04_nand_2_test.nand_2_test) ... ok runTest (04_nand_3_test.nand_3_test) ... ok runTest (04_nor_2_test.nor_2_test) ... ok runTest (04_wordline_driver_test.wordline_driver_test) ... ok runTest (05_array_test.array_test) ... ok runTest (06_hierdecoder_test.hierdecoder_test) ... ok runTest (07_single_level_column_mux_test.single_level_column_mux_test) ... ok runTest (08_precharge_test.precharge_test) ... ok 47 runTest runTest runTest runTest runTest runTest runTest runTest (09_sense_amp_test.sense_amp_test) ... ok (10_write_driver_test.write_driver_test) ... ok (11_ms_flop_array_test.ms_flop_test) ... ok (13_control_logic_test.control_logic_test) ... ok (14_delay_chain_test.delay_chain_test) ... ok (15_tri_gate_array_test.tri_gate_array_test) ... ok (19_bank_test.bank_test) ... ok (20_sram_test.sram_test) ... ok If there are any DRC/LVS violations during the test, all the summary,output,and error files will be generated in the technology directory’s ”openram temp” folder. One would view those files to determine the cause of the DRC/LVS violations. More information on the Python unittest framework is available at http://docs.python.org/2/library/unittest.html. 11 Debug Framework All output in OpenRAM should use the shared debug framework. This is still under development but is in a usable state. It is going to be replaced with the Python Logging framework which is quite simple. All of the debug framework is contained in debug.py and is based around the concept of a “debug level” which is a single global variable in this file. This level is, by default, 0 which will output normal minimal output. The general guidelines for debug output are: • 0 Normal output • 1 Verbose output • 2 Detailed output • 3+ Excessively detailed output The debug level can be adjusted on the command line when arguments are parsed using the “-v” flag. Adding more “-v” flags will increase the debug level as in the following examples: python tests/01_library_drc_test.py -vv python openram.py 4 16 -v -v which each put the program in debug level 2 (detailed output). Since every module may output a lot of information in the higher debug levels, the output format is standardized to allow easy searching via grep or other command-line tools. The standard output formatting is used through three interface functions: • debug.info(int, msg) 48 • debug.warning(msg) • debug.error(msg) The msg string in each case can be any string format including data or other useful debug information. The string should also contain information to make it human understandable. It should not just be a number! The warning and error messages are independent of debug levels while the info message will only print the message if the current debug level is above the parameter value. The output format of the debug info messages are: [ module ]: msg where module is the calling module name and msg is the string provided. This enables a grep command to get the relevant lines. The warning and error messages include the file name and line number of the warning/error. GDSMill OpenRAM uses gdsMill, a GDS library written by Michael Wieckowski at the University of Michigan. Michael gave us complete permission to use the code. Since then, we have made several bug and performance enhancements to gdsMill. In addition, gdsMill is no longer available on the web, so we distribute it along with OpenRAM. From: Michael Wieckowski Date: Thu, Oct 14, 2010 at 12:49 PM Subject: Re: GDS Mill To: Matthew Guthaus Hi Matt, Feel free to use / modify / distribute the code as you like. -Mike On Oct 14, 2010, at 3:07 PM, Matthew Guthaus wrote: > Hi Michael (& Dennis), > > A student and I were looking at your GDS tools, but > we noticed that there is no license. What is the license? > > Thanks, > > Matt 49
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 49 Page Mode : UseOutlines Author : Title : Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.16 Create Date : 2018:01:25 11:15:45-08:00 Modify Date : 2018:01:25 11:15:45-08:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) kpathsea version 6.2.1EXIF Metadata provided by EXIF.tools