Memorysystemsimulatormanual6 DRAMsim Manual
DRAMsimManual
DRAMsimManual
DRAMsimManual
DRAMsimManual
DRAMsimManual
User Manual:
Open the PDF directly: View PDF .
Page Count: 26
Download | ![]() |
Open PDF In Browser | View PDF |
UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL –1 University of Maryland Memory System Simulator Manual I. INTRODUCTION The simulated memory system described in this manual consists of a bus interface unit (BIU), one or more transaction driven memory controllers, and one or more command driven memory systems. This documentation has been prepare to familiarize the user to the terminology used in the design of the memory system, and provide a brief explanation of the basic assumptions of the simulated memory system as well as the simulation framework. Figure 1 shows the system topology of the simulated processor-memory system. Three distinct and separate Processor Core processor Part A: Searching on-chip for data (CPU clocking domain) [A1] L2 cache [A3] [A2] DTLB [A4] L1 cache BIU (Bus Interface Unit) [B1] [B2] memory controller Part B: Going off-chip for data (DRAM clocking domain) I/O to memory traffic Fetch Decode virtual to physical [A2] L1 D-Cache address translation access. If miss (DTLB access) [A1] then proceed to physical to memory addr mapping [B8] read data buffer [B3] memory request [B4] scheduling Exec Mem [A3] L2 Cache access. If miss then send to BIU [B7] [B5] WB DRAM System [B6] DRAM core Stages of instruction execution Bus Interface Unit (BIU) obtains data from main memory [A4 + B] Proceeding through the memory hierarchy in a modern processor [B1] BIU arbitrates [B2] request [B3]physical addr. [B4] memory [B5] memory [B6, B7] DRAM dev. [B8] system controller returns for ownership of sent to system to memory addr. request addr. Setup obtains data and address bus ** controller scheduling** (RAS/CAS) returns to controller data to CPU translation. ** Steps not required for some processor/system controllers. protocol specific Fig. 1: Abstract Illustration of a Load Instruction in a Processor-Memory System entities that interact in the life of a memory transaction request: are assumed in this framework: processor(s), memory controller(s), and DRAM memory system(s). Each of these three distinct and separate entities is assumed to be an independently clocked synchronous state machine that operates in separate clocking domains. In the current implementation of the simulation framework, there are only two clocking domains: the CPU clock domain and the DRAM memory system clock domain. {FB-DIMM memory systems excepted} The simulation framework assumes UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL –2 that the DRAM memory system as well as the memory controller operate in the DRAM memory system clock domain, and the CPU operates in the CPU clock domain. This assumption holds true for legacy systems with separate memory controllers, while newer systems where the memory controllers is integrated into the CPU core the assumption may be reversed. In such a system, the memory controller is assumed to operate in the same clocking domain as the CPU. A more generalized model would operate the three separate entities as three independent clock domains, then the frequency of each clock domain may be set separately, and the model may be altered as necessary. However, at this time we believe that such an implementation would be unnecessarily complex, and decreases simulation speed for minimal increase in the system simulation model flexibility and accuracy. II. BUS INTERFACE UNIT The memory system’s basic assumptions about the processor is illustrated in figure 2. In essence, the memory cpu-fe cpu-exec cpu-commit BIU: bus interface unit Fig. 2a: Bus Interface of Simulated CPU status rid start_time address access_type Valid Invalid Valid Valid Invalid Invalid 0 -1 -1 0 -1 -1 54 0xXXXX I Fetch 14 36 0xXXXX 0xXXXX D Write D Read Fig. 2b: Bus Interface Unit Data Structure system assume an out of order execution core where different portions of the processor can all generate memory requests. The simulator assume that each request is tagged with a request id (rid), so that when the memory callback function is invoked, the callback function would be able to uniquely identify the functional unit that had generated the request and also identify the specific pending operation by the request id. The simulator assumes that each functional unit can sustain more than one memory transaction miss at a given instance in time, and the memory transaction may be returned out of order by the memory system. We assume that the life of a memory transaction request begins when a requesting functional unit generates a DRAM memory request. The requesting unit begins this process by attempting to place the request into a slot in the bus interface unit (BIU)1. In the simulation framework, the BIU is a data structure with multiple entires/slots, and the entires/slots in the BIU do not have any assumed ordering. If there is a free slot available, then the request will be successfully placed into the bus interface unit, and the status MEM_UNKNOWN will be returned to the requesting functional unit, and the memory system will return the latency of the request at a later time. If all of the slots have been filled, and no free slot is available, then MEM_RETRY will be returned to the requesting functional unit, and the functional unit must retry the request at a later time to see if a memory slot has become available at the later time. 1. The BIU has the functional equiavalence to MSHR’s in this simulator UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL III. –3 SYSTEM CONTROLLER In figure 3, we show a generalized system controller that supports multiple processors. The simulation of the cpu 0 cpu 1 cpu 2 BIU transaction queue system controller/ memory controller (northbridge) cpu 3 Memory Controller Command Sequence DRAM System Memory Controller DRAM System Memory Controller DRAM System Memory Controller DRAM System Fig. 3: Transaction Queue and Memory Controller(s) System Architecture system controller begins with the selection of a memory transaction from the BIU to the transaction queue. The transaction queue then takes the memory transaction and maps the physical address of the transaction to the memory address in terms of channel ID, rank ID, bank ID, row ID and column ID via an address mapping scheme. Then, depending on the row-buffer management policy used by the system, a sequence of DRAM commands are generated for each memory transaction. The simulated memory system supports multiple memory controllers, each of which can independantly control a logical channel of memory. Each logical channel may contain multiple physical channels of memory. As an example, each Alpha EV7 processor contains 2 logical channels of memory, each channel of memory consists of 4 phyiscal channels of Direct RDRAM. The way to simulate such a system in our simulator is to specify the channel count as “2”, and the channel width as “8” (unit is in bytes). A. THE TRANSACTION QUEUE AND THE TRANSACTION ORDERING POLICY In the memory system simulator, the simulation begins when the system controller goes to the BIU to select a request for processing. After the appropriate BIU entry(slot) has been selected, the status of the BIU entry is marked as SCHEDULED, then a memory transaction is created in the memory transaction queue. Unlike the BIU, the memory transaction queue is nominally implemented as an in-order queue, where DRAM commands of an earlier memory transaction are given higher priority than DRAM commands from later transactions. The selection of the memory request from the BIU into the transaction queue is referred to as the transaction ordering policy. Since the transaction queue is an in-order queue, the transaction ordering policy that selects which request is to be serviced is of great importance to determine the bandwidth and latency characteristics of DRAM memory systems. In this simulation framework, four transation ordering policies are supported: First Come First Serve (FCFS), Read or Instruction Fetch First (RIFF), Bank Round Robin (BRR), and Command Pair Rank Hopping (CPRH). UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL B. –4 ROW BUFFER MANAGEMENT POLICY Modern memory controllers typically deploy one of two policies to manage operations of the sense amplifiers. Since a DRAM access is essentially a two step process, in cases where the memory access sequence has a high degree of spatial locality, it would be favorable to direct the memory access sequences to the same row of memory. The Open Page row buffer management policy is designed to favor memory accesses to the same row of memory by keeping sense amplifiers open and holding an entire row of data for ready access. In contrast, the Close Page row buffer management policy is designed to favor random accesses to different rows of memory. Different row buffer management policies exist, including dynamic row buffer management policies that use timers to keep pages open for a limited period of time before closing. However, dynamic row buffer management policies and other alternative row buffer management policies are typically derivatives of either the close page or open page policies. For the sake of simplicity, the discussion and examination in this text is limited to the open page and closed page policies. In the Open Page row buffer management policy, the primary assumption is that once a row of data is brought to the array of sense amplifiers, different parts of the same row may be accessed again in the near future. Under this assumption, after a column access is performed, the sense amplifiers are kept active and the entire row of data is buffered to await another memory access to the same row. In the case another access is made to the same row, that memory access could occur with the minimal latency of tCAS, since the row is already active in the sense amplifiers. However, in the case that the access is to a different row of the same bank, the memory controller would have to first precharge the DRAM array, perform another row access, then perform the column access. The minimal latency to access data in this case is tRP + tRCD + tCAS. In the Close Page row buffer management policy, the primary assumption is that there is limited spatial locality in the memory access pattern, and as soon as a data has been obtained via a column access, the DRAM array and sense amplifiers are precharged in preparation for another memory access to a different row of memory. C. ADDRESS MAPPING In a memory system, before data can be read from or written to a memory location, the physical address given by the CPU has to be translated into memory addresses in the form of channel ID, rank ID, bank ID, row ID, and column ID. In a memory system that implements the open page row buffer management policy, the role of the address mapping scheme is to optimize the temporal and spatial locality of the address request stream and direct memory accesses to an open DRAM row (bank) and minimize DRAM bank conflicts. However, in a memory system that implements the closepage row-buffer management policy, the goal of the address mapping scheme is to minimize temporal and spatial locality to any given bank and instead distribute memory accesses throughout different banks in the memory system. In this manner, the DRAM memory system can avoid memory accesses to the same bank of memory and instead focus on transaction and DRAM command ordering algorithms that rotates through all available DRAM banks to achieve maximum DRAM bandwidth. Address mapping scheme depends not only on the row buffer management policy, but also the configuration of the DRAM memory system as well as the expandability/non-expandability of the memory system. For example, UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL –5 depending on design, the channel ID or rank ID can be mapped to the low order address bit to obtain the most bank parallelism, but in memory systems that allow end users to flexibly configure the memory system by adding more ranks or changing channel configurations, the channel ID and rank ID’s are typically mapped to the high order address bits. .Figure 4 demonstrates the device configuration of a specific 256 Mbit SDRAM device. Figure 4 shows that a Device config 64 Meg x 4 32 Meg x 8 16 Meg x 16 Configuration 16 M x 4 x 4 bks 8 M x 8 x 4 bks 4 M x 16 x 4 bks row addressing 8K (A0 - A12) 8K (A0 - A12) 8K (A0 - A12) bank addressing 4 (BA0, BA1) 4 (BA0, BA1) 4 (BA0, BA1) col addressing 2K(A0-A9,A11) 1K (A0-A9) 512 (A0- A8) 8 of the x8 devices 4 of the x16 devices form 64 bit wide form 64 bit wide data bus data bus “DRAM page size” differs with different configurations. Fig. 4: Different Configurations of a 256 Mbit DRAM device 256 Mbit SDRAM device may be shipped in one of three different configurations: 64 Mbit x 4, 32 Mbit x 8, and 16 Mbit x 16. Basically, the same 256 Mbit device could contain 64 million uniquely addressable locations with each location being 4 bit wide, 32 million uniquely addressable 8 bit wide locations, or 16 million uniquely addressable 16 bit wide locations. However, many modern memory systems use 64 bit wide data busses, so in this case, multiple DRAM devices are then combined to form the 64 bit wide data bus. The effect of these different configurations is that there are different numbers of columns per “DRAM page” for each configuration. Due to the variable sizes and configuration of the memory devices used in memory systems, address mapping differs with each configuration. One difficulty related to the precise definition of an address mapping scheme is that in a memory system with differently configured memory modules, the mapping scheme must differ from module to module. In figure 5, one memory address mapping scheme with 256 Mbit SDRAM devices and uniform configuration of the memory system is assumed. In this figure there are 4 ranks of memory modules, each module formed from 16 bit wide 256 Mbit SDRAM devices. In this configuration, the 16 bit wide Mbit devices use 9 bits for column addressing, 2 bits to Device config 16 Meg x 16 Configuration 4 M x 16 x 4 bks row addressing 8K (A0 - A12) bank addressing 4 (BA0, BA1) col addressing 512 (A0- A8) 32 bit physical address (byte addressable) no rank memory ID 32 1413 12 11 31 29 28 27 26 row ID bank ID column ID 0 not used Fig. 5: Open Page Address Mapping Scheme of a 512 MB system with 256 Mbit DRAM devices UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL –6 address 4 different banks, 13 bits to address 8192 rows, and 2 bits to address 4 ranks of memory. Altogether, 29 bits of physical address is used here to address 512 Megabyte of memory. The address mapping policy illustrated in figure 5 is optimized for an open page memory system, since the column ID’s are mapped to the lowest order bits, and multiple accesses to the same array of memory would most likely be mapped to different columns within the saw row and same bank of DRAM devices. Alternative memory address mapping schemes may achieve a higher degrees of performance depending on the configuration and row buffer management policy. Finally, the memory addressing scheme presented in figure 5 is specified for a single channel of memory. Multi-channel memory require an address mapping policy that can adequately distribute the memory accesses to different channels. D. BASIC TIMING PARAMETERS In any DRAM memory-access protocol, a set of timing parameters is used to characterize various command durations and latencies. Although the exacting desciption of a full and complete protocol requires the use of tens of different timing parameters, a generic protocol can be well described with a subset of the timing parameters. The timing parameters used in the simulation framework are summarized in table 1 Parameter Description Illust. tBurst Data Burst duration. Time period that data burst occupies on the data bus. Typically 4 or 8 beats of data. In DDR SDRAM, 4 beats of data occupies 2 full cycles. Also known as tBL. figure 7 tCAS Column Access Strobe latency. Time interval between column access command and data return by DRAM device(s). Also known as tCL. figure 7 tCMD Command transport duration. Time period that a command occupies on the command bus as it is transported from the DRAM controller to the DRAM devices. figure 6 tCWD Column Write Delay. Time interval between issuance of column write command and placement of data on data bus by the DRAM controller. figure 7 tDQS Data Strobe turnaround. Used in DDR and DDR2 SDRAM memory systems. Not used in SDRAM or Direct RDRAM memory systems. 1 full cycle in DDR SDRAM systems. figure 12 tFAW Four bank Activation Window. A rolling time frame in which a maximum of four bank activation may be engaged. Limits peak current profile. figure 16 tRAS Row Access Strobe. Time interval between row access command and data restoration in DRAM array. After tRAS, DRAM bank could be precharged. figure 6 tRC Row Cycle. Time interval between accesses to different rows in same bank tRC = tRAS + tRP figure 7 tRCD Row to Column command Delay. Time interval between row access command and data ready at sense amplifiers. figure 6 tRFC Refresh Cycle. Time between refresh commands or refresh command and row activation. figure 10 tRRD Row activation to Row activation Delay. Minimum time interval between two row activation commands to same DRAM device. Limits peak current profile. figure 15 tRP Row Precharge. Time interval that it takes for a DRAM array to be precharged and readied for another row access. figure 9 tWR Write Recovery time. Minimum time interval between end of write data burst and the start of a precharge command. Allows sense amplifiers to restore data to cells figure 8 Table 1: Summary of DRAM Timing Parameters UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL E. –7 BASIC DRAM COMMANDS In this section, five basic DRAM commands are described: row access command, column read command, column write command, precharge command, and the refresh command. The descriptions of the basic commands form the foundation of accurate and flexible simulation of DRAM memory systems. ROW ACCESS COMMAND In DRAM memory systems, data must be retrieved from DRAM cells and resolved into digital values by an array of sense amplifiers before it can be accessed by the DRAM controller. The array of sense amplifiers is also known as the row buffer because a row access command moves an entire row of data from the DRAM cells into the array of sense amplifiers. DRAM memory systems accomplish this movement of data by issuing a row access command, also known as a row activation command. Figure 6 illustrates the progression of a generic row access command. A row access tCMD cmd & addr bus bank “i” utilization rank “m” utilization data bus tRCD tRAS row access data sense data restored to DRAM cells 2 1 address and command bus decode addr & cmd 3 Generic DRAM device (one rank) To issue a RAS Command 1. Command Bus must be available @NOW 2. Bank must be Idle or a “set idle” event @ NOW + tCMD time sense amplifiers data bus RAS-command event-sequence 1. Command Transport: Use Cmd bus for tCMD cycles 2. Activation: for tRCD cycles 3. Restore: Row is accessible for tRAS - tRCD cycles. 4. Active: Row active until precharge command. Fig. 6: Row Access Command Illustrated command only moves data internally within a given bank, and it does not make use of I/O gating resources, nor does it make use of the data bus to move data between the DRAM controller and DRAM devices. Two timing parameters are associated with the row access command: tRCD and tRAS. One timing parameter associated with the row access command is the row column delay, labelled as tRCD in figure 6. The row column delay measures the time it takes for the row access command to move data from the DRAM cell arrays to the sense amplifiers. After tRCD, the sense amplifiers has completed the task of resolving the electronic charges stored in DRAM cells into digital values. At this point, one or more column access commands could retrieve data from the sense amplifiers and move it through the data bus to the memory controller or store data from the memory controller into the sense amplifiers. However, the act of reading data discharges the DRAM storage cells and the data needs to be restored from the sense amplifiers back into the DRAM storage cells. The row access strobe latency, tRAS, describes the time duration between activation of a row and restoration of data into DRAM cell arrays. Essentially, a precharge command to prepare the sense amplifiers for another row access command cannot be issued until minimally tRAS after the previous row access command. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL –8 COLUMN READ COMMAND In DRAM memory systems, once data has been moved into the array of sense amplifiers, it can then be accessed by the DRAM controller through one or more column read commands or column write commands. The purpose of a column read command is to move a subset of the row of data from the array of sense amplifiers through the shared data bus back to the memory controller. There are two timing parameters associated with a column read command, tCAS and tBurst. The time it takes for the column read command to be issued and for the DRAM device to place the requested data onto the data bus is known as the column access strobe1 latency, tCAS2. After tCAS, the requested data is moved onto the data bus then into the memory controller. Modern DRAM memory systems move data in relatively short bursts, usually occupying 4 or 8 beats on the data bus. The duration of the data burst is simply tBurst3. Figure 7 illustrates the tCAS cmd & addr bus bank “i” utilization rank “m” utilization data bus tBurst cmd row x open I/O gating time data burst 1 2 address and command bus 3 decode addr & cmd data bus Generic DRAM device (one rank) To issue a CAS Read Command 1. Command Bus must be available @NOW 2. Bank must be Restore or Active @ NOW + tCMD 3. Data Bus must be free @ NOW + tCAS CAS Read-command event-sequence 1. Command Transport: Use Cmd bus for tCMD cycles 2. CAS latency : for tCAS - tCMD cycles 3. Burst: Move data for tBurst cycles. Fig. 7: Column Read Command progression of a column read command and shows that the column read command goes through three basic phases. In phase one, the command is transported on the address and command busses then latched and decoded by the DRAM devices. In phase two, the appropriate columns of data is retrieved from the sense amplifier array of the selected bank and moved through the I/O gating structures and readied for transport across the data bus. In phase three, the data flows through the I/O gating and out to the data bus, occupying the data bus for the time duration of tBurst. One basic assumption of the column read command illustrated in figure 7 is that before the I/O gating phase of the command can proceed, the accessed DRAM bank must be open to the selected row, labelled as row x in figure 7. That is, tRCD time must have passed since the row access command was issued to the selected row x before the column read command can be issued. 1. 2. 3. The column access strobe signal also no longer exists in modern DRAM systems, but the terminology remains. Sometimes referred to as tCL, or Cas Latency. Sometimes referred to as tBL. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL –9 COLUMN WRITE COMMAND In DRAM memory systems, once a row of data has been moved to the sense amplifiers, write commands can be issued to overwrite data in the array of sense amplifiers. The array of sense amplifiers then seamlessly restores the new data values back into the DRAM cells1. From the perspective of the memory access protocol, the column write command goes through a similar same set of operations as the column read command. However, the primary difference between a column read command and a column write command is that the direction of data movement is opposite to each other, and the column write command has one additional phase that accounts for the time that the column write command overwrites data from the sense amplifiers into the DRAM cells. Moreover, unlike the timing of the column read command, the timing of the command transport phase with respect to the data transport phase in a column write command is defined differently for different DRAM memory systems. Figure 8 illustrates the progression of a column tCWD tBurst tWR SDRAM cmd data burst cmd & addr bus bank “i” utilization rank “m” utilization data bus cmd tCWD = 0 data restore I/O gating time data burst DDR SDRAM cmd 1 4 3 2 data burst addr & cmd address and command bus decode tCWD = 1 data bus Generic DRAM device (one rank) To issue a CAS Write Command 1. Command Bus must be available @NOW 2. Bank must be Restore or Active @ NOW + tCWD 3. Data Bus must be free @ NOW + tCWD cmd DDR2 SDRAM data burst tCWD = tCAS - 1 CAS Write-command event-sequence 1. Command Transport: Use Cmd bus for tCMD cycles 2. CWD latency: for tCWD cycles 3. Burst: Move data for tBurst cycles. 4. Write Recovery: Tie up entire rank for tWR cycles. (Assuming no Write buffers or Write caches) Fig. 8: Column Write Command in SDRAM, DDR SDRAM and DDR2 SDRAM write command through four phases. In figure 8, phase one shows that the column address and column write command is placed on the address and command bus. In phase two, the data is placed on the data bus by the memory controller. In phase three the data flows through the I/O gating structures to the array of sense amplifiers. Finally, in phase four, the sense amplifiers in the selected bank overwrites data in the DRAM cells with the newly received data. One timing parameters associated with a column write command is tCWD, column write delay2. Column write delay defines the time it takes for the column write command to be issued and data placed onto the data bus by the DRAM controller. Figure 8 shows that in SDRAM memory system, the command, address and data is placed on the respective busses in the same cycle. In this manner, tCWD is zero cycles in duration. In DDR SDRAM memory system, data for the write command is placed on the data bus one full cycle after the command and address is placed on the command and 1. 2. Some DRAM devices with write buffers operate in a slightly different manner. The analysis here assumes no write buffer. Sometimes referred to as command write delay. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 10 address busses, so tCWD is defined as one full cycle. Finally, in DDR2 SDRAM memory system, the write delay is one full cycle less than tCAS. The definition of write delay to match the read latency simplifies DRAM command scheduling in a DDR2 SDRAM memory system1. Figure 8 also illustrates tWR, the write recovery time. The write recovery time denotes the time between the end of the data burst and the completion of the movement of data into the DRAM arrays. The movement of data into the DRAM arrays in this time period means that in case of a bank conflict with the next DRAM request, the precharge command to prepare the array of sense amplifiers for another row access cannot begin until the write recovery time for the current write command has been satisfied. PRECHARGE COMMAND DRAM device data access is a two step process. A row access command moves an entire row of data from the DRAM cells to the array of sense amplifiers. The data remains in the array of sense amplifiers for one or more column access commands to move data to and from the DRAM devices to the DRAM controller. In this framework, a precharge command completes the sequence as it resets the array of sense amplifiers and the bitlines to a preset voltage, and prepares the sense amplifiers for another row access command. Figure 9 illustrates the progression of a precharge tRC tRP tRAS prec bank precharge Row access to same bank (previous command) 1 address and command bus Generic DRAM device To issue a PREC Command 1. Command Bus must be available @NOW 2. Bank must be Active @ NOW + tCMD time 2 decode cmd & addr bus bank “i” utilization rank “m” utilization data bus sense amplifiers data bus (one rank) PREC-command event-sequence 1. Command Transport: Use Cmd bus for tCMD cycles 2. Precharge: for tRP cycles 3. Upon completion: Bank status set to idle. Fig. 9: Row Precharge Command Illustrated command. Figure 9 shows that in the first phase, the precharge command is sent to the DRAM device, and in phase two, the array of sense amplifiers in the selected bank is precharged to the preset voltage. 1. One more difference between a read command and a write command is that the data are offset in different clock phases in SDRAM, DDR SDRAM and DDR2 SDRAM memory systems. However, the difference in clock phases may be masked by the use of the tDQS timing parameter to illustrate the overhead in data bus turn around time. One issue that result from this difference is that column write delay defined as equal to column access latency provides no benefit to the scheduling algorithms, because the overhead of tDQS would have to be inserted in the protocol to denote the data bus turnaround time. As a result, the difference in the relative clock phases of read and write data bursts may be abstracted out, and it has no impact in the description of the abstract DRAM access protocol. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 11 The timing parameter associated with the precharge command is the row precharge duration, tRP. The row precharge duration describes the length of time the DRAM devices utilizes to precharge the bitlines and the sense amplifiers. Figure 9 shows that a precharge command cannot be issued to the DRAM device until minimally tRAS time period after the previous row access command to the same bank. Collectively, the sum of tRAS and tRP forms tRC, the row cycle time. The row cycle time of a given DRAM device measures the speed at which a DRAM device could bring data from the DRAM cell arrays into the sense amplifiers, restore the data values back into the DRAM cells, then precharge the bitlines and sense amplifiers back to the reference voltage level and made ready for another row access command. The row cycle time is the fundamental limitation to the speed at which data may be retrieved from different rows of a given DRAM bank. REFRESH COMMAND AND TIMING In DRAM devices, data is stored in the form of electrical charges in DRAM cells. The DRAM cells are composed of a storage capacitor and an access transistor. With the passage of time, the electrical charges stored in the capacitor gradually leaks through the access transistor. A low level of charge leakage is acceptable as long as the remaining electrical charge will still resolve to the correct digital values. However, without intervention, electrical charge leakage eventually leads to a state where the stored digital values can no longer be correctly resolved by the sense amplifiers. As a result, data held in DRAM cells must to be periodically read out to the sense amplifiers and restored with full electrical charge levels back into DRAM cells. As long as DRAM cells are periodically refreshed before the levels of electrical charges deteriorate to indistinguishable values, DRAM refresh cycles can be used to overcome leaky DRAM cells and ensure data integrity. The drawback to any refresh mechanism is that refresh commands constitute an overhead in terms of utilizable bandwidth and additional power consumption by DRAM devices. There are multitudes of DRAM refresh strategies designed to minimize peak power consumption or maximize available device bandwidth. Figure 10 illustrates a basic refresh command that allows the DRAM controller to send a tRFC tRC time cmd & addr bus all bank utilization tRAS tRP refresh all banks precharge Row access to all banks 1 decode address and command bus 2 additional time needed by DRAM device to recover from current spike (all bank refresh) sense amplifiers data bus Generic DRAM device (one rank) To issue a REFRESH Command 1. Command Bus must be available @NOW 2. Bank must be Idle @ NOW + tCMD REFRESH-command event-sequence 1. Command Transport: Use Cmd bus for tCMD cycles 2. Activation: for tRCD cycles 3. Restore: Row is accessible for tRAS - tRCD cycles. 4. Active: Row active until precharge command. Fig. 10: Refresh Command Illustrated UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 12 single refresh command to a DRAM device, and the device takes the address of the row to be refreshed from an internal register, sends the same row address to all banks in the device concurrently, each bank then brings a row of data into the sense amplifiers, resolves the stored electrical charges to full voltage levels, restores the data back into DRAM cells, and precharges the DRAM array to ready it for another row access. This single, basic refresh command to all banks takes one row refresh cycle time, tRFC, to complete. The reason that tRFC is longer than tRC is due to the fact that the bankconcurent refresh command draws a lot of current, and it takes longer than tRC for the DRAM device to recover from the current spike. In many modern DRAM memory systems, the memory controller would inject one row refresh command per row in a bank every 32 or 64 milliseconds. Depending on the design and refresh policy, refresh commands could be issued consecutively or opportunistically, one at a time. F. DRAM COMMAND SCHEDULING IN MULTI-RANK SYSTEM Figure 11, illustrates the topology of a DDRx SDRAM memory system. DDrx SDRAM memory systems use Rank 1 Rank 2 Rank 3 DDR SDRAM Controller DRAM Chips similar topology as SDRAM Addr & Cmd Data Bus DQS (Data Strobe) Chip (DIMM) Select Fig. 11: DDRx SDRAM Memory System Topology source synchronous data reference strobe signals to ensure proper timing on the data bus. However, the use of the source synchronous data strobe signal creates problems in scheduling column access commands between different rank in DDRx SDRAM memory systems. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 13 Figure 12 illustrates the timing and command sequence of two consecutive read commands to different ranks of tBurst + tDQS cmd&addr bank “i” of rank “m” bank “j” of rank “n” rank “m” utilization rank “n” utilization data bus read 0 read 1 bank i open bank j open time I/O gating I/O gating sync data burst data burst n != m tBurst tCAS addr & cmd tDQS decode Rank m decode Rank n tBurst data bus Fig. 12: Consecutive Read Command to Different Ranks DRAM devices in a memory system that uses a data strobe to synchronize timing on the data bus. In figure 12, the rank switching penalty is labelled as tDQS, the read-write data strobe re-synchronization time. For relatively low frequency SDRAM memory system, data synchronization strobes are not used and tDQS is zero. For Direct RDRAM memory system, the use of the topology matched source synchronous clocking scheme obviates the need separate strobe signals, and tDQS is also zero. However, for DDRx SDRAM memory systems, the use of the data strobe signal means that the tDQS data strobe re-synchronization penalty for read bursts between different ranks requires at least one full clock cycle. G. SCHEDULING FOR SERIAL MEMORY PROTOCOLS The new DRAM standard is the Fully Buffered-DIMM. The specifications are currently being hammered out by the DRAM standards committee - JEDEC. Based on initial presentations made at Intel Developer Forum and MemCon we built a picture of the protocol. The simulator supports this model for the FB-DIMM. In the recent years, memory system designers have moved towards wider and faster buses with lower supportable DIMM capacity. Due to this limitation of high-speed multi-drop buses, the proposal is to replace the wide bus with a serial interface and to add onto the existing DIMM a logic block known as the Advanced Memory Buffer (AMB). The memory controller sends its requests via the high-speed serial link to the AMB. This buffer performs the necessary serial-parallel conversion of the bundles to standard DRAM commands/data and passes this onto the DRAM via a standard DRAM interface. The protocol replaces the bi-directional link with two uni-directional links - one to the DIMMs ( southbound link) and the other back to memory contorller (northbound link). All serial links are point-to-point i.e. data is sent to an AMB, which forwards to the next AMB and so on. The transmission time for a bundle/frame on the northbound link is the same as the southbound link. The base configuration has a 14 bit-wide northbound link and a 10 bit-wide southbound link. The bundles on the northbound link are larger and communicate only data. Bundles on the southbound link are smaller and are a combination of command and data. A southbound frame can comprise of either 3 commands, 1 command-2 data slots or 3 data slots. Each slot is equivalent to roughly 24 bits of information. Additional parameters required are specified in the table. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 14 Northbound Link Memory Controller Southbound Link Figure 13: Simplified representation of a FB-DIMM system. The FB-DIMM system depicted above has two channels which are only three-deep . A fully populated system is expected to go upto 6 channels each 8 deep. H. Parameter Description tBUNDLE Transmission time for a single frame/bundle. tBUS Additional latency to account for delays due to bus propagation time and AMB receive-send overheads. tAMB_UP Time required for AMB to perform serial-parallel conversion of packet information and then activate the DRAM. tAMB_DOWN Overhead associated with converting the parallel data bursts received from the DRAM to its bundle format. ADDITIONAL CONSTRAINTS: POWER Numerous constraints exist in modern DRAM memory systems that limits bandwidth utilization of the DRAM device. One such constraint is related to the power consumption of DRAM devices. With continuing emphasis placed on memory system performance, DRAM manufacturers are expected to push for ever higher data transfer rates in each successive generation of DRAM devices. However, just as increasing operating frequencies lead to higher activity rates and higher power consumption in modern processors, increasing data rates for DRAM devices also increase the potential for higher activity rates and higher power consumptions on DRAM devices. One solution deployed to limited the power consumption of DRAM devices is to constrain the activity rate of DRAM devices. However, constraints on the activity rate of DRAM devices in turn limit the capability of DRAM devices to move data, and further limits the performance capability of DRAM memory systems. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 15 In modern DRAM devices, each time a row is activated, thousands of bits are discharged, sensed, then recharged in current draw in abstract units parallel. As a result, the row activation command is a relatively energy intensive operation. Figure 14 shows the current cmd&addr row act read 0 prec rank “m” internal cmd bank “i” util. data sense row x open - data restore I/O gating rank “m” util. data bus data burst Quiescent current draw of active device bank i precharge current draw profile due to device activity time Fig. 14: Current Profile of a DRAM Read Cycle profile of a DRAM read cycle. Figure 14 shows that an active DRAM device draws a relatively constant current level. The DRAM device then draws additional current for each activity on the DRAM device. The total current draw of the DRAM device is simply the summation of the quiescent current draw and the current draw of each activity on the DRAM device. I. POWER MODEL The dram simulator is incorporated with a power model for DDR and DDR2 SDRAM. Basically, to calculate the power is to calculate the average power in one activation-to-activation cycle. That is we calculate the power in each DRAM state and then multiply it with the fraction of time the device spends in each state in one activation-to-activation cycle. For simplicity, we consider the power model for DDR SDRAM first, and then make some extensions to cover the DDR2 case. The power consumption in DDR SDRAM is calculated as follows: There are parameters extracted from an DDR SDRAM data sheet involved in the calculation. Table 2 shows the IDD values from a 128Mb DDR SDRAM data sheet and the description of each value. In order to calculate the power, two states are defined.When data is stored in any of the sense amplifiers, the DRAM is said to be in the “active state”. And after all banks of the DDR SDRAM has been restored to the memory array, it is said to be in the “precharge state”. Adddionally, CKE, the device clock enable signal, is considered. In order to send commands, read ,or write data to the DDR SDRAM, CKE must be HIGH.. If CKE is LOW, the DDR SDRAM clock and input buffers are turned off, and the device is in the power-down mode. Parameter/Condition Symbol -75/-75Z -8 Units OPERATING CURRENT: One bank; Active Precharge; tRC = tRC MIN; tCK = tCK MIN IDD0 105 100 mA PRECHARGE POWER-DOWN STANDBY CURRENT: All banks idle; Power-down mode; tCK = tCK MIN; CKE = LOW IDD2P 3 3 mA IDLE STANDBY CURRENT: CS_ = HIGH; All banks idle; tCK = tCK MIN; CKE = HIGH IDD2F 45 35 mA Table 2: a UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 16 Parameter/Condition Symbol -75/-75Z -8 Units ACTIVE POWER-DOWN STANDBY CURRENT: One bank; Power-down mode; tCK = tCK MIN; CKE = LOW IDD3P 18 18 mA ACTIVE STANDBY CURRENT: CS_ = HIGH; One bank; tCK = tCK MIN; CKE = HIGH IDD3N 45 35 mA OPERATING CURRENT: Burst = 2; READs; Continuous burst; One bank active tCK = tCK MIN; IOUT = 0mA IDD4R 110 90 mA OPERATING CURRENT: Burst = 2; WRITEs; Continuous burst; One bank active tCK = tCK MIN IDD4W 110 90 mA AUTO REFRESH CURRENT; tRC = 15.625ms IDD5 5 5 mA Table 2: a a. Data Sheet Assumptions 1. IDD is dependent on output loading and cycle rates. Specified values are obtained with minimum cycle time at CL = 2 for -75Z, -8 and CL = 2.5 for -75 with the outputs open. 2. 0°C ≤ T A ≤ 70°C 3. ( V DD Q ) ⁄ V DD = 2.5V − + 0.2V 4. CKE must be active (HIGH) during the entire time a REFRESH command is excuted. That is, from the time the AUTO REFRESH command is registered, CKE must be active at each rising clock edge, until tREF later. From the definition of active/precharge states and CKE above, a DRAM device can be in four states: 1. 2. 3. 4. Precharge Power-down Power: p ( PREpdn ) = IDD2P × VDD × BNKpre × CKEloPRE Precharge Standby Power: p ( PREstby ) = IDD2F × VDD × BNKpre × ( 1 – CKEloPRE ) Active Power-down Power: p ( ACTpdn ) = IDD3P × VDD × ( 1 – BNKpre ) × CKEloACT Active Standby Power: p ( ACTstby ) = IDD3N × VDD × ( 1 – BNKpre ) × ( 1 – CKEloACT ) where IDD values are defined in the data sheet and VDD is the maximum voltage supply of the device. BNKpre is the fraction of time the DRAM device is in precharge state (all banks of the DRAM are in precharge state) compared with the actual activation-to-activation cycle time. CKEloPRE is the fraction of time the DRAM stays in precharge state and CKE is low compared with the time it stays in precharge state. CKEloACT is the fraction of time the DRAM stays in active state and CKE is low compared with the time it stays in active state. In addition, when the DRAM device is in Active Standby state, commands can be sent to the device. Therefore, we have 4 more states in the Active Standby state. tRC - × VDD 1. Activate Power: p ( ACT ) = ( IDD0 – IDD3N ) × -----------tACT 2. Write Power: p ( WR ) = ( IDD4W – IDD3N ) × WRpercent × VDD 3. Read Power: p ( RD ) = ( IDD4R – IDD3N ) × RDpercent × VDD 4. Termination Power: p ( ( DQ ) = p ( perDQ ) × ( numDQ + numDQS ) × RDpercent ) where tRC is the shortest activation-to-activation cycle time as specified in the data sheet. tACT is the actual activation-to-activation cycle time in the real system. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 17 WRpercent is the fraction of time the data, to be written, stays on the data pins compared with the actual activationto-activation cycle time. RDpercent is the fraction of time the read data stays on the data pins compared with the actual activation-toactivationcycle time. p(perDQ) is the power of each DQ. It depends on the termination scheme. In this case, we use p(perDQ) = 6.88mW for DDR SDRAM. numDQ and numDQS are the number of DQ and DQS pins in the device, respectively. And, Refresh Power: p ( REF ) = ( IDD5 – IDD2P ) × VDD Notice that IDD3N is deducted out from the calculation since we already include it in the p(ACTstdby). Also, in the current version of the DRAM simulator, we simulate a refresh command as a row activate command with a precharge command. So, we ignore the refresh power at this point. Then we scale the voltage and frequency to the ones we actually operate on. As a result, we obtain: 2 useVDD P ( PREpdn ) = p ( PREpdn ) × -------------------------2maxVDD 2 useVDD P ( ACTpdn ) = p ( ACTpdn ) × -------------------------2maxVDD 2 usefreq useVDD P ( PREstby ) = p ( PREstby ) × ---------------------- × -------------------------2specfreq maxVDD 2 usefreq useVDD P ( ACTstby ) = p ( ACTstby ) × ---------------------- × -------------------------2specfreq maxVDD 2 useVDD P ( ACT ) = p ( ACT ) × -------------------------2maxVDD 2 usefreq useVDD P ( WR ) = p ( WR ) × ---------------------- × -------------------------2specfreq maxVDD 2 usefreq useVDD P ( RD ) = p ( RD ) × ---------------------- × -------------------------2specfreq maxVDD usefreq P ( DQ ) = p ( DQ ) × ---------------------specfreq 2 useVDD P ( REF ) = p ( REF ) × -------------------------2maxVDD Finally, sum everything up for the total power: P ( TOT ) = P ( PREpdn ) + P ( PREstby ) + P ( ACTpdn ) + P ( ACTstby ) + P ( ACT ) + P ( WR ) + P ( RD ) + P ( DQ ) + P ( REF ) In case of DDR2 SDRAM, most of the calculations remain the same except p(ACT) , p(REF), and the I/O and termination power. For DDR2 SDRAM, p(ACT) before the voltage/frequency scaling is: UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 18 IDD3N × tRAS + IDD2N × ( tRC – tRAS ) p ( ACT ) = IDD0 – -------------------------------------------------------------------------------------------------------- × VDD tRC Then we scale it the same as in the DDR SDRAM case. The refresh power p(REF) is : tRFCmin p ( REF ) = ( IDD5 – IDD3N ) × VDD × -----------------------tREFI In the power model of DDR2 SDRAM, the simulator supports two cases, 1) one-rank, and 2) four-rank case. For the one rank case, the termination powers are: WriteTermination Power: p ( termW ) = p ( dqW ) × ( numDQ + numDQS + 1 ) × WRpercent Read Termination Power: p ( DQ ) = p ( dqR ) × ( numDQ + numDQS ) × RDpercent Read Termination Power and Write Termination Power to other ranks are zero: p ( termRoth ) = p ( termWoth ) = 0 where p(dqW) = 8.2 mW and p(dqR) = 1.1 mW. In the case of four ranks, the read termination power and write termination power are the same with p(dqW) = 0 and p(dqR) = 1.5 mW. However, the termination power from other ranks are: p ( termRoth ) = p ( dqRDoth ) × ( numDQ + numDQS ) × termRDsch p ( termWoth ) = p ( dqWRoth ) × ( numDQ + numDQS + 1 ) × termWRsch where p(dqRDoth) is the termination power when terminating a read from another DRAM, and is equal to 13.1 mW. p(dqWRoth) is the termination power when terminating write dta to another DRAM, and is equal to 14.6 mW. termRDsch is the fraction of time that read terminated from another DRAM. termWRsch is the fraction of time that write terminated to another DRAM. Finally, we sum it all to obtain the total power of the DDR2 SDRAM: P ( TOT ) = P ( PREpdn ) + P ( PREstby ) + P ( ACTpdn ) + P ( ACTstby ) + P ( ACT ) + P ( WR ) + P ( RD ) + P ( DQ ) + P ( REF ) + p ( termW ) + p ( termWoth ) + ptermRot ( h ) Detailed informaiton for the calculations can be obtaioned at Micron’s website: http://www.micron.com/products/dram/syscalc.html J. TRRD: ROW (ACTIVATION) TO ROW (ACTIVATION) DELAY In DDR2 SDRAM devices, the timing parameter tRRD has been defined to specify the minimum time period between row activations on the same DRAM device. In the present context, the acronym RRD stands for row-to-row UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 19 activation delay. The timing parameter tRRD is specified in terms of nanoseconds, and figure 15 shows that by tRRD row act row act data sense data sense current draw abstract units cmd&addr bank “i” util. bank “j” util. time Fig. 15: Row to Row Activation Limited by tRRD specifying tRDD in terms of nanoseconds instead of number of cycles, a minimum spacing between row activation is maintained regardless of operating datarates. For memory systems that implement the close page row buffer management policy, tRRD effectively limits the maximum sustainable bandwidth of a memory system with a single rank of memory1. K. TFAW: FOUR BANK ACTIVATION WINDOW In DDR2 SDRAM devices, the timing parameter tFAW has been defined to specify a rolling time frame in which a maximum of four row activations on the same DRAM device may be engaged concurrently. The acronym FAW stands for Four bank Activation Window. Figure 16 shows a sequence of row activation requests to different banks on the same DDR2 SDRAM device that respects both tRRD as well as tFAW. Figure 16 shows that the row activation requests are spaced at least tRRD apart from each other, and that the fifth row activation to a different bank is deferred until at least tFAW time period has passed since the first row activation was initiated. For memory systems that implement the close page row buffer management system, tFAW places additional constraint on the maximum sustainable bandwidth of a memory system with a single rank of memory regardless of operating datarates. tFAW tRRD row act data sense tRRD row act row act row act data sense data sense data sense data sense current draw abstract units cmd&addr row act bank “i” util. bank “j” util. bank “k” util. bank “l” util. bank “m” util. tRRD time Fig. 16: Maximum of Four Row Activations in any tFAW time frame 1. In a memory system with 2 or more ranks of memory, consecutive row activation commands may be directed to different ranks. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL L. – 20 DRAM COMMAND CHAIN In a DRAM based memory system, each memory transaction is translated into one or more DRAM commands. In this simulation framework, this sequence of DRAM commands is referred to as the DRAM command chain. The difficulty associated with the translation process from a transaciton to a DRAM command chain is that the sequence of DRAM commands in the command chain depends on the row buffer management policy as well as on the state of the DRAM memory system. In an open page memory system, a memory transaction may be translated into: a single column access command if the row is already open, a precharge command, a row access command and a column access command if there is a bank conflict, or just a row access command and a column access command if the bank is currently idle. In a close page memory system, all of the memory transactions translate to a sequence of three DRAM commands that completes a read cycle. Figure 17 illustrates a read cycle in a close-page DDRx SDRAM memory system. tCMD clock command R bus 1 tRAS tRC C P 2 5 R 6 tRP data bus 4 tRCD tCAS data tBurst row column read activation precharge R C P Row Activation Command Column Read Command Precharge Command @ 1Gbps tCAS = 10 ns = 5 cycles = 10 beats tRCD = 10 ns = 5 cycles = 10 beats tRP = 10 ns = 5 cycles = 10 beats tRRD = 10 ns = 5 cycles = 10 beats tRAS = 40 ns = 20 cycles = 40 beats tburst = 8 ns = 4 cycles = 8 beats tFAW = 48 ns = 24 cycles = 48 beats tDQS = 4 ns = 2 cycles = 4 beats Fig. 17: A Complete “Read Cycle” in DDRx SDRAM Memory System (@ 1 Gbit) UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 21 DRAM User’s Guide: DRAM Related Options This section has been included to provide a summary of available options that is related to the DRAM system simulator. -biu:transaction_ordering_policy ?transaction_ordering_policy ? transaction_ordering_policy specifies the ordering policy for selecting and prioritizing memory transactions. Currently supported transaction ordering policies are First Come First Serve (fcfs), Read or Instruction Fetch First (riff),Wang’s algorithm(wang),Least Pending(least_pending), Most Pending(most_pending), Open bank first (obf) and Greedy(greedy). Note that the fb-dimm system supports only Greedy. -cpu:frequency ?cpu_frequency? cpu_frequency specifies the frequency of the CPU core. Since everything is relative, we need to know the frequency of the CPU core so the memory system can have a reference ratio to interact with it. The unit is assumed to be “MHz”. A default setting of 2000 MHz is assumed if no options are specified. -dram:type ?dram_type? dram_type specifies the type of dram system to be used. “sdram”, “ddrsdram”, “ddr2”, “ddr3” and “fbdimm” are currently supported options. A default of “sdram” is assumed if no options are specified. -dram:frequency ?dram_frequency? dram_frequency specifies the operating frequency of the memory system. The unit is assumed to be in “MHz”. A default setting of 100 MHz is assumed if no options are specified. A PC800 RDRAM memory system should have this option set to 800, and a PC1600 DDR SDRAM memory system should have this option set to 200. -dram:channel_count ?channel_count? channel_count specifies the number of logical channels of DRAM in the memory system. The current implementation supports one, two, four and eight logical channel of memory system. The default setting of 1 physical channel is assumed if no options are selected. -dram:channel_width ?channel_width? channel_width specifies the width of the data channel in the memory system on a per channel basis. The units are assumed to be in bytes. To simulate a dual RDRAM channel system with a single memory controller (as in Intel i850), the channel_count switch above should be set to 1 (channel), and the channel_width setting should be set to 32 (bytes). UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 22 -dram:refresh? auto_refresh_time_period? auto_refresh_time_period specifies the time period which the memory controller will cycle through and refresh all of the rows in a DRAM based memory system. The unit is in milliseconds. The default is set to 0, which is a special case that specifies no refresh simulated. An auto refresh time setting of 10 milliseconds should reduce available bandwidth by about 1 to 5%, depending on the memory system and refresh scheme. -dram:row_buffer_management_policy ?row_buffer_management_policy? row_buffer_management_policy specifies the row buffer management policy. Currently available options are “open_page”, “close_page”. The default is set to “open_page”. The open page policy keeps an accessed page open for as long as possible, until a row refresh closes it, or until another row access to the same bank forces that page to be closed. The close page policy closes each page immediately after the column access. -dram:address_mapping_policy ?address_mapping_policy? The address mapping policy determines how an address will be broken down to addresses in the memory system by “sdram_base_map”, rank, bank, row, “intel845g_map”, and column. Currently “sdram_close_page_map”, supported options “burger_alt_map” are and “burger_base_map”. sdram_base_map and intel845g_map are to be used for SDRAM and DDRSDRAM memory systems. -dram:chipset_delay ?chipset_delay_value? To simulate the minimum latency through the system controller and memory controller, we implemented chipset_delay in our simulator. The units are in number of DRAM cycles. Since DRDRAM based systems are clocked might higher, please set this delay to a higher value. DDR SDRAM based memory systems should also have this value set to twice that of SDRAM based systems for “equivalent” latency in terms of nanoseconds through the system controller. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 23 -dram:spd_input ?input_filename? Since it gets tedious to specify 20 different parameters to specify a memory system, the preferred way to do it with a configuration file. Numerous timing parameters and DRAM system configurations can be specified with a .spd file. A sample .spd file is shown below. Comments are allowed after // Sample .spd files are defined under the subdirectory of /mem_system_def/ // DDR3 1000 memory system. // Composed of 1 Gbit chips. 2 ranks, each rank has 8 (x8) 1 Gbit chips. // Total is 2 GB // Peak Bandwidth is 8 GB/s // type ddr3 datarate 1000 channel_count 1 // Logical channel channel_width 8 // Byte width PA_mapping_policy sdram_hiperf_map // Comments are allowed here rank_count 2 bank_count 8 // 8 banks per chip row_count 16384 col_count 1048 t_cas 10 // 10ns t_cmd 2 t_cwd 8 t_dqs 4 t_faw 48 t_ras 40 t_rc 50 t_rcd 10 t_rrd 10 t_rp 10 t_wr 10 posted_cas FALSE t_al 8 auto_refresh FALSE auto_refresh_policy refresh_one_chan_all_rank_all_bank refresh_time 64000 //specified in us. Fig. 18: Sample DRAM Configuration File for 1 Gbit DDR3 SDRAM -dram:power_input ?input_filename?The power input file (similar to the spd file) can be specified. The format of the power input file is :#comment Below is a sample of power input file. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 24 ##################################################### # DDR SDRAM Configuration and Data Sheet Parameters # # corresponding to 1 Gb 667 MHz 4-4-4 spd file # ##################################################### density 1024 # Mb DQS 2 # per chip max_VDD 1.9 #V min_VDD 1.7 #V IDD0 100 # mA IDD2P 7 # mA IDD2F 65 # mA IDD3P 40 # mA IDD3N 70 # mA IDD4R 205 # mA IDD4W 180 # mA IDD5 270 # mA t_CK 3 # ns t_RFC_min 127.5 # ns t_REFI 7.8 # microseconds ################################################### # DRAM Usage Conditions in the System Environment # ################################################### VDD 1.8 # V Fig. 19: Sample DRAM Power input File for 1 Gbit DDR2 SDRAM -debug:biu This switch turns on the bus interface debugging feature, and dumps out to stderr for each bus interface unit slot acquisition and release. -debug:transaction This switch turns on the transaction interface debugging feature, and dumps out to stderr each time a transaction enters into a queue, gets broken down into command sequences or retires. -debug:threshold This switch turns on the transaction interface debugging feature, and dumps out to stderr each time after a particular transaction number is past. -debug:bundle UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL – 25 This switch turns on the bundle debugging feature, and dumps out the contents of a bundle being sent when simulating a FB-DIMM based system. -debug:amb This switch turns on the Advanced Memory Buffer (amb) debugging feature, and dumps out to stderr the state of the amb everytime a command is sent that is going to occupy the buffer or release its contents to the dram. This flag is valid only when FB-DIMM based configurations are simulated. -debug:all This switch turns on biu, transaction and dram memory system debugging all at the same time. -debug:wave This switch turns on a simple ASCII text based DRAM memory system waveform display. -stat:biu The bus interface unit (BIU) stats file collects latency information for each and every memory transaction. The final result is placed in stderr or in a common statistical file. The output will show one column that represents the number of cpu cycles that it took for a memory access to complete, and the second column represents the number of accesses that incurs the latency. -stat:dram:bank_hit The bank hit stats is information for the number of dram cycles that occur between accesses that hit an open bank. -stat:dram:bank_conflict This flag enables the collection information for the number of dram cycles that occur between accesses that conflicts on an open bank. -stat:dram:cas_per_ras This enables collection of data on the number of column read or write commands that gets issued for each row access open bank/page command. -stat:dram:power ?output_filename? The power stats file collects the power statistical data during a specified period of time and the average power per rank during that time. The format of each line in the file is /<# of chips per rank> <# of access> All variables are as explained in “Power model” section. UNIVERSITY OF MARYLAND MEMORY SYSTEM SIMULATOR MANUAL BIBLIOGRAPHY – 26 Bibliography [1] D. T. Wang. Modern DRAM Memory Systems: Performance Analysis and a High Performance, Power-Constrained DRAMScheduling Algorithm. PhD thesis, University of Maryland College Park, 2005. [2] S. Rixner, W. J. Dally, U. J. Kapasi, P. R. Mattson, and J. D. Owens. Memory access scheduling. In ISCA, pages 128–138, 2000 [3] W. fen Lin, S. K. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In HPCA, 2001..
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : Yes Page Count : 26 Page Mode : UseOutlines XMP Toolkit : XMP toolkit 2.9.1-13, framework 1.6 About : uuid:3d140d9d-d5a1-450e-8b67-160ab6341d63 Producer : Acrobat Distiller 6.0 (Windows) Create Date : 2002:06:21 20:39:00Z Creator Tool : FrameMaker 7.0 Modify Date : 2006:01:23 17:03:05Z Document ID : uuid:cbdcc36a-6e3f-4905-ad73-e2b83e5fc0c5 Format : application/pdf Title : memorysystemsimulatormanual6.fm Creator : brinda Author : brindaEXIF Metadata provided by EXIF.tools