Intel CTO Justin Rattner discusses the Hybrid Memory Cube at the Intel Developer Forum |
It was slipped in for only about one-minute of Intel CTO Justin Rattner's hour-long presentation on the final day of the Intel Developer Forum (IDF), so many observers may have missed the announcement that Micron and Intel are collaborating on development of the Hybrid Memory Cube (HMC). HMC had been first introduced just one month earlier, by Micron Fellow and Chief Technologist J. Thomas Pawlowski, who gave a more detailed presentation at the Stanford University Hot Chips Conference (without disclosing the Intel involvement). Hybrid Memory Cube is a 3D IC innovation that goes beyond processor-DRAM die stacking, which has been shown by companies such as Samsung, to an entirely new architecture for the memory-processor interface.
Speaking at Hot Chips, Pawlowski said that in order to continue to increase bandwidth and reduce power and latency to meet the demands of multi-core processing, it is essential that direct control of memory must give way to some form of memory abstraction. The need to have an industry standards body, such as JEDEC, agree on the ~80 parameters used to specify DRAMs results in a "lowest common denominator" solution according to Pawloski.
In the HMC, communication from the processor to memory goes through a a high-speed SERDES data link to a local logic controller die at the bottom of the DRAM stack. In the prototype example shown at IDF, 4 DRAMs were connected by through-silicon vias (TSVs) to the logic die, but stacking of up to 8 DRAMs was also described. The processor was not integrated as part of the stack, avoiding issues of die size mismatch and thermal/cooling issues. The HMC is a complete DRAM module that can be attached in close proximity to the CPU in a multi-chip module (MCM) or on a 2.5D passive interposer. Micron also described "far memory" configurations where some HMCs connect to a host and others connect to other HMCs, through serial links to form networks of memory cubes.
HMC eliminates the need for a complex memory scheduler, using just a thin arbitrator that results in shallow queues, said Pawloski at Hot Chips. The HMC architecture eliminates complex standards requirements, since only the high-speed SERDES interface and form-factor need to be standardized. Timing constraints no longer need to be standardized, since these specifications can be tuned to the application with a custom logic IC, while the high-volume DRAM die are the same across numerous applications.
Pawlowski said that while the serial links slightly increase system latency, a significant overall reduction is achieved with HMC. The DRAM cycle time (tRC) is lower by design, and lower queue delays and higher bank availability further shorten system latency. The 1st generation HMC prototype utilizes four 40 GBps (billion bytes per second) links per cube, for a total throughput capability of 160 GBps per cube.
Micron introduced the Hybrid Memory Cube at the Hot Chips Conference at Stanford University in August. |
In the HMC, communication from the processor to memory goes through a a high-speed SERDES data link to a local logic controller die at the bottom of the DRAM stack. In the prototype example shown at IDF, 4 DRAMs were connected by through-silicon vias (TSVs) to the logic die, but stacking of up to 8 DRAMs was also described. The processor was not integrated as part of the stack, avoiding issues of die size mismatch and thermal/cooling issues. The HMC is a complete DRAM module that can be attached in close proximity to the CPU in a multi-chip module (MCM) or on a 2.5D passive interposer. Micron also described "far memory" configurations where some HMCs connect to a host and others connect to other HMCs, through serial links to form networks of memory cubes.
HMC eliminates the need for a complex memory scheduler, using just a thin arbitrator that results in shallow queues, said Pawloski at Hot Chips. The HMC architecture eliminates complex standards requirements, since only the high-speed SERDES interface and form-factor need to be standardized. Timing constraints no longer need to be standardized, since these specifications can be tuned to the application with a custom logic IC, while the high-volume DRAM die are the same across numerous applications.
Pawlowski said that while the serial links slightly increase system latency, a significant overall reduction is achieved with HMC. The DRAM cycle time (tRC) is lower by design, and lower queue delays and higher bank availability further shorten system latency. The 1st generation HMC prototype utilizes four 40 GBps (billion bytes per second) links per cube, for a total throughput capability of 160 GBps per cube.
The Micron-Intel Hybrid memory cube achieved |
Micron-Intel constructed the 1st generation 27mm x 27mm HMC prototype by combining 1Gb 50nm DRAM arrays with a 90nm prototype logic die, for a total capacity of 512MB in the DRAM cube. The resulting performance achieved significantly ~3X better energy-efficiency (in pj/bit) than next-generation DDR4. The bandwidth of 128GBps with a 1.2 volt VDD, equates to a record-breaking sustained transfer rate of >1 Terabit per second (Tbps or 1 trillion bits per second), according to Intel.
Other articles from IDF
No comments:
Post a Comment