System-Level Detection, Modeling, and Mitigation of DRAM Failures to Enable Efficient Scaling of DRAM Memory
Future computing systems will be dominated by enormous amount of data processing. These systems will have to compute over exponentially growing user focused data from ubiquitous network and internet of things (e.g., sensors, self-driving cars, mobile devices, social media). At the same time, the forward progress of scientific innovations will greatly depend on the fast and efficient computation on high volume datasets generated from scientific experiments (analyzing gravitational waves, colliding particles in the particle accelerators, etc.). However, the current computing systems are bottlenecked by memory, but high capacity, scalable memory is essential for fast and efficient data processing in the future. Unfortunately, DRAM, the predominant underlying technology for memory is facing major scaling challenge. As DRAM scales down to smaller technology nodes, cells become more vulnerable, resulting in DRAM failures. Enabling a higher capacity memory system without sacrificing reliability is a major research challenge.
This research focuses on developing fundamental breakthrough that can enable scalable memory system for the future systems. This proposal provides research plan and ideas to solve the DRAM scaling challenge in a completely new approach by separating the responsibility of providing reliable DRAM operation from designing memory cells with smaller feature size. The central vision of this proposal is to develop system-level detection and mitigation techniques for DRAM failures such that cells can be manufactured to be smaller without providing any reliability guarantee. It is expected that ideas developed in this research will bridge the gap between circuits and systems and will enable a holistic approach to solve the DRAM scaling challenge. The cross-cutting nature of the work will influence circuit-level testing, computer architecture, and OS and systems design and can potentially enable collaboration between different communities (testing and systems/architecture). The ideas developed in this research will not only impact innovation in computing, but will also help numerous scientific fields to take a leap towards new innovations. The results of this research will be integrated to existing and new courses to impact student training and education, designed focusing on attracting the minority groups towards hardware and systems design to enhance diversity in the field.