[ad_1]
John von Neumann’s authentic pc structure, the place logic and reminiscence are separate domains, has had a great run. However some corporations are betting that it’s time for a change.
Lately, the shift towards extra parallel processing and an enormous improve within the dimension of neural networks imply processors must entry extra information from reminiscence extra shortly. And but “the efficiency hole between DRAM and processor is wider than ever,” says Joungho Kim, an knowledgeable in 3D reminiscence chips at Korea Superior Institute of Science and Know-how, in Daejeon, and an IEEE Fellow. The von Neumann structure has turn out to be the von Neumann bottleneck.
What if, as a substitute, not less than a few of the processing occurred within the reminiscence? Much less information must transfer between chips, and also you’d save power, too. It’s not a brand new thought. However its second might lastly have arrived. Final 12 months, Samsung, the world’s largest maker of dynamic random-access reminiscence (DRAM), began rolling out processing-in-memory (PIM) tech. Its first PIM providing, unveiled in February 2021, built-in AI-focused compute cores inside its
Aquabolt-XL high-bandwidth reminiscence. HBM is the type of specialised DRAM that surrounds some high AI accelerator chips. The brand new reminiscence is designed to behave as a “drop-in alternative” for strange HBM chips, stated Nam Sung Kim, an IEEE Fellow, who was then senior vice chairman of Samsung’s reminiscence enterprise unit.
Final August, Samsung revealed outcomes from assessments in a associate’s system. When used with the
Xilinx Virtex Ultrascale + (Alveo) AI accelerator, the PIM tech delivered an almost 2.5-fold efficiency acquire and a 62 p.c reduce in power consumption for a speech-recognition neural internet. Samsung has been offering samples of the expertise built-in into the present technology of high-bandwidth DRAM, HBM2. It’s additionally growing PIM for the subsequent technology, HBM3, and for the low-power DRAM utilized in cell gadgets. It expects to finish the usual for the latter with JEDEC within the first half of 2022.
There are many methods so as to add computational smarts to reminiscence chips. Samsung selected a design that’s quick and easy. HBM consists of a stack of DRAM chips linked vertically by interconnects referred to as through-silicon vias (TSVs). The stack of reminiscence chips sits atop a logic chip that acts because the interface to the processor.
The very best information bandwidth within the stack lies inside every chip, adopted by the TSVs, and eventually the connections to the processor. So Samsung selected to place the processing on the DRAM chips to reap the benefits of the excessive bandwidth there. The compute items are designed to do the commonest neural-network calculation, referred to as multiply and accumulate, and little else.
Different designs have put the AI logic on the interface chip or used extra advanced processing cores.
Samsung’s two largest rivals,
SK hynix and Micron Know-how, aren’t fairly able to make the leap on PIM for HBM, although they’ve every made strikes towards different sorts of processing-in-memory.
Icheon, South Korea–based mostly SK hynix, the No. 2 DRAM provider, is exploring PIM from a number of angles, says
Il Park, vice chairman and head of memory-solution product growth. For now it’s pursuing PIM in commonplace DRAM chips somewhat than HBM, which is perhaps less complicated for purchasers to undertake, says Park.
HBM PIM is extra of a mid- to long-term risk, for SK hynix. For the time being, clients are already coping with sufficient points as they attempt to transfer HBM DRAM bodily nearer to processors. “Many specialists on this area don’t need to add extra, and fairly important, complexity on high of the already busy scenario involving HBM,” says Park.
That stated, SK hynix researchers labored with Purdue College pc scientists on a complete design of an
HBM-PIM product referred to as Newton in 2019. Like Samsung’s Aquabolt-XL, it locations multiply-and-accumulate items within the reminiscence banks to reap the benefits of the excessive bandwidth throughout the dies themselves.
“Samsung has put a stake within the floor,” —Bob O’Donnell, chief analyst at
Technalysis Analysis
In the meantime, Rambus, based mostly in San Jose, Calif., was motivated to discover PIM due to power-consumption points, says Rambus fellow and distinguished inventor
Steven Woo. The corporate designs the interfaces between processors and reminiscence, and two-thirds of the ability consumed by system-on-chip and its HBM reminiscence go to transporting information horizontally between the 2 chips. Transporting information vertically throughout the HBM makes use of a lot much less power as a result of the distances are a lot shorter. “You is perhaps going 10 to fifteen millimeters horizontally to get information again to an SoC,” says Woo. “However vertically you’re speaking on the order of a pair hundred microns.”
Rambus’s experimental PIM design provides an additional layer of silicon on the high of the HBM stack to do AI computation. To keep away from the potential bandwidth bottleneck of the HBM’s central through-silicon vias, the design provides TSVs to attach the reminiscence banks with the AI layer. Having a devoted AI layer in every reminiscence chip may enable reminiscence makers to customise recollections for various functions, argues Woo.
How shortly PIM is adopted will rely on how determined the makers of AI accelerators are for the memory-bandwidth reduction it offers. “Samsung has put a stake within the floor,” says Bob O’Donnell, chief analyst at
Technalysis Analysis. “It stays to be seen whether or not [PIM] turns into a business success.
This text seems within the January 2022 print difficulty as “AI Computing Involves Reminiscence Chips.”
[ad_2]