HBM Hits 36GB Stacks at 9.8 Gbps
Samsung’s annual Reminiscence Tech Day is happening in San Jose this morning, and as a part of the occasion, the corporate is making a few notable reminiscence expertise bulletins/disclosures. The spotlight of Samsung’s occasion is the introduction of Shinebolt, Samsung’s HBM3E reminiscence that may set new marks for each reminiscence bandwidth and reminiscence capability for high-end processors. The corporate can also be disclosing a bit extra on their GDDR7 reminiscence, which is able to mark a major technological replace to the GDDR household of reminiscence requirements.
Beginning issues off, now we have at present’s marquee announcement: Shinebolt HBM3E reminiscence. Like the remainder of the reminiscence trade, Samsung is making ready a successor to the current-generation of HBM3 reminiscence that’s getting used with high-end/HPC-grade processors, with the trade settling across the upcoming HBM3E standard. HBM3E is slated to supply each considerably greater capacities and better reminiscence bandwidth than HBM3, serving to the high-bandwidth reminiscence expertise sustain with ever-growing workloads on high-end processors.
Samsung HBM Reminiscence Generations | ||||||
HBM3E (Shinebolt) |
HBM3 (Icebolt) |
HBM2E (Flashbolt) |
HBM2 (Aquabolt) |
|||
Max Capability | 36GB | 24 GB | 16 GB | 8 GB | ||
Max Bandwidth Per Pin | 9.8 Gb/s | 6.4 Gb/s | 3.6 Gb/s | 2.0 Gb/s | ||
Variety of DRAM ICs per Stack | 12 | 12 | 8 | 8 | ||
Efficient Bus Width | 1024-bit | |||||
Voltage | ? | 1.1 V | 1.2 V | 1.2 V | ||
Bandwidth per Stack | 1.225 TB/s | 819.2 GB/s | 460.8 GB/s | 256 GB/s |
The premise of Shinebolt is a brand new 24Gbit HBM reminiscence die, which Samsung will probably be producing on their D1a course of, the corporate’s EUV-based 4th era 10nm-class (14nm) node. Samsung will probably be producing each 8Hi and finally 12Hi stacks based mostly on this new die, permitting for whole stack capacities of 24GB and 36GB respectively, 50% extra capability than their HBM3 (Icebolt) equivalents.
Based on Samsung, Shinebolt will have the ability to hit reminiscence clockspeeds as excessive as 9.8Gbps/pin, higher than 50% quicker than their HBM3 merchandise. Although given a few of Samsung’s earlier reminiscence clockspeed claims, there’s a great probability it is a semi-overclocked state. Shinebolt growth isn’t far sufficient alongside for Samsung to record particular person SKUs, however even on the conservative finish of issues, Samsung is selling knowledge charges of no less than 8Gbps/pin of their occasion press launch. And if Samsung’s bold reminiscence frequencies do come to fruition, then that may put Samsung forward of their competitors as properly; to this point, SK hynix and Micron have introduced plans for 9Gbps/pin and 9.2Gbps/pin reminiscence respectively, so Samsung’s claims are definitely essentially the most aggressive.
Total, these clockspeeds would give a single HBM3E stack a minimal bandwidth of 1TB/sec, and a most bandwidth of 1.225TB/sec, properly forward of the 819GB/sec knowledge charge of HBM3. Or to border issues in reference of a high-end processor (e.g. NVIDIA H100), a 6-stack chip would have the ability to entry as a lot as 216GB of reminiscence with an combination reminiscence bandwidth as excessive as 7.35TB/sec.
As for energy effectivity, issues look to be a little bit of a combined bag. On a relative foundation, Samsung says that Shinebolt will probably be 10% extra environment friendly than Icebolt – in different phrases, consuming 10% much less energy per bit transferred (pJ/bit). Nevertheless, a 25%+ clockspeed enchancment will greater than wipe out these positive factors as a result of important improve in bits transferred. So whereas Shinebolt will probably be extra environment friendly general, on an absolute foundation plainly whole energy consumption for HBM reminiscence will proceed to develop with the following era.
Both manner, for the high-end processor market that Samsung is focusing on with Shinebolt, chipmakers are unlikely to be fazed by the ability improve. Like the remainder of the high-end processor house, Samsung has the AI market set sq. in its sights – a market phase the place each reminiscence bandwidth and reminiscence capability are limiting elements, particularly with large giant language fashions (LLMs). Together with the normal supercomputer and networking market segments, Samsung ought to have little bother promoting quicker HBM in the course of a booming AI market.
Like the opposite main reminiscence distributors, Samsung expects to ship Shinebolt sooner or later in 2024. Provided that the corporate simply began sampling the reminiscence – and that HBM3 Icebolt itself simply hit mass manufacturing – Shinebolt’s doubtless not delivery till the later a part of the 12 months.
A Temporary Teaser on HBM4: FinFETs & Copper-to-Copper Bonding
Lastly, trying even farther into the longer term, Samsung is briefly speaking about their plans for HBM4 reminiscence. Whereas that expertise remains to be a number of years off (there’s not even an permitted specification for it but), we all know from previous disclosures that the reminiscence trade is aiming to maneuver to a wider, 2048-bit reminiscence interface. Which, as Samsung likes to border issues, is the one sensible alternative when additional HBM clockspeed will increase would blow out energy consumption.
For HBM4, Samsung is taking a look at using extra superior fab and packaging applied sciences which can be at present the area of logic chips. On the fab facet of issues, the corporate needs to maneuver to utilizing FinFET transistors for his or her reminiscence, versus the planar transistors nonetheless used there. As with logic, FinFETs would scale back the drive present required, which might assist to enhance DRAM vitality effectivity. In the meantime on the packaging facet of issues, Samsung is taking a look at shifting from micro-bump bonding to bumpless (direct copper-to-copper) bonding, a packing method that’s nonetheless on the cutting-edge of growth even within the logic house. Embracing cutting-edge applied sciences will probably be important to retaining HBM bandwidth rising because it has over the past decade, however the prices and complexities of doing so additionally underscore why HBM stays an completely area of interest high-end reminiscence expertise.
GDDR7 Replace: 50% Decrease Stand-By Energy Than GDDR6
Moreover HBM3E, Samsung’s different massive bandwidth reminiscence replace of the day is a quick standing replace on their GDDR7 reminiscence.
Again in July of this 12 months, Samsung announced that they completed initial development on their GDDR7 memory. The subsequent era of GDDR reminiscence, GDDR7 brings with it a number of main adjustments versus at present’s GDDR6, essentially the most important of which is a change to PAM3 encoding. PAM3 permits for 1.5 bits to be transferred per cycle (or quite 3 bits over two cycles), opening the door to enhancing reminiscence switch charges with out using extra pricey technique of additional enhancing the frequency of the reminiscence bus.
GDDR Reminiscence Generations | |||||
GDDR7 | GDDR6X | GDDR6 | |||
B/W Per Pin | 32 Gbps (Projected) | 24 Gbps (Transport) | 24 Gbps (Sampling) | ||
Chip Density | 2 GB (16 Gb) | 2 GB (16 Gb) | 2 GB (16 Gb) | ||
Complete B/W (256-bit bus) | 1024 GB/sec | 768 GB/ssec | 768 GB/ssec | ||
DRAM Voltage | 1.2 V | 1.35 V | 1.35 V | ||
Knowledge Price | QDR | QDR | QDR | ||
Signaling | PAM-3 | PAM-4 | NRZ (Binary) | ||
Packaging | 266 FBGA | 180 FBGA | 180 FBGA |
As a fast recap from Samsung’s July announcement, Samsung will probably be rolling out 16Gbit (2GB) modules, which is able to have the ability to run at as much as 32Gbps/pin. That’s a 33% enchancment in bandwidth per pin over present GDDR6 reminiscence, and would deliver the mixture bandwidth of a 256-bit reminiscence bus to a cool 1TB/second. GDDR7 also needs to ship a 20% enchancment in energy effectivity over Samsung’s GDDR6 (when it comes to pJ/bit), thanks partially to using Samsung’s 3rd era D1z (10nm-class) fab node.
Right now’s occasion from Samsung is basically a recap of July’s announcement, however within the course of now we have discovered a few new technical particulars on GDDR7 that Samsung hasn’t beforehand disclosed. First off, GDDR7 isn’t simply enhancing lively energy consumption, however the tech can even enhance on stand-by energy consumption to a major diploma. Because of extra clock controls, GDDR7 will devour 50% much less stand-by energy than GDDR6.
Second, in discussing why Samsung (and the trade as a complete) went with PAM3 encoding for GDDR7 as an alternative of even denser PAM4, the corporate confirmed a few of our technical suppositions on the brand new expertise. In brief, PAM3 has a decrease common bit error charge (BER) than PAM4, largely due to the broader margins on the attention window. None of which makes PAM4 unworkable (as Micron has already confirmed), however Samsung and the remainder of the reminiscence trade are favoring the relative simplicity of PAM3, given the trade-offs.
Moreover the same old video card/gaming clients, Samsung is anticipating GDDR7 to be adopted by AI chip makers, and maybe a bit extra surprisingly, the automotive trade. The truth is a few of these non-traditional clients will be the first to undertake the reminiscence; because the conventional GPU distributors are nonetheless mid-cycle on their present era of merchandise, it should nonetheless be fairly a while earlier than they ship any GDDR7-capable silicon.
At this level Samsung has not introduced a projected date for when their GDDR7 reminiscence will go into mass manufacturing. However the firm remains to be anticipating that they would be the first vendor to ship the next-generation reminiscence, presumably in 2024.