Korean researchers power-shame Nvidia with new neural AI chip — declare 625 occasions much less energy draw, 41 occasions smaller

A crew of scientists from the Korea Superior Institute of Science and Know-how (KAIST) detailed their ‘Complementary-Transformer’ AI chip throughout the current 2024 Worldwide Stable-State Circuits Convention (ISSCC). The brand new C-Transformer chip is claimed to be the world’s first ultra-low energy AI accelerator chip able to giant language mannequin (LLM) processing.
In a press release, the researchers power-shame Nvidia, claiming that the C-Transformer makes use of 625 occasions much less energy and is 41x smaller than the inexperienced crew’s A100 Tensor Core GPU. It additionally reveals that the Samsung fabbed chip’s achievements largely stem from refined neuromorphic computing know-how.
Although we’re advised that the KAIST C-Transformer chip can do the identical LLM processing duties as one among Nvidia’s beefy A100 GPUs, not one of the press nor convention supplies we’ve got offered any direct comparative efficiency metrics. That is a big statistic, conspicuous by its absence, and the cynical would in all probability surmise {that a} efficiency comparability would not do the C-Transformer any favors.
Picture 1 of 4
The above gallery has a ‘chip {photograph}’ and a abstract of the processor’s specs. You may see that the C-Transformer is presently fabbed on Samsung’s 28nm course of and has a die space of 20.25mm2. It runs at a most frequency of 200 MHz, consuming below 500mW. At greatest, it may obtain 3.41 TOPS. At face worth, that is 183x slower than the claimed 624 TOPS of the Nvidia A100 PCIe card (however the KAIST chip is claimed to make use of 625x much less energy). Nevertheless, we might choose some form of benchmark efficiency comparability slightly than have a look at every platform’s claimed TOPS.
The structure of the C-Transformer chip is fascinating to have a look at and is characterised by three essential purposeful characteristic blocks. Firstly, there’s a Homogeneous DNN-Transformer / Spiking-transformer Core (HDSC) with a Hybrid Multiplication-Accumulation Unit (HMAU) to effectively course of the dynamically altering distribution power. Secondly, we’ve got an Output Spike Hypothesis Unit (OSSU) to cut back the latency and computations of spike area processing. Thirdly, the researchers carried out an Implicit Weight Technology Unit (IWGU) with Prolonged Signal Compression (ESC) to cut back Exterior Reminiscence Entry (EMA) power consumption.
It’s defined that the C-Transformer chip would not simply add some off-the-shelf neuromorphic processing as its ‘particular sauce’ to compress the massive parameters of LLMs. Beforehand, neuromorphic computing know-how wasn’t correct sufficient to be used with LLMs, says the KAIST press launch. Nevertheless, the analysis crew says it “succeeded in bettering the accuracy of the know-how to match that of [deep neural networks] DNNs.”
Although there are uncertainties in regards to the efficiency of this primary C-Transformer chip on account of no direct comparisons with industry-standard AI accelerators, it’s onerous to dispute claims that it will likely be a pretty possibility for cell computing. Additionally it is encouraging that the researchers have gotten this far with a Samsung take a look at chip and in depth GPT-2 testing.