Now Reading
Nvidia RTX 4090 vs M1Pro with MLX (up to date with M2/M3)

Nvidia RTX 4090 vs M1Pro with MLX (up to date with M2/M3)

2023-12-13 08:52:36

How briskly is my Whisper Benchmark with the MLX Framework from Apple? Nvidia 4090 / M1 Professional / M2 Extremely / M3

(… see down beneath for M2 Extremely / M3 Max Replace)

Apple released a machine learning framework for Apple Silicon. Together with which are some examples to see how issues are working. Additionally they use a whisper for benchmarking. So I dug out my benchmark and used that to measure efficiency.

I merely added a brand new file to the repo (and the whisper giant mannequin was already downloaded). See the original source dir.

import datetime
from pprint import pprint

from whisper import transcribe

if __name__ == '__main__':
    audio_file = "whisper/belongings/audio.wav"
    start_time =
    x = transcribe(audio=audio_file, mannequin="giant")
    end_time =
    print(end_time - start_time)

It experiences again an inventory of segements with the next construction:

{'avg_logprob': -0.18728541468714807,
               'compression_ratio': 1.3786764705882353,
               'finish': 589.92,
               'id': 139,
               'no_speech_prob': 0.0017877654172480106,
               'search': 56892,
               'begin': 586.92,
               'temperature': 0.0,
               'textual content': ' Ich heiße Moses Fendel, danke fürs Zuhören und '
               'tokens': [51264,

The construction is identical as I get with Python whisper on my RTX 4090.

The audio file is identical as in my different benchmarks with M1 and 4090.

End result

The consequence for a ten Minute audio is 0:03:36.296329 (216 seconds). Evaluate that to 0:03:06.707770 (186 seconds) on my Nvidia 4090. The 2000 € GPU continues to be 30 seconds or ~ 16% quicker. All graphics core the place totally utilized in the course of the run and I give up all packages, disabled desktop image or related for that run.

Replace: I ran the identical assessments a number of instances, the time is measured now with out loading the mannequin into reminiscence in each instances.

My Macbook {Hardware} Specs:

  • 14″ MacBook with M1 Professional, 8 (6 efficiency and a couple of effectivity) cores (2021 mannequin)
  • 32 GB RAM
  • 16 GPU Cores

PC Spec:

  • Intel Core I7-12700KF 8x 3.60GHz
  • 2×32 GB RAM 3200 MHz DDR4, Kingston FURY Beast
  • SSD M.2 PCIe 2280 – 1000GB Kingston KC3000 PCIe 4.0 NVMe
    7000 MBps (learn)/ 6000 MBps (write)
  • GeForce RTX 4090, 24GB GDDR6X / Palit RTX 4090 GameRock OmniBlack

The brand new M3 Max Chip has 30 GPU cores (configurable as much as 40 GPU cores)  and 14 CPU cores (as much as 16 CPU cores). If these GPU cores (double quantity) are 30% quicker than those in my Laptop computer: This could beat the RTX 4090 simply. That machine is as of now 3200 $.  

M2 Extremely / M3 Max Replace

Ivan over at Twitter ran the identical audio file on M2 Extremely with 76 GPUs and M3 Max with 40 GPUs. A lot quicker than my M1 however each are related velocity.

Ivan examined it on M2+M3


Whisper efficiency

Be mindful, this isn’t 100% correct. The tough thought must be seen. Different processes operating, loading instances, chilly, heat begin can affect the numbers.

See Also

Energy consumption

Distinction between idle PC / M1Pro and GPU operating PC / M1Pro

  • PC +242 W (Nvidia 4090 operating vs. idle)
  • MacBook +38 W (16 M1 GPU cores operating vs. idle)

I measured that with a Shelly plug. This won’t be 100% correct however provides an thought the place it’s going.

Pricey Reddit feedback:
This isn’t imagined to be a scientific measurement. This provides you a tough thought what the MLX framework is able to :).  A ~ 2 yr previous Macbook utilizing Whisper is nearly as quick because the quickest client graphics card (~ 1 yr previous) available on the market.

Technique to go Apple.

Why I am doing this?

I run a podcast search engine over at I transcribe tens of hundreds episodes, make them full textual content searchable and run some knowledge mining on them.

Replace Dec eleventh: Added specs and extra assessments with out loading the mannequin.

Replace Dec twelfth: The 4090 is the quickest client graphics card. Additionally up to date numbers for M2/M3.

Source Link

What's Your Reaction?
In Love
Not Sure
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top