Bullshit Graph Database Efficiency Benchmarks
How is the Graph Database class presupposed to develop when distributors hold spouting off full bullshit? I wrote a bit concerning the ridiculous benchmark Memgraph printed final month hoping they’d do the correct factor and make an try at an actual evaluation. As an alternative these clowns put it on a banner on prime of their house web page. So let’s tear into it.
At first I thought-about replicating it utilizing their very own repository, nevertheless it’s about 2000 strains of Python and I don’t know Python. Worse nonetheless, the work is underneath a “Enterprise Supply License” which states:
“Authorised Goal” means any of the next, offered all the time that (a) you don’t embed or in any other case distribute the Licensed Work to 3rd events; and (b) you don’t present third events direct entry to function or management the Licensed Work as a standalone resolution or service:
1…
2….
3. utilizing the Licensed Work to create a piece or resolution which competes (or may fairly be anticipated to compete) with the Licensed Work.
https://github.com/memgraph/memgraph/blob/master/licenses/BSL.txt
IANAL, however that sounds such as you aren’t allowed to make use of their benchmark code if you happen to present a competing resolution. Which suggests the opposite database distributors can’t truly use this anyway. Why would they do that? As a result of it’s a bullshit benchmark and so they don’t truly need anyone wanting too deeply at it.
So I made a decision to simply do a easy challenge utilizing Gatling. An trade customary software to check efficiency. Why doesn’t everyone simply do this as a substitute of making their very own bizarre factor that’s most likely stuffed with issues? Oh proper, as a result of everybody produces bullshit benchmarks.
They determined to supply the info not in a CSV file like a standard human being would, however as a substitute in a giant cypher file performing particular person transactions for every node and every relationship created. Not batches of transactions… however fairly painful, particular person, one by one transactions one level 8 million occasions. So as a substitute of the import taking 2 minutes, it takes hours. Why would they do that? I hold forgetting… as a result of they don’t truly need anyone making an attempt this.
Okay, what’s their {hardware} for the benchmark outcomes?
A G6? I’m feeling so fly like a G6. These came out in 2009. I might go on ebay and purchase a refurbished one for $50 bucks, however I don’t have a rack to place it in and no assure it received’t catch on hearth the second I flip it on. I’ll simply use my gaming pc with an Intel® Core™ i7-10700K CPU @ 3.80GHz × 8 cores. Debian 4.19… uh… they’ll’t imply that, they most likely imply Debian 10 with Linux kernel 4.19. I’ll keep on with Ubuntu 22.04 with Linux kernel 6.0.5.
Properly use the medium dataset (which already takes hours to import the way in which they set it up) from a “sizzling engine”. Has anybody ever tried testing the efficiency of a sports activities automotive with a “chilly engine”? No, as a result of that’s silly, so we received’t do this right here. Alright the primary 4 queries are:
Q1: MATCH (n:Consumer) RETURN n.age, COUNT(*)
Q2: MATCH (n) RETURN depend(n), depend(n.age)
Q3: MATCH (n:Consumer) WHERE n.age >= 18 RETURN n.age, COUNT(*)
This autumn: MATCH (n) RETURN min(n.age), max(n.age), avg(n.age)
These should not “graphy” queries in any respect, why are they in a graph database benchmark? Okay, no matter. Let’s check out the uncooked information for the primary question [neo4j and memgraph]:
On the left we have now Neo4j, on the correct we have now Memgraph. Neo4j executed the question 195 occasions, taking 1.75 seconds and Memgraph 183 occasions taking 1.04 seconds. Why would you execute the question for various quantity of occasions or totally different durations of time? That is unnecessary. In a correct benchmark you’d run each question for not less than 60 seconds, ideally extra after which evaluate. They do some division and provide you with 112 requests per second for Neo4j and 175 for Memgraph:
Nonetheless, the CPUs of each programs weren’t at 100% which means they weren’t totally utilized. The question statistics half can also be bizarre. “iterations”:100... wait a minute, no they didn’t… they didn’t take 100 extra queries with a single employee separate from the queries they used for throughput and generate “p” latencies in probably the most idiotic method doable. Sure they did. However after all they did:
for i in vary(0, iteration):
ret = consumer.execute(queries=[query_list[i]], num_workers=1)
latency.append(ret[0]["duration"])
latency.type()
query_stats = {
"iterations": iteration,
"min": latency[0],
"max": latency[iteration - 1],
"imply": statistics.imply(latency),
"p99": latency[math.floor(iteration * 0.99) - 1],
"p95": latency[math.floor(iteration * 0.95) - 1],
"p90": latency[math.floor(iteration * 0.90) - 1],
"p75": latency[math.floor(iteration * 0.75) - 1],
"p50": latency[math.floor(iteration * 0.50) - 1],
}
You recognize what’s additionally bizarre? They reported the p99 of solely 100 queries which may very well be excessive for any variety of causes, however not the imply latency. Why not? As a result of within the imply latency Neo4j is available in at 39.6ms vs 46.6ms for Memgraph. Neo4j is definitely quicker if we have a look at the opposite metrics. The p95, p90, p50 and min are all quicker for Neo4j. Speak about an egg on your face.
At this level I haven’t even ran a single of my very own assessments and I can dismiss this benchmark as Gigli bad. Okay, let’s attempt operating question 1 for 60 seconds utilizing 8 employees since I solely have 8 cores vs their 12 and see what Gatling tells us:
As an alternative of 112 queries per second, I get 531q/s. As an alternative of a p99 latency of 94.49ms, I get 28ms with a min, imply, p50, p75 and p95 of 14ms to 18ms. Alright, what about question 2? Identical story.
Let’s see the remainder:
Requests Response Time (ms)
Complete Cnt/s Min fiftieth seventy fifth ninety fifth 99th Max Imply
Q1 - combination 31866 531.1 14 14 15 18 28 82 15
Q2 - aggregate_count 41888 698.133 10 11 11 14 22 48 11
Q3 - aggregate_with_f… 30820 505.246 14 15 15 18 28 95 16
This autumn - min_max_avg 31763 529.383 14 14 15 19 28 81 15
Q5 - expansion_1 615698 10093.41 0 1 1 1 2 90 1
Q6 - expansion_1_with… 615168 10084.721 0 1 1 1 2 64 1
Q7 - expansion_2 57683 945.623 0 2 7 41 81 583 8
Q8 - expansion_2_with… 109390 1793.279 0 1 4 17 41 252 4
Q9 - expansion_3 3027 49.623 0 95 233 552 733 1028 159
Q10 - expansion_3_wit… 4832 79.213 0 59 148 328 479 803 99
Q11 - expansion_4 226 3.054 1 888 2157 9194 22886 25890 2261
Q12 - expansion_4_wit… 247 3.087 1 679 2026 8584 21703 24169 2138
Q13 - neighbours_2 56106 919.77 0 2 8 42 84 367 8
Q14 - neighbours_2_wit 105232 1725.115 0 1 5 18 43 328 4
Q15 - neighbours_2_wi… 32580 534.098 0 5 15 68 121 385 15
Q16 - neighbours_2_wi… 60791 996.574 0 4 10 27 61 414 8
Q17 - pattern_cycle 523845 8587.623 0 1 1 2 3 82 1
Q18 - pattern_long 602254 9873.016 0 1 1 1 2 31 1
Q19 - pattern_short 616306 10103.377 0 1 1 1 2 20 1
Q20 - single_edge_wri… 242302 3972.164 1 2 2 3 6 32 2
Q21 - single_vertex_w… 284782 4668.557 1 2 2 2 5 75 2
Q22 - single_vertex_p… 9646 158.131 39 49 51 58 71 139 49
Q23 - single_vertex_r… 614535 10074.344 0 1 1 1 2 109 1
These numbers are wildly totally different than the numbers Memgraph calculated for Neo4j of their benchmark. Let’s see the breakdown:
Neo4j
Memgraph reported mine Winner By
Q1 - combination 175 112 531 Neo4j 3.03*
Q2 - aggregate_count 272 129 698 Neo4j 2.56*
Q3 - aggregate_with_filter 137 88 505 Neo4j 3.68*
This autumn - min_max_avg 125 97 529 Neo4j 4.23*
Q5 - expansion_1 29648 517 10093 Memgraph 2.93
Q6 - exp_1_with_filter 31553 467 10085 Memgraph 3.12
Q7 - expansion_2 2164 30 946 Memgraph 2.29
Q8 - exp_3_with_filter 3603 61 1793 Memgraph 2.01
Q9 - expansion_3 134 7 50 Memgraph 2.68
Q10 - exp_3_with_filter 159 12 79 Memgraph 2.01
Q11 - expansion_4 4 1 3 Memgraph 1.33
Q12 - exp_4_with_filter 5 2 3 Memgraph 1.66
Q13 - neighbours_2 2171 59 920 Memgraph 2.36
Q14 - n2_with_filter 727 48 1725 Neo4j 2.37*
Q15 - n2_with_data 1286 43 534 Memgraph 2.40
Q16 - n2_w_data_and_filter 3453 83 997 Memgraph 3.46
Q17 - pattern_cycle 21718 371 8588 Memgraph 2.53
Q18 - pattern_long 33130 1127 9873 Memgraph 3.36
Q19 - pattern_short 36187 1508 10103 Memgraph 3.58
Q20 - single_edge_write 32211 337 3972 Memgraph 8.11
Q21 - single_vertex_write 35172 557 4669 Memgraph 7.53
Q22 - s_v_property_update 239 106 158 Memgraph 1.51
Q23 - single_vertex_read 36843 1841 10074 Memgraph 3.66
It appears to be like like Neo4j is quicker than Memgraph within the Mixture queries by about 3 occasions. Memgraph is quicker than Neo4j for the queries they chose by about 2-3x apart from question 14 the place Neo4j performs the Uno Reverse card. I checked it a number of occasions, it’s right.
So there you could have it of us. Memgraph is less than 120 occasions quicker than Neo4j… let’s repair their house web page:
The source code and directions for recreating this benchmark is on github.