Is 20M of rows nonetheless a legitimate tender restrict of MySQL desk in 2023? – Yisheng’s weblog
Rumor
There’s rumor across the web that we must always keep away from having > 20M rows in a single MySQL desk. In any other case, the desk’s efficiency will likely be downgraded, you’ll discover SQL question a lot slower than traditional when it’s above the tender restrict. These judgements had been made on HDD a few years in the past. I’m questioning if it’s nonetheless true for MySQL on SSD in 2023, and if true, why is that?
Setting
— Database
MySQL model: 8.0.25
Occasion kind: AWS db.r5.massive (2vCPUs, 16GiB RAM)
EBS storage kind: Common Function SSD (gp2)
— Check Consumer
Linux kernal model: 6.1
Occasion kind: AWS t2.micro (1 vCPU, 1GiB RAM)
Experiment Design
Create tables with identical schema however totally different in dimension. I created 9 tables with 100k, 200k, 500k, 1m, 2m, 5m, 10m, 20m, 30m, 50m, 60m of rows respectively.
1. Create a number of tables with identical schema:
CREATE TABLE row_test(
`id` int NOT NULL AUTO_INCREMENT,
`person_id` int NOT NULL,
`person_name` VARCHAR(200),
`insert_time` int,
`update_time` int,
PRIMARY KEY (`id`),
KEY `query_by_update_time` (`update_time`),
KEY `query_by_insert_time` (`insert_time`)
);
2. Insert tables with totally different rows. I exploit each check shopper and copying to create these tables. The script may be discovered here.
# check shopper
INSERT INTO {desk} (person_id, person_name, insert_time, update_time) VALUES ({person_id}, {person_name}, {insert_time}, {update_time})
# copy
create desk <new-table> like <desk>
insert into (`person_id`, `person_name`, `insert_time`, `update_time`)
choose `person_id`, `person_name`, `insert_time`, `update_time` from
Values for person_id, person_name, insert_time and update_time are randomized.
3. Use check shopper to execute following sql queries to check efficiency. The script may be discovered here.
choose depend(*) from <desk> -- full desk scan
choose depend(*) from <desk> the place id = 12345 -- question by main key
choose depend(*) from <desk> the place insert_time = 12345 -- question by index
choose * from <desk> the place insert_time = 12345 -- question by index, however trigger 2-times index tree lookup
4. Lookup the innodb buffer pool standing
SHOW ENGINE INNODB STATUS
SHOW STATUS LIKE 'innodb_buffer_pool_page%'
5. Be sure you reboot the database each time you end testing on a desk! Flush innodb buffer pool to keep away from studying previous cache and get incorrect outcomes!
End result
Question 1. choose depend(*) from <desk>
This question will trigger full desk scan, which is MySQL poor at.
No-cache spherical: (First spherical) The question is executed on the first time when no information cached in buffer pool.
Cache spherical: (Different spherical) The question is executed when there’s already information cached in buffer pool, normally after first execution.
A number of observations:
1. The question runtime of first execution is longer than the next ones
The reason being that MySQL makes use of innodb_buffer_pool to cache information pages. Earlier than first execution, buffer pool is empty and it has to do heavy disk I/O to load the desk from .idb file. However after first execution, the info is saved in buffer pool and following execution can get the outcome by in-memory calculation and keep away from disk I/O, which is way sooner. The method is named MySQL buffer pool heat up.
2. choose depend(*) from <desk>
will attempt to load the entire desk to buffer pool
I in contrast the innodb_buffer_pool statistics earlier than / after experiment. After operating the question, buffer pool utilization change is the same as desk dimension if the buffer pool is sufficiently big. In any other case solely a part of the desk will likely be cached within the buffer pool. The reason being that the question choose depend(*) from desk
will do full desk scan and depend the primary row after one other. This requires to load full desk to reminiscence if it’s not cached. Why? As a result of Innodb helps transaction and it might probably’t assure transactions will see the identical desk at totally different time. Full desk scan is the one protected strategy to get the correct row depend.
3. If buffer pool can not maintain the total desk, there’ll be a burst in question latency
I discover that the innodb_buffer_pool dimension can drastically impression the question efficiency, thus I attempt to run the question underneath totally different configurations. When use 11G buffer, the burst in question latency occurs when desk dimension reachs 50M. Then reduce buffer dimension into 7G, the burst in question latency occurs when desk dimension is 30M. Lastly reduce buffer dimension to 3G, the burst in question runtime occurs when desk dimension is 20M. It’s clear that if there’s desk information can’t be cached in buffer pool, choose depend(*) from <desk>
has to do costly disk I/O to load that, which ends up in a burst in question runtime.
4. No-cache spherical has a linear relationship between question runtime and desk dimension, no matter buffer pool dimension.
No-cache spherical runtime is determined by disk I/O, which is no matter buffer pool dimension. There’s no distinction to make use of choose depend(*)
to heat up buffer pool with the storage disk with identical IOPS.
5. Distinction of question runtime between no-cache spherical and cache-round is fixed if desk can’t be totally cached.
Additionally discover that though there’ll be a burst in question runtime if the desk can’t be totally cached in buffer, the runtime is predictable. The distinction between no-cache spherical runtime and cache spherical runtime is fixed, no matter desk dimension. The reason being that the desk is partly cached within the buffer and the distinction is the time which saved by querying from buffer as a substitute of disk.
Question 2, 3: choose depend(*) from <desk> the place <index_column> = 12345
This question makes use of the index. Since it isn’t a spread question, it solely must search for pages within the path of B+ tree from high to backside, and cache these pages within the innodb buffer pool.
The B+ timber of the desk I created are all 3 in depth and causes 3-4 instances of I/O to heat up the buffer, which prices 4-6ms on common. After that, if I run the identical question once more and it’ll search for the outcome from the reminiscence instantly, which is 0.5ms, equal to community RTT. If the cached web page just isn’t hit for a very long time and evicted from buffer pool, it has to load that from disk once more, which requires at most 4 instances of disk I/O.
Question 4: choose * from <desk> the place <index_column> = 12345
This question causes 2 instances index lookup. Since choose *
requires the question to fetch person_name
, person_id
, which aren’t contained within the index, in the course of the question execution the database engine has to search for 2 B+ timber. Firstly it appears to be like up the insert_time
B+ tree to get the goal row’s main key, then it appears to be like up the first key B+ tree to fetch the total information of the row, which is defined by the graph under:
It’s the explanation why we must always keep away from choose *
in manufacturing. And within the exeperiments the info confirms that this question load 2 instances extra web page blocks than question 2 or 3, which is 8 at most. The typical question runtime will likely be 6-10ms, which can be 1.5 to 2 instances bigger than question 2 or 3.
How does the rumor come
First we have to know the bodily construction of innodb index pages. The default web page dimension is 16k, which consists of headers, system data, person data, web page director and trailer. There’ll solely be 15-14k left to retailer the free information.
Let’s say you makes use of INT as main key (4 byte), and 1KB of payload in every row. Every leaf web page can retailer 15 rows and it’ll be 4+8=12 byte to make it a pointer to that web page. Thus, every non-leaf web page can maintain at most 15k / 12 byte = 1280 pointers. You probably have a 4-layer B+ tree, that may maintain at most 1280*1280*15 = 24.6M rows of knowledge.
Again to the time when HDD dominates the market and SSD is just too costly for databases, 4 instances of random I/O may be the worst case we are able to tolerate and queries with 2 instances index tree lookup even make it worse. Engineers at the moment need to management the depth of index tree and don’t need them to develop too deep. Since now SSD positive aspects recognition and random I/O is cheaper than earlier than, we are able to evaluate the principles set 10 years in the past.
By the way in which 5 layers of B+ tree can maintain 1280*1280*1280*15 = 31.4B rows of knowledge, which exceeds most variety of INT can maintain. Totally different assumptions on dimension of every row will end in totally different tender restrict, smaller or larger than 20M. eg, in my experiment, every row is about 816 bytes (I exploit utf8mb4
charset so every character takes 4 bytes) and the tender restrict of which 4-layers B+ tree can maintain is 29.5M.
Conclusion
- Innodb_buffer_pool dimension / desk dimension decides whether or not there’ll be a efficiency downgrade.
- A extra significant metric to inform whether or not it is advisable to break up a MySQL desk is question runtime / buffer pool hit fee. If the queries at all times hit the buffer, there’ll not be any efficiency situation. 20M rows is only a worth primarily based on expertise.
- Apart from spliting desk, improve innodb_buffer_pool dimension / database reminiscence can be a alternative.
- If doable, keep away from
choose *
in manufacturing, which causes 2 instances index tree lookup within the worst case. - (My private opinion) 20M rows just isn’t a fairly legitimate tender restrict of MySQL desk, contemplating SSD positive aspects recognition now.
Reference
- The physical structure of InnoDB index pages, Jeremy Cole
- B+ Tree and Buffer Pool
- An in-depth look at Database Indexing
Appendix
Unique information may be discovered here