15k inserts/s with Rust and SQLite

2023-04-01 07:51:38

There’s this rising sentiment in tech that stacking an increasing number of layers of complexity to succeed in the sacrosanct “infinite scalability” isn’t the best way ahead.

Debt.

First, as a result of it hardly ever materializes: you want loads of different issues than fancy tech to succeed in hundreds of thousands of individuals.

Second, as a result of the tradeoffs of advanced techniques are sometimes misunderstood, and more often than not, they create extra issues than advantages. And these issues compound over time.

So here’s a little experiment to point out you tips on how to attain 15,000 inserts per second with easy expertise, which is roughly 1.3 billion inserts per day. 1.3 Billion.

Is it potential to enhance this micro benchmark? In fact, by bundling all of the inserts in a single transaction, for instance, or by utilizing one other, non-async database driver, nevertheless it doesn’t make sense as it is not how a real-world codebase accessing a database seems like. We favor simplicity over theorical numbers.

With out additional ado, listed below are the outcomes:

$ cargo run --release -- -c 3 -i 100000
Inserting 100000 data. concurrency: 3
Time elapsed to insert 100000 data: 6.523381395s (15329.47 inserts/s)

The code

Cargo.toml

[package]
title = "high_performance_rust_with_sqlite"
model = "0.1.0"
version = "2018"

# See extra keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
tokio = { model = "1", options = ["full"] }
sqlx = { model = "0.5", options = [ "runtime-tokio-rustls", "sqlite", "uuid", "chrono", "migrate" ] }
futures = "0.3"
chrono = "0.4"
uuid = { model = "0.8", options = ["v4"] }
clap = "2"

most important.rs

use clap::{App, Arg};
use futures::*;
use sqlx::{
    sqlite::{SqliteConnectOptions, SqliteJournalMode, SqlitePoolOptions, SqliteSynchronous},
    Pool, Sqlite,
};
use std::time::{Length, Immediate};
use std::{fs, str::FromStr};

struct Person {
    id: uuid::Uuid,
    created_at: chrono::DateTime<chrono::Utc>,
    username: String,
}

#[tokio::main]
async fn most important() -> End result<(), Field<dyn std::error::Error>> {
    let cli_matches = App::new("Rust to the mooooon")
        .model("1.0")
        .arg(
            Arg::with_name("concurrency")
                .brief("c")
                .lengthy("concurrency")
                .assist("Variety of concurrent inserts")
                .default_value("3"),
        )
        .arg(
            Arg::with_name("inserts")
                .brief("i")
                .lengthy("inserts")
                .assist("Variety of inserts to carry out")
                .default_value("40000"),
        )
        .get_matches();

    let concurrency = cli_matches
        .value_of("concurrency")
        .unwrap()
        .parse::<usize>()
        .unwrap_or(1);
    let inserts = cli_matches
        .value_of("inserts")
        .unwrap()
        .parse::<usize>()
        .unwrap_or(1);

    let database_file = "db.sqlite";
    let database_url = format!("sqlite://{}", database_file);
    let pool_timeout = Length::from_secs(30);
    // with pool_max_connections = 1, the pool timeout. perhaps associated to https://github.com/launchbadge/sqlx/points/1210
    let pool_max_connections = if concurrency == 1 {
        2
    } else {
        concurrency as u32
    };

    let _ = fs::remove_file(database_file);

    let connection_options = SqliteConnectOptions::from_str(&database_url)?
        .create_if_missing(true)
        .journal_mode(SqliteJournalMode::Wal)
        .synchronous(SqliteSynchronous::Regular)
        .busy_timeout(pool_timeout);

    let sqlite_pool = SqlitePoolOptions::new()
        .max_connections(pool_max_connections)
        .connect_timeout(pool_timeout)
        .connect_with(connection_options)
        .await?;

    sqlx::migrate!("./db").run(&sqlite_pool).await?;

    sqlx::question("pragma temp_store = reminiscence;")
        .execute(&sqlite_pool)
        .await?;
    sqlx::question("pragma mmap_size = 30000000000;")
        .execute(&sqlite_pool)
        .await?;
    sqlx::question("pragma page_size = 4096;")
        .execute(&sqlite_pool)
        .await?;

    println!(
        "Inserting {} data. concurrency: {}",
        inserts, concurrency
    );

    let begin = Immediate::now();
    insert(inserts, concurrency, &sqlite_pool).await;
    let period = begin.elapsed();

    let inserts_per_sec = inserts as f64 / period.as_secs_f64();
    println!(
        "Time elapsed to insert {} data: {:?} ({:.2} inserts/s)",
        inserts, period, inserts_per_sec
    );

    Okay(())
}

async fn insert(inserts: usize, concurrency: usize, sqlite_pool: &Pool<Sqlite>) {
    let stream = stream::iter(0..inserts);

    stream
        .for_each_concurrent(concurrency, |_| async transfer {
            let consumer = Person {
                id: uuid::Uuid::new_v4(),
                created_at: chrono::Utc::now(),
                username: String::from("Good day"),
            };

            sqlx::question(
                "INSERT INTO customers (id, created_at, username)
            VALUES (?, ?, ?)",
            )
            .bind(consumer.id)
            .bind(consumer.created_at)
            .bind(&consumer.username)
            .execute(sqlite_pool)
            .await
            .anticipate("inserting in db");
        })
        .await;
}

db/000_init.sql

CREATE TABLE IF NOT EXISTS customers (
    id BLOB PRIMARY KEY NOT NULL,
    created_at TEXT NOT NULL,
    username TEXT NOT NULL
);

CREATE UNIQUE INDEX idx_users_on_id ON customers(id);

Concurrency

Rising concurrency ought to improve efficiency, proper?

$ cargo run --release -- -c 100 -i 100000
Inserting 100000 data. concurrency: 100
Time elapsed to insert 100000 data: 10.255768373s (9750.61 inserts/s)

What occurs? SQLite allows only one concurrent write to a database. Thus, if we improve concurrency an excessive amount of, we encounter lock rivalry, and efficiency is degraded.

One strategy to restrict lock rivalry is to make use of concurrency primitives in your individual code and a superb scheduler. On this instance, we use a Stream with tokio which appear means higher at dealing with concurrency than SQLite’s locking mechanism.

The machine

For the report, the server is a Scaleway ENT1-S, so not the slowest VPS of the market, however not that costly both.

See Also

$ sudo lscpu
Structure:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Deal with sizes:                   40 bits bodily, 48 bits digital
CPU(s):                          8
On-line CPU(s) checklist:             0-7
Thread(s) per core:              1
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU household:                      25
Mannequin:                           1
Mannequin title:                      AMD EPYC 7543 32-Core Processor
Stepping:                        1
CPU MHz:                         2794.750
BogoMIPS:                        5589.50
Virtualization:                  AMD-V
Hypervisor vendor:               KVM
Virtualization sort:             full
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        4 MiB
L3 cache:                        16 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec retailer bypass: Mitigation; Speculative Retailer Bypass disabled by way of prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs limitations and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmx
                                 ext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse
                                 4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_l
                                 egacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 s
                                 mep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 wbnoinvd arat npt nrip_save umip vaes vpclm
                                 ulqdq arch_capabilities
$ sudo lshw -class disk -class storage
  *-scsi
       description: SCSI storage controller
       product: Virtio SCSI
       vendor: Crimson Hat, Inc.
       bodily id: 1
       bus data: pci@0000:00:01.0
       model: 01
       width: 64 bits
       clock: 33MHz
       capabilities: scsi msix bus_master cap_list
       configuration: driver=virtio-pci latency=0
       sources: iomemory:180-17f irq:21 reminiscence:9100a000-9100afff reminiscence:1800000000-1800003fff
  *-sata
       description: SATA controller
       product: 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
       vendor: Intel Company
       bodily id: 1f.2
       bus data: pci@0000:00:1f.2
       model: 02
       width: 32 bits
       clock: 33MHz
       capabilities: sata msi ahci_1.0 bus_master cap_list
       configuration: driver=ahci latency=0
       sources: irq:36 ioport:1040(measurement=32) reminiscence:91000000-91000fff
  *-scsi
       bodily id: 5
       logical title: scsi0
     *-disk
          description: SCSI Disk
          product: b_ssd
          vendor: SCW
          bodily id: 0.0.0
          bus data: scsi@0:0.0.0
          logical title: /dev/sda
          model: v42
          measurement: 13GiB (15GB)
          capabilities: 5400rpm gpt-1.00 partitioned partitioned:gpt
          configuration: ansiversion=5 guid=9ceb264d-ecc9-413c-a6dc-180fa42c5342 logicalsectorsize=512 sectorsize=4096

Conclusion

Much less is extra.

This submit was nothing greater than only a reminder than you do not want a flowery serverless cluster (??) to crunch some severe numbers.

All of us agree that testing new tech is enjoyable and thrilling, myself included (in any case you’re on a weblog speaking about Rust). However new tech is usually marketed by hiding its drawbacks, and they’re going to come again to chunk you on the worst time when scaling what you are promoting.

If SQLite isn’t your best option for you attributable to its anemic and dynamic typing, or its lack of high-availability, check out PostgreSQL 🙂

The code is on GitHub

As normal, you will discover the code on GitHub: github.com/skerkour/kerkour.com (please remember to star the repo ????)

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top