vDPA: assist for block units in Linux and QEMU

A vDPA gadget is a kind of gadget that follows the virtio specification
for its datapath however has a vendor-specific management path.
vDPA units could be each bodily situated on the {hardware} or emulated by
software program.
A small vDPA dad or mum driver within the host kernel is required just for the management
path. The principle benefit is the unified software program stack for all vDPA
units:
- vhost interface (vhost-vdpa) for userspace or visitor virtio driver, like a
VM operating in QEMU - virtio interface (virtio-vdpa) for bare-metal or containerized functions
operating within the host - administration interface (vdpa netlink) for instantiating units and
configuring virtio parameters
Helpful Assets
Many weblog posts and talks have been printed in recent times that may make it easier to
higher perceive vDPA and the use circumstances. On vdpa-dev.gitlab.io
we collected a few of them; I counsel you a minimum of discover the next:
Block units
A lot of the work in vDPA has been pushed by community units, however in recent times,
we now have additionally developed assist for block units.
The principle use case is certainly leveraging the {hardware} to straight emulate the
virtio-blk gadget and assist totally different community backends akin to Ceph RBD or
iSCSI. That is the purpose of some SmartNICs or DPUs, that are in a position to emulate
virtio-net units in fact, but in addition virtio-blk for community storage.
The abstraction offered by vDPA additionally makes software program accelerators doable,
much like present vhost or vhost-user units.
We discussed about that at KVM Forum 2021.
We talked in regards to the quick path and the gradual path in that speak. When QEMU wants
to deal with requests, like supporting reside migration or executing I/O throttling,
it makes use of the gradual path. Throughout the gradual path, the gadget uncovered to the visitor is
emulated in QEMU. QEMU intercepts the requests and forwards them to the vDPA
gadget by profiting from the driving force applied in libblkio.
Then again, when QEMU doesn’t have to intervene, the quick path comes
into play. On this case, the vDPA gadget could be straight uncovered to the visitor,
bypassing QEMU’s emulation.
libblkio exposes frequent API for accessing
block units in userspace. It helps a number of drivers. We are going to focus extra
on virtio-blk-vhost-vdpa
driver, which is utilized by virtio-blk-vhost-vdpa
block gadget in QEMU. It solely helps gradual path for now, however sooner or later
it ought to be capable to swap to quick path robotically. Since QEMU 7.2, it
helps libblkio drivers, so you need to use the next choices to connect a
vDPA block gadget to a VM:
-blockdev node-name=drive_src1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on
-device virtio-blk-pci,id=src1,bootindex=2,drive=drive_src1
Anyway, to totally leverage the efficiency of a vDPA {hardware} gadget, we are able to
all the time use the generic vhost-vdpa-device-pci
gadget supplied by QEMU that
helps any vDPA gadget and exposes it on to the visitor. In fact,
QEMU shouldn’t be in a position to intercept requests on this situation and subsequently some
options supplied by its block layer (e.g. reside migration, disk format, and many others.)
should not supported. Since QEMU 8.0, you need to use the next choices to connect
a generic vDPA gadget to a VM:
-device vhost-vdpa-device-pci,vhostdev1=/dev/vhost-vdpa-0
At KVM Discussion board 2022, Alberto Faria and Stefan Hajnoczi introduced libblkio,
whereas Kevin Wolf and I discussed its usage in the QEMU Storage Deamon (QSD).
Software program units
One of many vital advantages of vDPA is its robust abstraction, enabling
the implementation of virtio units in each {hardware} and software program—whether or not in
the kernel or person house. This unification underneath a single framework, the place
units seem an identical for QEMU facilitates the seamless integration of
{hardware} and software program elements.
Kernel units
Concerning in-kernel units, ranging from Linux v5.13, there exists a easy
simulator designed for improvement and debugging functions. It’s obtainable
by way of the vdpa-sim-blk
kernel module, which emulates a 128 MB ramdisk.
As highlighted within the presentation at KVM Discussion board 2021, a future gadget within the
kernel (much like the repeatedly proposed however by no means merged vhost-blk
)
may doubtlessly supply wonderful efficiency. Such a tool could possibly be used
in its place when {hardware} is unavailable, for example, facilitating
reside migration in any system, no matter whether or not the vacation spot system
contains a SmartNIC/DPU or not.
Consumer house units
As an alternative, concerning person house, we are able to use VDUSE. QSD helps it and thus
permits us to export any disk picture supported by QEMU, akin to a vDPA gadget
on this means:
qemu-storage-daemon
--blockdev file,filename=/path/to/disk.qcow2,node-name=file
--blockdev qcow2,file=file,node-name=qcow2
--export sort=vduse-blk,id=vduse0,identify=vduse0,node-name=qcow2,writable=on
As talked about within the introduction, vDPA helps totally different buses akin to
vhost-vdpa
and virtio-vdpa
. This flexibility allows the utilization of
vDPA units with digital machines or person house drivers (e.g., libblkio)
by way of the vhost-vdpa
bus. Moreover, it permits interplay with
functions operating straight on the host or inside containers through the
virtio-vdpa
bus.
The vdpa
device in iproute2 facilitates the administration of vdpa units
by way of netlink, enabling the allocation and deallocation of those units.
Beginning with Linux 5.17, vDPA drivers assist driver_ovveride
. This
enhancement permits dynamic reconfiguration throughout runtime, allowing the
migration of a tool from one bus to a different on this means:
# load vdpa buses
$ modprobe -a virtio-vdpa vhost-vdpa
# load vdpa-blk in-kernel simulator
$ modprobe vdpa-sim-blk
# instantiate a brand new vdpasim_blk gadget known as `vdpa0`
$ vdpa dev add mgmtdev vdpasim_blk identify vdpa0
# `vdpa0` is hooked up to the primary vDPA bus driver loaded
$ driverctl -b vdpa list-devices
vdpa0 virtio_vdpa
# change the `vdpa0` bus to `vhost-vdpa`
$ driverctl -b vdpa set-override vdpa0 vhost_vdpa
# `vdpa0` is now hooked up to the `vhost-vdpa` bus
$ driverctl -b vdpa list-devices
vdpa0 vhost_vdpa [*]
# Observe: driverctl(8) integrates with udev so the binding is preserved.
Examples
Under are a number of examples on methods to use VDUSE and the QEMU Storage Daemon
with VMs (QEMU
) or Containers (podman
).
These steps are simply adaptable to any {hardware} that helps virtio-blk
units through vDPA.
qcow2 picture obtainable for host functions and containers
# load vdpa buses
$ modprobe -a virtio-vdpa vhost-vdpa
# create an empty qcow2 picture
$ qemu-img create -f qcow2 take a look at.qcow2 10G
# load vduse kernel module
$ modprobe vduse
# launch QSD exposing the `take a look at.qcow2` picture as `vduse0` vDPA gadget
$ qemu-storage-daemon --blockdev file,filename=take a look at.qcow2,node-name=file
--blockdev qcow2,file=file,node-name=qcow2
--export vduse-blk,id=vduse0,identify=vduse0,num-queues=1,node-name=qcow2,writable=on &
# instantiate the `vduse0` gadget (similar identify utilized in QSD)
$ vdpa dev add identify vduse0 mgmtdev vduse
# be sure you connect it to the `virtio-vdpa` gadget to make use of with host functions
$ driverctl -b vdpa set-override vduse0 virtio_vdpa
# gadget uncovered as a virtio gadget, however hooked up to the host kernel
$ lsblk -pv
NAME TYPE TRAN SIZE RQ-SIZE MQ
/dev/vda disk virtio 10G 256 1
# begin a container with `/dev/vda` hooked up
podman run -it --rm --device /dev/vda --group-add keep-groups fedora:39 bash
Launch a VM utilizing a vDPA gadget
# obtain Fedora cloud picture (or use another bootable picture you need)
$ wget https://obtain.fedoraproject.org/pub/fedora/linux/releases/39/Cloud/x86_64/photos/Fedora-Cloud-Base-39-1.5.x86_64.qcow2
# launch QSD exposing the VM picture as `vduse1` vDPA gadget
$ qemu-storage-daemon
--blockdev file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file
--blockdev qcow2,file=file,node-name=qcow2
--export vduse-blk,id=vduse1,identify=vduse1,num-queues=1,node-name=qcow2,writable=on &
# instantiate the `vduse1` gadget (similar identify utilized in QSD)
$ vdpa dev add identify vduse1 mgmtdev vduse
# initially it is hooked up to the host (`/dev/vdb`), as a result of `virtio-vdpa`
# is the primary kernel module we loaded
$ lsblk -pv
NAME TYPE TRAN SIZE RQ-SIZE MQ
/dev/vda disk virtio 10G 256 1
/dev/vdb disk virtio 5G 256 1
$ lsblk /dev/vdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vdb 251:16 0 5G 0 disk
├─vdb1 251:17 0 1M 0 half
├─vdb2 251:18 0 1000M 0 half
├─vdb3 251:19 0 100M 0 half
├─vdb4 251:20 0 4M 0 half
└─vdb5 251:21 0 3.9G 0 half
# and it's recognized as `virtio1` within the host
$ ls /sys/bus/vdpa/units/vduse1/
driver driver_override energy subsystem uevent virtio1
# connect it to the `vhost-vdpa` gadget to make use of the gadget with VMs
$ driverctl -b vdpa set-override vduse1 vhost_vdpa
# `/dev/vdb` shouldn't be obtainable anymore
$ lsblk -pv
NAME TYPE TRAN SIZE RQ-SIZE MQ
/dev/vda disk virtio 10G 256 1
# the gadget is recognized as `vhost-vdpa-1` within the host
$ ls /sys/bus/vdpa/units/vduse1/
driver driver_override energy subsystem uevent vhost-vdpa-1
$ ls -l /dev/vhost-vdpa-1
crw-------. 1 root root 511, 0 Feb 12 17:58 /dev/vhost-vdpa-1
# launch QEMU utilizing `/dev/vhost-vdpa-1` gadget with the
# `virtio-blk-vhost-vdpa` libblkio driver
$ qemu-system-x86_64 -m 512M -smp 2 -M q35,accel=kvm,memory-backend=mem
-object memory-backend-memfd,share=on,id=mem,measurement="512M"
-blockdev node-name=drive0,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-1,cache.direct=on
-device virtio-blk-pci,drive=drive0
# `virtio-blk-vhost-vdpa` blockdev can be utilized with any QEMU block layer
# options (e.g reside migration, I/O throttling).
# On this instance we're utilizing I/O throttling:
$ qemu-system-x86_64 -m 512M -smp 2 -M q35,accel=kvm,memory-backend=mem
-object memory-backend-memfd,share=on,id=mem,measurement="512M"
-blockdev node-name=drive0,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-1,cache.direct=on
-blockdev node-name=throttle0,driver=throttle,file=drive0,throttle-group=limits0
-object throttle-group,id=limits0,x-iops-total=2000
-device virtio-blk-pci,drive=throttle0
# Alternatively, we are able to use the generic `vhost-vdpa-device-pci` to take
# benefit of all of the efficiency, however with out having any QEMU block layer
# options obtainable
$ qemu-system-x86_64 -m 512M -smp 2 -M q35,accel=kvm,memory-backend=mem
-object memory-backend-memfd,share=on,id=mem,measurement="512M"
-device vhost-vdpa-device-pci,vhostdev=/dev/vhost-vdpa-0