Compare commits

...

75 Commits

Author SHA1 Message Date
Jake Correnti 7d3ff6c39c ci: do not exclude nitro crate
The main branch not being able to compile should not happen. This should
help prevent a similar scenario from happening in the future.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-06-16 17:14:38 -04:00
Jake Correnti f901021254 nitro: libkrun: fix macOS compilation failure
macOS was failing to compile with EFI=1 due to AWS nitro related code
compiling when it should not have been.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-06-16 17:14:38 -04:00
Jake Correnti 7e2239ae07 vmm: fix worker thread panic
If the worker thread panics when trying to convert memory to or from
private, it leaves the VMM process waiting indefinitely for the sender
to send some sort of message over the channel. Rather than panicking, we
should print an error and send a message back over the channel to stop
the VM.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-06-16 16:01:01 +01:00
Ruoqing He 0be2d439a9 arch: Remove `round_up` and `round_down`
We have replaced usage of `round_up` and `round_down`, remove those
unused functions.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-06-16 15:16:36 +01:00
Ruoqing He 78d13aafff rusabaga_gfx: Introduce align module from `vmm-sys-util`
Use macros from `vmm-sys-util` for aligning in `round_up_to_page_size`
implementation.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-06-16 15:16:36 +01:00
Ruoqing He 5c90daa2f7 vmm: Introduce align module from `vmm-sys-util`
Use macros from `vmm-sys-util` for aligning in vmm crate.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-06-16 15:16:36 +01:00
Ruoqing He 0258af1295 arch: Introduce align module from `vmm-sys-util`
Use macros from `vmm-sys-util` for aligning in arch crate.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-06-16 15:16:36 +01:00
Tyler Fanelli 885d642c43 .github/darwin: Exclude nitro crate
The nitro workspace member is not supported on macOS and has
Linux-exclusive dependencies. Exclude the crate when running
clippy on macOS.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Tyler Fanelli e81f9b73a0 nitro: Add API to configure enclave start flags
Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Tyler Fanelli 13d0351806 nitro: Add IPC socket for enclave vsock output
To give more control to the user, allow for the krun_add_vsock_port API
to forward enclave data to/from and IPC socket set up by a consumer of
the library.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Tyler Fanelli 01a748bbe6 nitro: Run preliminary (unconfigurable) enclave
Uses the nitro-enclaves crate to create and run an enclave in debug mode.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Tyler Fanelli e5434228c9 examples: Create nitro enclaves example
The nitro example will run a nitro enclave in debug mode and print the
vsock data from the enclave to the console.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Tyler Fanelli fd839ead34 nitro: Init module, collect enclave resources
The nitro module/crate will serve as the main management layer for nitro
enclaves. Collecting enclave resources from a krun context will allow us
to re-use the existing libkrun APIs for nitro enclaves.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Tyler Fanelli 73227fcca5 nitro: Require initial vsock connection from guest
The guest/libkrun process will initiate the communication of vsock data
from a nitro enclave.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Tyler Fanelli 0ba624a03b lib: Add API to set path of nitro enclaves image
Nitro enclaves' memory regions are defined by enclave image files.
Introduce a libkrun API to set an enclave's image from a file path as
well as the type of enclave image it is.

Currently, only Enclave Image Format (EIF) files are supported for nitro
enclaves.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Tyler Fanelli 6f8bc92b04 Introduce nitro flavor and feature
The nitro flavor introduces support for libkrun acting as a "driver" and
manager for AWS Nitro Enclaves.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-06-12 12:07:17 +01:00
Jake Correnti 11479de00f devices: vmm: Update IOAPIC IRQ routes
Update the IOAPIC IRQ routes to use the new wrapped
`kvm_bindings::KvmIrqRouting` instead of
`kvm_bindings::kvm_irq_routing`.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-06-06 11:16:17 -04:00
Jake Correnti fb5c6e2fbf Update rust-vmm/kvm dependencies
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-06-06 11:16:17 -04:00
Geoffrey Goodman b80a18ac0f makefile: do not couple gpu to efi feature
This tweaks the `EFI=1` make option so that it doesn't automatically bring
along the `gpu` feature. Since `EFI` resets `FEATURE_FLAGS`, the test
for `GPU` and `SND` (not implied by `efi` rust feature) are evaluated
after.

Signed-off-by: Geoffrey Goodman <geoff@goodman.dev>
2025-06-05 11:37:38 +01:00
Sergio Lopez 05bbf88e72 Bump version to 1.13.0
Set up the stage for a new release, including fixes for nested
virt in M4, aarch64 registers, and a new log API.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-06-03 15:04:53 -04:00
Sergio Lopez 052bd8ec1d vmm: bump kbs-types and drop tee-sev
Bump kbs-types dependency to version 0.11 and drop the use of the
tee-sev which we no longer use (and it's broken in kbs-types).

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-06-03 18:26:18 +01:00
Sergio Lopez 6a3b27833c arch/aarch64: replace offset__of with a safe macro
We were using an unsafe macro with undefined behavior and, with
the latest compiler, it generates broken code. Replace it with
a cleaner alternative.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-06-03 16:59:31 +01:00
Sergio Lopez 136b6c49c6 hvf: enable EL2 and GICv3 in ID_AA64PFR0_EL1
For correctness sake, enable EL2 and GICv3 flags in ID_AA64PFR0_EL1.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-06-03 16:46:21 +01:00
Sergio Lopez c8c5185dab hvf: mask out SME in ID_AA64PFR1_EL1
If SME is present in ID_AA64PFR1_EL1 in the VM, the guest will break
after enabling the MMU. I don't see an architectural reason for this to
happen, so I suspect it's a deliberate action of some macOS component to
avoid exposing the feature to VMs.

In any case, let's just mask out SME from ID_AA64PFR1_EL1 to fix nested
virt on M4 devices.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-06-03 16:46:21 +01:00
Matej Hrica 22d7b61b2b Remove orphaned source files which are never used
Remove vmm_config/console.rs file whis is never used (there is no corresponding
`mod console`) and device_manager/mmio.rs which has been split into 2 versions:
device_manager/kvm/mmio.rs and device_manager/hvf/mmio.rs

Signed-off-by: Matej Hrica <mhrica@redhat.com>
2025-06-03 16:45:56 +01:00
Matej Hrica 3d08c75533 Upgrade env_logger dependency
Use newest version of env_logger. This is needed to fix a problem where output
to a pipe ignores RUST_LOG_STYLE=always option and colors don't work.

Signed-off-by: Matej Hrica <mhrica@redhat.com>
2025-06-03 16:44:40 +01:00
Matej Hrica 826ffe0c94 Declare env_logger as dependency only in the top level crate
The correct way to use the env_logger crate is to only depend on it the toplevel
aplication crate. In other crates we should just use the `log` crate facade
(which we already do).

Drop the env_logger dependency from all of our internal crates, and just keep it
in the the `libkrun` crate.

Signed-off-by: Matej Hrica <mhrica@redhat.com>
2025-06-03 16:44:40 +01:00
Matej Hrica c226f2286a chroot_vm: Support redirecting libkrun log to a pipe
Use the newer krun_init_log to support redirecting the log to a pipe.

Signed-off-by: Matej Hrica <mhrica@redhat.com>
2025-06-03 16:44:40 +01:00
Matej Hrica 03924f4d6c chroot_vm: Set default log level to "warn"
chroot_vm is meant as an example and program to showcase libkrun APIs, we should
at least show error messages by default even without RUST_LOG env variable.

Signed-off-by: Matej Hrica <mhrica@redhat.com>
2025-06-03 16:44:40 +01:00
Matej Hrica 2176075a54 Introduce a new krun_init_log() that replaces krun_set_log_level
Introduce a new krun_init_log public API function to allow for much more
detailed configuration of logging by the application. The main improvment is
is the ability to specify a file descriptor to write the log to.

Signed-off-by: Matej Hrica <mhrica@redhat.com>
2025-06-03 16:44:40 +01:00
Matej Hrica 1d54577077 examples/boot_efi+external_kernel: Make connect_to_passt const correct
The argument should be const (same as `cmdline.passt_socket_path` we pass in).

Signed-off-by: Matej Hrica <mhrica@redhat.com>
2025-05-29 13:59:40 +02:00
Matej Hrica b805f3a1a0 examples/chroot_vm: Fix connect_to_passt function
The function needs to accept socket_path as an argument, like expected at the
call site. This used to compile with older gcc, but the code was wrong.

Signed-off-by: Matej Hrica <mhrica@redhat.com>
2025-05-29 13:59:40 +02:00
Sergio Lopez 5c3ecd66c6 Bump version 1.12.2
This release is intended to simplify packaging by dropping rangemap
as a dependency and include the security fix for crossbeam-channel.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-05-20 09:47:19 -04:00
Sergio Lopez 25a972e33e vmm: drop use of rangemap crate
It's not packaged in Fedora and I don't think it adds enough value
to justify its addition.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-05-19 16:07:32 -04:00
Sergio Lopez d645ced4bd Bump version to 1.12.1
This is a minor security release to address a vulnerability in
crossbeam-channel (CVE-2025-4574).

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-05-16 12:48:47 -04:00
Sergio Lopez 33773bee2b clippy: use std::ptr:eq
Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-05-16 15:12:30 +01:00
Sergio Lopez 1dfa1170c5 clippy: use std::io::Error::other
Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-05-16 15:12:30 +01:00
Sergio Lopez 7f08ebaeac Require crossbeam-channel 0.5.15 or higher
crosbeam-channel from version 0.5.12 to 0.5.14 are affected by
CVE-2025-4574 (https://bugzilla.redhat.com/show_bug.cgi?id=2358890)

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-05-16 15:12:30 +01:00
Sergio Lopez 45563e9c78 Bump version to 1.12.0
Set up the stage for a new release. Bumping minor since we're
extending the API with a function to check whether nested virt
is supported.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-05-12 17:54:41 +02:00
Jake Correnti 9041aaa4cb Update worker thread for TEE
For TEE VMs, use the same sender as we would for macOS or an x86 VM with
a split IRQCHIP. ADditionally, use a channel for inter-process
communication instead of an EventFd.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-05-06 10:06:43 -04:00
Jake Correnti 52bcaffff2 Update worker thread sender on x86
On x86, use the same sender as we would for macOS. Additionally, rather
than using an EventFd to determine when the thread work is done, use a
response sender/receiver like macOS.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-05-06 10:06:43 -04:00
Jake Correnti f72429181c Generify worker thread on macOS
Rather than creating the worker thread on macOS differently than we do
for the other x86 tasks, use the same associative function by taking
    advnatage of the generic `WorkerMessage` enum.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-05-06 10:06:43 -04:00
Jake Correnti e3023e9ed2 Generify worker threads on x86
Rather than spawning a new worker thread every time some functionality
needs it, which results in it's own Message type, create something
generic that can be used in one thread.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-05-06 10:06:43 -04:00
Jake Correnti 73352c997a arch_gen: add Cargo edition
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-05-02 09:57:09 -04:00
Jake Correnti 3507730c98 hvf: Add API to verify Nested Virt is supported
Add an API to check if the current system supports Nested Virt on macOS.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-29 19:12:49 +02:00
Sergio Lopez a137a2f5b4 init: use the same exit codes as chroot/podman
For consistency for the container isolation use case, let's use in init
the same exit codes as chroot and podman do:

 125: "init" cannot set up the environment inside the microVM.
 126: "init" can find the executable to be run inside the microVM but cannot not execute it.
 127: "init" cannot find the executable to be run inside the microVM.

Co-authored-by: Matej Hrica <30753707+mtjhrc@users.noreply.github.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-04-29 18:31:44 +02:00
Sergio Lopez 78f0b744a0 init: record exit code of the entrypoint
Use the newly added exit code recording feature in virtio-fs to record
the exit code from workload's entrypoint.

We need to stop ignoring SIGCHLD, as otherwise waitpid doesn't record
the exit code of our child, which also means we need to do a waitpid
on every children to ensure we aren't leaving zombie processes.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-04-29 18:31:44 +02:00
Sergio Lopez ef0b0fb01f virtio/fs/macos: fix INIT_BINARY read with offset
We already fixed this for Linux a while ago, do the same for the macOS
implementation.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-04-29 18:31:44 +02:00
Sergio Lopez 1e3c568090 virtio/fs: implement an ioctl to receive exit_code
For the container use case, we need to relay the exit code from
userspace in the microVM all the way to the process using libkrun to
launch the VMM.

Since ioctls are passed mostly unmodified from userspace in the guest
to the virtio-fs device, we can use them as a mechanism for transporting
this information without requiring any specific support in the guest's
kernel.

Here we create an AtomicI32 and wire it up between virtio-fs and Vmm,
using it as exit code if userspace has set it to some value other than
i32::MAX. Otherwise, we keep using the vCPU exit code, as we did before.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2025-04-29 18:31:44 +02:00
Tyler Fanelli 3722a253fe Fix clippy warnings
Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-28 22:23:21 -04:00
Tyler Fanelli d8a1bc9eb1 vmm: Coalesce around pm_sender
Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-27 21:56:28 -04:00
Matias Ezequiel Vara Larsen f8bde30ef3 Add support for KVM_EXIT_MEMORY_FAULT
The KVM_EXIT_MEMORY_FAULT vmexit is triggered when guest wants to switch
a region of memory from private to shared and viceversa. To support this
when tee is enabled, add an extra thread named sender_io that gets the
parameters from the vcpu thread and triggers the
set_memory_properties(). The vcpu fd is owned only by this thread.

Signed-off-by: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>
2025-04-27 21:56:28 -04:00
Tyler Fanelli 97e6bd45a0 vmm/linux/tee: Handle KVM_EXIT_HYPERCALL exits
SEV-SNP guests use KVM_EXIT_HYPERCALL exits to signal to the hypervisor
it would like some pages set to private or shared.

Implements a handler that manages guest memory and can set regions to
private or shared. vCPUs can send "memory properties" messages
to the handler indicating:

- Guest GPA
- Size of memory region
- Whether the region should be set to private or shared

The handler will read these messages and configure the memory regions
accordingly.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-27 21:56:28 -04:00
Tyler Fanelli 4c234f6ecf amd-sev: Enable KVM_EXIT_HYPERCALL for SEV-SNP guests
SEV-SNP guests use KVM_EXIT_HYPERCALL to signal to the hypervisor that
they would like some memory shared or private. Enable the KVM capability
to allow the guests to use KVM_EXIT_HYPERCALL.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-27 21:56:28 -04:00
Tyler Fanelli 9ec79fd58e vmm/linux: Make guest_memfd optional
guest_memfd is required by SEV-SNP guests, and will also be used for
other TEE architectures such as TDX and CCA. Probe if guest_memfd will
be used on the system (non-TEE workloads do not require guest_memfd),
and create/map them if so.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-27 21:56:28 -04:00
Matias Ezequiel Vara Larsen 7afe3b7a99 Use create_guest_memfd() and set_user_memory_region2()
When tee feature is enabled, use create_guest_memfd() and
set_user_memory_region2(). The created regions are marked as private by
using set_memory_attributes() thus imitating QEMU behavior.

Signed-off-by: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>
2025-04-27 21:56:28 -04:00
Tyler Fanelli f20729899f amd-sev: Update sev library to 6.0.0
Updated SEV-SNP support in the Linux kernel requires an update of the
sev library version along with some modifications to the APIs provided by the
library.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-27 21:56:28 -04:00
Tyler Fanelli 94dc1aef7a amd-sev: Create VM with KVM_X86_SNP_VM type
kvm_bindings does not yet expose the KVM_X86_SNP_VM value. Hard code the
value until it is available in kvm_bindings.

Taken from linux/arch/x86/include/uapi/asm/kvm.h

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-27 21:56:28 -04:00
Tyler Fanelli 3a6b263aa1 amd-sev: secure_virt_attest to secure_virt_measure
In the legacy AMD SEV implementation, the sev_secure_virt_attest
function performed pre-boot attestation for a VM. This implementation
was removed, and SEV-SNP uses post-boot attestation. As such, the
SEV-SNP implementation only measures each region for guests, and does
not attest anything, making the function name a bit misleading.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-27 21:56:28 -04:00
Tyler Fanelli 291a4e9b85 Update vmm-sys-util dependency
Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-16 13:31:24 -04:00
Tyler Fanelli 08e1ef8ce2 Update KVM dependencies
Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-16 13:31:24 -04:00
Tyler Fanelli 0237f1ba50 include: Add #ifndef header guard
Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
2025-04-11 10:25:37 +02:00
Jake Correnti a014ac5a72 examples: add `krun_split_irqchip`
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 2632a978ee vmm: if IRQCHIP is split, create `IoApic` device
If the user calls `krun_split_irqchip` with `enabled` set to `true`,
then create an `IoApic` device in userspace rather than creating an
IOAPIC in the guest with KVM.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti bad060d1fe libkrun: start IRQ worker thread if IRQCHIP is split
If the user called `krun_split_irqchip` with `enabled` set to `true`,
start an IRQ worker thread to aid in servicing interrupts or commiting
IRQ routes.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 0e3827e750 libkrun:vmm: add API to set split irqchip
Provide user API, `krun_split_irqhip`, to specify the VM should have
`KVM_CAP_SPLIT_IRQCHIP` enabled.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 031cb20f1c devices/legacy: implement `IrqChipT` trait for `IoApic`
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 17dd4cef5d devices/legacy: implement `BusDevice` trait for `IoApic`
Implement the `BusDevice` trait for `IoApic` to handle reads and writes
from the system bus.

When reading the `IO_WIN` register, it's important to verify the data is
only 32-bits in size, as the register is 64-bits, but needs to do two
consecutive reads.

Like `read`, `write` needs to verify the data is 32-bits so the
register can be read twice and get the full 64-bits of data. In `write`,
the IO APIC can't write to the Version or Arbitration regiseters,
because those are Read-Only registers on the device.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 51d0a7df2d devices/legacy: add function to service interrupts
Add function to prepare the entry to be serviced and send message to the
IRQ worker thread.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 3c3bc6476e devices/legacy: add functions to update IRQ routes
Add a function to update the MSI data for each IRQ route if it's not
masked.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 06180db92d devices/legacy: parse ioredtbl entry into struct
Each entry in the ioredtbl is a 64-bit bitfield. Make it easier to
access the information in the entry by parsing it into a struct and
determine the MSI information that can be gathered from the entry.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti d83ab4c338 devices/legacy: add helper function to send irq worker message
When triggering an interrupt in the kernel or commiting the IRQ routes,
the `IoApic` struct needs to access the VM's FD. Provide a function to
send a message to the IRQ worker thread with the specified message.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti fe3517e899 devices/legacy: set irr bit to 0 on edge trigger
When an interrupt is edge triggered, the Remote IRR bit needs to be set
to 0. `fix_edge_remote_irr` provides an easy way to set that bit.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 665eb861fb devices/legacy: introduce `IoApic` struct
When enabling `KVM_CAP_SPLIT_IRQCHIP`, the LAPIC will be created by the
guest kernel. However, the VMM needs to create an IOAPIC in userspace to
redirect interrupts from the system bus to the respective LAPIC.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
Jake Correnti 6e9b18accf utils: add sized_vec
From https://www.github.com/cloud-hypervisor/cloud-hypervisor:

The kvm API has many structs that resemble the following `Foo`
structure:

```
struct Foo {
   some_data: u32
   entries: __IncompleteArrayField<__u32>,
}
```

In order to allocate such a structure, `size_of::<Foo>()` would be too
small because it would not include any space for `entries`. To make the
allocation large enough while still being aligned for `Foo`, a
`Vec<Foo>` is created. Only the first element of `Vec<Foo>` would
actually be used as a `Foo`. The remaining memory in the `Vec<Foo>` is
for `entries`, which must be contiguous with `Foo`.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
2025-04-10 12:00:09 -04:00
60 changed files with 2866 additions and 1201 deletions

965
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,7 @@
LIBRARY_HEADER = include/libkrun.h
ABI_VERSION=1
FULL_VERSION=1.11.2
FULL_VERSION=1.13.0
INIT_SRC = init/init.c
KBS_INIT_SRC = init/tee/kbs/kbs.h \
@ -27,9 +27,6 @@ ifeq ($(SEV),1)
INIT_SRC += $(SNP_INIT_SRC)
BUILD_INIT = 0
endif
ifeq ($(GPU),1)
FEATURE_FLAGS += --features gpu
endif
ifeq ($(VIRGL_RESOURCE_MAP2),1)
FEATURE_FLAGS += --features virgl_resource_map2
endif
@ -39,12 +36,20 @@ endif
ifeq ($(NET),1)
FEATURE_FLAGS += --features net
endif
ifeq ($(EFI),1)
VARIANT = -efi
FEATURE_FLAGS := --features efi # EFI Implies blk and net
BUILD_INIT = 0
endif
ifeq ($(GPU),1)
FEATURE_FLAGS += --features gpu
endif
ifeq ($(SND),1)
FEATURE_FLAGS += --features snd
endif
ifeq ($(EFI),1)
VARIANT = -efi
FEATURE_FLAGS := --features efi,gpu
ifeq ($(NITRO),1)
VARIANT = -nitro
FEATURE_FLAGS := --features nitro
BUILD_INIT = 0
endif
@ -91,6 +96,9 @@ $(LIBRARY_RELEASE_$(OS)): $(INIT_BINARY)
ifeq ($(SEV),1)
mv target/release/libkrun.so target/release/$(KRUN_BASE_$(OS))
endif
ifeq ($(NITRO),1)
mv target/release/libkrun.so target/release/$(KRUN_BASE_$(OS))
endif
ifeq ($(OS),Darwin)
ifeq ($(EFI),1)
install_name_tool -id libkrun-efi.dylib target/release/libkrun.dylib

View File

@ -5,6 +5,7 @@ LDFLAGS_aarch64_Linux = -lkrun
LDFLAGS_arm64_Darwin = -L/opt/homebrew/lib -lkrun
LDFLAGS_sev = -lkrun-sev
LDFLAGS_efi = -L/opt/homebrew/lib -lkrun-efi
LDFLAGS_nitro = -lkrun-nitro
CFLAGS = -O2 -g -I../include
ROOTFS_DISTRO := fedora
ROOTFS_DIR = rootfs_$(ROOTFS_DISTRO)
@ -42,6 +43,9 @@ ifeq ($(OS),Darwin)
codesign --entitlements chroot_vm.entitlements --force -s - $@
endif
nitro: nitro.c
gcc -o $@ $< $(CFLAGS) $(LDFLAGS_nitro)
# Build the rootfs to be used with chroot_vm.
rootfs:
mkdir -p $(ROOTFS_DIR)
@ -50,4 +54,4 @@ rootfs:
podman rm libkrun_chroot_vm
clean:
rm -rf chroot_vm $(ROOTFS_DIR) launch-tee boot_efi external_kernel
rm -rf chroot_vm $(ROOTFS_DIR) launch-tee boot_efi external_kernel nitro

View File

@ -90,7 +90,7 @@ bool parse_cmdline(int argc, char *const argv[], struct cmdline *cmdline)
return false;
}
int connect_to_passt(char *socket_path)
int connect_to_passt(char const *socket_path)
{
struct sockaddr_un addr;
int socket_fd = socket(AF_UNIX, SOCK_STREAM, 0);

View File

@ -6,6 +6,7 @@
*/
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@ -34,6 +35,8 @@ static void print_help(char *const name)
"Usage: %s [OPTIONS] NEWROOT COMMAND [COMMAND_ARGS...]\n"
"OPTIONS: \n"
" -h --help Show help\n"
" --log=PATH Write libkrun log to file or named pipe at PATH\n"
" --color-log=PATH Write libkrun log to file or named pipe at PATH, use color\n"
" --net=NET_MODE Set network mode\n"
" --passt-socket=PATH Instead of starting passt, connect to passt socket at PATH"
"NET_MODE can be either TSI (default) or PASST\n"
@ -47,6 +50,8 @@ static void print_help(char *const name)
static const struct option long_options[] = {
{ "help", no_argument, NULL, 'h' },
{ "log", required_argument, NULL, 'L' },
{ "color-log", required_argument, NULL, 'C' },
{ "net_mode", required_argument, NULL, 'N' },
{ "passt-socket", required_argument, NULL, 'P' },
{ NULL, 0, NULL, 0 }
@ -54,12 +59,27 @@ static const struct option long_options[] = {
struct cmdline {
bool show_help;
int log_target;
uint32_t log_style;
enum net_mode net_mode;
char const *passt_socket_path;
char const *new_root;
char *const *guest_argv;
};
bool cmdline_set_log_target(struct cmdline *cmdline, const char *arg) {
int fd = open(arg, O_WRONLY);
if (fd < 0) {
perror(arg);
return false;
}
if (cmdline->log_target > 0) {
close(cmdline->log_target);
}
cmdline->log_target = fd;
return true;
}
bool parse_cmdline(int argc, char *const argv[], struct cmdline *cmdline)
{
assert(cmdline != NULL);
@ -71,6 +91,8 @@ bool parse_cmdline(int argc, char *const argv[], struct cmdline *cmdline)
.passt_socket_path = NULL,
.new_root = NULL,
.guest_argv = NULL,
.log_target = KRUN_LOG_TARGET_DEFAULT,
.log_style = KRUN_LOG_STYLE_AUTO
};
int option_index = 0;
@ -81,6 +103,14 @@ bool parse_cmdline(int argc, char *const argv[], struct cmdline *cmdline)
case 'h':
cmdline->show_help = true;
return true;
case 'C':
cmdline->log_style = KRUN_LOG_STYLE_ALWAYS;
/* fall through */
case 'L':
if (!cmdline_set_log_target(cmdline, optarg)) {
return false;
}
break;
case 'N':
if (strcasecmp("TSI", optarg) == 0) {
cmdline->net_mode = NET_MODE_TSI;
@ -119,7 +149,7 @@ bool parse_cmdline(int argc, char *const argv[], struct cmdline *cmdline)
return false;
}
int connect_to_passt()
int connect_to_passt(char const *socket_path)
{
struct sockaddr_un addr;
int socket_fd = socket(AF_UNIX, SOCK_STREAM, 0);
@ -130,7 +160,7 @@ int connect_to_passt()
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, "/tmp/passt_1.socket", sizeof(addr.sun_path) - 1);
strncpy(addr.sun_path, socket_path, sizeof(addr.sun_path) - 1);
if (connect(socket_fd, (const struct sockaddr *) &addr, sizeof(addr)) < 0) {
perror("Failed to bind passt socket");
@ -217,8 +247,8 @@ int main(int argc, char *const argv[])
return 0;
}
// Set the log level to "off".
err = krun_set_log_level(0);
// Set the log level to "warn".
err = krun_init_log(cmdline.log_target, KRUN_LOG_LEVEL_WARN, cmdline.log_style, 0);
if (err) {
errno = -err;
perror("Error configuring log level");
@ -301,6 +331,12 @@ int main(int argc, char *const argv[])
return -1;
}
if (err = krun_split_irqchip(ctx_id, false)) {
errno = -err;
perror("Error setting split IRQCHIP property");
return -1;
}
// Start and enter the microVM. Unless there is some error while creating the microVM
// this function never returns.
if (err = krun_start_enter(ctx_id)) {

View File

@ -149,7 +149,7 @@ bool parse_cmdline(int argc, char *const argv[], struct cmdline *cmdline)
return false;
}
int connect_to_passt(char *socket_path)
int connect_to_passt(char const *socket_path)
{
struct sockaddr_un addr;
int socket_fd = socket(AF_UNIX, SOCK_STREAM, 0);

224
examples/nitro.c Normal file
View File

@ -0,0 +1,224 @@
/*
* This is an example implementing running an example AWS nitro enclave with
* libkrun.
*
* Given a nitro enclave image, run the image in a nitro enclave with 1 vCPU and
* 256 MiB of memory allocated.
*/
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <libkrun.h>
#include <getopt.h>
#include <stdbool.h>
#include <assert.h>
#include <pthread.h>
#define MAX_ARGS_LEN 4096
#ifndef MAX_PATH
#define MAX_PATH 4096
#endif
#define IPC_SOCK_PATH "/tmp/krun_nitro_example_ipc.sock"
static void print_help(char *const name)
{
fprintf(stderr,
"Usage: %s EIF_FILE [COMMAND_ARGS...]\n"
"OPTIONS: \n"
" -h --help Show help\n"
"\n"
"ENCLAVE_IMAGE: The enclave image to run\n",
name
);
}
static const struct option long_options[] = {
{ "help", no_argument, NULL, 'h' },
{ NULL, 0, NULL, 0 }
};
struct cmdline {
bool show_help;
const char *eif_path;
};
bool parse_cmdline(int argc, char *const argv[], struct cmdline *cmdline)
{
int c, option_index = 0;
assert(cmdline != NULL);
// set the defaults
*cmdline = (struct cmdline){
.show_help = false,
.eif_path = NULL,
};
// the '+' in optstring is a GNU extension that disables permutating argv
while ((c = getopt_long(argc, argv, "+h", long_options, &option_index)) != -1) {
switch (c) {
case 'h':
cmdline->show_help = true;
return true;
case '?':
return false;
default:
fprintf(stderr, "internal argument parsing error (returned character code 0x%x)\n", c);
return false;
}
}
if (optind < argc) {
cmdline->eif_path = argv[optind];
return true;
} else
fprintf(stderr, "Missing EIF_FILE argument");
return false;
}
void *listen_enclave_output(void *opaque)
{
int ret, fd = (int) opaque, sock, len;
char buf[512];
struct sockaddr_un client_sockaddr;
sock = accept(fd, (struct sockaddr *) &client_sockaddr, &len);
if (sock < 1)
return (void *) -1;
for (;;) {
ret = read(sock, &buf, 512);
if (ret <= 0)
break;
else if (ret < 512) {
buf[ret] = '\0';
}
printf("%s", buf);
}
}
int main(int argc, char *const argv[])
{
int ret, ctx_id, err, i, sock_fd, enable = 1;
struct cmdline cmdline;
struct sockaddr_un addr;
pthread_t thread;
if (!parse_cmdline(argc, argv, &cmdline)) {
putchar('\n');
print_help(argv[0]);
return -1;
}
if (cmdline.show_help){
print_help(argv[0]);
return 0;
}
// Set the log level to "off".
err = krun_set_log_level(0);
if (err) {
errno = -err;
perror("Error configuring log level");
return -1;
}
// Create the configuration context.
ctx_id = krun_create_ctx();
if (ctx_id < 0) {
errno = -ctx_id;
perror("Error creating configuration context");
return -1;
}
// Configure the number of vCPUs (1) and the amount of RAM (512 MiB).
if (err = krun_set_vm_config(ctx_id, 1, 512)) {
errno = -err;
perror("Error configuring the number of vCPUs and/or the amount of RAM");
return -1;
}
// Set the nitro enclave image specified on the command line.
if (err = krun_nitro_set_image(ctx_id, cmdline.eif_path,
KRUN_NITRO_IMG_TYPE_EIF)) {
errno = -err;
perror("Error configuring nitro enclave image");
return -1;
}
// Configure the nitro enclave to run in debug mode.
if (err = krun_nitro_set_start_flags(ctx_id, KRUN_NITRO_START_FLAG_DEBUG)) {
errno = -err;
perror("Error configuring nitro enclave start flags");
return -1;
}
// Create and initialize UNIX IPC socket for reading enclave output.
sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (sock_fd < 0) {
perror("Error creating UNIX IPC socket for enclave communication");
return -1;
}
memset(&addr, 0, sizeof(struct sockaddr_un));
addr.sun_family = AF_UNIX;
strcpy(addr.sun_path, IPC_SOCK_PATH);
// Listen on the socket for enclave output.
unlink(IPC_SOCK_PATH);
ret = bind(sock_fd, (struct sockaddr *) &addr, sizeof(addr));
if (ret < 0) {
perror("Error binding socket");
close(sock_fd);
exit(1);
}
ret = listen(sock_fd, 1);
if (ret < 0) {
perror("Error listening on socket");
close(sock_fd);
exit(1);
}
// Configure the IPC socket to read output from the enclave. The "port"
// argument is ignored.
if (err = krun_add_vsock_port(ctx_id, 0, IPC_SOCK_PATH)) {
close(sock_fd);
errno = -err;
perror("Error configuring enclave vsock");
return -1;
}
ret = pthread_create(&thread, NULL, listen_enclave_output,
(void *) sock_fd);
if (ret < 0) {
perror("unable to create new listener thread");
close(sock_fd);
exit(1);
}
// Start and enter the microVM. Unless there is some error while creating the microVM
// this function never returns.
if (err = krun_start_enter(ctx_id)) {
close(sock_fd);
errno = -err;
perror("Error creating the microVM");
return -1;
}
ret = pthread_join(thread, NULL);
if (ret < 0) {
perror("unable to join listener thread");
close(sock_fd);
exit(1);
}
return 0;
}

Binary file not shown.

View File

@ -1,3 +1,10 @@
#ifndef _LIBKRUN_H
#define _LIBKRUN_H
#ifdef __cplusplus
extern "C" {
#endif
#include <inttypes.h>
#include <stdbool.h>
#include <unistd.h>
@ -19,6 +26,44 @@
*/
int32_t krun_set_log_level(uint32_t level);
#define KRUN_LOG_TARGET_DEFAULT -1
#define KRUN_LOG_LEVEL_OFF 0
#define KRUN_LOG_LEVEL_ERROR 1
#define KRUN_LOG_LEVEL_WARN 2
#define KRUN_LOG_LEVEL_INFO 3
#define KRUN_LOG_LEVEL_DEBUG 4
#define KRUN_LOG_LEVEL_TRACE 5
#define KRUN_LOG_STYLE_AUTO 0
#define KRUN_LOG_STYLE_ALWAYS 1
#define KRUN_LOG_STYLE_NEVER 2
#define KRUN_LOG_OPTION_NO_ENV 1
/**
* Initializes logging for the library.
*
* Arguments:
* "target_fd" - File descriptor to write log to. Note that using a file descriptor pointing to a regular file on
* filesystem might slow down the VM.
* Use KRUN_LOG_TARGET_DEFAULT to use the default target for log output (stderr).
*
* "level" - Level is an integer specifying the level of verbosity, higher number means more verbose log.
* The log levels are described by the constants: KRUN_LOG_LEVEL_{OFF, ERROR, WARN, INFO, DEBUG, TRACE}
*
* "style" - Enable/disable usage of terminal escape sequences (to display colors)
* One of: KRUN_LOG_STYLE_{AUTO, ALWAYS, NEVER}.
*
* "options" - Bitmask of logging options, use 0 for default options.
* KRUN_LOG_OPTION_NO_ENV to disallow environment variables to override these settings.
*
* Returns:
* Zero on success or a negative error number on failure.
*/
int32_t krun_init_log(int target_fd, uint32_t level, uint32_t style, uint32_t options);
/**
* Creates a configuration context.
*
@ -553,6 +598,54 @@ int32_t krun_setgid(uint32_t ctx_id, gid_t gid);
*/
int32_t krun_set_nested_virt(uint32_t ctx_id, bool enabled);
/**
* Check the system if Nested Virtualization is supported
*
* Notes:
* This feature is only supported on macOS.
*
* Returns:
* - 1 : Success and Nested Virtualization is supported
* - 0 : Success and Nested Virtualization is not supported
* - <0: Failure
*/
int32_t krun_check_nested_virt(void);
/**
* Specify whether to split IRQCHIP responsibilities between the host and the guest.
*
* Arguments:
* "ctx_id" - the configuration context ID.
* "enable" - whether to enable the split IRQCHIP
*
* Returns:
* Zero on success or a negative error number on failure.
*/
int32_t krun_split_irqchip(uint32_t ctx_id, bool enable);
#define KRUN_NITRO_IMG_TYPE_EIF 1
/**
* Configure a Nitro Enclaves image.
*
* Arguments:
* "ctx_id" - the configuration context ID.
* "image_path" - a null-terminated string representing the path of the image
* in the host.
* "image_type" - the type of enclave image being provided.
*/
int32_t krun_nitro_set_image(uint32_t ctx_id, const char *image_path,
uint32_t image_type);
#define KRUN_NITRO_START_FLAG_DEBUG (1 << 0)
/**
* Configure a Nitro Enclave's start flags.
*
* Arguments:
* "ctx_id" - the configuration context ID.
* "start_flags" - Start flags.
*/
int32_t krun_nitro_set_start_flags(uint32_t ctx_id, uint64_t start_flags);
/**
* Starts and enters the microVM with the configured parameters. The VMM will attempt to take over
* stdin/stdout to manage them on behalf of the process running inside the isolated environment,
@ -563,9 +656,24 @@ int32_t krun_set_nested_virt(uint32_t ctx_id, bool enabled);
* Arguments:
* "ctx_id" - the configuration context ID.
*
* Returns:
* Notes:
* This function only returns if an error happens before starting the microVM. Otherwise, the
* VMM assumes it has full control of the process, and will call to exit() once the microVM shuts
* down.
* VMM assumes it has full control of the process, and will call to exit() with the workload's exit
* code once the microVM shuts down. If an error occurred before running the workload the process
* will exit() with an error exit code.
*
* Error exit codes:
* 125 - "init" cannot set up the environment inside the microVM.
* 126 - "init" can find the executable to be run inside the microVM but cannot execute it.
* 127 - "init" cannot find the executable to be run inside the microVM.
*
* Returns:
* -EINVAL - The VMM has detected an error in the microVM configuration.
*/
int32_t krun_start_enter(uint32_t ctx_id);
#ifdef __cplusplus
}
#endif
#endif // _LIBKRUN_H

View File

@ -28,6 +28,8 @@
#include "tee/snp_attest.h"
#endif
#define KRUN_EXIT_CODE_IOCTL 0x7602
#define KRUN_MAGIC "KRUN"
#define KRUN_FOOTER_LEN 12
#define CMDLINE_SECRET_PATH "/sfs/secrets/coco/cmdline"
@ -954,10 +956,31 @@ int setup_redirects()
return 0;
}
void set_exit_code(int code)
{
int fd;
int ret;
fd = open("/", O_RDONLY);
if (fd < 0) {
perror("Couldn't open root filesystem to report exit code");
return;
}
ret = ioctl(fd, KRUN_EXIT_CODE_IOCTL, code);
if (ret < 0) {
perror("Error using the ioctl to set the exit code");
}
close(fd);
}
int main(int argc, char **argv)
{
struct ifreq ifr;
int sockfd;
int status;
int saved_errno;
char localhost[] = "localhost\0";
char *hostname;
char *krun_home;
@ -1042,26 +1065,41 @@ int main(int argc, char **argv)
// We need to fork ourselves, because pid 1 cannot doesn't receive SIGINT
// signal
int pid = fork();
if (pid < 0) {
int child = fork();
if (child < 0) {
perror("fork");
exit(-3);
set_exit_code(125);
exit(125);
}
if (pid == 0) { // child
if (child == 0) { // child
if (setup_redirects() < 0) {
exit(-4);
exit(125);
}
if (execvp(exec_argv[0], exec_argv) < 0) {
saved_errno = errno;
printf("Couldn't execute '%s' inside the vm: %s\n", exec_argv[0],
strerror(errno));
exit(-3);
// Use the same exit code as chroot and podman do.
if (saved_errno == ENOENT) {
exit(127);
} else {
exit(126);
}
}
} else { // parent
// tell the kernel we don't want to be notified on SIGCHLD so it'll reap
// our children for us
signal(SIGCHLD, SIG_IGN);
// wait for children since we can't exit init
waitpid(pid, NULL, 0);
// Wait until the workload's entrypoint has exited, ignoring any other
// children.
while (waitpid(-1, &status, 0) != child) {
// Not the first child, ignore it.
};
// The workload's entrypoint has exited, record its exit code and exit
// ourselves.
if (WIFEXITED(status)) {
set_exit_code(WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
set_exit_code(WTERMSIG(status) + 128);
}
}
return 0;

View File

@ -12,14 +12,15 @@ efi = []
[dependencies]
libc = ">=0.2.39"
vm-memory = { version = ">=0.13", features = ["backend-mmap"] }
vmm-sys-util = ">= 0.14"
arch_gen = { path = "../arch_gen" }
smbios = { path = "../smbios" }
utils = { path = "../utils" }
[target.'cfg(target_os = "linux")'.dependencies]
kvm-bindings = { version = ">=0.8", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.17"
kvm-bindings = { version = ">=0.11", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.21"
[dev-dependencies]
utils = { path = "../utils" }

View File

@ -5,11 +5,11 @@
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
use std::{mem, num::TryFromIntError, result};
use std::{mem, mem::offset_of, num::TryFromIntError, result};
use super::super::get_fdt_addr;
use kvm_bindings::{
user_pt_regs, KVM_REG_ARM64, KVM_REG_ARM64_SYSREG, KVM_REG_ARM64_SYSREG_CRM_MASK,
kvm_regs, user_pt_regs, KVM_REG_ARM64, KVM_REG_ARM64_SYSREG, KVM_REG_ARM64_SYSREG_CRM_MASK,
KVM_REG_ARM64_SYSREG_CRM_SHIFT, KVM_REG_ARM64_SYSREG_CRN_MASK, KVM_REG_ARM64_SYSREG_CRN_SHIFT,
KVM_REG_ARM64_SYSREG_OP0_MASK, KVM_REG_ARM64_SYSREG_OP0_SHIFT, KVM_REG_ARM64_SYSREG_OP1_MASK,
KVM_REG_ARM64_SYSREG_OP1_SHIFT, KVM_REG_ARM64_SYSREG_OP2_MASK, KVM_REG_ARM64_SYSREG_OP2_SHIFT,
@ -42,24 +42,9 @@ const PSR_D_BIT: u64 = 0x0000_0200;
// Taken from arch/arm64/kvm/inject_fault.c.
const PSTATE_FAULT_BITS_64: u64 = PSR_MODE_EL1h | PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT;
// Following are macros that help with getting the ID of a aarch64 core register.
// This is a macro that helps with getting the ID of a aarch64 core register.
// The core register are represented by the user_pt_regs structure. Look for it in
// arch/arm64/include/uapi/asm/ptrace.h.
// This macro gets the offset of a structure (i.e `str`) member (i.e `field`) without having
// an instance of that structure.
// It uses a null pointer to retrieve the offset to the field.
// Inspired by C solution: `#define offsetof(str, f) ((size_t)(&((str *)0)->f))`.
// Doing `offset__of!(user_pt_regs, pstate)` in our rust code will trigger the following:
// unsafe { &(*(0 as *const user_pt_regs)).pstate as *const _ as usize }
// The dereference expression produces an lvalue, but that lvalue is not actually read from,
// we're just doing pointer math on it, so in theory, it should safe.
macro_rules! offset__of {
($str:ty, $field:ident) => {
unsafe { &(*(std::ptr::null::<user_pt_regs>())).$field as *const _ as usize }
};
}
macro_rules! arm64_core_reg {
($reg: tt) => {
// As per `kvm_arm_copy_reg_indices`, the id of a core register can be obtained like this:
@ -87,7 +72,7 @@ macro_rules! arm64_core_reg {
KVM_REG_ARM64 as u64
| KVM_REG_SIZE_U64 as u64
| u64::from(KVM_REG_ARM_CORE)
| ((offset__of!(user_pt_regs, $reg) / mem::size_of::<u32>()) as u64)
| (((offset_of!(kvm_regs, regs) + offset_of!(user_pt_regs, $reg)) / mem::size_of::<u32>()) as u64)
};
}
@ -126,14 +111,12 @@ arm64_sys_reg!(MPIDR_EL1, 3, 0, 0, 0, 5);
/// * `mem` - Reserved DRAM for current VM.
pub fn setup_regs(vcpu: &VcpuFd, cpu_id: u8, boot_ip: u64, mem: &GuestMemoryMmap) -> Result<()> {
// Get the register index of the PSTATE (Processor State) register.
#[allow(deref_nullptr)]
vcpu.set_one_reg(arm64_core_reg!(pstate), &PSTATE_FAULT_BITS_64.to_le_bytes())
.map_err(Error::SetCoreRegister)?;
// Other vCPUs are powered off initially awaiting PSCI wakeup.
if cpu_id == 0 {
// Setting the PC (Processor Counter) to the current program address (kernel address).
#[allow(deref_nullptr)]
vcpu.set_one_reg(arm64_core_reg!(pc), &boot_ip.to_le_bytes())
.map_err(Error::SetCoreRegister)?;
@ -141,7 +124,6 @@ pub fn setup_regs(vcpu: &VcpuFd, cpu_id: u8, boot_ip: u64, mem: &GuestMemoryMmap
// "The device tree blob (dtb) must be placed on an 8-byte boundary and must
// not exceed 2 megabytes in size." -> https://www.kernel.org/doc/Documentation/arm64/booting.txt.
// We are choosing to place it the end of DRAM. See `get_fdt_addr`.
#[allow(deref_nullptr)]
vcpu.set_one_reg(arm64_core_reg!(regs), &get_fdt_addr(mem).to_le_bytes())
.map_err(Error::SetCoreRegister)?;
}

View File

@ -17,8 +17,9 @@ pub use self::macos::*;
use std::fmt::Debug;
use crate::{round_up, ArchMemoryInfo};
use crate::ArchMemoryInfo;
use vm_memory::{Address, GuestAddress, GuestMemory, GuestMemoryMmap};
use vmm_sys_util::align_upwards;
#[cfg(feature = "efi")]
use smbios;
@ -44,7 +45,7 @@ pub fn arch_memory_regions(
initrd_size: u64,
) -> (ArchMemoryInfo, Vec<(GuestAddress, usize)>) {
let page_size: usize = unsafe { libc::sysconf(libc::_SC_PAGESIZE).try_into().unwrap() };
let dram_size = round_up(size, page_size);
let dram_size = align_upwards!(size, page_size);
let ram_last_addr = layout::DRAM_MEM_START + (dram_size as u64);
let shm_start_addr = ((ram_last_addr / 0x4000_0000) + 1) * 0x4000_0000;
@ -97,8 +98,9 @@ pub fn get_kernel_start() -> u64 {
/// Returns the memory address where the initrd could be loaded.
pub fn initrd_load_addr(guest_mem: &GuestMemoryMmap, initrd_size: usize) -> super::Result<u64> {
let round_to_pagesize = |size| (size + (super::PAGE_SIZE - 1)) & !(super::PAGE_SIZE - 1);
match GuestAddress(get_fdt_addr(guest_mem)).checked_sub(round_to_pagesize(initrd_size) as u64) {
match GuestAddress(get_fdt_addr(guest_mem))
.checked_sub(align_upwards!(initrd_size, super::PAGE_SIZE) as u64)
{
Some(offset) => {
if guest_mem.address_in_range(offset) {
Ok(offset.raw_value())

View File

@ -48,12 +48,3 @@ pub struct InitrdConfig {
/// Default (smallest) memory page size for the supported architectures.
pub const PAGE_SIZE: usize = 4096;
pub fn round_up(size: usize, align: usize) -> usize {
let page_mask = align - 1;
(size + page_mask) & !page_mask
}
pub fn round_down(size: usize, align: usize) -> usize {
let page_mask = !(align - 1);
size & page_mask
}

View File

@ -17,12 +17,13 @@ pub mod msr;
/// Logic for configuring x86_64 registers.
pub mod regs;
use crate::{round_up, ArchMemoryInfo, InitrdConfig};
use crate::{ArchMemoryInfo, InitrdConfig};
use arch_gen::x86::bootparam::{boot_params, E820_RAM};
use vm_memory::Bytes;
use vm_memory::{
Address, ByteValued, GuestAddress, GuestMemory, GuestMemoryMmap, GuestMemoryRegion,
};
use vmm_sys_util::align_upwards;
// This is a workaround to the Rust enforcement specifying that any implementation of a foreign
// trait (in this case `ByteValued`) where:
@ -73,7 +74,7 @@ pub fn arch_memory_regions(
) -> (ArchMemoryInfo, Vec<(GuestAddress, usize)>) {
let page_size: usize = unsafe { libc::sysconf(libc::_SC_PAGESIZE).try_into().unwrap() };
let size = round_up(size, page_size);
let size = align_upwards!(size, page_size);
// It's safe to cast MMIO_MEM_START to usize because it fits in a u32 variable
// (It points to an address in the 32 bit space).
@ -155,7 +156,7 @@ pub fn arch_memory_regions(
) -> (ArchMemoryInfo, Vec<(GuestAddress, usize)>) {
let page_size: usize = unsafe { libc::sysconf(libc::_SC_PAGESIZE).try_into().unwrap() };
let size = round_up(size, page_size);
let size = align_upwards!(size, page_size);
if let Some(kernel_load_addr) = kernel_load_addr {
if size < (kernel_load_addr + kernel_size as u64) as usize {
panic!("Kernel doesn't fit in RAM");

View File

@ -2,5 +2,6 @@
name = "arch_gen"
version = "0.1.0"
authors = ["Amazon Firecracker team <firecracker-devel@amazon.com>"]
edition = "2021"
[dependencies]

View File

@ -5,8 +5,8 @@ authors = ["Amazon Firecracker team <firecracker-devel@amazon.com>"]
edition = "2021"
[dependencies]
vmm-sys-util = ">=0.11"
vmm-sys-util = ">= 0.14"
[target.'cfg(target_os = "linux")'.dependencies]
kvm-bindings = { version = ">=0.8", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.17"
kvm-bindings = { version = ">=0.11", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.21"

View File

@ -13,11 +13,11 @@ efi = ["blk", "net"]
gpu = ["rutabaga_gfx", "thiserror", "zerocopy", "zerocopy-derive"]
snd = ["pw", "thiserror"]
virgl_resource_map2 = []
nitro = []
[dependencies]
bitflags = "1.2.0"
crossbeam-channel = "0.5"
env_logger = "0.9.0"
crossbeam-channel = ">=0.5.15"
libc = ">=0.2.39"
libloading = "0.8"
log = "0.4.0"
@ -43,8 +43,8 @@ lru = ">=0.9"
[target.'cfg(target_os = "linux")'.dependencies]
rutabaga_gfx = { path = "../rutabaga_gfx", features = ["x"], optional = true }
caps = "0.5.5"
kvm-bindings = { version = ">=0.8", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.17"
kvm-bindings = { version = ">=0.11", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.21"
[target.'cfg(target_arch = "aarch64")'.dependencies]
vm-fdt = ">= 0.2.0"
vm-fdt = ">= 0.2.0"

View File

@ -135,10 +135,9 @@ impl IrqChipT for HvfGicV3 {
if let Some(irq_line) = irq_line {
let ret = unsafe { (self.bindings.hv_gic_set_spi)(irq_line, true) };
if ret != HV_SUCCESS {
Err(DeviceError::FailedSignalingUsedQueue(io::Error::new(
io::ErrorKind::Other,
"HVF returned error when setting SPI",
)))
Err(DeviceError::FailedSignalingUsedQueue(
std::io::Error::other("HVF returned error when setting SPI"),
))
} else {
Ok(())
}

View File

@ -0,0 +1,466 @@
use crossbeam_channel::unbounded;
use kvm_bindings::{
kvm_enable_cap, kvm_irq_routing_entry, kvm_irq_routing_entry__bindgen_ty_1,
kvm_irq_routing_msi, KvmIrqRouting, KVM_CAP_SPLIT_IRQCHIP, KVM_IRQ_ROUTING_MSI,
};
use kvm_ioctls::{Error, VmFd};
use utils::eventfd::EventFd;
use utils::worker_message::WorkerMessage;
use crate::bus::BusDevice;
use crate::legacy::irqchip::IrqChipT;
use crate::Error as DeviceError;
const IOAPIC_BASE: u32 = 0xfec0_0000;
const APIC_DEFAULT_ADDRESS: u32 = 0xfee0_0000;
const IOAPIC_NUM_PINS: usize = 24;
const IO_REG_SEL: u64 = 0x00;
const IO_WIN: u64 = 0x10;
const IO_EOI: u64 = 0x40;
const IO_APIC_ID: u8 = 0x00;
const IO_APIC_VER: u8 = 0x01;
const IO_APIC_ARB: u8 = 0x02;
const IOAPIC_LVT_DELIV_MODE_SHIFT: u64 = 8;
const IOAPIC_LVT_DEST_MODE_SHIFT: u64 = 11;
const IOAPIC_LVT_DELIV_STATUS_SHIFT: u64 = 12;
const IOAPIC_LVT_REMOTE_IRR_SHIFT: u64 = 14;
const IOAPIC_LVT_TRIGGER_MODE_SHIFT: u64 = 15;
const IOAPIC_LVT_MASKED_SHIFT: u64 = 16;
const IOAPIC_LVT_DEST_IDX_SHIFT: u64 = 48;
const IOAPIC_VER_ENTRIES_SHIFT: u64 = 16;
const IOAPIC_ID_SHIFT: u64 = 24;
const MSI_DATA_VECTOR_SHIFT: u64 = 0;
const MSI_ADDR_DEST_MODE_SHIFT: u64 = 2;
const MSI_ADDR_DEST_IDX_SHIFT: u64 = 4;
const MSI_DATA_DELIVERY_MODE_SHIFT: u64 = 8;
const MSI_DATA_TRIGGER_SHIFT: u64 = 15;
const IOAPIC_LVT_REMOTE_IRR: u64 = 1 << IOAPIC_LVT_REMOTE_IRR_SHIFT;
const IOAPIC_LVT_TRIGGER_MODE: u64 = 1 << IOAPIC_LVT_TRIGGER_MODE_SHIFT;
const IOAPIC_LVT_DELIV_STATUS: u64 = 1 << IOAPIC_LVT_DELIV_STATUS_SHIFT;
const IOAPIC_RO_BITS: u64 = IOAPIC_LVT_REMOTE_IRR | IOAPIC_LVT_DELIV_STATUS;
const IOAPIC_RW_BITS: u64 = !IOAPIC_RO_BITS;
const IOAPIC_DM_MASK: u64 = 0x7;
const IOAPIC_ID_MASK: u64 = 0xf;
const IOAPIC_VECTOR_MASK: u64 = 0xff;
const IOAPIC_DM_EXTINT: u64 = 0x7;
const IOAPIC_REG_REDTBL_BASE: u64 = 0x10;
const IOAPIC_TRIGGER_EDGE: u64 = 0;
/// 63:56 Destination Field (RW)
/// 55:17 Reserved
/// 16 Interrupt Mask (RW)
/// 15 Trigger Mode (RW)
/// 14 Remote IRR (RO)
/// 13 Interrupt Input Pin Polarity (INTPOL) (RW)
/// 12 Delivery Status (DELIVS) (RO)
/// 11 Destination Mode (DESTMOD) (RW)
/// 10:8 Delivery Mode (DELMOD) (RW)
/// 7:0 Interrupt Vector (INTVEC) (RW)
type RedirectionTableEntry = u64;
#[derive(Debug, Default)]
pub struct IoApicEntryInfo {
masked: u8,
trig_mode: u8,
_dest_idx: u16,
_dest_mode: u8,
_delivery_mode: u8,
_vector: u8,
addr: u32,
data: u32,
}
#[derive(Default)]
struct MsiMessage {
address: u64,
data: u64,
}
#[derive(Debug)]
pub struct IoApic {
id: u8,
ioregsel: u8,
irr: u32,
ioredtbl: [u64; IOAPIC_NUM_PINS],
version: u8,
irq_eoi: [i32; IOAPIC_NUM_PINS],
irq_routes: Vec<kvm_irq_routing_entry>,
irq_sender: crossbeam_channel::Sender<WorkerMessage>,
}
impl IoApic {
pub fn new(
vm: &VmFd,
_irq_sender: crossbeam_channel::Sender<WorkerMessage>,
) -> Result<Self, Error> {
let mut cap = kvm_enable_cap {
cap: KVM_CAP_SPLIT_IRQCHIP,
..Default::default()
};
cap.args[0] = 24;
vm.enable_cap(&cap)?;
let mut ioapic = Self {
id: 0,
ioregsel: 0,
irr: 0,
ioredtbl: [1 << IOAPIC_LVT_MASKED_SHIFT; IOAPIC_NUM_PINS],
version: 0x20,
irq_eoi: [0; IOAPIC_NUM_PINS],
irq_routes: Vec::with_capacity(IOAPIC_NUM_PINS),
irq_sender: _irq_sender,
};
(0..IOAPIC_NUM_PINS).for_each(|i| ioapic.add_msi_route(i));
let mut routing = KvmIrqRouting::new(ioapic.irq_routes.len()).unwrap();
let routing_entires = routing.as_mut_slice();
routing_entires.copy_from_slice(ioapic.irq_routes.as_slice());
vm.set_gsi_routing(&routing)?;
Ok(ioapic)
}
fn add_msi_route(&mut self, virq: usize) {
let msg = MsiMessage::default();
let kroute = kvm_irq_routing_entry {
gsi: virq as u32,
type_: KVM_IRQ_ROUTING_MSI,
flags: 0,
u: kvm_irq_routing_entry__bindgen_ty_1 {
msi: kvm_irq_routing_msi {
address_lo: msg.address as u32,
address_hi: (msg.address >> 32) as u32,
data: msg.data as u32,
..Default::default()
},
},
..Default::default()
};
// 4095 is the max irq number for kvm (MAX_IRQ_ROUTES - 1)
if self.irq_routes.len() < 4095 {
self.irq_routes.push(kroute);
} else {
error!("ioapic: not enough space for irq");
}
}
fn fix_edge_remote_irr(&mut self, index: usize) {
if self.ioredtbl[index] & IOAPIC_LVT_TRIGGER_MODE == IOAPIC_TRIGGER_EDGE {
self.ioredtbl[index] &= !IOAPIC_LVT_REMOTE_IRR;
}
}
fn parse_entry(&self, entry: &RedirectionTableEntry) -> IoApicEntryInfo {
let vector = (entry & IOAPIC_VECTOR_MASK) as u8;
let dest_idx = ((entry >> IOAPIC_LVT_DEST_IDX_SHIFT) & 0xffff) as u16;
let delivery_mode = ((entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK) as u8;
let trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1) as u8;
let dest_mode = ((entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1) as u8;
if delivery_mode as u64 == IOAPIC_DM_EXTINT {
panic!("ioapic: libkrun does not have PIC support");
}
IoApicEntryInfo {
masked: ((entry >> IOAPIC_LVT_MASKED_SHIFT) & 1) as u8,
trig_mode,
_dest_idx: dest_idx,
_dest_mode: dest_mode,
_delivery_mode: delivery_mode,
_vector: vector,
addr: ((APIC_DEFAULT_ADDRESS as u64)
| ((dest_idx as u64) << MSI_ADDR_DEST_IDX_SHIFT)
| ((dest_mode as u64) << MSI_ADDR_DEST_MODE_SHIFT)) as u32,
data: (((vector as u64) << MSI_DATA_VECTOR_SHIFT)
| ((trig_mode as u64) << MSI_DATA_TRIGGER_SHIFT)
| ((delivery_mode as u64) << MSI_DATA_DELIVERY_MODE_SHIFT))
as u32,
}
}
fn update_msi_route(&mut self, virq: usize, msg: &MsiMessage) {
let kroute = kvm_irq_routing_entry {
gsi: virq as u32,
type_: KVM_IRQ_ROUTING_MSI,
flags: 0,
u: kvm_irq_routing_entry__bindgen_ty_1 {
msi: kvm_irq_routing_msi {
address_lo: msg.address as u32,
address_hi: (msg.address >> 32) as u32,
data: msg.data as u32,
..Default::default()
},
},
..Default::default()
};
for entry in self.irq_routes.iter_mut() {
if entry.gsi == kroute.gsi {
*entry = kroute;
}
}
}
fn update_routes(&mut self) {
for i in 0..IOAPIC_NUM_PINS {
let info = self.parse_entry(&self.ioredtbl[i]);
if info.masked == 0 {
let msg = MsiMessage {
address: info.addr as u64,
data: info.data as u64,
};
self.update_msi_route(i, &msg);
}
}
let (response_sender, response_receiver) = unbounded();
self.irq_sender
.send(WorkerMessage::GsiRoute(
response_sender.clone(),
self.irq_routes.clone(),
))
.unwrap();
if !response_receiver.recv().unwrap() {
error!("unable to set GSI Routes for IO APIC");
}
}
fn service(&mut self) {
for i in 0..IOAPIC_NUM_PINS {
let mask = 1 << i;
if self.irr & mask > 0 {
let mut coalesce = 0;
let entry = self.ioredtbl[i];
let info = self.parse_entry(&entry);
if info.masked == 0 {
if info.trig_mode as u64 == IOAPIC_TRIGGER_EDGE {
self.irr &= !mask;
} else {
coalesce = self.ioredtbl[i] & IOAPIC_LVT_REMOTE_IRR;
self.ioredtbl[i] |= IOAPIC_LVT_REMOTE_IRR;
}
if coalesce > 0 {
continue;
}
let (response_sender, response_receiver) = unbounded();
if info.trig_mode as u64 == IOAPIC_TRIGGER_EDGE {
self.irq_sender
.send(WorkerMessage::IrqLine(
response_sender.clone(),
i as u32,
true,
))
.unwrap();
if !response_receiver.recv().unwrap() {
error!(
"unable to set IRQ LINE for IRQ {} with active set to {}",
i, true
);
}
self.irq_sender
.send(WorkerMessage::IrqLine(
response_sender.clone(),
i as u32,
false,
))
.unwrap();
if !response_receiver.recv().unwrap() {
error!(
"unable to set IRQ LINE for IRQ {} with active set to {}",
i, false
);
}
} else {
self.irq_sender
.send(WorkerMessage::IrqLine(
response_sender.clone(),
i as u32,
true,
))
.unwrap();
if !response_receiver.recv().unwrap() {
error!(
"unable to set IRQ LINE for IRQ {} with active set to {}",
i, true
);
}
}
}
}
}
}
}
impl IrqChipT for IoApic {
fn get_mmio_addr(&self) -> u64 {
IOAPIC_BASE as u64
}
fn get_mmio_size(&self) -> u64 {
0x1000
}
fn set_irq(
&self,
_irq_line: Option<u32>,
interrupt_evt: Option<&EventFd>,
) -> Result<(), DeviceError> {
if let Some(interrupt_evt) = interrupt_evt {
if let Err(e) = interrupt_evt.write(1) {
error!("Failed to signal used queue: {:?}", e);
return Err(DeviceError::FailedSignalingUsedQueue(e));
}
} else {
error!("EventFd not set up for irq line");
return Err(DeviceError::FailedSignalingUsedQueue(std::io::Error::new(
std::io::ErrorKind::NotFound,
"EventFd not set up for irq line",
)));
}
Ok(())
}
}
impl BusDevice for IoApic {
fn read(&mut self, _vcpuid: u64, offset: u64, data: &mut [u8]) {
let val = match offset {
IO_REG_SEL => {
debug!("ioapic: read: ioregsel");
self.ioregsel as u32
}
IO_WIN => {
// the data needs to be 32-bits in size
if data.len() != 4 {
error!("ioapic: bad read size {}", data.len());
return;
}
match self.ioregsel {
IO_APIC_ID | IO_APIC_ARB => {
debug!("ioapic: read: IOAPIC ID");
((self.id as u64) << IOAPIC_ID_SHIFT) as u32
}
IO_APIC_VER => {
debug!("ioapic: read: IOAPIC version");
self.version as u32
| ((IOAPIC_NUM_PINS as u32 - 1) << IOAPIC_VER_ENTRIES_SHIFT)
}
_ => {
let index = (self.ioregsel as u64 - IOAPIC_REG_REDTBL_BASE) >> 1;
debug!("ioapic: read: ioredtbl register {}", index);
let mut val = 0u32;
// we can only read from this register in 32-bit chunks.
// Therefore, we need to check if we are reading the
// upper 32 bits or the lower
if index < IOAPIC_NUM_PINS as u64 {
if self.ioregsel & 1 > 0 {
// read upper 32 bits
val = (self.ioredtbl[index as usize] >> 32) as u32;
} else {
// read lower 32 bits
val = (self.ioredtbl[index as usize] & 0xffff_ffffu64) as u32;
}
}
val
}
}
}
_ => unreachable!(),
};
// turn the value into native endian byte order and put that value into `data`
let out_arr = val.to_ne_bytes();
for i in 0..4 {
if i < data.len() {
data[i] = out_arr[i];
}
}
}
fn write(&mut self, _vcpuid: u64, offset: u64, data: &[u8]) {
// data needs to be 32-bits in size
if data.len() != 4 {
error!("ioapic: bad write size {}", data.len());
return;
}
// convert data into a u32 int with native endianness
let arr = [data[0], data[1], data[2], data[3]];
let val = u32::from_ne_bytes(arr);
match offset {
IO_REG_SEL => {
debug!("ioapic: write: ioregsel");
self.ioregsel = val as u8
}
IO_WIN => {
match self.ioregsel {
IO_APIC_ID => {
debug!("ioapic: write: IOAPIC ID");
self.id = ((val >> IOAPIC_ID_SHIFT) & (IOAPIC_ID_MASK as u32)) as u8
}
// NOTE: these are read-only registers, so they should never be written to
IO_APIC_VER | IO_APIC_ARB => debug!("ioapic: write: IOAPIC VERSION"),
_ => {
if self.ioregsel < (IO_WIN as u8) {
debug!("invalid write; ignore");
return;
}
let index = (self.ioregsel as u64 - IOAPIC_REG_REDTBL_BASE) >> 1;
debug!("ioapic: write: ioredtbl register {}", index);
if index >= IOAPIC_NUM_PINS as u64 {
warn!("ioapic: write: virq out of pin range {}", index);
return;
}
let ro_bits = self.ioredtbl[index as usize] & IOAPIC_RO_BITS;
// check if we are writing to the upper 32-bits of the
// register or the lower 32-bits
if self.ioregsel & 1 > 0 {
self.ioredtbl[index as usize] &= 0xffff_ffff;
self.ioredtbl[index as usize] |= (val as u64) << 32;
} else {
self.ioredtbl[index as usize] &= !0xffff_ffff;
self.ioredtbl[index as usize] |= val as u64;
}
// restore RO bits
self.ioredtbl[index as usize] &= IOAPIC_RW_BITS;
self.ioredtbl[index as usize] |= ro_bits;
self.irq_eoi[index as usize] = 0;
// if the trigger mode is EDGE, clear IRR bit
self.fix_edge_remote_irr(index as usize);
self.update_routes();
self.service();
}
}
}
IO_EOI => todo!(),
_ => unreachable!(),
}
}
}

View File

@ -11,6 +11,8 @@ mod gicv3;
#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
mod hvfgicv3;
mod i8042;
#[cfg(all(target_os = "linux", target_arch = "x86_64"))]
mod ioapic;
mod irqchip;
#[cfg(all(target_os = "linux", target_arch = "aarch64"))]
mod kvmgicv3;
@ -39,6 +41,8 @@ pub use self::gpio::Gpio;
pub use self::hvfgicv3::HvfGicV3;
pub use self::i8042::Error as I8042DeviceError;
pub use self::i8042::I8042Device;
#[cfg(all(target_os = "linux", target_arch = "x86_64"))]
pub use self::ioapic::IoApic;
pub use self::irqchip::{IrqChip, IrqChipDevice, IrqChipT};
#[cfg(all(target_os = "linux", target_arch = "aarch64"))]
pub use self::kvmgicv3::KvmGicV3;

View File

@ -118,7 +118,7 @@ impl PortOutput for PortOutputFd {
VolatileMemoryError::IOError(e) => e,
e => {
log::error!("Unsuported error from write_volatile: {e:?}");
io::Error::new(ErrorKind::Other, e)
io::Error::other(e)
}
})
}
@ -173,9 +173,7 @@ impl PortOutputLog {
impl PortOutput for PortOutputLog {
fn write_volatile(&mut self, buf: &VolatileSlice) -> Result<usize, io::Error> {
self.buf
.write_volatile(buf)
.map_err(|e| io::Error::new(ErrorKind::Other, e))?;
self.buf.write_volatile(buf).map_err(io::Error::other)?;
let mut start = 0;
for (i, ch) in self.buf.iter().cloned().enumerate() {

View File

@ -2,13 +2,13 @@
use crossbeam_channel::Sender;
use std::cmp;
use std::io::Write;
use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
use std::sync::atomic::{AtomicI32, AtomicU64, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread::JoinHandle;
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
use utils::eventfd::{EventFd, EFD_NONBLOCK};
#[cfg(target_os = "macos")]
use utils::worker_message::WorkerMessage;
use virtio_bindings::{virtio_config::VIRTIO_F_VERSION_1, virtio_ring::VIRTIO_RING_F_EVENT_IDX};
use vm_memory::{ByteValued, GuestMemoryMmap};
@ -54,14 +54,16 @@ pub struct Fs {
passthrough_cfg: passthrough::Config,
worker_thread: Option<JoinHandle<()>>,
worker_stopfd: EventFd,
exit_code: Arc<AtomicI32>,
#[cfg(target_os = "macos")]
map_sender: Option<Sender<MemoryMapping>>,
map_sender: Option<Sender<WorkerMessage>>,
}
impl Fs {
pub(crate) fn with_queues(
fs_id: String,
shared_dir: String,
exit_code: Arc<AtomicI32>,
queues: Vec<VirtQueue>,
) -> super::Result<Fs> {
let mut queue_events = Vec::new();
@ -97,17 +99,18 @@ impl Fs {
passthrough_cfg: fs_cfg,
worker_thread: None,
worker_stopfd: EventFd::new(EFD_NONBLOCK).map_err(FsError::EventFd)?,
exit_code,
#[cfg(target_os = "macos")]
map_sender: None,
})
}
pub fn new(fs_id: String, shared_dir: String) -> super::Result<Fs> {
pub fn new(fs_id: String, shared_dir: String, exit_code: Arc<AtomicI32>) -> super::Result<Fs> {
let queues: Vec<VirtQueue> = defs::QUEUE_SIZES
.iter()
.map(|&max_size| VirtQueue::new(max_size))
.collect();
Self::with_queues(fs_id, shared_dir, queues)
Self::with_queues(fs_id, shared_dir, exit_code, queues)
}
pub fn id(&self) -> &str {
@ -132,7 +135,7 @@ impl Fs {
}
#[cfg(target_os = "macos")]
pub fn set_map_sender(&mut self, map_sender: Sender<MemoryMapping>) {
pub fn set_map_sender(&mut self, map_sender: Sender<WorkerMessage>) {
self.map_sender = Some(map_sender);
}
}
@ -226,6 +229,7 @@ impl VirtioDevice for Fs {
self.shm_region.clone(),
self.passthrough_cfg.clone(),
self.worker_stopfd.try_clone().unwrap(),
self.exit_code.clone(),
#[cfg(target_os = "macos")]
self.map_sender.clone(),
);

View File

@ -5,7 +5,7 @@
#[cfg(target_os = "macos")]
use crossbeam_channel::Sender;
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
use utils::worker_message::WorkerMessage;
use std::collections::BTreeMap;
use std::convert::TryInto;
@ -13,6 +13,7 @@ use std::ffi::{CStr, CString};
use std::fs::File;
use std::io;
use std::mem;
use std::sync::atomic::AtomicI32;
use std::sync::{Arc, Mutex};
use std::time::Duration;
@ -1133,7 +1134,7 @@ pub trait FileSystem {
moffset: u64,
host_shm_base: u64,
shm_size: u64,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<MemoryMapping>>,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<WorkerMessage>>,
) -> io::Result<()> {
Err(io::Error::from_raw_os_error(libc::ENOSYS))
}
@ -1144,7 +1145,7 @@ pub trait FileSystem {
requests: Vec<RemovemappingOne>,
host_shm_base: u64,
shm_size: u64,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<MemoryMapping>>,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<WorkerMessage>>,
) -> io::Result<()> {
Err(io::Error::from_raw_os_error(libc::ENOSYS))
}
@ -1160,6 +1161,7 @@ pub trait FileSystem {
arg: u64,
in_size: u32,
out_size: u32,
exit_code: &Arc<AtomicI32>,
) -> io::Result<Vec<u8>> {
Err(io::Error::from_raw_os_error(bindings::LINUX_ENOSYS))
}

View File

@ -11,12 +11,12 @@ use std::io;
use std::mem::{self, size_of, MaybeUninit};
use std::os::unix::io::{AsRawFd, FromRawFd, RawFd};
use std::str::FromStr;
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
use std::sync::atomic::{AtomicBool, AtomicI32, AtomicU64, Ordering};
use std::sync::{Arc, RwLock};
use std::time::Duration;
use caps::{has_cap, CapSet, Capability};
use nix::request_code_read;
use nix::{request_code_none, request_code_read};
use vm_memory::ByteValued;
@ -2030,7 +2030,7 @@ impl FileSystem for PassthroughFs {
0,
)
};
if ret == libc::MAP_FAILED {
if std::ptr::eq(ret, libc::MAP_FAILED) {
return Err(io::Error::last_os_error());
}
@ -2062,7 +2062,7 @@ impl FileSystem for PassthroughFs {
foffset as libc::off_t,
)
};
if ret == libc::MAP_FAILED {
if std::ptr::eq(ret, libc::MAP_FAILED) {
return Err(io::Error::last_os_error());
}
@ -2092,7 +2092,7 @@ impl FileSystem for PassthroughFs {
0_i64,
)
};
if ret == libc::MAP_FAILED {
if std::ptr::eq(ret, libc::MAP_FAILED) {
return Err(io::Error::last_os_error());
}
}
@ -2107,11 +2107,13 @@ impl FileSystem for PassthroughFs {
handle: Self::Handle,
_flags: u32,
cmd: u32,
_arg: u64,
arg: u64,
_in_size: u32,
out_size: u32,
exit_code: &Arc<AtomicI32>,
) -> io::Result<Vec<u8>> {
const VIRTIO_IOC_MAGIC: u8 = b'v';
const VIRTIO_IOC_TYPE_EXPORT_FD: u8 = 1;
const VIRTIO_IOC_EXPORT_FD_SIZE: usize = 2 * mem::size_of::<u64>();
const VIRTIO_IOC_EXPORT_FD_REQ: u32 = request_code_read!(
@ -2120,6 +2122,10 @@ impl FileSystem for PassthroughFs {
VIRTIO_IOC_EXPORT_FD_SIZE
) as u32;
const VIRTIO_IOC_TYPE_EXIT_CODE: u8 = 2;
const VIRTIO_IOC_EXIT_CODE_REQ: u32 =
request_code_none!(VIRTIO_IOC_MAGIC, VIRTIO_IOC_TYPE_EXIT_CODE) as u32;
match cmd {
VIRTIO_IOC_EXPORT_FD_REQ => {
if out_size as usize != VIRTIO_IOC_EXPORT_FD_SIZE {
@ -2150,6 +2156,10 @@ impl FileSystem for PassthroughFs {
ret.extend_from_slice(&handle.to_ne_bytes());
Ok(ret)
}
VIRTIO_IOC_EXIT_CODE_REQ => {
exit_code.store(arg as i32, Ordering::SeqCst);
Ok(Vec::new())
}
_ => Err(io::Error::from_raw_os_error(libc::EOPNOTSUPP)),
}
}

View File

@ -14,12 +14,12 @@ use std::mem::MaybeUninit;
use std::os::unix::io::{AsRawFd, FromRawFd, RawFd};
use std::ptr::null_mut;
use std::str::FromStr;
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
use std::sync::atomic::{AtomicBool, AtomicI32, AtomicU64, Ordering};
use std::sync::{Arc, Mutex, RwLock};
use std::time::Duration;
use crossbeam_channel::{unbounded, Sender};
use hvf::MemoryMapping;
use utils::worker_message::WorkerMessage;
use crate::virtio::fs::filesystem::SecContext;
@ -1238,7 +1238,15 @@ impl FileSystem for PassthroughFs {
debug!("read: {:?}", inode);
#[cfg(not(feature = "efi"))]
if inode == self.init_inode {
return w.write(&INIT_BINARY[offset as usize..(offset + (size as u64)) as usize]);
let off: usize = offset
.try_into()
.map_err(|_| io::Error::from_raw_os_error(libc::EINVAL))?;
let len = if off + (size as usize) < INIT_BINARY.len() {
size as usize
} else {
INIT_BINARY.len() - off
};
return w.write(&INIT_BINARY[off..(off + len)]);
}
let data = self
@ -1986,7 +1994,7 @@ impl FileSystem for PassthroughFs {
moffset: u64,
guest_shm_base: u64,
shm_size: u64,
map_sender: &Option<Sender<MemoryMapping>>,
map_sender: &Option<Sender<WorkerMessage>>,
) -> io::Result<()> {
if map_sender.is_none() {
return Err(linux_error(io::Error::from_raw_os_error(libc::ENOSYS)));
@ -2035,7 +2043,7 @@ impl FileSystem for PassthroughFs {
let sender = map_sender.as_ref().unwrap();
let (reply_sender, reply_receiver) = unbounded();
sender
.send(MemoryMapping::AddMapping(
.send(WorkerMessage::GpuAddMapping(
reply_sender,
host_addr as u64,
guest_addr,
@ -2062,7 +2070,7 @@ impl FileSystem for PassthroughFs {
requests: Vec<fuse::RemovemappingOne>,
guest_shm_base: u64,
shm_size: u64,
map_sender: &Option<Sender<MemoryMapping>>,
map_sender: &Option<Sender<WorkerMessage>>,
) -> io::Result<()> {
if map_sender.is_none() {
return Err(linux_error(io::Error::from_raw_os_error(libc::ENOSYS)));
@ -2085,7 +2093,7 @@ impl FileSystem for PassthroughFs {
let sender = map_sender.as_ref().unwrap();
let (reply_sender, reply_receiver) = unbounded();
sender
.send(MemoryMapping::RemoveMapping(
.send(WorkerMessage::GpuRemoveMapping(
reply_sender,
guest_addr,
req.len,
@ -2105,4 +2113,29 @@ impl FileSystem for PassthroughFs {
Ok(())
}
fn ioctl(
&self,
_ctx: Context,
_inode: Self::Inode,
_handle: Self::Handle,
_flags: u32,
cmd: u32,
arg: u64,
_in_size: u32,
_out_size: u32,
exit_code: &Arc<AtomicI32>,
) -> io::Result<Vec<u8>> {
// We can't use nix::request_code_none here since it's system-dependent
// and we need the value from Linux.
const VIRTIO_IOC_EXIT_CODE_REQ: u32 = 0x7602;
match cmd {
VIRTIO_IOC_EXIT_CODE_REQ => {
exit_code.store(arg as i32, Ordering::SeqCst);
Ok(Vec::new())
}
_ => Err(io::Error::from_raw_os_error(libc::EOPNOTSUPP)),
}
}
}

View File

@ -5,14 +5,15 @@
#[cfg(target_os = "macos")]
use crossbeam_channel::Sender;
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
use utils::worker_message::WorkerMessage;
use std::convert::TryInto;
use std::ffi::{CStr, CString};
use std::fs::File;
use std::io::{self, Read, Write};
use std::mem::size_of;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::atomic::{AtomicI32, AtomicU64, Ordering};
use std::sync::Arc;
use vm_memory::ByteValued;
@ -83,7 +84,8 @@ impl<F: FileSystem + Sync> Server<F> {
mut r: Reader,
w: Writer,
shm_region: &Option<VirtioShmRegion>,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<MemoryMapping>>,
exit_code: &Arc<AtomicI32>,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<WorkerMessage>>,
) -> Result<usize> {
let in_header: InHeader = r.read_obj().map_err(Error::DecodeMessage)?;
@ -132,7 +134,7 @@ impl<F: FileSystem + Sync> Server<F> {
x if x == Opcode::Interrupt as u32 => self.interrupt(in_header),
x if x == Opcode::Bmap as u32 => self.bmap(in_header, r, w),
x if x == Opcode::Destroy as u32 => self.destroy(),
x if x == Opcode::Ioctl as u32 => self.ioctl(in_header, r, w),
x if x == Opcode::Ioctl as u32 => self.ioctl(in_header, r, w, exit_code),
x if x == Opcode::Poll as u32 => self.poll(in_header, r, w),
x if x == Opcode::NotifyReply as u32 => self.notify_reply(in_header, r, w),
x if x == Opcode::BatchForget as u32 => self.batch_forget(in_header, r, w),
@ -1166,7 +1168,13 @@ impl<F: FileSystem + Sync> Server<F> {
Ok(0)
}
fn ioctl(&self, in_header: InHeader, mut r: Reader, w: Writer) -> Result<usize> {
fn ioctl(
&self,
in_header: InHeader,
mut r: Reader,
w: Writer,
exit_code: &Arc<AtomicI32>,
) -> Result<usize> {
let IoctlIn {
fh,
flags,
@ -1185,6 +1193,7 @@ impl<F: FileSystem + Sync> Server<F> {
arg,
in_size,
out_size,
exit_code,
) {
Ok(data) => {
let out = IoctlOut {
@ -1332,7 +1341,7 @@ impl<F: FileSystem + Sync> Server<F> {
w: Writer,
host_shm_base: u64,
shm_size: u64,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<MemoryMapping>>,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<WorkerMessage>>,
) -> Result<usize> {
let SetupmappingIn {
fh,
@ -1367,7 +1376,7 @@ impl<F: FileSystem + Sync> Server<F> {
w: Writer,
host_shm_base: u64,
shm_size: u64,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<MemoryMapping>>,
#[cfg(target_os = "macos")] map_sender: &Option<Sender<WorkerMessage>>,
) -> Result<usize> {
let RemovemappingIn { count } = r.read_obj().map_err(Error::DecodeMessage)?;

View File

@ -1,10 +1,10 @@
#[cfg(target_os = "macos")]
use crossbeam_channel::Sender;
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
use utils::worker_message::WorkerMessage;
use std::os::fd::AsRawFd;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::atomic::{AtomicI32, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;
@ -32,8 +32,9 @@ pub struct FsWorker {
shm_region: Option<VirtioShmRegion>,
server: Server<PassthroughFs>,
stop_fd: EventFd,
exit_code: Arc<AtomicI32>,
#[cfg(target_os = "macos")]
map_sender: Option<Sender<MemoryMapping>>,
map_sender: Option<Sender<WorkerMessage>>,
}
impl FsWorker {
@ -49,7 +50,8 @@ impl FsWorker {
shm_region: Option<VirtioShmRegion>,
passthrough_cfg: passthrough::Config,
stop_fd: EventFd,
#[cfg(target_os = "macos")] map_sender: Option<Sender<MemoryMapping>>,
exit_code: Arc<AtomicI32>,
#[cfg(target_os = "macos")] map_sender: Option<Sender<WorkerMessage>>,
) -> Self {
Self {
queues,
@ -63,6 +65,7 @@ impl FsWorker {
shm_region,
server: Server::new(PassthroughFs::new(passthrough_cfg).unwrap()),
stop_fd,
exit_code,
#[cfg(target_os = "macos")]
map_sender,
}
@ -170,6 +173,7 @@ impl FsWorker {
reader,
writer,
&self.shm_region,
&self.exit_code,
#[cfg(target_os = "macos")]
&self.map_sender,
) {

View File

@ -18,7 +18,7 @@ use super::worker::Worker;
use crate::legacy::IrqChip;
use crate::Error as DeviceError;
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
use utils::worker_message::WorkerMessage;
// Control queue.
pub(crate) const CTL_INDEX: usize = 0;
@ -49,7 +49,7 @@ pub struct Gpu {
pub(crate) sender: Option<Sender<u64>>,
virgl_flags: u32,
#[cfg(target_os = "macos")]
map_sender: Sender<MemoryMapping>,
map_sender: Sender<WorkerMessage>,
export_table: Option<ExportTable>,
}
@ -57,7 +57,7 @@ impl Gpu {
pub(crate) fn with_queues(
queues: Vec<VirtQueue>,
virgl_flags: u32,
#[cfg(target_os = "macos")] map_sender: Sender<MemoryMapping>,
#[cfg(target_os = "macos")] map_sender: Sender<WorkerMessage>,
) -> super::Result<Gpu> {
let mut queue_events = Vec::new();
for _ in 0..queues.len() {
@ -92,7 +92,7 @@ impl Gpu {
pub fn new(
virgl_flags: u32,
#[cfg(target_os = "macos")] map_sender: Sender<MemoryMapping>,
#[cfg(target_os = "macos")] map_sender: Sender<WorkerMessage>,
) -> super::Result<Gpu> {
let queues: Vec<VirtQueue> = defs::QUEUE_SIZES
.iter()

View File

@ -8,8 +8,6 @@ use std::sync::{Arc, Mutex};
#[cfg(target_os = "macos")]
use crossbeam_channel::{unbounded, Sender};
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
use libc::c_void;
#[cfg(target_os = "macos")]
use rutabaga_gfx::RUTABAGA_MEM_HANDLE_TYPE_APPLE;
@ -28,6 +26,8 @@ use rutabaga_gfx::{
RUTABAGA_MAP_ACCESS_READ, RUTABAGA_MAP_ACCESS_RW, RUTABAGA_MAP_ACCESS_WRITE,
};
use utils::eventfd::EventFd;
#[cfg(target_os = "macos")]
use utils::worker_message::WorkerMessage;
use vm_memory::{GuestAddress, GuestMemory, GuestMemoryMmap, VolatileSlice};
use super::super::Queue as VirtQueue;
@ -107,7 +107,7 @@ pub struct VirtioGpu {
resources: BTreeMap<u32, VirtioGpuResource>,
fence_state: Arc<Mutex<FenceState>>,
#[cfg(target_os = "macos")]
map_sender: Sender<MemoryMapping>,
map_sender: Sender<WorkerMessage>,
}
impl VirtioGpu {
@ -182,7 +182,7 @@ impl VirtioGpu {
intc: Option<IrqChip>,
irq_line: Option<u32>,
virgl_flags: u32,
#[cfg(target_os = "macos")] map_sender: Sender<MemoryMapping>,
#[cfg(target_os = "macos")] map_sender: Sender<WorkerMessage>,
export_table: Option<ExportTable>,
) -> Self {
let xdg_runtime_dir = match env::var("XDG_RUNTIME_DIR") {
@ -676,7 +676,7 @@ impl VirtioGpu {
let (reply_sender, reply_receiver) = unbounded();
self.map_sender
.send(MemoryMapping::AddMapping(
.send(WorkerMessage::GpuAddMapping(
reply_sender,
map_ptr,
guest_addr,
@ -756,7 +756,7 @@ impl VirtioGpu {
let (reply_sender, reply_receiver) = unbounded();
self.map_sender
.send(MemoryMapping::RemoveMapping(
.send(WorkerMessage::GpuRemoveMapping(
reply_sender,
guest_addr,
resource.size,

View File

@ -6,13 +6,13 @@ use std::{result, thread};
use crossbeam_channel::Receiver;
#[cfg(target_os = "macos")]
use crossbeam_channel::Sender;
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
use rutabaga_gfx::{
ResourceCreate3D, ResourceCreateBlob, RutabagaFence, Transfer3D,
RUTABAGA_PIPE_BIND_RENDER_TARGET, RUTABAGA_PIPE_TEXTURE_2D,
};
use utils::eventfd::EventFd;
#[cfg(target_os = "macos")]
use utils::worker_message::WorkerMessage;
use vm_memory::{GuestAddress, GuestMemoryMmap};
use super::super::descriptor_utils::{Reader, Writer};
@ -39,7 +39,7 @@ pub struct Worker {
shm_region: VirtioShmRegion,
virgl_flags: u32,
#[cfg(target_os = "macos")]
map_sender: Sender<MemoryMapping>,
map_sender: Sender<WorkerMessage>,
export_table: Option<ExportTable>,
}
@ -55,7 +55,7 @@ impl Worker {
irq_line: Option<u32>,
shm_region: VirtioShmRegion,
virgl_flags: u32,
#[cfg(target_os = "macos")] map_sender: Sender<MemoryMapping>,
#[cfg(target_os = "macos")] map_sender: Sender<WorkerMessage>,
export_table: Option<ExportTable>,
) -> Self {
Self {

View File

@ -21,7 +21,7 @@ pub mod console;
pub mod descriptor_utils;
pub mod device;
pub mod file_traits;
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
pub mod fs;
#[cfg(feature = "gpu")]
pub mod gpu;
@ -42,7 +42,7 @@ pub use self::balloon::*;
pub use self::block::{Block, CacheType};
pub use self::console::*;
pub use self::device::*;
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
pub use self::fs::*;
#[cfg(feature = "gpu")]
pub use self::gpu::*;

View File

@ -1,6 +1,5 @@
use std::{
io::Error as IoError,
io::ErrorKind,
sync::{atomic::AtomicUsize, Arc, Mutex},
};
@ -144,7 +143,7 @@ pub enum Error {
impl From<Error> for IoError {
fn from(e: Error) -> Self {
Self::new(ErrorKind::Other, e)
Self::other(e)
}
}

View File

@ -5,9 +5,8 @@ authors = ["Sergio Lopez <slp@sinrega.org>"]
edition = "2021"
[dependencies]
crossbeam-channel = "0.5"
crossbeam-channel = ">=0.5.15"
libloading = "0.8"
log = "0.4.0"
env_logger = "0.9.0"
arch = { path = "../arch" }
arch = { path = "../arch" }

View File

@ -9,6 +9,9 @@
#[allow(deref_nullptr)]
pub mod bindings;
#[macro_use]
extern crate log;
use bindings::*;
#[cfg(target_arch = "aarch64")]
@ -21,7 +24,6 @@ use std::time::Duration;
#[cfg(all(target_arch = "aarch64", target_os = "macos"))]
use arch::aarch64::sysreg::{sys_reg_name, SYSREG_MASK};
use crossbeam_channel::Sender;
use log::debug;
extern "C" {
@ -86,6 +88,10 @@ const CNTHCTL_EL0PCTEN: u64 = 1 << 0;
// Trap accesses to both virtual and physical counter registers.
const CNTHCTL_EL2_BITS: u64 = CNTHCTL_EL0VCTEN | CNTHCTL_EL0PCTEN;
const AA64PFR0_EL1_EL2EN: u64 = 1 << 8;
const AA64PFR0_EL1_GIC3EN: u64 = 1 << 24;
const AA64PFR1_EL1_SMEMASK: u64 = 3 << 24;
const EC_WFX_TRAP: u64 = 0x1;
const EC_AA64_HVC: u64 = 0x16;
const EC_AA64_SMC: u64 = 0x17;
@ -146,12 +152,6 @@ impl Display for Error {
}
}
/// Messages for requesting memory maps/unmaps.
pub enum MemoryMapping {
AddMapping(Sender<bool>, u64, u64, u64),
RemoveMapping(Sender<bool>, u64, u64),
}
pub enum InterruptType {
Irq,
Fiq,
@ -206,11 +206,27 @@ pub fn vcpu_set_vtimer_mask(vcpuid: u64, masked: bool) -> Result<(), Error> {
}
}
pub struct HvfNestedBindings {
hv_vm_config_get_el2_supported:
libloading::Symbol<'static, unsafe extern "C" fn(*mut bool) -> hv_return_t>,
hv_vm_config_set_el2_enabled:
libloading::Symbol<'static, unsafe extern "C" fn(hv_vm_config_t, bool) -> hv_return_t>,
/// Checks if Nested Virtualization is supported on the current system. Only
/// M3 or newer chips on macOS 15+ will satisfy the requirements.
pub fn check_nested_virt() -> Result<bool, Error> {
type GetEL2Supported =
libloading::Symbol<'static, unsafe extern "C" fn(*mut bool) -> hv_return_t>;
let get_el2_supported: Result<GetEL2Supported, libloading::Error> =
unsafe { HVF.get(b"hv_vm_config_get_el2_supported") };
if get_el2_supported.is_err() {
info!("cannot find hv_vm_config_get_el2_supported symbol");
return Ok(false);
}
let mut el2_supported: bool = false;
let ret = unsafe { (get_el2_supported.unwrap())(&mut el2_supported) };
if ret != HV_SUCCESS {
error!("hv_vm_config_get_el2_supported failed: {:?}", ret);
return Err(Error::NestedCheck);
}
Ok(el2_supported)
}
pub struct HvfVm {}
@ -225,30 +241,16 @@ static HVF: LazyLock<libloading::Library> = LazyLock::new(|| unsafe {
impl HvfVm {
pub fn new(nested_enabled: bool) -> Result<Self, Error> {
let config = unsafe { hv_vm_config_create() };
if nested_enabled {
let bindings = unsafe {
HvfNestedBindings {
hv_vm_config_get_el2_supported: HVF
.get(b"hv_vm_config_get_el2_supported")
.map_err(Error::FindSymbol)?,
hv_vm_config_set_el2_enabled: HVF
.get(b"hv_vm_config_set_el2_enabled")
.map_err(Error::FindSymbol)?,
}
let set_el2_enabled: libloading::Symbol<
'static,
unsafe extern "C" fn(hv_vm_config_t, bool) -> hv_return_t,
> = unsafe {
HVF.get(b"hv_vm_config_set_el2_enabled")
.map_err(Error::FindSymbol)?
};
let mut el2_supported: bool = false;
let ret = unsafe { (bindings.hv_vm_config_get_el2_supported)(&mut el2_supported) };
if ret != HV_SUCCESS {
return Err(Error::NestedCheck);
}
if !el2_supported {
return Err(Error::NestedCheck);
}
let ret = unsafe { (bindings.hv_vm_config_set_el2_enabled)(config, true) };
let ret = unsafe { (set_el2_enabled)(config, true) };
if ret != HV_SUCCESS {
return Err(Error::EnableEL2);
}
@ -401,6 +403,53 @@ impl HvfVcpu<'_> {
if ret != HV_SUCCESS {
return Err(Error::VcpuInitialRegisters);
}
// Enable EL2 and GICv3 in ID_AA64PFR0_EL1
let val: u64 = 0;
let ret = unsafe {
hv_vcpu_get_sys_reg(
self.vcpuid,
hv_sys_reg_t_HV_SYS_REG_ID_AA64PFR0_EL1,
&val as *const _ as *mut _,
)
};
if ret != HV_SUCCESS {
return Err(Error::VcpuInitialRegisters);
}
let ret = unsafe {
hv_vcpu_set_sys_reg(
self.vcpuid,
hv_sys_reg_t_HV_SYS_REG_ID_AA64PFR0_EL1,
val | AA64PFR0_EL1_EL2EN | AA64PFR0_EL1_GIC3EN,
)
};
if ret != HV_SUCCESS {
return Err(Error::VcpuInitialRegisters);
}
// If SME is enabled in ID_AA64PFR1_EL1 in the VM, the guest will
// break after enabling the MMU. Mask it out.
let val: u64 = 0;
let ret = unsafe {
hv_vcpu_get_sys_reg(
self.vcpuid,
hv_sys_reg_t_HV_SYS_REG_ID_AA64PFR1_EL1,
&val as *const _ as *mut _,
)
};
if ret != HV_SUCCESS {
return Err(Error::VcpuInitialRegisters);
}
let ret = unsafe {
hv_vcpu_set_sys_reg(
self.vcpuid,
hv_sys_reg_t_HV_SYS_REG_ID_AA64PFR1_EL1,
val & !AA64PFR1_EL1_SMEMASK,
)
};
if ret != HV_SUCCESS {
return Err(Error::VcpuInitialRegisters);
}
} else {
let ret = unsafe {
hv_vcpu_set_reg(self.vcpuid, hv_reg_t_HV_REG_CPSR, PSTATE_EL1_FAULT_BITS_64)

View File

@ -1,6 +1,6 @@
[package]
name = "libkrun"
version = "1.11.2"
version = "1.13.0"
authors = ["Sergio Lopez <slp@redhat.com>"]
edition = "2021"
build = "build.rs"
@ -14,10 +14,11 @@ efi = [ "blk", "net" ]
gpu = []
snd = []
virgl_resource_map2 = []
nitro = [ "dep:nitro", "dep:nitro-enclaves" ]
[dependencies]
crossbeam-channel = "0.5"
env_logger = "0.9.0"
crossbeam-channel = ">=0.5.15"
env_logger = "0.11"
libc = ">=0.2.39"
libloading = "0.8"
log = "0.4.0"
@ -31,6 +32,13 @@ vmm = { path = "../vmm" }
[target.'cfg(target_os = "macos")'.dependencies]
hvf = { path = "../hvf" }
[target.'cfg(target_os = "linux")'.dependencies]
kvm-bindings = { version = ">=0.11", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.21"
nitro = { path = "../nitro", optional = true }
nitro-enclaves = { version = "0.2.0", optional = true }
vm-memory = ">=0.13"
[lib]
name = "krun"
crate-type = ["cdylib"]

View File

@ -8,11 +8,12 @@ use std::env;
use std::ffi::CStr;
#[cfg(target_os = "linux")]
use std::ffi::CString;
#[cfg(all(target_arch = "x86_64", not(feature = "tee")))]
use std::fs::File;
#[cfg(target_os = "linux")]
use std::os::fd::AsRawFd;
use std::os::fd::RawFd;
use std::os::fd::{FromRawFd, RawFd};
#[cfg(feature = "nitro")]
use std::os::unix::net::UnixStream;
use std::path::PathBuf;
use std::slice;
use std::sync::atomic::{AtomicI32, Ordering};
@ -20,7 +21,6 @@ use std::sync::atomic::{AtomicI32, Ordering};
use std::sync::LazyLock;
use std::sync::Mutex;
#[cfg(target_os = "macos")]
use crossbeam_channel::unbounded;
#[cfg(feature = "blk")]
use devices::virtio::block::ImageType;
@ -28,9 +28,7 @@ use devices::virtio::block::ImageType;
use devices::virtio::net::device::VirtioNetBackend;
#[cfg(feature = "blk")]
use devices::virtio::CacheType;
use env_logger::Env;
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
use env_logger::{Env, Target};
#[cfg(not(feature = "efi"))]
use libc::size_t;
use libc::{c_char, c_int};
@ -54,6 +52,12 @@ use vmm::vmm_config::machine_config::VmConfig;
use vmm::vmm_config::net::NetworkInterfaceConfig;
use vmm::vmm_config::vsock::VsockDeviceConfig;
#[cfg(feature = "nitro")]
use nitro::enclaves::NitroEnclave;
#[cfg(feature = "nitro")]
use nitro_enclaves::launch::StartFlags;
// Value returned on success. We use libc's errors otherwise.
const KRUN_SUCCESS: i32 = 0;
// Maximum number of arguments/environment variables we allow
@ -155,6 +159,10 @@ struct ContextConfig {
console_output: Option<PathBuf>,
vmm_uid: Option<libc::uid_t>,
vmm_gid: Option<libc::gid_t>,
#[cfg(feature = "nitro")]
nitro_image_path: Option<PathBuf>,
#[cfg(feature = "nitro")]
nitro_start_flags: StartFlags,
}
impl ContextConfig {
@ -299,22 +307,143 @@ impl ContextConfig {
fn set_vmm_gid(&mut self, vmm_gid: libc::gid_t) {
self.vmm_gid = Some(vmm_gid);
}
#[cfg(feature = "nitro")]
fn set_nitro_image(&mut self, image_path: PathBuf) {
self.nitro_image_path = Some(image_path);
}
#[cfg(feature = "nitro")]
fn set_nitro_start_flags(&mut self, start_flags: StartFlags) {
self.nitro_start_flags = start_flags;
}
}
#[cfg(feature = "nitro")]
impl TryFrom<ContextConfig> for NitroEnclave {
type Error = i32;
fn try_from(ctx: ContextConfig) -> Result<Self, Self::Error> {
let vm_config = ctx.vmr.vm_config();
let Some(mem_size_mib) = vm_config.mem_size_mib else {
error!("memory size not configured");
return Err(-libc::EINVAL);
};
let Some(vcpus) = vm_config.vcpu_count else {
error!("vCPU count not configured");
return Err(-libc::EINVAL);
};
let Some(image_path) = ctx.nitro_image_path else {
error!("nitro image not configured");
return Err(-libc::EINVAL);
};
let Ok(image) = File::open(&image_path) else {
error!("unable to open {}", image_path.display());
return Err(-libc::EINVAL);
};
let Some(port_map) = ctx.unix_ipc_port_map else {
error!("enclave vsock not configured");
return Err(-libc::EINVAL);
};
if port_map.len() > 1 {
error!("too many nitro vsocks detected (max 1)");
return Err(-libc::EINVAL);
}
let ipc_stream = {
let mut vec = Vec::from_iter(port_map.values());
let Some((path, _)) = vec.pop() else {
error!("enclave vsock path not found");
return Err(-libc::EINVAL);
};
UnixStream::connect(path).unwrap()
};
Ok(Self {
image,
mem_size_mib,
vcpus,
ipc_stream,
start_flags: ctx.nitro_start_flags,
})
}
}
static CTX_MAP: Lazy<Mutex<HashMap<u32, ContextConfig>>> = Lazy::new(|| Mutex::new(HashMap::new()));
static CTX_IDS: AtomicI32 = AtomicI32::new(0);
#[no_mangle]
pub extern "C" fn krun_set_log_level(level: u32) -> i32 {
let log_level = match level {
fn log_level_to_filter_str(level: u32) -> &'static str {
match level {
0 => "off",
1 => "error",
2 => "warn",
3 => "info",
4 => "debug",
_ => "trace",
}
}
#[no_mangle]
pub extern "C" fn krun_set_log_level(level: u32) -> i32 {
let filter = log_level_to_filter_str(level);
env_logger::Builder::from_env(Env::default().default_filter_or(filter)).init();
KRUN_SUCCESS
}
mod log_defs {
pub const KRUN_LOG_STYLE_AUTO: u32 = 0;
pub const KRUN_LOG_STYLE_ALWAYS: u32 = 1;
pub const KRUN_LOG_STYLE_NEVER: u32 = 2;
pub const KRUN_LOG_OPTION_NO_ENV: u32 = 1;
}
#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub unsafe extern "C" fn krun_init_log(target: RawFd, level: u32, style: u32, options: u32) -> i32 {
let target = match target {
..-1 => return -libc::EINVAL,
-1 => Target::default(),
0 /* stdin */ => return -libc::EINVAL,
1 /* stdout */ => Target::Stdout,
2 /* stderr */ => Target::Stderr,
fd => Target::Pipe(Box::new(File::from_raw_fd(fd))),
};
env_logger::Builder::from_env(Env::default().default_filter_or(log_level)).init();
let filter = log_level_to_filter_str(level);
let write_style = match style {
log_defs::KRUN_LOG_STYLE_AUTO => "auto",
log_defs::KRUN_LOG_STYLE_ALWAYS => "always",
log_defs::KRUN_LOG_STYLE_NEVER => "never",
_ => return -libc::EINVAL,
};
let use_env = match options {
0 => true,
log_defs::KRUN_LOG_OPTION_NO_ENV => false,
_ => return -libc::EINVAL,
};
let mut builder = if use_env {
env_logger::Builder::from_env(
Env::new()
.default_filter_or(filter)
.default_write_style_or(write_style),
)
} else {
let mut builder = env_logger::Builder::new();
builder.parse_filters(filter).parse_write_style(write_style);
builder
};
builder.target(target).init();
KRUN_SUCCESS
}
@ -933,6 +1062,11 @@ pub unsafe extern "C" fn krun_add_vsock_port2(
c_filepath: *const c_char,
listen: bool,
) -> i32 {
#[cfg(feature = "nitro")]
if listen {
return -libc::EINVAL;
}
let filepath = match CStr::from_ptr(c_filepath).to_str() {
Ok(f) => PathBuf::from(f.to_string()),
Err(_) => return -libc::EINVAL,
@ -1062,6 +1196,35 @@ pub unsafe extern "C" fn krun_set_nested_virt(ctx_id: u32, enabled: bool) -> i32
}
}
#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub unsafe extern "C" fn krun_check_nested_virt() -> i32 {
#[cfg(target_os = "macos")]
match hvf::check_nested_virt() {
Ok(supp) => supp as i32,
Err(_) => -libc::EINVAL,
}
#[cfg(not(target_os = "macos"))]
-libc::EOPNOTSUPP
}
#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub extern "C" fn krun_split_irqchip(ctx_id: u32, enable: bool) -> i32 {
if enable && !cfg!(target_arch = "x86_64") {
return -libc::EINVAL;
}
match CTX_MAP.lock().unwrap().entry(ctx_id) {
Entry::Occupied(mut ctx_cfg) => {
let cfg = ctx_cfg.get_mut();
cfg.vmr.split_irqchip = enable;
KRUN_SUCCESS
}
Entry::Vacant(_) => -libc::ENOENT,
}
}
#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub unsafe extern "C" fn krun_set_smbios_oem_strings(
@ -1131,7 +1294,7 @@ fn map_kernel(ctx_id: u32, kernel_path: &PathBuf) -> i32 {
0_i64,
)
};
if kernel_host_addr == libc::MAP_FAILED {
if std::ptr::eq(kernel_host_addr, libc::MAP_FAILED) {
error!("Can't load kernel into process map");
return -libc::EINVAL;
}
@ -1320,7 +1483,52 @@ pub extern "C" fn krun_setgid(ctx_id: u32, gid: libc::gid_t) -> i32 {
KRUN_SUCCESS
}
#[cfg(feature = "nitro")]
#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub unsafe extern "C" fn krun_nitro_set_image(ctx_id: u32, c_image_filepath: *const c_char) -> i32 {
let filepath = match CStr::from_ptr(c_image_filepath).to_str() {
Ok(f) => PathBuf::from(f.to_string()),
Err(_) => return -libc::EINVAL,
};
match CTX_MAP.lock().unwrap().entry(ctx_id) {
Entry::Occupied(mut ctx_cfg) => {
let cfg = ctx_cfg.get_mut();
cfg.set_nitro_image(filepath);
}
Entry::Vacant(_) => return -libc::ENOENT,
}
KRUN_SUCCESS
}
#[cfg(feature = "nitro")]
#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub unsafe extern "C" fn krun_nitro_set_start_flags(ctx_id: u32, start_flags: u64) -> i32 {
let mut flags = StartFlags::empty();
// Only debug mode is supported at the moment. To avoid doing conversion and
// checking if the "start_flags" argument is valid, set the flags to debug mode
// if the "start_flags" argument is greater than zero.
if start_flags > 0 {
flags |= StartFlags::DEBUG;
}
match CTX_MAP.lock().unwrap().entry(ctx_id) {
Entry::Occupied(mut ctx_cfg) => {
let cfg = ctx_cfg.get_mut();
cfg.set_nitro_start_flags(flags);
}
Entry::Vacant(_) => return -libc::ENOENT,
}
KRUN_SUCCESS
}
#[no_mangle]
#[allow(unreachable_code)]
pub extern "C" fn krun_start_enter(ctx_id: u32) -> i32 {
#[cfg(target_os = "linux")]
{
@ -1331,6 +1539,9 @@ pub extern "C" fn krun_start_enter(ctx_id: u32) -> i32 {
unsafe { libc::prctl(libc::PR_SET_NAME, prname.as_ptr()) };
}
#[cfg(feature = "nitro")]
return krun_start_enter_nitro(ctx_id);
let mut event_manager = match EventManager::new() {
Ok(em) => em,
Err(e) => {
@ -1465,14 +1676,12 @@ pub extern "C" fn krun_start_enter(ctx_id: u32) -> i32 {
}
}
#[cfg(target_os = "macos")]
let (sender, receiver) = unbounded();
let (sender, _receiver) = unbounded();
let _vmm = match vmm::builder::build_microvm(
&ctx_cfg.vmr,
&mut event_manager,
ctx_cfg.shutdown_efd,
#[cfg(target_os = "macos")]
sender,
) {
Ok(vmm) => vmm,
@ -1482,29 +1691,19 @@ pub extern "C" fn krun_start_enter(ctx_id: u32) -> i32 {
}
};
#[cfg(target_os = "macos")]
let mapper_vmm = _vmm.clone();
#[cfg(target_os = "macos")]
if ctx_cfg.gpu_virgl_flags.is_some() {
std::thread::Builder::new()
.name("mapping worker".into())
.spawn(move || loop {
match receiver.recv() {
Err(e) => error!("Error in receiver: {:?}", e),
Ok(m) => match m {
MemoryMapping::AddMapping(s, h, g, l) => {
mapper_vmm.lock().unwrap().add_mapping(s, h, g, l)
}
MemoryMapping::RemoveMapping(s, g, l) => {
mapper_vmm.lock().unwrap().remove_mapping(s, g, l)
}
},
}
})
.unwrap();
vmm::worker::start_worker_thread(_vmm.clone(), _receiver).unwrap();
}
#[cfg(target_arch = "x86_64")]
if ctx_cfg.vmr.split_irqchip {
vmm::worker::start_worker_thread(_vmm.clone(), _receiver.clone()).unwrap();
}
#[cfg(feature = "amd-sev")]
vmm::worker::start_worker_thread(_vmm.clone(), _receiver.clone()).unwrap();
loop {
match event_manager.run() {
Ok(_) => {}
@ -1515,3 +1714,23 @@ pub extern "C" fn krun_start_enter(ctx_id: u32) -> i32 {
}
}
}
#[cfg(feature = "nitro")]
#[no_mangle]
fn krun_start_enter_nitro(ctx_id: u32) -> i32 {
let ctx_cfg = match CTX_MAP.lock().unwrap().remove(&ctx_id) {
Some(ctx_cfg) => ctx_cfg,
None => return -libc::ENOENT,
};
let Ok(mut enclave) = NitroEnclave::try_from(ctx_cfg) else {
return -libc::EINVAL;
};
if let Err(e) = enclave.run() {
error!("Error running nitro enclave: {e}");
return -libc::EINVAL;
}
KRUN_SUCCESS
}

15
src/nitro/Cargo.toml Normal file
View File

@ -0,0 +1,15 @@
[package]
name = "nitro"
version = "0.1.0"
edition = "2021"
[features]
nitro = []
[dependencies]
libc = "0.2.171"
nix = { version = "0.26.0", features = ["ioctl", "poll"] }
vsock = "0.5.1"
[target.'cfg(target_os = "linux")'.dependencies]
nitro-enclaves = "0.2.0"

171
src/nitro/src/enclaves.rs Normal file
View File

@ -0,0 +1,171 @@
// SPDX-License-Identifier: Apache-2.0
use super::error::NitroError;
use nitro_enclaves::{
launch::{ImageType, Launcher, MemoryInfo, PollTimeout, StartFlags},
Device,
};
use nix::{
poll::{poll, PollFd, PollFlags},
sys::{
socket::{connect, socket, AddressFamily, SockFlag, SockType, VsockAddr as NixVsockAddr},
time::{TimeVal, TimeValLike},
},
unistd::read,
};
use std::{
fs::File,
io::{Read, Write},
os::{
fd::{AsRawFd, RawFd},
unix::net::UnixStream,
},
};
use vsock::{VsockAddr, VsockListener};
type Result<T> = std::result::Result<T, NitroError>;
const ENCLAVE_READY_VSOCK_PORT: u32 = 9000;
const CID_TO_CONSOLE_PORT_OFFSET: u32 = 10000;
const VMADDR_CID_PARENT: u32 = 3;
const VMADDR_CID_HYPERVISOR: u32 = 0;
const SO_VM_SOCKETS_CONNECT_TIMEOUT: i32 = 6;
const HEART_BEAT: u8 = 0xb7;
/// Nitro Enclave data.
pub struct NitroEnclave {
/// Enclave image.
pub image: File,
/// Amount of RAM (in MiB).
pub mem_size_mib: usize,
/// Number of vCPUs.
pub vcpus: u8,
/// Path of vsock for initial enclave communication.
pub ipc_stream: UnixStream,
/// Enclave start flags.
pub start_flags: StartFlags,
}
impl NitroEnclave {
/// Run the enclave.
pub fn run(&mut self) -> Result<()> {
let device = Device::open().map_err(NitroError::DeviceOpen)?;
let mut launcher = Launcher::new(&device).map_err(NitroError::VmCreate)?;
let mem = MemoryInfo::new(ImageType::Eif(&mut self.image), self.mem_size_mib);
launcher.set_memory(mem).map_err(NitroError::VmMemorySet)?;
for _ in 0..self.vcpus {
launcher.add_vcpu(None).map_err(NitroError::VcpuAdd)?;
}
let sockaddr = VsockAddr::new(VMADDR_CID_PARENT, ENCLAVE_READY_VSOCK_PORT);
let listener = VsockListener::bind(&sockaddr).map_err(NitroError::HeartbeatBind)?;
let cid = launcher
.start(self.start_flags, None)
.map_err(NitroError::VmStart)?;
// Safe to unwrap.
let cid: u32 = cid.try_into().unwrap();
let poll_timeout = PollTimeout::try_from((&self.image, self.mem_size_mib << 20))
.map_err(NitroError::PollTimeoutCalculate)?;
enclave_check(listener, poll_timeout.into(), cid)?;
self.listen(VMADDR_CID_HYPERVISOR, cid + CID_TO_CONSOLE_PORT_OFFSET)?;
Ok(())
}
fn listen(&mut self, cid: u32, port: u32) -> Result<()> {
let socket_fd = socket(
AddressFamily::Vsock,
SockType::Stream,
SockFlag::empty(),
None,
)
.map_err(|_| NitroError::VsockCreate)?;
let sockaddr = NixVsockAddr::new(cid, port);
vsock_timeout(socket_fd)?;
connect(socket_fd, &sockaddr).map_err(|_| NitroError::VsockConnect)?;
let mut buf = [0u8; 512];
loop {
// Read debug output from vsock.
if let Ok(sz) = read(socket_fd, &mut buf) {
// If there is enclave debug output read, write it to the IPC socket.
if sz > 0 {
self.ipc_stream
.write_all(&buf[..sz])
.map_err(NitroError::IpcWrite)?;
continue;
}
}
break;
}
Ok(())
}
}
fn enclave_check(listener: VsockListener, poll_timeout_ms: libc::c_int, cid: u32) -> Result<()> {
let mut poll_fds = [PollFd::new(listener.as_raw_fd(), PollFlags::POLLIN)];
let result = poll(&mut poll_fds, poll_timeout_ms);
if result == Ok(0) {
return Err(NitroError::PollNoSelectedEvents);
} else if result != Ok(1) {
return Err(NitroError::PollMoreThanOneSelectedEvent);
}
let mut stream = listener.accept().map_err(NitroError::HeartbeatAccept)?;
let mut buf = [0u8];
let bytes = stream.0.read(&mut buf).map_err(NitroError::HeartbeatRead)?;
if bytes != 1 || buf[0] != HEART_BEAT {
return Err(NitroError::EnclaveHeartbeatNotDetected);
}
stream
.0
.write_all(&buf)
.map_err(NitroError::HeartbeatWrite)?;
if stream.1.cid() != cid {
return Err(NitroError::HeartbeatCidMismatch);
}
Ok(())
}
fn vsock_timeout(socket_fd: RawFd) -> Result<()> {
// Set the timeout to 20 seconds.
let timeval = TimeVal::milliseconds(20000);
let ret = unsafe {
libc::setsockopt(
socket_fd,
libc::AF_VSOCK,
SO_VM_SOCKETS_CONNECT_TIMEOUT,
&timeval as *const _ as *const libc::c_void,
size_of::<TimeVal>() as u32,
)
};
if ret != 0 {
return Err(NitroError::VsockSetTimeout);
}
Ok(())
}

71
src/nitro/src/error.rs Normal file
View File

@ -0,0 +1,71 @@
// SPDX-License-Identifier: Apache-2.0
use nitro_enclaves::launch::LaunchError;
use std::{fmt, io};
#[derive(Debug)]
pub enum NitroError {
DeviceOpen(io::Error),
VmCreate(LaunchError),
VmMemorySet(LaunchError),
VcpuAdd(LaunchError),
HeartbeatAccept(io::Error),
HeartbeatBind(io::Error),
HeartbeatRead(io::Error),
HeartbeatWrite(io::Error),
VmStart(LaunchError),
PollTimeoutCalculate(LaunchError),
PollNoSelectedEvents,
PollMoreThanOneSelectedEvent,
EnclaveHeartbeatNotDetected,
HeartbeatCidMismatch,
VsockCreate,
VsockSetTimeout,
VsockConnect,
IpcWrite(io::Error),
}
impl fmt::Display for NitroError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let msg = match self {
NitroError::DeviceOpen(e) => format!("unable to open nitro enclaves device: {e}"),
NitroError::VmCreate(e) => format!("unable to create enclave VM: {e}"),
NitroError::VmMemorySet(e) => format!("unable to set enclave memory regions: {e}"),
NitroError::VcpuAdd(e) => format!("unable to add vCPU to enclave: {e}"),
NitroError::HeartbeatAccept(e) => {
format!("unable to accept enclave heartbeat vsock: {e}")
}
NitroError::HeartbeatBind(e) => {
format!("unable to bind to enclave hearbeat vsock: {e}")
}
NitroError::HeartbeatRead(e) => format!("unable to read enclave hearbeat vsock: {e}"),
NitroError::HeartbeatWrite(e) => {
format!("unable to write to enclave heartbeat vsock: {e}")
}
NitroError::VmStart(e) => format!("unable to start enclave: {e}"),
NitroError::PollTimeoutCalculate(e) => {
format!("unable to calculate vsock poll timeout: {e}")
}
NitroError::PollNoSelectedEvents => {
"no selected poll fds for heartbeat vsock found".to_string()
}
NitroError::PollMoreThanOneSelectedEvent => {
"more than one selected pollfd for heartbeat vsock found".to_string()
}
NitroError::EnclaveHeartbeatNotDetected => {
"enclave heartbeat message not detected".to_string()
}
NitroError::HeartbeatCidMismatch => "enclave heartbeat vsock CID mismatch".to_string(),
NitroError::VsockCreate => "unable to create enclave vsock".to_string(),
NitroError::VsockSetTimeout => {
"unable to set poll timeout for enclave vsock".to_string()
}
NitroError::VsockConnect => "unable to connect to enclave vsock".to_string(),
NitroError::IpcWrite(e) => {
format!("unable to write enclave vsock data to UNIX IPC socket: {e}")
}
};
write!(f, "{}", msg)
}
}

5
src/nitro/src/lib.rs Normal file
View File

@ -0,0 +1,5 @@
#[cfg(feature = "nitro")]
pub mod enclaves;
#[cfg(feature = "nitro")]
mod error;

View File

@ -26,6 +26,7 @@ remain = "0.2"
thiserror = "1.0.23"
zerocopy = "0.6"
log = "0.4"
vmm-sys-util = ">=0.14"
[target.'cfg(unix)'.dependencies]
nix = "0.26.1"

View File

@ -14,6 +14,7 @@ use nix::sys::memfd::MemFdCreateFlag;
use nix::unistd::ftruncate;
use nix::unistd::sysconf;
use nix::unistd::SysconfVar;
use vmm_sys_util::align_upwards;
use crate::rutabaga_os::descriptor::AsRawDescriptor;
use crate::rutabaga_os::descriptor::IntoRawDescriptor;
@ -72,8 +73,7 @@ impl IntoRawDescriptor for SharedMemory {
pub fn round_up_to_page_size(v: u64) -> RutabagaResult<u64> {
let page_size_opt = sysconf(SysconfVar::PAGE_SIZE)?;
if let Some(page_size) = page_size_opt {
let page_mask = (page_size - 1) as u64;
let aligned_size = (v + page_mask) & !page_mask;
let aligned_size = align_upwards!(v, page_size as u64);
Ok(aligned_size)
} else {
Err(RutabagaError::SpecViolation("no page size"))

View File

@ -6,7 +6,10 @@ edition = "2021"
[dependencies]
bitflags = "1.2.0"
env_logger = "0.9.0"
libc = ">=0.2.85"
log = "0.4.0"
vmm-sys-util = ">=0.11"
vmm-sys-util = ">= 0.14"
crossbeam-channel = ">=0.5.15"
[target.'cfg(target_os = "linux")'.dependencies]
kvm-bindings = { version = ">=0.10", features = ["fam-wrappers"] }

View File

@ -19,6 +19,8 @@ pub use macos::eventfd;
pub mod rand;
#[cfg(target_os = "linux")]
pub mod signal;
pub mod sized_vec;
pub mod sm;
pub mod syscall;
pub mod time;
pub mod worker_message;

View File

@ -0,0 +1,41 @@
// Copyright © 2024 Institute of Software, CAS. All rights reserved.
//
// Copyright © 2019 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0 OR BSD-3-Clause
//
// Copyright © 2020, Microsoft Corporation
//
// Copyright 2018-2019 CrowdStrike, Inc.
//
//
// Returns a `Vec<T>` with a size in bytes at least as large as `size_in_bytes`.
fn vec_with_size_in_bytes<T: Default>(size_in_bytes: usize) -> Vec<T> {
let rounded_size = size_in_bytes.div_ceil(size_of::<T>());
let mut v = Vec::with_capacity(rounded_size);
v.resize_with(rounded_size, T::default);
v
}
// The kvm API has many structs that resemble the following `Foo` structure:
//
// ```
// #[repr(C)]
// struct Foo {
// some_data: u32
// entries: __IncompleteArrayField<__u32>,
// }
// ```
//
// In order to allocate such a structure, `size_of::<Foo>()` would be too small because it would not
// include any space for `entries`. To make the allocation large enough while still being aligned
// for `Foo`, a `Vec<Foo>` is created. Only the first element of `Vec<Foo>` would actually be used
// as a `Foo`. The remaining memory in the `Vec<Foo>` is for `entries`, which must be contiguous
// with `Foo`. This function is used to make the `Vec<Foo>` with enough space for `count` entries.
use std::mem::size_of;
pub fn vec_with_array_field<T: Default, F>(count: usize) -> Vec<T> {
let element_space = count * size_of::<F>();
let vec_size_bytes = size_of::<T>() + element_space;
vec_with_size_in_bytes(vec_size_bytes)
}

View File

@ -0,0 +1,22 @@
#[derive(Debug)]
pub struct MemoryProperties {
pub gpa: u64,
pub size: u64,
pub private: bool,
}
#[derive(Debug)]
pub enum WorkerMessage {
#[cfg(target_arch = "x86_64")]
GsiRoute(
crossbeam_channel::Sender<bool>,
Vec<kvm_bindings::kvm_irq_routing_entry>,
),
#[cfg(target_arch = "x86_64")]
IrqLine(crossbeam_channel::Sender<bool>, u32, bool),
#[cfg(target_os = "macos")]
GpuAddMapping(crossbeam_channel::Sender<bool>, u64, u64, u64),
#[cfg(target_os = "macos")]
GpuRemoveMapping(crossbeam_channel::Sender<bool>, u64, u64),
ConvertMemory(crossbeam_channel::Sender<bool>, MemoryProperties),
}

View File

@ -12,15 +12,16 @@ blk = []
efi = [ "blk", "net" ]
gpu = []
snd = []
nitro = []
[dependencies]
crossbeam-channel = "0.5"
env_logger = "0.9.0"
crossbeam-channel = ">=0.5.15"
flate2 = "1.0.35"
libc = ">=0.2.39"
linux-loader = { version = "0.13.0", features = ["bzimage", "elf", "pe"] }
log = "0.4.0"
vm-memory = { version = ">=0.13", features = ["backend-mmap"] }
vmm-sys-util = ">=0.14"
arch = { path = "../arch" }
devices = { path = "../devices" }
@ -30,12 +31,12 @@ polly = { path = "../polly" }
# Dependencies for amd-sev
codicon = { version = "3.0.0", optional = true }
kbs-types = { version = "0.8.0", features = ["tee-sev", "tee-snp"], optional = true }
kbs-types = { version = "0.11.0", features = ["tee-snp"], optional = true }
procfs = { version = "0.12", optional = true }
rdrand = { version = "^0.8", optional = true }
serde = { version = "1.0.125", optional = true }
serde_json = { version = "1.0.64", optional = true }
sev = { version = "4.0.0", features = ["openssl"], optional = true }
sev = { version = "6.0.0", features = ["openssl"], optional = true }
curl = { version = "0.4", optional = true }
nix = "0.24.1"
@ -45,11 +46,8 @@ cpuid = { path = "../cpuid" }
zstd = "0.13"
[target.'cfg(target_os = "linux")'.dependencies]
kvm-bindings = { version = ">=0.10", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.17"
kvm-bindings = { version = ">=0.11", features = ["fam-wrappers"] }
kvm-ioctls = ">=0.21"
[target.'cfg(target_os = "macos")'.dependencies]
hvf = { path = "../hvf" }
[dev-dependencies]
vmm-sys-util = ">=0.11"

View File

@ -4,7 +4,8 @@
//! Enables pre-boot setup, instantiation and booting of a Firecracker VMM.
#[cfg(target_os = "macos")]
use crossbeam_channel::{unbounded, Sender};
use crossbeam_channel::unbounded;
use crossbeam_channel::Sender;
use kernel::cmdline::Cmdline;
#[cfg(target_os = "macos")]
use std::collections::HashMap;
@ -14,6 +15,7 @@ use std::io::{self, Read};
#[cfg(target_os = "linux")]
use std::os::fd::AsRawFd;
use std::path::PathBuf;
use std::sync::atomic::AtomicI32;
use std::sync::{Arc, Mutex};
use super::{Error, Vmm};
@ -32,12 +34,12 @@ use devices::legacy::Serial;
use devices::legacy::VcpuList;
#[cfg(target_os = "macos")]
use devices::legacy::{GicV3, HvfGicV3};
#[cfg(target_arch = "x86_64")]
use devices::legacy::{IoApic, IrqChipT};
use devices::legacy::{IrqChip, IrqChipDevice};
#[cfg(feature = "net")]
use devices::virtio::Net;
use devices::virtio::{port_io, MmioTransport, PortDescription, Vsock};
#[cfg(target_os = "macos")]
use hvf::MemoryMapping;
#[cfg(feature = "tee")]
use kbs_types::Tee;
@ -53,7 +55,7 @@ use crate::terminal::term_set_raw_mode;
#[cfg(feature = "blk")]
use crate::vmm_config::block::BlockBuilder;
use crate::vmm_config::boot_source::DEFAULT_KERNEL_CMDLINE;
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
use crate::vmm_config::fs::FsDeviceConfig;
#[cfg(target_os = "linux")]
use crate::vstate::KvmContext;
@ -62,7 +64,7 @@ use crate::vstate::MeasuredRegion;
use crate::vstate::{Error as VstateError, Vcpu, VcpuConfig, Vm};
use arch::{ArchMemoryInfo, InitrdConfig};
use device_manager::shm::ShmManager;
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
use devices::virtio::{fs::ExportTable, VirtioShmRegion};
use flate2::read::GzDecoder;
#[cfg(feature = "tee")]
@ -73,14 +75,17 @@ use linux_loader::loader::{self, KernelLoader};
use nix::unistd::isatty;
use polly::event_manager::{Error as EventManagerError, EventManager};
use utils::eventfd::EventFd;
use utils::worker_message::WorkerMessage;
#[cfg(all(target_arch = "x86_64", not(feature = "efi"), not(feature = "tee")))]
use vm_memory::mmap::MmapRegion;
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
use vm_memory::Address;
use vm_memory::Bytes;
#[cfg(not(feature = "nitro"))]
use vm_memory::GuestMemory;
#[cfg(all(target_arch = "x86_64", not(feature = "tee")))]
use vm_memory::GuestRegionMmap;
use vm_memory::{GuestAddress, GuestMemory, GuestMemoryMmap};
use vm_memory::{GuestAddress, GuestMemoryMmap};
#[cfg(feature = "efi")]
static EDK2_BINARY: &[u8] = include_bytes!("../../../edk2/KRUN_EFI.silent.fd");
@ -506,7 +511,7 @@ pub fn build_microvm(
vm_resources: &super::resources::VmResources,
event_manager: &mut EventManager,
_shutdown_efd: Option<EventFd>,
#[cfg(target_os = "macos")] _map_sender: Sender<MemoryMapping>,
_sender: Sender<WorkerMessage>,
) -> std::result::Result<Arc<Mutex<Vmm>>, StartMicrovmError> {
let payload = choose_payload(vm_resources)?;
@ -660,10 +665,23 @@ pub fn build_microvm(
// while on aarch64 we need to do it the other way around.
#[cfg(target_arch = "x86_64")]
{
let kvmioapic = KvmIoapic::new(vm.fd()).map_err(StartMicrovmError::CreateKvmIrqChip)?;
intc = Arc::new(Mutex::new(IrqChipDevice::new(Box::new(kvmioapic))));
let ioapic: Box<dyn IrqChipT> = if vm_resources.split_irqchip {
Box::new(
IoApic::new(vm.fd(), _sender.clone())
.map_err(StartMicrovmError::CreateKvmIrqChip)?,
)
} else {
Box::new(KvmIoapic::new(vm.fd()).map_err(StartMicrovmError::CreateKvmIrqChip)?)
};
intc = Arc::new(Mutex::new(IrqChipDevice::new(ioapic)));
attach_legacy_devices(&vm, &mut pio_device_manager)?;
attach_legacy_devices(
&vm,
vm_resources.split_irqchip,
&mut pio_device_manager,
&mut mmio_device_manager,
Some(intc.clone()),
)?;
vcpus = create_vcpus_x86_64(
&vm,
@ -672,6 +690,8 @@ pub fn build_microvm(
payload_config.entry_addr,
&pio_device_manager.io_bus,
&exit_evt,
#[cfg(feature = "tee")]
_sender,
)
.map_err(StartMicrovmError::Internal)?;
}
@ -738,6 +758,9 @@ pub fn build_microvm(
)?;
}
// We use this atomic to record the exit code set by init/init.c in the VM.
let exit_code = Arc::new(AtomicI32::new(i32::MAX));
let mut vmm = Vmm {
guest_memory,
arch_memory_info,
@ -745,6 +768,7 @@ pub fn build_microvm(
vcpus_handles: Vec::new(),
exit_evt,
exit_observers: Vec::new(),
exit_code: exit_code.clone(),
vm,
mmio_device_manager,
#[cfg(target_arch = "x86_64")]
@ -762,7 +786,7 @@ pub fn build_microvm(
vm_resources.console_output.clone(),
)?;
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
let export_table: Option<ExportTable> = if cfg!(feature = "gpu") {
Some(Default::default())
} else {
@ -780,10 +804,10 @@ pub fn build_microvm(
intc.clone(),
virgl_flags,
#[cfg(target_os = "macos")]
_map_sender.clone(),
_sender.clone(),
)?;
}
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
attach_fs_devices(
&mut vmm,
&vm_resources.fs,
@ -791,8 +815,9 @@ pub fn build_microvm(
#[cfg(not(feature = "tee"))]
export_table,
intc.clone(),
exit_code,
#[cfg(target_os = "macos")]
_map_sender,
_sender,
)?;
#[cfg(feature = "blk")]
attach_block_devices(&mut vmm, &vm_resources.block, intc.clone())?;
@ -845,7 +870,7 @@ pub fn build_microvm(
.map_err(VstateError::KvmCpuId)
.map_err(StartMicrovmError::SecureVirtAttest)?;
vmm.kvm_vm()
.snp_secure_virt_attest(
.snp_secure_virt_measure(
cpuid,
vmm.guest_memory(),
measured_regions,
@ -1329,13 +1354,23 @@ pub fn setup_serial_device(
#[cfg(target_arch = "x86_64")]
fn attach_legacy_devices(
vm: &Vm,
split_irqchip: bool,
pio_device_manager: &mut PortIODeviceManager,
mmio_device_manager: &mut MMIODeviceManager,
intc: Option<Arc<Mutex<IrqChipDevice>>>,
) -> std::result::Result<(), StartMicrovmError> {
pio_device_manager
.register_devices()
.map_err(Error::LegacyIOBus)
.map_err(StartMicrovmError::Internal)?;
if split_irqchip {
mmio_device_manager
.register_mmio_ioapic(intc)
.map_err(Error::RegisterMMIODevice)
.map_err(StartMicrovmError::Internal)?;
}
macro_rules! register_irqfd_evt {
($evt: ident, $index: expr) => {{
vm.fd()
@ -1422,6 +1457,7 @@ fn create_vcpus_x86_64(
entry_addr: GuestAddress,
io_bus: &devices::Bus,
exit_evt: &EventFd,
#[cfg(feature = "tee")] pm_sender: Sender<WorkerMessage>,
) -> super::Result<Vec<Vcpu>> {
let mut vcpus = Vec::with_capacity(vcpu_config.vcpu_count as usize);
for cpu_index in 0..vcpu_config.vcpu_count {
@ -1432,6 +1468,8 @@ fn create_vcpus_x86_64(
vm.supported_msrs().clone(),
io_bus.clone(),
exit_evt.try_clone().map_err(Error::EventFd)?,
#[cfg(feature = "tee")]
pm_sender.clone(),
)
.map_err(Error::Vcpu)?;
@ -1542,20 +1580,26 @@ fn attach_mmio_device(
Ok(())
}
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
fn attach_fs_devices(
vmm: &mut Vmm,
fs_devs: &[FsDeviceConfig],
shm_manager: &mut ShmManager,
#[cfg(not(feature = "tee"))] export_table: Option<ExportTable>,
intc: IrqChip,
#[cfg(target_os = "macos")] map_sender: Sender<MemoryMapping>,
exit_code: Arc<AtomicI32>,
#[cfg(target_os = "macos")] map_sender: Sender<WorkerMessage>,
) -> std::result::Result<(), StartMicrovmError> {
use self::StartMicrovmError::*;
for (i, config) in fs_devs.iter().enumerate() {
let fs = Arc::new(Mutex::new(
devices::virtio::Fs::new(config.fs_id.clone(), config.shared_dir.clone()).unwrap(),
devices::virtio::Fs::new(
config.fs_id.clone(),
config.shared_dir.clone(),
exit_code.clone(),
)
.unwrap(),
));
let id = format!("{}{}", String::from(fs.lock().unwrap().id()), i);
@ -1825,7 +1869,7 @@ fn attach_gpu_device(
#[cfg(not(feature = "tee"))] mut export_table: Option<ExportTable>,
intc: IrqChip,
virgl_flags: u32,
#[cfg(target_os = "macos")] map_sender: Sender<MemoryMapping>,
#[cfg(target_os = "macos")] map_sender: Sender<WorkerMessage>,
) -> std::result::Result<(), StartMicrovmError> {
use self::StartMicrovmError::*;

View File

@ -88,6 +88,23 @@ impl MMIODeviceManager {
}
}
/// Register a MMIO IOAPIC device.
#[cfg(target_arch = "x86_64")]
pub fn register_mmio_ioapic(
&mut self,
intc: Option<Arc<Mutex<devices::legacy::IrqChipDevice>>>,
) -> Result<()> {
if let Some(intc) = intc {
let (addr, size) = {
let intc = intc.lock().unwrap();
(intc.get_mmio_addr(), intc.get_mmio_size())
};
self.bus.insert(intc, addr, size).map_err(Error::BusError)?;
}
Ok(())
}
/// Register an already created MMIO device to be used via MMIO transport.
pub fn register_mmio_device(
&mut self,
@ -117,6 +134,8 @@ impl MMIODeviceManager {
vm.register_irqfd(mmio_device.locked_device().interrupt_evt(), self.irq)
.map_err(Error::RegisterIrqFd)?;
mmio_device.locked_device().set_irq_line(self.irq);
self.bus
.insert(Arc::new(Mutex::new(mmio_device)), self.mmio_base, MMIO_LEN)
.map_err(Error::BusError)?;

View File

@ -1,553 +0,0 @@
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use std::{fmt, io};
#[cfg(target_arch = "aarch64")]
use arch::aarch64::DeviceInfoForFDT;
use arch::DeviceType;
use devices;
use devices::BusDevice;
use kernel::cmdline as kernel_cmdline;
use kvm_ioctls::{IoEventAddress, VmFd};
#[cfg(target_arch = "aarch64")]
use utils::eventfd::EventFd;
/// Errors for MMIO device manager.
#[derive(Debug)]
pub enum Error {
/// Failed to perform an operation on the bus.
BusError(devices::BusError),
/// Appending to kernel command line failed.
Cmdline(kernel_cmdline::Error),
/// Failure in creating or cloning an event fd.
EventFd(io::Error),
/// No more IRQs are available.
IrqsExhausted,
/// Registering an IO Event failed.
RegisterIoEvent(kvm_ioctls::Error),
/// Registering an IRQ FD failed.
RegisterIrqFd(kvm_ioctls::Error),
/// The device couldn't be found
DeviceNotFound,
/// Failed to update the mmio device.
UpdateFailed,
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self {
Error::BusError(ref e) => write!(f, "failed to perform bus operation: {}", e),
Error::Cmdline(ref e) => {
write!(f, "unable to add device to kernel command line: {}", e)
}
Error::EventFd(ref e) => write!(f, "failed to create or clone event descriptor: {}", e),
Error::IrqsExhausted => write!(f, "no more IRQs are available"),
Error::RegisterIoEvent(ref e) => write!(f, "failed to register IO event: {}", e),
Error::RegisterIrqFd(ref e) => write!(f, "failed to register irqfd: {}", e),
Error::DeviceNotFound => write!(f, "the device couldn't be found"),
Error::UpdateFailed => write!(f, "failed to update the mmio device"),
}
}
}
type Result<T> = ::std::result::Result<T, Error>;
/// This represents the size of the mmio device specified to the kernel as a cmdline option
/// It has to be larger than 0x100 (the offset where the configuration space starts from
/// the beginning of the memory mapped device registers) + the size of the configuration space
/// Currently hardcoded to 4K.
const MMIO_LEN: u64 = 0x1000;
/// Manages the complexities of registering a MMIO device.
pub struct MMIODeviceManager {
pub bus: devices::Bus,
mmio_base: u64,
irq: u32,
last_irq: u32,
id_to_dev_info: HashMap<(DeviceType, String), MMIODeviceInfo>,
}
impl MMIODeviceManager {
/// Create a new DeviceManager handling mmio devices (virtio net, block).
pub fn new(mmio_base: &mut u64, irq_interval: (u32, u32)) -> MMIODeviceManager {
if cfg!(target_arch = "aarch64") {
*mmio_base += MMIO_LEN;
}
MMIODeviceManager {
mmio_base: *mmio_base,
irq: irq_interval.0,
last_irq: irq_interval.1,
bus: devices::Bus::new(),
id_to_dev_info: HashMap::new(),
}
}
/// Register an already created MMIO device to be used via MMIO transport.
pub fn register_mmio_device(
&mut self,
vm: &VmFd,
mmio_device: devices::virtio::MmioTransport,
type_id: u32,
device_id: String,
) -> Result<(u64, u32)> {
if self.irq > self.last_irq {
return Err(Error::IrqsExhausted);
}
for (i, queue_evt) in mmio_device
.locked_device()
.queue_events()
.iter()
.enumerate()
{
let io_addr = IoEventAddress::Mmio(
self.mmio_base + u64::from(devices::virtio::NOTIFY_REG_OFFSET),
);
vm.register_ioevent(queue_evt, &io_addr, i as u32)
.map_err(Error::RegisterIoEvent)?;
}
vm.register_irqfd(mmio_device.locked_device().interrupt_evt(), self.irq)
.map_err(Error::RegisterIrqFd)?;
self.bus
.insert(Arc::new(Mutex::new(mmio_device)), self.mmio_base, MMIO_LEN)
.map_err(Error::BusError)?;
let ret = (self.mmio_base, self.irq);
self.id_to_dev_info.insert(
(DeviceType::Virtio(type_id), device_id),
MMIODeviceInfo {
addr: self.mmio_base,
len: MMIO_LEN,
irq: self.irq,
},
);
self.mmio_base += MMIO_LEN;
self.irq += 1;
Ok(ret)
}
/// Append a registered MMIO device to the kernel cmdline.
#[cfg(target_arch = "x86_64")]
pub fn add_device_to_cmdline(
&mut self,
cmdline: &mut kernel_cmdline::Cmdline,
mmio_base: u64,
irq: u32,
) -> Result<()> {
// as per doc, [virtio_mmio.]device=<size>@<baseaddr>:<irq> needs to be appended
// to kernel commandline for virtio mmio devices to get recognized
// the size parameter has to be transformed to KiB, so dividing hexadecimal value in
// bytes to 1024; further, the '{}' formatting rust construct will automatically
// transform it to decimal
cmdline
.insert(
"virtio_mmio.device",
&format!("{}K@0x{:08x}:{}", MMIO_LEN / 1024, mmio_base, irq),
)
.map_err(Error::Cmdline)
}
#[cfg(target_arch = "aarch64")]
/// Register an early console at some MMIO address.
pub fn register_mmio_serial(
&mut self,
vm: &VmFd,
cmdline: &mut kernel_cmdline::Cmdline,
serial: Arc<Mutex<devices::legacy::Serial>>,
) -> Result<()> {
if self.irq > self.last_irq {
return Err(Error::IrqsExhausted);
}
vm.register_irqfd(&serial.lock().unwrap().interrupt_evt(), self.irq)
.map_err(Error::RegisterIrqFd)?;
self.bus
.insert(serial, self.mmio_base, MMIO_LEN)
.map_err(|err| Error::BusError(err))?;
cmdline
.insert("earlycon", &format!("uart,mmio,0x{:08x}", self.mmio_base))
.map_err(Error::Cmdline)?;
let ret = self.mmio_base;
self.id_to_dev_info.insert(
(DeviceType::Serial, DeviceType::Serial.to_string()),
MMIODeviceInfo {
addr: ret,
len: MMIO_LEN,
irq: self.irq,
},
);
self.mmio_base += MMIO_LEN;
self.irq += 1;
Ok(())
}
#[cfg(target_arch = "aarch64")]
/// Register a MMIO RTC device.
pub fn register_mmio_rtc(&mut self, vm: &VmFd) -> Result<()> {
if self.irq > self.last_irq {
return Err(Error::IrqsExhausted);
}
// Attaching the RTC device.
let rtc_evt = EventFd::new(utils::eventfd::EFD_NONBLOCK).map_err(Error::EventFd)?;
let device = devices::legacy::RTC::new(rtc_evt.try_clone().map_err(Error::EventFd)?);
vm.register_irqfd(&rtc_evt, self.irq)
.map_err(Error::RegisterIrqFd)?;
self.bus
.insert(Arc::new(Mutex::new(device)), self.mmio_base, MMIO_LEN)
.map_err(|err| Error::BusError(err))?;
let ret = self.mmio_base;
self.id_to_dev_info.insert(
(DeviceType::RTC, "rtc".to_string()),
MMIODeviceInfo {
addr: ret,
len: MMIO_LEN,
irq: self.irq,
},
);
self.mmio_base += MMIO_LEN;
self.irq += 1;
Ok(())
}
#[cfg(target_arch = "aarch64")]
/// Gets the information of the devices registered up to some point in time.
pub fn get_device_info(&self) -> &HashMap<(DeviceType, String), MMIODeviceInfo> {
&self.id_to_dev_info
}
/// Gets the the specified device.
pub fn get_device(
&self,
device_type: DeviceType,
device_id: &str,
) -> Option<&Mutex<dyn BusDevice>> {
if let Some(dev_info) = self
.id_to_dev_info
.get(&(device_type, device_id.to_string()))
{
if let Some((_, device)) = self.bus.get_device(dev_info.addr) {
return Some(device);
}
}
None
}
}
/// Private structure for storing information about the MMIO device registered at some address on the bus.
#[derive(Clone, Debug)]
pub struct MMIODeviceInfo {
addr: u64,
irq: u32,
len: u64,
}
#[cfg(target_arch = "aarch64")]
impl DeviceInfoForFDT for MMIODeviceInfo {
fn addr(&self) -> u64 {
self.addr
}
fn irq(&self) -> u32 {
self.irq
}
fn length(&self) -> u64 {
self.len
}
}
#[cfg(test)]
mod tests {
use super::super::super::builder;
use super::*;
use arch;
use devices::virtio::{ActivateResult, Queue, VirtioDevice};
use std::sync::atomic::AtomicUsize;
use std::sync::Arc;
use utils::errno;
use utils::eventfd::EventFd;
use vm_memory::{GuestAddress, GuestMemoryMmap};
const QUEUE_SIZES: &[u16] = &[64];
impl MMIODeviceManager {
fn register_virtio_device(
&mut self,
vm: &VmFd,
guest_mem: GuestMemoryMmap,
device: Arc<Mutex<dyn devices::virtio::VirtioDevice>>,
cmdline: &mut kernel_cmdline::Cmdline,
type_id: u32,
device_id: &str,
) -> Result<u64> {
let mmio_device = devices::virtio::MmioTransport::new(guest_mem, device);
let (mmio_base, _irq) =
self.register_mmio_device(vm, mmio_device, type_id, device_id.to_string())?;
#[cfg(target_arch = "x86_64")]
self.add_device_to_cmdline(cmdline, mmio_base, _irq)?;
Ok(mmio_base)
}
}
#[allow(dead_code)]
struct DummyDevice {
dummy: u32,
queues: Vec<Queue>,
queue_evts: [EventFd; 1],
interrupt_evt: EventFd,
}
impl DummyDevice {
pub fn new() -> Self {
DummyDevice {
dummy: 0,
queues: QUEUE_SIZES.iter().map(|&s| Queue::new(s)).collect(),
queue_evts: [EventFd::new(utils::eventfd::EFD_NONBLOCK).expect("cannot create eventFD")],
interrupt_evt: EventFd::new(utils::eventfd::EFD_NONBLOCK).expect("cannot create eventFD"),
}
}
}
impl devices::virtio::VirtioDevice for DummyDevice {
fn avail_features(&self) -> u64 {
0
}
fn acked_features(&self) -> u64 {
0
}
fn set_acked_features(&mut self, _: u64) {}
fn device_type(&self) -> u32 {
0
}
fn queues(&self) -> &[Queue] {
&self.queues
}
fn queues_mut(&mut self) -> &mut [Queue] {
&mut self.queues
}
fn queue_events(&self) -> &[EventFd] {
&self.queue_evts
}
fn interrupt_evt(&self) -> &EventFd {
&self.interrupt_evt
}
fn interrupt_status(&self) -> Arc<AtomicUsize> {
Arc::new(AtomicUsize::new(0))
}
fn ack_features_by_page(&mut self, page: u32, value: u32) {
let _ = page;
let _ = value;
}
fn read_config(&self, offset: u64, data: &mut [u8]) {
let _ = offset;
let _ = data;
}
fn write_config(&mut self, offset: u64, data: &[u8]) {
let _ = offset;
let _ = data;
}
fn activate(&mut self, _: GuestMemoryMmap) -> ActivateResult {
Ok(())
}
fn is_activated(&self) -> bool {
false
}
}
#[test]
fn test_register_virtio_device() {
let start_addr1 = GuestAddress(0x0);
let start_addr2 = GuestAddress(0x1000);
let guest_mem =
GuestMemoryMmap::from_ranges(&[(start_addr1, 0x1000), (start_addr2, 0x1000)]).unwrap();
let mut vm = builder::setup_kvm_vm(&guest_mem).unwrap();
let mut device_manager =
MMIODeviceManager::new(&mut 0xd000_0000, (arch::IRQ_BASE, arch::IRQ_MAX));
let mut cmdline = kernel_cmdline::Cmdline::new(4096);
let dummy = Arc::new(Mutex::new(DummyDevice::new()));
#[cfg(target_arch = "x86_64")]
assert!(builder::setup_interrupt_controller(&mut vm).is_ok());
#[cfg(target_arch = "aarch64")]
assert!(builder::setup_interrupt_controller(&mut vm, 1).is_ok());
assert!(device_manager
.register_virtio_device(vm.fd(), guest_mem, dummy, &mut cmdline, 0, "dummy")
.is_ok());
}
#[test]
fn test_register_too_many_devices() {
let start_addr1 = GuestAddress(0x0);
let start_addr2 = GuestAddress(0x1000);
let guest_mem =
GuestMemoryMmap::from_ranges(&[(start_addr1, 0x1000), (start_addr2, 0x1000)]).unwrap();
let mut vm = builder::setup_kvm_vm(&guest_mem).unwrap();
let mut device_manager =
MMIODeviceManager::new(&mut 0xd000_0000, (arch::IRQ_BASE, arch::IRQ_MAX));
let mut cmdline = kernel_cmdline::Cmdline::new(4096);
#[cfg(target_arch = "x86_64")]
assert!(builder::setup_interrupt_controller(&mut vm).is_ok());
#[cfg(target_arch = "aarch64")]
assert!(builder::setup_interrupt_controller(&mut vm, 1).is_ok());
for _i in arch::IRQ_BASE..=arch::IRQ_MAX {
device_manager
.register_virtio_device(
vm.fd(),
guest_mem.clone(),
Arc::new(Mutex::new(DummyDevice::new())),
&mut cmdline,
0,
"dummy1",
)
.unwrap();
}
assert_eq!(
format!(
"{}",
device_manager
.register_virtio_device(
vm.fd(),
guest_mem,
Arc::new(Mutex::new(DummyDevice::new())),
&mut cmdline,
0,
"dummy2"
)
.unwrap_err()
),
"no more IRQs are available".to_string()
);
}
#[test]
fn test_dummy_device() {
let dummy = DummyDevice::new();
assert_eq!(dummy.device_type(), 0);
assert_eq!(dummy.queues().len(), QUEUE_SIZES.len());
}
#[test]
fn test_error_messages() {
let device_manager =
MMIODeviceManager::new(&mut 0xd000_0000, (arch::IRQ_BASE, arch::IRQ_MAX));
let mut cmdline = kernel_cmdline::Cmdline::new(4096);
let e = Error::Cmdline(
cmdline
.insert(
"virtio_mmio=device",
&format!(
"{}K@0x{:08x}:{}",
MMIO_LEN / 1024,
device_manager.mmio_base,
device_manager.irq
),
)
.unwrap_err(),
);
assert_eq!(
format!("{}", e),
format!(
"unable to add device to kernel command line: {}",
kernel_cmdline::Error::HasEquals
),
);
assert_eq!(
format!("{}", Error::UpdateFailed),
"failed to update the mmio device"
);
assert_eq!(
format!("{}", Error::BusError(devices::BusError::Overlap)),
format!(
"failed to perform bus operation: {}",
devices::BusError::Overlap
)
);
assert_eq!(
format!("{}", Error::IrqsExhausted),
"no more IRQs are available"
);
assert_eq!(
format!("{}", Error::RegisterIoEvent(errno::Error::new(0))),
format!("failed to register IO event: {}", errno::Error::new(0))
);
assert_eq!(
format!("{}", Error::RegisterIrqFd(errno::Error::new(0))),
format!("failed to register irqfd: {}", errno::Error::new(0))
);
}
#[test]
fn test_device_info() {
let start_addr1 = GuestAddress(0x0);
let start_addr2 = GuestAddress(0x1000);
let guest_mem =
GuestMemoryMmap::from_ranges(&[(start_addr1, 0x1000), (start_addr2, 0x1000)]).unwrap();
let vm = builder::setup_kvm_vm(&guest_mem).unwrap();
let mut device_manager =
MMIODeviceManager::new(&mut 0xd000_0000, (arch::IRQ_BASE, arch::IRQ_MAX));
let mut cmdline = kernel_cmdline::Cmdline::new(4096);
let dummy = Arc::new(Mutex::new(DummyDevice::new()));
let type_id = 0;
let id = String::from("foo");
if let Ok(addr) = device_manager.register_virtio_device(
vm.fd(),
guest_mem,
dummy,
&mut cmdline,
type_id,
&id,
) {
assert!(device_manager
.get_device(DeviceType::Virtio(type_id), &id)
.is_some());
assert_eq!(
addr,
device_manager.id_to_dev_info[&(DeviceType::Virtio(type_id), id.clone())].addr
);
assert_eq!(
arch::IRQ_BASE,
device_manager.id_to_dev_info[&(DeviceType::Virtio(type_id), id.clone())].irq
);
}
let id = "bar";
assert!(device_manager
.get_device(DeviceType::Virtio(type_id), &id)
.is_none());
}
}

View File

@ -1,7 +1,8 @@
use std::collections::BTreeMap;
use arch::{round_up, ArchMemoryInfo};
use arch::ArchMemoryInfo;
use vm_memory::GuestAddress;
use vmm_sys_util::align_upwards;
#[derive(Debug)]
pub enum Error {
@ -46,7 +47,7 @@ impl ShmManager {
regions
}
#[cfg(not(feature = "tee"))]
#[cfg(not(any(feature = "tee", feature = "nitro")))]
pub fn fs_region(&self, index: usize) -> Option<&ShmRegion> {
self.fs_regions.get(&index)
}
@ -57,7 +58,7 @@ impl ShmManager {
}
fn create_region(&mut self, size: usize) -> Result<ShmRegion, Error> {
let size = round_up(size, self.page_size);
let size = align_upwards!(size, self.page_size);
let region = ShmRegion {
guest_addr: GuestAddress(self.next_guest_addr),

View File

@ -31,15 +31,15 @@ use crate::linux::vstate;
#[cfg(target_os = "macos")]
mod macos;
mod terminal;
pub mod worker;
#[cfg(target_os = "macos")]
pub use hvf::MemoryMapping;
#[cfg(target_os = "macos")]
use macos::vstate;
use std::fmt::{Display, Formatter};
use std::io;
use std::os::unix::io::AsRawFd;
use std::sync::atomic::{AtomicI32, Ordering};
use std::sync::{Arc, Mutex};
#[cfg(target_os = "linux")]
use std::time::Duration;
@ -202,6 +202,7 @@ pub struct Vmm {
exit_evt: EventFd,
vm: Vm,
exit_observers: Vec<Arc<Mutex<dyn VmmExitObserver>>>,
exit_code: Arc<AtomicI32>,
// Guest VM devices.
mmio_device_manager: MMIODeviceManager,
@ -394,7 +395,10 @@ impl Subscriber for Vmm {
// If the exit_code can't be found on any vcpu, it means that the exit signal
// has been issued by the i8042 controller in which case we exit with
// FC_EXIT_CODE_OK.
let exit_code = self
//
// The exit code set up by the guest takes preference over the one reported
// by either a vcpu or the i8042 controller.
let vcpu_exit_code = self
.vcpus_handles
.iter()
.find_map(|handle| match handle.response_receiver().try_recv() {
@ -402,7 +406,15 @@ impl Subscriber for Vmm {
_ => None,
})
.unwrap_or(FC_EXIT_CODE_OK);
self.stop(i32::from(exit_code));
let vmm_exit_code = self.exit_code.load(Ordering::SeqCst);
let exit_code = if vmm_exit_code != i32::MAX {
debug!("using vmm exit code: {vmm_exit_code}");
vmm_exit_code
} else {
debug!("using vcpu exit code: {vcpu_exit_code}");
vcpu_exit_code as i32
};
self.stop(exit_code);
} else {
error!("Spurious EventManager event for handler: Vmm");
}

View File

@ -6,8 +6,11 @@ use std::{
use crate::vstate::MeasuredRegion;
use arch::x86_64::layout::*;
use sev::firmware::{guest::GuestPolicy, host::Firmware};
use sev::launch::snp::*;
use sev::{
error::FirmwareError,
firmware::{guest::GuestPolicy, host::Firmware},
launch::snp::*,
};
use kvm_bindings::{kvm_enc_region, CpuId, KVM_CPUID_FLAG_SIGNIFCANT_INDEX};
use kvm_ioctls::VmFd;
@ -19,12 +22,12 @@ use vm_memory::{
pub enum Error {
CpuIdWrite,
CpuIdFull,
CreateLauncher(std::io::Error),
CreateLauncher(FirmwareError),
GuestMemoryWrite(vm_memory::GuestMemoryError),
GuestMemoryRead(vm_memory::GuestMemoryError),
LaunchStart(std::io::Error),
LaunchUpdate(std::io::Error),
LaunchFinish(std::io::Error),
LaunchStart(FirmwareError),
LaunchUpdate(FirmwareError),
LaunchFinish(FirmwareError),
MemoryEncryptRegion,
OpenFirmware(std::io::Error),
}
@ -105,9 +108,9 @@ impl AmdSnp {
}
let mut policy = GuestPolicy(0);
policy.set_smt_allowed(1);
policy.set_smt_allowed(true);
let start = Start::new(None, policy, false, [0; 16]);
let start = Start::new(policy, [0; 16]);
let launcher = launcher.start(start).map_err(Error::LaunchStart)?;
@ -281,7 +284,6 @@ impl AmdSnp {
launcher: &mut Launcher<Started, RawFd, RawFd>,
page_type: PageType,
) -> Result<(), Error> {
let dp = VmplPerms::empty();
let ga = GuestAddress(region.guest_addr);
/*
@ -296,15 +298,11 @@ impl AmdSnp {
let ptr = bytes.ptr_guard().as_ptr();
let slice: &[u8] = unsafe { slice::from_raw_parts(ptr, region.size) };
let update = Update::new(
region.guest_addr >> 12,
slice,
false,
page_type,
(dp, dp, dp),
);
let update = Update::new(region.guest_addr >> 12, slice, page_type);
launcher.update_data(update).map_err(Error::LaunchUpdate)
launcher
.update_data(update, region.guest_addr, region.size as u64)
.map_err(Error::LaunchUpdate)
}
pub fn vm_measure(

View File

@ -10,8 +10,8 @@ use libc::{c_int, c_void, siginfo_t};
use std::cell::Cell;
use std::fmt::{Display, Formatter};
use std::io;
use std::ops::Range;
#[cfg(feature = "tee")]
use std::os::unix::io::RawFd;
use std::result;
@ -41,14 +41,21 @@ use kvm_bindings::{
KVM_MAX_CPUID_ENTRIES,
};
use kvm_bindings::{
kvm_userspace_memory_region, KVM_API_VERSION, KVM_SYSTEM_EVENT_RESET, KVM_SYSTEM_EVENT_SHUTDOWN,
kvm_create_guest_memfd, kvm_memory_attributes, kvm_userspace_memory_region,
kvm_userspace_memory_region2, KVM_API_VERSION, KVM_MEMORY_ATTRIBUTE_PRIVATE,
KVM_MEM_GUEST_MEMFD, KVM_SYSTEM_EVENT_RESET, KVM_SYSTEM_EVENT_SHUTDOWN,
};
use kvm_ioctls::*;
#[cfg(feature = "tee")]
use kvm_bindings::{kvm_enable_cap, KVM_CAP_EXIT_HYPERCALL, KVM_MEMORY_EXIT_FLAG_PRIVATE};
use kvm_ioctls::{Cap::*, *};
use utils::eventfd::EventFd;
use utils::signal::{register_signal_handler, sigrtmin, Killable};
use utils::sm::StateMachine;
#[cfg(feature = "tee")]
use utils::worker_message::{MemoryProperties, WorkerMessage};
use vm_memory::{
Address, GuestAddress, GuestMemory, GuestMemoryError, GuestMemoryMmap, GuestMemoryRegion,
GuestRegionMmap,
};
#[cfg(feature = "amd-sev")]
@ -63,6 +70,8 @@ pub enum Error {
#[cfg(target_arch = "x86_64")]
/// A call to cpuid instruction failed.
CpuId(cpuid::Error),
/// Unable to create a KVM guest_memfd.
CreateGuestMemfd(kvm_ioctls::Error),
#[cfg(target_arch = "x86_64")]
/// Error configuring the floating point related registers
FPUConfiguration(arch::x86_64::regs::Error),
@ -73,6 +82,9 @@ pub enum Error {
GuestMSRs(arch::x86_64::msr::Error),
/// Hyperthreading flag is not initialized.
HTNotInitialized,
/// Unable to enable KVM hypercall exits.
#[cfg(feature = "tee")]
HypercallExitEnable(kvm_ioctls::Error),
/// Cannot configure the IRQ.
Irq(kvm_ioctls::Error),
/// The host kernel reports an invalid KVM API version.
@ -99,6 +111,8 @@ pub enum Error {
#[cfg(target_arch = "x86_64")]
/// Error configuring the general purpose registers
REGSConfiguration(arch::x86_64::regs::Error),
/// Cannot set memory region attributes.
SetMemoryAttributes(kvm_ioctls::Error),
/// Cannot set the memory regions.
SetUserMemoryRegion(kvm_ioctls::Error),
/// Error creating memory map for SHM region.
@ -197,6 +211,9 @@ pub enum Error {
VcpuTlsNotPresent,
/// Unexpected KVM_RUN exit reason
VcpuUnhandledKvmExit,
/// Unsupported KVM_EXIT_HYPERCALL.
#[cfg(feature = "tee")]
VcpuUnsupportedHypercall,
/// Cannot open the VM file descriptor.
VmFd(kvm_ioctls::Error),
#[cfg(target_arch = "x86_64")]
@ -228,10 +245,13 @@ impl Display for Error {
match self {
#[cfg(target_arch = "x86_64")]
CpuId(e) => write!(f, "Cpuid error: {e:?}"),
CreateGuestMemfd(e) => write!(f, "Unable to create KVM guest_memfd: {e:?}"),
GuestMemoryMmap(e) => write!(f, "Guest memory error: {e:?}"),
#[cfg(target_arch = "x86_64")]
GuestMSRs(e) => write!(f, "Retrieving supported guest MSRs fails: {e:?}"),
HTNotInitialized => write!(f, "Hyperthreading flag is not initialized"),
#[cfg(feature = "tee")]
HypercallExitEnable(e) => write!(f, "Unable to enable KVM hypercall exits: {e}"),
KvmApiVersion(v) => {
write!(f, "The host kernel reports an invalid KVM API version: {v}")
}
@ -252,6 +272,7 @@ impl Display for Error {
f,
"Cannot set the local interruption due to bad configuration: {e:?}"
),
SetMemoryAttributes(e) => write!(f, "Cannot set memory region attributes: {e}"),
SetUserMemoryRegion(e) => write!(f, "Cannot set the memory regions: {e}"),
ShmMmap(e) => write!(f, "Error creating memory map for SHM region: {e}"),
#[cfg(feature = "tee")]
@ -333,6 +354,8 @@ impl Display for Error {
VcpuTlsInit => write!(f, "Cannot clean init vcpu TLS"),
VcpuTlsNotPresent => write!(f, "Vcpu not present in TLS"),
VcpuUnhandledKvmExit => write!(f, "Unexpected KVM_RUN exit reason"),
#[cfg(feature = "tee")]
VcpuUnsupportedHypercall => write!(f, "Unsupported KVM_EXIT_HYPERCALL"),
#[cfg(target_arch = "x86_64")]
VmGetPit2(e) => write!(f, "Failed to get KVM vm pit state: {e}"),
#[cfg(target_arch = "x86_64")]
@ -378,7 +401,6 @@ pub struct KvmContext {
impl KvmContext {
pub fn new() -> Result<Self> {
use kvm_ioctls::Cap::*;
let kvm = Kvm::new().expect("Error creating the Kvm object");
// Check that KVM has the correct version.
@ -433,6 +455,8 @@ pub struct Vm {
#[cfg(feature = "amd-sev")]
pub tee_config: Tee,
pub guest_memfds: Vec<(Range<u64>, RawFd)>,
}
impl Vm {
@ -457,13 +481,16 @@ impl Vm {
supported_cpuid,
#[cfg(target_arch = "x86_64")]
supported_msrs,
guest_memfds: Vec::new(),
})
}
#[cfg(feature = "amd-sev")]
pub fn new(kvm: &Kvm, tee_config: &TeeConfig) -> Result<Self> {
//create fd for interacting with kvm-vm specific functions
let vm_fd = kvm.create_vm().map_err(Error::VmFd)?;
let vm_fd = kvm
.create_vm_with_type(4 /* KVM_X86_SNP_VM */)
.map_err(Error::VmFd)?;
let supported_cpuid = kvm
.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES)
@ -472,6 +499,15 @@ impl Vm {
let supported_msrs =
arch::x86_64::msr::supported_guest_msrs(kvm).map_err(Error::GuestMSRs)?;
let cap = kvm_enable_cap {
cap: KVM_CAP_EXIT_HYPERCALL,
flags: 0,
args: [1 << 12 /* KVM_HC_MAP_GPA_RANGE */, 0, 0, 0],
..Default::default()
};
vm_fd.enable_cap(&cap).map_err(Error::HypercallExitEnable)?;
let tee = match tee_config.tee {
Tee::Snp => Some(AmdSnp::new().map_err(Error::SnpSecVirtInit)?),
_ => return Err(Error::InvalidTee),
@ -484,6 +520,7 @@ impl Vm {
supported_msrs,
tee,
tee_config: tee_config.tee,
guest_memfds: Vec::new(),
})
}
@ -508,17 +545,47 @@ impl Vm {
if guest_mem.num_regions() > kvm_max_memslots {
return Err(Error::NotEnoughMemorySlots);
}
for region in guest_mem.iter() {
// It's safe to unwrap because the guest address is valid.
let host_addr = guest_mem.get_host_address(region.start_addr()).unwrap();
debug!("Guest memory starts at {:x?}", host_addr);
self.memory_region_set(guest_mem, region)?;
}
#[cfg(target_arch = "x86_64")]
self.fd
.set_tss_address(arch::x86_64::layout::KVM_TSS_ADDRESS as usize)
.map_err(Error::VmSetup)?;
Ok(())
}
pub fn guest_memfd_get(&self, gpa: u64) -> Option<(RawFd, u64)> {
for (range, rawfd) in self.guest_memfds.iter() {
if range.contains(&gpa) {
return Some((*rawfd, range.start));
}
}
None
}
#[allow(unused_mut)]
fn memory_region_set(
&mut self,
guest_mem: &GuestMemoryMmap,
region: &GuestRegionMmap,
) -> Result<()> {
let host_addr = guest_mem.get_host_address(region.start_addr()).unwrap();
let start = region.start_addr().raw_value();
let end = start + region.len();
if !self.fd.check_extension(GuestMemfd) {
let memory_region = kvm_userspace_memory_region {
slot: self.next_mem_slot,
guest_phys_addr: region.start_addr().raw_value(),
guest_phys_addr: start,
memory_size: region.len(),
userspace_addr: host_addr as u64,
flags: 0,
};
// Safe because we mapped the memory region, we made sure that the regions
// are not overlapping.
unsafe {
@ -526,13 +593,52 @@ impl Vm {
.set_user_memory_region(memory_region)
.map_err(Error::SetUserMemoryRegion)?;
};
self.next_mem_slot += 1;
} else {
// Create a guest_memfd and set the region.
let guest_memfd = self
.fd
.create_guest_memfd(kvm_create_guest_memfd {
size: region.size() as u64,
flags: 0,
reserved: [0; 6],
})
.map_err(Error::CreateGuestMemfd)?;
let memory_region = kvm_userspace_memory_region2 {
slot: self.next_mem_slot,
flags: KVM_MEM_GUEST_MEMFD,
guest_phys_addr: start,
memory_size: region.len(),
userspace_addr: host_addr as u64,
guest_memfd_offset: 0,
guest_memfd: guest_memfd as u32,
pad1: 0,
pad2: [0; 14],
};
// Safe because we mapped the memory region, we made sure that the regions
// are not overlapping.
unsafe {
self.fd
.set_user_memory_region2(memory_region)
.map_err(Error::SetUserMemoryRegion)?;
};
let attr = kvm_memory_attributes {
address: start,
size: region.len(),
attributes: KVM_MEMORY_ATTRIBUTE_PRIVATE as u64,
flags: 0,
};
self.fd
.set_memory_attributes(attr)
.map_err(Error::SetMemoryAttributes)?;
self.guest_memfds.push((Range { start, end }, guest_memfd));
}
#[cfg(target_arch = "x86_64")]
self.fd
.set_tss_address(arch::x86_64::layout::KVM_TSS_ADDRESS as usize)
.map_err(Error::VmSetup)?;
self.next_mem_slot += 1;
Ok(())
}
@ -551,7 +657,7 @@ impl Vm {
}
#[cfg(feature = "amd-sev")]
pub fn snp_secure_virt_attest(
pub fn snp_secure_virt_measure(
&self,
cpuid: CpuId,
guest_mem: &GuestMemoryMmap,
@ -687,6 +793,9 @@ pub struct Vcpu {
response_receiver: Option<Receiver<VcpuResponse>>,
// The transmitting end of the responses channel owned by the vcpu side.
response_sender: Sender<VcpuResponse>,
#[cfg(feature = "tee")]
pm_sender: Sender<WorkerMessage>,
}
impl Vcpu {
@ -718,7 +827,7 @@ impl Vcpu {
// _before_ running this, then there is nothing we can do.
Self::TLS_VCPU_PTR.with(|cell: &VcpuCell| {
if let Some(vcpu_ptr) = cell.get() {
if vcpu_ptr == self as *mut Vcpu {
if std::ptr::eq(vcpu_ptr, self) {
Self::TLS_VCPU_PTR.with(|cell: &VcpuCell| cell.take());
return Ok(());
}
@ -790,6 +899,7 @@ impl Vcpu {
msr_list: MsrList,
io_bus: devices::Bus,
exit_evt: EventFd,
#[cfg(feature = "tee")] pm_sender: Sender<WorkerMessage>,
) -> Result<Self> {
let kvm_vcpu = vm_fd.create_vcpu(id as u64).map_err(Error::VcpuFd)?;
let (event_sender, event_receiver) = unbounded();
@ -808,6 +918,8 @@ impl Vcpu {
event_sender: Some(event_sender),
response_receiver: Some(response_receiver),
response_sender,
#[cfg(feature = "tee")]
pm_sender,
})
}
@ -1068,9 +1180,11 @@ impl Vcpu {
self.fd
.set_sregs(&state.sregs)
.map_err(Error::VcpuSetSregs)?;
self.fd
.set_xsave(&state.xsave)
.map_err(Error::VcpuSetXsave)?;
unsafe {
self.fd
.set_xsave(&state.xsave)
.map_err(Error::VcpuSetXsave)?;
}
self.fd.set_xcrs(&state.xcrs).map_err(Error::VcpuSetXcrs)?;
self.fd
.set_debug_regs(&state.debug_regs)
@ -1091,6 +1205,35 @@ impl Vcpu {
fn run_emulation(&mut self) -> Result<VcpuEmulation> {
match self.fd.run() {
Ok(run) => match run {
#[cfg(feature = "tee")]
VcpuExit::Hypercall(hypercall) => {
if hypercall.nr != 12
/* KVM_HC_MAP_GPA_RANGE */
{
return Err(Error::VcpuUnsupportedHypercall);
}
let gpa = hypercall.args[0];
let size = hypercall.args[1] * 0x1000; /* TARGET_PAGE_SIZE */
let attributes = hypercall.args[2];
let private = !matches!(attributes, 0);
let mem_properties = MemoryProperties { gpa, size, private };
let (response_sender, response_receiver) = unbounded();
self.pm_sender
.send(WorkerMessage::ConvertMemory(
response_sender.clone(),
mem_properties,
))
.unwrap();
if !response_receiver.recv().unwrap() {
error!("Unable to convert memory with properties: gpa: 0x{:x} size: 0x{:x} to_private: {}", gpa, size, private);
return Err(Error::VcpuUnhandledKvmExit);
}
Ok(VcpuEmulation::Handled)
}
#[cfg(target_arch = "x86_64")]
VcpuExit::IoIn(addr, data) => {
self.io_bus.read(0, u64::from(addr), data);
@ -1101,6 +1244,26 @@ impl Vcpu {
self.io_bus.write(0, u64::from(addr), data);
Ok(VcpuEmulation::Handled)
}
#[cfg(feature = "tee")]
VcpuExit::MemoryFault { gpa, size, flags } => {
let private = (flags & (KVM_MEMORY_EXIT_FLAG_PRIVATE as u64)) != 0;
let mem_properties = MemoryProperties { gpa, size, private };
let (response_sender, response_receiver) = unbounded();
self.pm_sender
.send(WorkerMessage::ConvertMemory(
response_sender.clone(),
mem_properties,
))
.unwrap();
if !response_receiver.recv().unwrap() {
error!("Unable to convert memory with properties: gpa: 0x{:x} size: 0x{:x} to_private: {}", gpa, size, private);
return Err(Error::VcpuUnhandledKvmExit);
}
Ok(VcpuEmulation::Handled)
}
VcpuExit::MmioRead(addr, data) => {
if let Some(ref mmio_bus) = self.mmio_bus {
mmio_bus.read(0, addr, data);

View File

@ -232,7 +232,7 @@ impl Vcpu {
// _before_ running this, then there is nothing we can do.
Self::TLS_VCPU_PTR.with(|cell: &VcpuCell| {
if let Some(vcpu_ptr) = cell.get() {
if vcpu_ptr == self as *const Vcpu {
if std::ptr::eq(vcpu_ptr, self) {
Self::TLS_VCPU_PTR.with(|cell: &VcpuCell| cell.take());
return Ok(());
}

View File

@ -120,6 +120,8 @@ pub struct VmResources {
pub smbios_oem_strings: Option<Vec<String>>,
/// Whether to enable nested virtualization.
pub nested_enabled: bool,
/// Whether to enable split irqchip
pub split_irqchip: bool,
}
impl VmResources {
@ -344,6 +346,7 @@ mod tests {
console_output: None,
smbios_oem_strings: None,
nested_enabled: false,
split_irqchip: false,
}
}

View File

@ -1,52 +0,0 @@
use std::collections::VecDeque;
use std::fmt;
use std::sync::{Arc, Mutex};
use devices::virtio::{Console, ConsoleError};
#[derive(Debug)]
pub enum ConsoleConfigError {
/// Failed to create the console device.
CreateConsoleDevice(ConsoleError),
}
impl fmt::Display for ConsoleConfigError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use self::ConsoleConfigError::*;
match *self {
CreateConsoleDevice(ref e) => write!(f, "Cannot create console device: {:?}", e),
}
}
}
type Result<T> = std::result::Result<T, ConsoleConfigError>;
#[derive(Clone, Debug, PartialEq)]
pub struct ConsoleDeviceConfig {
pub fs_id: String,
pub shared_dir: String,
}
#[derive(Default)]
pub struct FsBuilder {
pub list: VecDeque<Arc<Mutex<Fs>>>,
}
impl FsBuilder {
pub fn new() -> Self {
Self {
list: VecDeque::<Arc<Mutex<Fs>>>::new(),
}
}
pub fn insert(&mut self, config: FsDeviceConfig) -> Result<()> {
let fs_dev = Arc::new(Mutex::new(Self::create_fs(config)?));
self.list.push_back(fs_dev);
Ok(())
}
pub fn create_fs(config: FsDeviceConfig) -> Result<Fs> {
Ok(devices::virtio::Fs::new(config.fs_id, config.shared_dir)
.map_err(FsConfigError::CreateFsDevice)?)
}
}

157
src/vmm/src/worker.rs Normal file
View File

@ -0,0 +1,157 @@
use std::io;
use std::sync::{Arc, Mutex};
#[cfg(feature = "tee")]
use utils::worker_message::MemoryProperties;
use utils::worker_message::WorkerMessage;
use crossbeam_channel::Receiver;
#[cfg(feature = "tee")]
use crossbeam_channel::Sender;
#[cfg(feature = "tee")]
use kvm_bindings::{kvm_memory_attributes, KVM_MEMORY_ATTRIBUTE_PRIVATE};
#[cfg(feature = "tee")]
use libc::{fallocate, madvise, FALLOC_FL_KEEP_SIZE, FALLOC_FL_PUNCH_HOLE, MADV_DONTNEED};
#[cfg(feature = "tee")]
use std::ffi::c_void;
#[cfg(feature = "tee")]
use vm_memory::{
guest_memory::GuestMemory, Address, GuestAddress, GuestMemoryRegion, MemoryRegionAddress,
};
pub fn start_worker_thread(
vmm: Arc<Mutex<super::Vmm>>,
receiver: Receiver<WorkerMessage>,
) -> io::Result<()> {
std::thread::Builder::new()
.name("vmm worker".into())
.spawn(move || loop {
match receiver.recv() {
Err(e) => error!("error receiving message from vmm worker thread: {:?}", e),
#[cfg(target_os = "macos")]
Ok(message) => vmm.lock().unwrap().match_worker_message(message),
#[cfg(target_os = "linux")]
Ok(message) => vmm.lock().unwrap().match_worker_message(message),
}
})?;
Ok(())
}
impl super::Vmm {
fn match_worker_message(&self, msg: WorkerMessage) {
match msg {
#[cfg(target_os = "macos")]
WorkerMessage::GpuAddMapping(s, h, g, l) => self.add_mapping(s, h, g, l),
#[cfg(target_os = "macos")]
WorkerMessage::GpuRemoveMapping(s, g, l) => self.remove_mapping(s, g, l),
#[cfg(target_arch = "x86_64")]
WorkerMessage::GsiRoute(sender, entries) => {
let mut routing = kvm_bindings::KvmIrqRouting::new(entries.len()).unwrap();
let routing_entries = routing.as_mut_slice();
routing_entries.copy_from_slice(&entries);
sender
.send(self.vm.fd().set_gsi_routing(&routing).is_ok())
.unwrap();
}
#[cfg(target_arch = "x86_64")]
WorkerMessage::IrqLine(sender, irq, active) => {
sender
.send(self.vm.fd().set_irq_line(irq, active).is_ok())
.unwrap();
}
WorkerMessage::ConvertMemory(_sender, _properties) =>
{
#[cfg(feature = "tee")]
self.convert_memory(_sender, _properties)
}
}
}
#[cfg(feature = "tee")]
fn convert_memory(&self, sender: Sender<bool>, properties: MemoryProperties) {
let Some((guest_memfd, region_start)) = self.kvm_vm().guest_memfd_get(properties.gpa)
else {
error!(
"unable to find KVM guest_memfd for memory region corresponding to GPA 0x{:x}",
properties.gpa
);
sender.send(false).unwrap();
return;
};
let attributes: u64 = if properties.private {
KVM_MEMORY_ATTRIBUTE_PRIVATE as u64
} else {
0
};
let attr = kvm_memory_attributes {
address: properties.gpa,
size: properties.size,
attributes,
flags: 0,
};
if self.kvm_vm().fd().set_memory_attributes(attr).is_err() {
error!("unable to set memory attributes for memory region corresponding to guest address 0x{:x}", properties.gpa);
sender.send(false).unwrap();
return;
}
let region = self
.guest_memory()
.find_region(GuestAddress(properties.gpa));
if region.is_none() {
error!(
"guest memory region corresponding to GPA 0x{:x} not found",
properties.gpa
);
sender.send(false).unwrap();
return;
}
let offset = properties.gpa - region_start;
if properties.private {
let region_addr = MemoryRegionAddress(offset);
let Ok(host_startaddr) = region.unwrap().get_host_address(region_addr) else {
error!(
"host address corresponding to memory region address 0x{:x} not found",
region_addr.raw_value()
);
sender.send(false).unwrap();
return;
};
let ret = unsafe {
madvise(
host_startaddr as *mut c_void,
properties.size.try_into().unwrap(),
MADV_DONTNEED,
)
};
if ret < 0 {
error!("unable to advise kernel that memory region corresponding to GPA 0x{:x} will likely not be needed (madvise)", properties.gpa);
sender.send(false).unwrap();
}
} else {
let ret = unsafe {
fallocate(
guest_memfd,
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
offset as i64,
properties.size as i64,
)
};
if ret < 0 {
error!("unable to allocate space in guest_memfd for shared memory (fallocate)");
sender.send(false).unwrap();
}
}
sender.send(true).unwrap();
}
}

View File

@ -5,7 +5,7 @@ edition = "2021"
[dependencies]
test_cases = { path = "../test_cases", features = ["host"] }
anyhow = "1.0.95"
nix = { version = "0.29.0", features = ["resource"] }
nix = { version = "0.29.0", features = ["resource", "fs"] }
macros = { path = "../macros" }
clap = { version = "4.5.27", features = ["derive"] }
tempdir = "0.3.7"