virtio_netmap: Use half the vring's size as netmap slots#726
Open
awelzel wants to merge 1 commit intoluigirizzo:masterfrom
Open
virtio_netmap: Use half the vring's size as netmap slots#726awelzel wants to merge 1 commit intoluigirizzo:masterfrom
awelzel wants to merge 1 commit intoluigirizzo:masterfrom
Conversation
For every slot in a netmap ring, two descriptors in the virtqueue's
vring are used. Therefore, we can only use half the vring's size as
slots for netmap rings. Otherwise, the vring and the netmap ring are
out of sync. This currently corrupts proper operation and produces
bad data (see below).
This patch also sets an explicit nm_config to ensure we keep the
values for num_rx_desc/num_tx_desc from attach time and marks the
VQ_FULL case in virtio_netmap_init_buffers() as an issue that should
never be triggered.
Background: Experiment / Observation
Running a qemu-guest with a virtio_net NIC (rx_queue_size=256) attached
to a bridge on the host. Sending known/predictable packets in a loop
from the host via scapy. On the guest side, running tcpdump with native
netmap support and inspecting the packet data:
tcpdump -i netmap:enp8s0 -Xn#
The first 128 packets look as expected, but thereafter come 128 packets
of "null data":
128 15:59:46.592378 IP 10.0.0.128.128 > 127.0.0.1.256: UDP, length 141
0x0000: 4500 00a9 0001 0000 4011 f0c2 0a00 0080 E.......@.......
0x0010: 7f00 0001 0080 0100 0095 7f07 3030 3132 ............0012
0x0020: 3820 3031 3330 2046 4646 4646 4646 4646 8.0130.FFFFFFFFF
0x0030: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0040: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0050: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0060: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0070: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0080: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0090: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x00a0: 4646 4646 4646 4646 46 FFFFFFFFF
129 15:59:46.712936 00:00:00:00:00:00 > 00:00:00:00:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 121
0x0000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0070: 0000 0000 00 .....
At packet 256 tcpdump receives non-null data, but the payloads represents
what should've been received for packet 129 and later. Because the length
is taken from the vring, some of the slots also expose stale/corrupt data:
281 16:03:51.846490 IP 10.0.0.153.153 > 127.0.0.1.306: UDP, length 74
0x0000: 4500 0066 0001 0000 4011 f0ec 0a00 0099 E..f....@.......
0x0010: 7f00 0001 0099 0132 0052 d633 3030 3135 .......2.R.30015
0x0020: 3320 3030 3633 2046 4646 4646 4646 4646 3.0063.FFFFFFFFF
0x0030: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0040: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0050: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0060: 4646 4646 4646 5757 5757 5757 5757 5757 FFFFFFWWWWWWWWWW
0x0070: 5757 5757 5757 5757 5757 5757 5757 5757 WWWWWWWWWWWWWWWW
0x0080: 5757 5757 5757 5757 5757 5757 5757 5757 WWWWWWWWWWWWWWWW
0x0090: 5757 5757 5757 5757 5757 5757 0000 0000 WWWWWWWWWWWW....
0x00a0: 0000 0000 0000 0000 ........
Contributor
Author
|
As an additional comment, FreeBSD uses only half the slots, too, unless indirect descriptors are in use. Linux virtio_net doesn't seem to use indirect descriptors (checked 4.4, 4,19), so it seems reasonable to do the same. https://github.com/freebsd/freebsd/blob/master/sys/dev/netmap/if_vtnet_netmap.h#L386 |
jhk098
reviewed
Nov 22, 2020
| * num_rx_desc descriptors and so the queue is not | ||
| * allowed to become full. | ||
| */ | ||
| if (vq->num_free == 0) { |
Contributor
There was a problem hiding this comment.
Use the VQ_FULL macro for portability
jhk098
reviewed
Nov 22, 2020
| @@ -677,6 +685,20 @@ virtio_netmap_intr(struct netmap_adapter *na, int onoff) | |||
| } | |||
| } | |||
|
|
|||
Contributor
There was a problem hiding this comment.
Defining this function is unnecessary, because nothing is going to change the values listed in struct nm_config_info...
Here you are not updating them indeed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For every slot in a netmap ring, two descriptors in the virtqueue's
vring are used. Therefore, we can only use half the vring's size as
slots for netmap rings. Otherwise, the vring and the netmap ring are
out of sync. This currently corrupts proper operation and produces
bad data (see below).
This patch also sets an explicit nm_config to ensure we keep the
values for num_rx_desc/num_tx_desc from attach time and marks the
VQ_FULL case in virtio_netmap_init_buffers() as an issue that should
never be triggered.
Background: Experiment / Observation
Running a qemu-guest with a virtio_net NIC (rx_queue_size=256) attached
to a bridge on the host. Sending known/predictable packets in a loop
from the host via scapy. On the guest side, running tcpdump with native
netmap support and inspecting the packet data:
The first 128 packets look as expected, but thereafter come 128 packets
of "null data":
At packet 256 tcpdump receives non-null data, but the payloads represents
what should've been received for packet 129 and later. Because the length
is taken from the vring, some of the slots also expose stale/corrupt data:
This seems like a biggie and I wonder why this hasn't come up before, maybe there's something off in the analysis?
Scapy code to produce packets