Page MenuHomeFreeBSD

Convert vmx(4) to iflib
ClosedPublic

Authored by pkelsey on Jan 6 2019, 5:24 AM.
Tags
None
Referenced Files
F81969505: D18761.diff
Sat, Dec 14, 7:35 AM
Unknown Object (File)
Apr 8 2024, 1:12 PM
Unknown Object (File)
Mar 30 2024, 7:01 PM
Unknown Object (File)
Mar 22 2024, 2:18 AM
Unknown Object (File)
Mar 4 2024, 9:50 AM
Unknown Object (File)
Jan 28 2024, 2:53 PM
Unknown Object (File)
Jan 25 2024, 3:30 AM
Unknown Object (File)
Jan 16 2024, 3:59 AM
Subscribers

Details

Summary

This is a conversion of vmx(4) to being an iflib driver.

In this version, the receive ring configuration for each receive queue is different. Instead of the first receive ring in a receive queue alternating head and body type descriptors, it contains only head type descriptors. The second receive ring in a receive queue contains only body type descriptors. Also, only a single receive buffer size is used across all rings in all receive queues on an interface, with the size depending on the mtu that has been configured. These changes were necessary to conform to the iflib model.

This patch also contains minor iflib changes required by the new vmx(4) implementation:

  • IFLIB_MAX_RX_SEGS has been moved from iflib.c to iflib.h so that drivers can make use of it
  • iflib_dma_alloc_align() has been added to the iflib API so that drivers can specify dmamem alignment as required

Note that for vmx(4) to be able to use multiple queues, MSI-X has to work, and for MSI-X to work, one must set the tunable hw.pci.honor_msi_blacklist=0, as for historical reasons that no longer apply, the VMware emulated PCIe bridges are blacklisted for MSI-X functionality in the kernel (background at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203874).

Also, performance is generally better with tx abdication enabled in iflib for the interface, as it allows for work to be scheduled onto more CPUs than otherwise. To do so for vmx0, set the tunable/sysctl dev.vmx.0.iflib.tx_abdicate=1.

Test Plan

Tested on VMWare Workstation 15 Pro and ESXi 6.7.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

My only concern here is making IFLIB_MAX_RX_SEGS available to the driver...

IFLIB_MAX_RX_SEGS is more as an upper limit on what a sane driver would want rather than an upper limit of what a driver can use. If the driver actually wants (and can/will use) more, this may be the time to move ifr_frags to the end of struct iflib_rxq and use a dynamic size so the driver can have as many as it actually wants instead... that would allow other drivers to use a smaller rxq struct. ixgbe for example appears to be limited to 8 via an MPASS(), and it looks like no e1000 devices support larger than 16k frame sizes either. Dropping this to 8 for most drivers and having it at 32 or 64 for ones that support hardware 64k frames or LRO/GRO then setting this driver to use whatever is best for it could be worth the effort.

Would this driver actually be happier with a larger max? I note that the old driver used 17, for 16 MJUMPAGESIZE plus a MCLBYTES... with everything at MCLBYTES, do we want to just bump this to 65 and cap vmx there?

My only concern here is making IFLIB_MAX_RX_SEGS available to the driver...

IFLIB_MAX_RX_SEGS is more as an upper limit on what a sane driver would want rather than an upper limit of what a driver can use. If the driver actually wants (and can/will use) more, this may be the time to move ifr_frags to the end of struct iflib_rxq and use a dynamic size so the driver can have as many as it actually wants instead... that would allow other drivers to use a smaller rxq struct. ixgbe for example appears to be limited to 8 via an MPASS(), and it looks like no e1000 devices support larger than 16k frame sizes either. Dropping this to 8 for most drivers and having it at 32 or 64 for ones that support hardware 64k frames or LRO/GRO then setting this driver to use whatever is best for it could be worth the effort.

In the current iflib implementation, the reality of course is that IFLIB_MAX_RX_SEGS is the upper limit of what a driver can use. Making this per-driver dynamic I think is a question that should be taken up separately from this patch (and the change would be more than just putting ifr_frags at the end of iflib_rxq, as ifc_rxqs in iflib_ctx could no longer be an array of iflib_rxq, it would have to be an array of pointers to dynamically allocated iflib_rxq).

Would this driver actually be happier with a larger max? I note that the old driver used 17, for 16 MJUMPAGESIZE plus a MCLBYTES... with everything at MCLBYTES, do we want to just bump this to 65 and cap vmx there?

I have no idea if this driver would be measurably happier with a larger max, but I don't see any point to bumping IFLIB_MAX_RX_SEGS to 65. 64 * MCLBYTES = 128K, which is already greater than 16 * MJUMPAGESIZE + MCLBYTES = 66K. Also, I haven't ever seen 64 segments be consumed in my testing, so I've seen no indication in practice that we're bumping up against the limit we're giving it. My approach is to just give it as much runway as the current iflib implementation allows.

The existing version of this driver, as you point out, supports 16*4K + 2K max size. There is no public documentation for VMXNET3 that I am aware of other than the Linux driver implementation, and that's where that 16*4K + 2K max size comes from. There is some indication as to that limit being determined by the definition of MAX_SKB_FRAGS in the Linux kernel and not necessarily by the implementation of the VMXNET3 device in the hypervisor, but the take away is that we don't have any indication as to what the maximum number of segments (or the maximum total packet size) the VMXNET3 device can produce is.

My only concern here is making IFLIB_MAX_RX_SEGS available to the driver...

In the current iflib implementation, the reality of course is that IFLIB_MAX_RX_SEGS is the upper limit of what a driver can use. My approach is to just give it as much runway as the current iflib implementation allows.

Yeah, I suppose the "why arbitrarily limit things" argument is more compelling than the "why expose more details" one.

sys/dev/vmware/vmxnet3/if_vmx.c
1447 ↗(On Diff #52603)

It may be worth adding:

if (avail > budget)
    break;

here since iflib really only cares if the return value is > budget, not the exact value, and a budget of 1 is used to check if there's more to process.

This revision is now accepted and ready to land.Jan 8 2019, 5:12 PM
sys/dev/vmware/vmxnet3/if_vmx.c
1447 ↗(On Diff #52603)

I did become of aware of only needing to go as far as budget when seeing what is available after I had written this and had that code in there at one point. I think I wound up discarding it along with some other speculative changes I made during testing that did not seem to change performance.

I will add this back in prior to commit though as having it in there is really keeping in line with how the API works.

This revision was automatically updated to reflect the committed changes.