Page MenuHomeFreeBSD

Establish 4GB boundaries for busdma in the mpr(4) and mps(4) driver
ClosedPublic

Authored by ken on Dec 12 2023, 8:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 17, 4:36 AM
Unknown Object (File)
Sun, Nov 17, 2:31 AM
Unknown Object (File)
Sun, Nov 17, 12:37 AM
Unknown Object (File)
Apr 8 2024, 4:37 AM
Unknown Object (File)
Feb 13 2024, 1:55 AM
Unknown Object (File)
Jan 11 2024, 10:09 PM
Unknown Object (File)
Dec 28 2023, 4:18 PM
Unknown Object (File)
Dec 14 2023, 8:17 PM
Subscribers
None

Details

Reviewers
imp
mav
kadesai
slm
Summary

Most all of the memory used by the cards in the mpr(4) and mps(4) drivers is required, according to the specs and Broadcom developers, to be within a 4GB segment of memory.

This includes:

  • System Request Message Frames pool
  • Reply Free Queues pool
  • ReplyDescriptorPost Queues pool
  • Chain Segments pool
  • Sense Buffers pool
  • SystemReply message pool

We got a bug report from Dwight Engen, who ran into data corruption in the BAE port of FreeBSD:

https://www.baesystems.com/en/product/stop-os

We have a port of the FreeBSD mpr driver to our
kernel and recently I found an issue under heavy load where a DMA may
go to the wrong address. The test system is a Supermicro X10SRH-CLN4F
with the onboard SAS3008 controller setup with 2 enterprise Micron SSDs
in RAID 0 (striped). I have debugged the issue and narrowed down that
the errant DMA is one that has a segment that crosses a 4GB physical
boundary. There are more details I can provide if you'd like, but with
the attached patch in place I can no longer re-create the issue.

I'm not sure if this is a known limit of the card (have not found a
datasheet/programming docs for the chip) or our system is just doing
something a bit different. Any helpful info or insight would be welcome.

Anyway, just thought this might be helpful info if you want to apply a
similar fix to FreeBSD. You can ignore/discard the commit message as it
is my internal commit (blkio is our own tool we use to write/read every
block of a device with CRC verification which is how I found the problem).

The commit message was:

[PATCH 8/9] mpr: fix memory corrupting DMA when sg segment crosses
4GB boundary

Test case was two SSD's in RAID 0 (stripe). The logical disk was then
partitioned into two partitions. One partition had lots of filesystem I/O
and the other was initially filled using blkio with CRCable data and then
read back with blkio CRC verify in a loop. Eventually blkio would report
a bad CRC block because the physical page being read-ahead into didn't
contain the right data. If the physical address in the arq/segs was
for example 0x500003000 the data would actually be DMAed to 0x400003000.

The original patch was against mpr(4) before busdma templates were introduced, and only affected the buffer pool (sc->buffer_dmat) in the mpr(4) driver. After some discussion with Dwight and the LSI/Broadcom developers and looking through the driver, it looks like most of the queues in the driver are ok, because they limit the memory used to memory below 4GB. The buffer queue and the chain frames seem to be the exceptions.

This is pretty much the same between the mpr(4) and mps(4) drivers.

Comments welcome here, is this the right thing to do?

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

ken requested review of this revision.Dec 12 2023, 8:51 PM
ken edited the summary of this revision. (Show Details)
ken edited the summary of this revision. (Show Details)

I think this is right.
I ran into issues similar to this in mpr3mr. But there the problem was that all I/O was specified as having a 4GB max address, leading to lots of bouncing and many deferrals because we ran out of bounce pages. I don't thjink any of these will hit that, though.

sys/dev/mpr/mpr.c
1504

Isn't there a constant for 4G we can use here? And if not, we should create one because it's too easy to add / loose a zero.

This revision is now accepted and ready to land.Dec 12 2023, 9:15 PM

Use BUS_SPACE_MAXSIZE_32BIT+1 to represent the 4GB boundary instead of 0x100000000.

This revision now requires review to proceed.Dec 13 2023, 7:20 PM
sys/dev/mpr/mpr.c
1504

There are a couple of macros for 0xffffffff, and I figured the BUS_SPACE version might fit here.

thanks. looks better

This revision is now accepted and ready to land.Dec 13 2023, 9:00 PM

This is already merged.