Looks fine to me, just couple nits.
- Feed Queries
- All Stories
- Search
- Feed Search
- Transactions
- Transaction Logs
Jul 3 2024
Jul 2 2024
Have you looked on similar Linux code? It would be good to be consistent or at least similar. I haven't looked deep, but foreach_nfs_host_cb() seems to support multiple hosts.
Jun 27 2024
Jun 24 2024
Looks odd to me, but OK.
In D45660#1042742, @ken wrote:So here is what the debugging log message in isp_getpdb() shows. isp0 and isp1 are connected to LTO-6 tape drives via an 8Gb switch. isp2 is directly connected to an LTO-6 in loop mode:
isp0: Chan 0 handle 0x0 Port 0xfffc01 flags 0x0 curstate 77 laststate 77
isp0: Chan 0 handle 0x1 Port 0x011b26 flags 0x40a0 curstate 46 laststate 46
isp0: Chan 0 handle 0x7fe Port 0xfffffe flags 0x0 curstate 44 laststate 44
isp0: Chan 0 handle 0x7fe Port 0xfffffe flags 0x0 curstate 44 laststate 44
isp1: Chan 0 handle 0x0 Port 0xfffc01 flags 0x0 curstate 77 laststate 77
isp1: Chan 0 handle 0x1 Port 0x011a26 flags 0x40a0 curstate 46 laststate 46
isp1: Chan 0 handle 0x7fe Port 0xfffffe flags 0x0 curstate 44 laststate 44
isp1: Chan 0 handle 0x7fe Port 0xfffffe flags 0x0 curstate 44 laststate 44
isp2: Chan 0 handle 0x0 Port 0x000026 flags 0x40a0 curstate 46 laststate 46
It seems a good tunable, except I am not getting the meaning of "only" there. Why not "always", "force" or something like that?
None of QLogic documents I have know nothing about NVMe, and this state field is declared is byte there. I have no objections for this patch, but a bit curios what NVMe status do we see there for non-NVMe devices.
Jun 14 2024
Jun 7 2024
Jun 6 2024
May 29 2024
Differences of less than 4 (RQ_PPQ) are insignificant and are simply removed. No functional change (intended).
I suspect that first thread was skipped to avoid stealing a thread that was just scheduled to a CPU, but was unable to run yet.
I am not fully sure about the motivation of this change, but It feels wrong to me to have per-namespace zones. On a big system under heavy load UMA does a lot of work for per-CPU and per-domain caching, and doing it also per-namespace would multiply resource waste. Also last time I touched it, I remember it was difficult for UMA to operate in severely constrained environments, since eviction of per-CPU caches is quite expensive. I don't remember how reservation works in that context, but I suppose that having dozens of small zones with small reservations, but huge per-CPU caches is not a very viable configuration.
May 23 2024
May 14 2024
I see no problems, but I have difficulties to believe that timeout handlers 1-2 times per second per queue pair may have any visible effects. Also I am not happy to see second place where timeouts are calculated. And 99/100 also looks quite arbitrary.
Mechanically it seems to have sense. I've missed when than original transition happened, but if you say it is right, so be it.
May 7 2024
I wonder if there is any real architecture where pointer load/store is non-atomic. For things that are going to be executed between once and never it feels like you are over-engineering it. :)
I have no objections, if it is useful.
May 3 2024
Apr 27 2024
Apr 26 2024
In D44961#1025280, @asomers wrote:What is an "OOA queue"?
I wonder what is your queue depth, so that one message per request per 90 seconds would cause a noticeable storm. Also per-system limiting makes output not very useful, since it does not say much useful about LUNs, ports, commands, etc due to selecting first message out of many, only that something is wrong. Thinking even wider, I find those messages printed on actual completion not very useful, since if there are not a delays, but something is really wrong, the commands many never complete and so the messages may never get printed. I wonder if instead removing all this and once per second checking OOA queues for stuck requests and printing some digests would be more useful.
Apr 20 2024
Looks good to me, but if you wish, couple cosmetic thoughts.
Looks good to me, though seems only cosmetic.
Apr 17 2024
Apr 10 2024
Mar 25 2024
Mar 21 2024
I don't have any chip documentation to know what is right here, so just wonder if unconditional printing a bunch of raw hex numbers is expected here. It feels mpi3mr_print_fault_info() is another candidate for mpi3mr_dprint().
I am not a big fan of kernel printing something in response to arbitrary user requests, it makes logs messy. Is the error reporting to user is not enough here?
Mar 18 2024
Why not backport 506fe78c48 instead?
Mar 15 2024
My only complaint is that it puts the queue into the same cache line as the main queue, that may be modified by writers. But if you really need it for debugging, it could be understood.
Mar 6 2024
Mar 5 2024
On failure we've already notified consumers that controller has failed. What will report it is back? And is there even a device to sent request IOCTL?
If you say it helps I have no objections, but I see nvme_sim_controller_fail() destroying SIM, so I am not sure you actually get here.
I wonder if there are any namespace-specific events? I remember NVMe specs allow per-namespace SMART, but I don't remember much details now.
In D39620#1008905, @sean_rogue-research.com wrote:stable/13 has this patch
releng/13.2 doesn't have this patch (yet).I'm not very familiar with FreeBSD's branching system... I see FreeBSD 13.3-RELEASE was released today, is this bug fix included?