In D54785#1252666, @kib wrote:I do not think that trying to schedule a task is very robust with a LOR detected. We are already in the situation risking the deadlock.
- Feed Queries
- All Stories
- Search
- Feed Search
- Transactions
- Transaction Logs
Feed Search
Tue, Jan 20
Tue, Jan 20
In D54785#1252125, @kib wrote:This should be extremely useful, I already anticipate it.
Mon, Jan 19
Mon, Jan 19
Wed, Jan 14
Wed, Jan 14
Updated to incorporate review feedback.
Dec 8 2025
Dec 8 2025
Updated license and clarified this is only applicable to amd64 in main.
Dec 7 2025
Dec 7 2025
Dec 6 2025
Dec 6 2025
Dec 5 2025
Dec 5 2025
Oct 28 2025
Oct 28 2025
jtl committed rG40b2111cfaa1: x86: Add a way to inject artificial MCA events for testing (authored by lprylli_netflix.com).
jtl committed rGfc13cf3c5bd4: x86: Add a way to inject artificial MCA events for testing (authored by lprylli_netflix.com).
Oct 9 2025
Oct 9 2025
Oct 8 2025
Oct 8 2025
updates to incorporate review feedback
incorporate review feedback; rebase on top of changes to D52946
Reworked patch to use a large-ish buffer allocated from BSS and protected by a lock if called from an interrupt context. As noted in the commit message, this will result in increased lock contention during an interrupt storm which exceeds the capacity of the free list; however, overall lock contention should still be lower than it was when mca_log() was called with the mca_lock held.
Oct 7 2025
Oct 7 2025
In D52946#1210071, @lprylli_netflix.com wrote:Not sure whether it needs to be addressed or not, but for completeness there is the possibility of some sequence of events occasionally emptying the mca_freelist, and then my understanding is that mca_record_entry() will work in "degraded mode" directly call mca_log() in an interrupt context (next to "MCA: Unable to allocate space for an event.\n"), and as long as that situation persists, the corresponding MCA printf() won't be ratechecked any more.
It could be a very unlikely theoretical case with the mca_postscan() changes, but with previous version of code, a couple of machines were able to trigger the "MCA: Unable to allocate space for an event" situation in netflix fleet.
jtl committed rG1c2fc62e4a96: x86: Add a way to inject artificial MCA events for testing (authored by lprylli_netflix.com).
Updating to apply cleanly on top of D52946, which introduces the use of sbuf to gather the log message.
Remove unneeded check for mca_startup_done
In D52943#1209843, @markj wrote:Note that I am not 100% sure this can actually happen, since I am not 100% sure we will ever enable interrupts before we run mca_setup().
I don't quite follow, MCEs will be delivered even if interrupts are disabled. _mca_init() enables delivery of MCEs at SI_SUB_CPU time; what's wrong with calling mca_record_entry() after that point?
Fix logic in case (mode != polled && (mca_startup_done || (rec->mr_status & MC_STATUS_UC)))
Oct 6 2025
Oct 6 2025
In D52942#1209601, @jtl wrote:Updated to hide this code under the DIAGNOSTIC kernel config option.
Updated to hide this code under the DIAGNOSTIC kernel config option.
jtl requested review of D52938: x86: Reduce amount of time the MCA lock is held while emitting records.
jtl added inline comments to D12275: x86: Defer non-fatal MCA message output from the hardware interrupt context.
Oct 3 2025
Oct 3 2025
jtl updated the diff for D12275: x86: Defer non-fatal MCA message output from the hardware interrupt context.
update diff to account for 8 years of bitrot
AFAIK, no one has bemoaned the lack of this feature in the 8 years this has been sitting here. I think it is safe to abandon this.
Updating the diff to account for 8 years of bit rot.
incorporating review feedback
Thanks for the review! I agree on the documentation comment.
Oct 2 2025
Oct 2 2025
In D52872#1207896, @tuexen wrote:In D52872#1207881, @jtl wrote:Is this worthy of a release note?
My main concern about these three revisions is that they are somewhat susceptible of remote manipulation, and that could make it easier to DoS a server. However, I view that as a tradeoff that the user needs to make, and think an appropriate release note should suffice to warn about these issues.
My main concern about these three revisions is that they are somewhat susceptible of remote manipulation, and that could make it easier to DoS a server. However, I view that as a tradeoff that the user needs to make, and think an appropriate release note should suffice to warn about these issues.
Is this worthy of a release note?
If anyone is planning to review this (or if I should ask more people), let me know. Otherwise, I will probably just commit this soon. AFAICT, this is fairly innocuous (new feature with low risk of breaking existing things). But, there is a reason we get these things reviewed...
Thanks for doing this! LGTM other than one nit.
Oct 1 2025
Oct 1 2025
Sep 29 2025
Sep 29 2025
Sep 24 2025
Sep 24 2025
Sep 15 2025
Sep 15 2025
Sep 12 2025
Sep 12 2025
May 29 2025
May 29 2025
jtl added inline comments to D50581: sendfile: don't hack sb_lowat for sockets that manage the watermark.
Jun 27 2024
Jun 27 2024
Jun 26 2024
Jun 26 2024
Overall, I think this approach has value. See my in-line comments for suggestions on things to review further.
Jun 21 2024
Jun 21 2024
Thanks! I like this approach. I've added a few comments about potential enhancements.
May 31 2024
May 31 2024
In D45411#1036082, @tuexen wrote:In D45411#1035980, @jtl wrote:Should we check for TCP_FUNC_BEING_REMOVED?
I thought about this and did again after you raised the question. The original code didn't.
...
However, since we don't hold the tcp_function_lock, the tfb can be removed any time before or after we set the TCP function block in tcp_newtcpcb. It cannot go away, since we hold a reference count. I guess that is the reason why there was no check...