Query: Search

	Include stories about projects I am a member of.

In D54785#1252666, @kib wrote:

I do not think that trying to schedule a task is very robust with a LOR detected. We are already in the situation risking the deadlock.

In D54785#1252125, @kib wrote:

This should be extremely useful, I already anticipate it.

Updated to incorporate review feedback.

Updated license and clarified this is only applicable to amd64 in main.

updates to incorporate review feedback

incorporate review feedback; rebase on top of changes to D52946

Reworked patch to use a large-ish buffer allocated from BSS and protected by a lock if called from an interrupt context. As noted in the commit message, this will result in increased lock contention during an interrupt storm which exceeds the capacity of the free list; however, overall lock contention should still be lower than it was when mca_log() was called with the mca_lock held.

In D52946#1210071, @lprylli_netflix.com wrote:

Not sure whether it needs to be addressed or not, but for completeness there is the possibility of some sequence of events occasionally emptying the mca_freelist, and then my understanding is that mca_record_entry() will work in "degraded mode" directly call mca_log() in an interrupt context (next to "MCA: Unable to allocate space for an event.\n"), and as long as that situation persists, the corresponding MCA printf() won't be ratechecked any more.

It could be a very unlikely theoretical case with the mca_postscan() changes, but with previous version of code, a couple of machines were able to trigger the "MCA: Unable to allocate space for an event" situation in netflix fleet.

Updating to apply cleanly on top of D52946, which introduces the use of sbuf to gather the log message.

Remove unneeded check for mca_startup_done

In D52943#1209843, @markj wrote:

Note that I am not 100% sure this can actually happen, since I am not 100% sure we will ever enable interrupts before we run mca_setup().

I don't quite follow, MCEs will be delivered even if interrupts are disabled. _mca_init() enables delivery of MCEs at SI_SUB_CPU time; what's wrong with calling mca_record_entry() after that point?

Fix logic in case (mode != polled && (mca_startup_done || (rec->mr_status & MC_STATUS_UC)))

In D52942#1209601, @jtl wrote:

Updated to hide this code under the DIAGNOSTIC kernel config option.

Updated to hide this code under the DIAGNOSTIC kernel config option.

update diff to account for 8 years of bitrot

AFAIK, no one has bemoaned the lack of this feature in the 8 years this has been sitting here. I think it is safe to abandon this.

Updating the diff to account for 8 years of bit rot.

incorporating review feedback

Thanks for the review! I agree on the documentation comment.

In D52872#1207896, @tuexen wrote:

In D52872#1207881, @jtl wrote:

Is this worthy of a release note?

Yes, the combination of D52871, D52872, and D52873.

My main concern about these three revisions is that they are somewhat susceptible of remote manipulation, and that could make it easier to DoS a server. However, I view that as a tradeoff that the user needs to make, and think an appropriate release note should suffice to warn about these issues.

Is this worthy of a release note?

If anyone is planning to review this (or if I should ask more people), let me know. Otherwise, I will probably just commit this soon. AFAICT, this is fairly innocuous (new feature with low risk of breaking existing things). But, there is a reason we get these things reviewed...

Thanks for doing this! LGTM other than one nit.

Overall, I think this approach has value. See my in-line comments for suggestions on things to review further.

Thanks! I like this approach. I've added a few comments about potential enhancements.

In D45411#1036082, @tuexen wrote:

In D45411#1035980, @jtl wrote:

Should we check for TCP_FUNC_BEING_REMOVED?

I thought about this and did again after you raised the question. The original code didn't.
...
However, since we don't hold the tcp_function_lock, the tfb can be removed any time before or after we set the TCP function block in tcp_newtcpcb. It cannot go away, since we hold a reference count. I guess that is the reason why there was no check...

Search
Use Results
Edit Query
Hide Query

Tue, Jan 20

Mon, Jan 19

Wed, Jan 14

Dec 8 2025

Dec 7 2025

Dec 6 2025

Dec 5 2025

Oct 28 2025

Oct 9 2025

Oct 8 2025

Oct 7 2025

Oct 6 2025

Oct 3 2025

Oct 2 2025

Oct 1 2025

Sep 29 2025

Sep 24 2025

Sep 15 2025

Sep 12 2025

May 29 2025

Jun 27 2024

Jun 26 2024

Jun 21 2024

May 31 2024

SearchUse ResultsEdit QueryHide Query