Core: master_process_auto_reload directive. by jo-carter · Pull Request #645 · nginx/nginx ·

Termination of a worker process (or similar) with an unexpected signal, such as SIGKILL or SIGSEGV (or similar) while it is in the process of updating a shared memory zone, or is in the intermediate stage of performing processes that will result in the update of a shared memory zone, may result in the shared memory zones contents becoming corrupted / may result in an undefined state, resulting in request handling / nginx functionality not performing correctly.

This directive instructs nginx to be terminate itself to avoid continuing to process / continuing to accept new requests while potentially in such an undefined state, by means of self imitating either graceful or fast shutdown logic.

The directive may be set to either "quit" or "stop", following the shutdown signal naming scheme of the cli, corresponding to the SIGQUIT and SIGINT/SIGTERM equivalent shutdown logic.

src/os/unix/ngx_process.c

@jo-carter

@jo-carter seems like this will cause at least service interruption. A better solution would be to gracefully restart/reload nginx without inheriting the zones. Not sure it's easy to do though. Another interesting task is detecting if the worker crashed under lock and only then taking measures.

@arut

@arut

Thanks for the feedback.

seems like this will cause at least service interruption

Yes it will, and yes it's a blunt solution. The idea was whatever high availability mechanism that an advanced user (who would be the primary audience for this feature, and are generally would use HA) uses would detect this instance as down and failover / markdown this particular terminating instance accordingly, just with as with shutdown of nginx for any other reason. Thus no major global service interruption would occur in that scenario. Then we rely on init system (or similar) to restart nginx.

better solution would be to gracefully restart/reload nginx without inheriting the zones. Not sure it's easy to do though.

It's a good idea. It seems feasible to mark all the zones as no_reuse (or similar conditional logic in ngx_cycle.c), trigger reload logic, which should do this.

Another interesting task is detecting if the worker crashed under lock and only then taking measures.

At least for shared memory zones, we do already have that nearby to where the changes in this were made

@arut

@arut there is the problem of these fine(er) grained locks used in upstream_rr_peer/s which are in a shared memory zone contents, but aren't the standard mutexes of the shared memory zone. Those also don't appear to be force unlocked at any point (although that is probably an issue on it's own as well).

And the general worry that even when not under lock (especially with fine grained locking is used) that we (or others) may rely on some logical process or operation completing, the result of which updates shared memory, and without which shared memory is still in a undefined / unexpected state.

Therefore, performing the reload and setting all zones as noreuse whenever child process being terminated by a signal regardless of being under lock may still be the safest option.

At least for shared memory zones, we do already have that nearby to where the changes in this were made

Yes. But we could also somehow mark the zone as uninheritable and reload the configuration.

@arut

@arut there is the problem of these fine(er) grained locks used in upstream_rr_peer/s which are in a shared memory zone contents, but aren't the standard mutexes of the shared memory zone. Those also don't appear to be force unlocked at any point (although that is probably an issue on it's own as well).

You're right, these are different and we need to think about a different solution for them. Going through all such locks after a crash is not an option. We can wait until this particular object is accessed and only then check for an inconsistent state. For example, when acquiring a lock, make sure the pid which is holding the lock is still alive. If not, lock operation will return an error (a change in API).

@arut

@arut

You're right, these are different and we need to think about a different solution for them. Going through all such locks after a crash is not an option. We can wait until this particular object is accessed and only then check for an inconsistent state. For example, when acquiring a lock, make sure the pid which is holding the lock is still alive. If not, lock operation will return an error (a change in API).

I fear this may be an undesirable (and incredibly complex) change to make. At present, the rwlocks, are a one writer or many readers model, which function by use of a single machine word sized atomic integer, which is either used for reference counting (one or more) reader locks, or marked as a write locked by setting all bits high.

Storing PID in this case would require storing a list of PIDs(for the readers) as part of lock structure, updated as workers acquire the read lock / removed as lock is released. This list of PIDs / the lock itself requires an outer mutex.

Where the memory backing the list of PIDs in the lock will stored is also unclear (surely not dynamically allocated, however static allocated in list will consume lots of memory if it should support NGX_MAX_PROCESSES worth of readers).

It would also require maintaining a list of "alive" worker PIDs in shared memory, and iterating through that list each time lock is acquired (this can be slow). Again this list would need a mutex.

Finally, there would still be a significant performance impact from all this extra work, in a sensitive area known for heavy amounts of locking..

I'm not sure there is a good solution for this issue.

@arut

@arut
There are also cases such as use of reference counting in limit_req module that further demonstrate the "or is in the intermediate stage of performing processes that will result in the update of a shared memory zone" part of the original issue also. If worker was terminated while the shared memory zone is not under lock, and before counter has been decremented to zero, counter would never reach zero.

Although less catastrophic in this case as it should eventually be cleared. - I am sure there are more such cases lurking out that result in undefined / unexpected behavior.

Yes. But we could also somehow mark the zone as uninheritable and reload the configuration.

I've implemented this, please see latest commit, albeit all zones are marked as not inheritable during the resulting reload/reconfiguration given the above concerns.

When a child process exits on an abnormal signal, such as SIGKILL or SIGSEGV, master process will perform reconfiguration.

jo-carter force-pushed the auto_exit branch 6 times, most recently from 1f85193 to 1826127 Compare April 21, 2025 01:14

ac000 reviewed Apr 21, 2025
View reviewed changes

src/os/unix/ngx_process.c Outdated Show resolved Hide resolved

jo-carter force-pushed the auto_exit branch from 1826127 to 2775f51 Compare April 21, 2025 21:46

jo-carter force-pushed the auto_exit branch from 2775f51 to ed87375 Compare April 27, 2025 00:21

jo-carter changed the title ~~Core: master_process_auto_exit directive.~~ Core: master_process_auto_reload directive. Apr 27, 2025

jo-carter force-pushed the auto_exit branch from ed87375 to 1cca409 Compare April 30, 2025 13:11

Core: master_process_auto_reload directive.

2dca487
When a child process exits on an abnormal signal, such as SIGKILL or SIGSEGV, master process will perform reconfiguration.

jo-carter force-pushed the auto_exit branch from 1cca409 to 2dca487 Compare April 30, 2025 16:48

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!