WebSphere problems related to new default nproc limit in RHEL 6
We recently had an incident on one of our production systems running under Red Hat Enterprise Linux where under certain load conditions WebSphere Application Server would fail with an OutOfMemoryError
with the following message:
Failed to create a thread: retVal -1073741830, errno 11
Error number 11 corresponds to EAGAIN
and indicates that the C library function creating the thread fails because of insufficient resources. Often this is related to native memory starvation, but in our case it turned out that it was the nproc
limit that was reached. That limit puts an upper bound on the number of processes a given user can create. It may affect WebSphere because in this context, Linux counts each thread as a distinct process.
Starting with RHEL 6, the soft nproc
limit is set to 1024 by default, while in previous releases this was not the case. The corresponding configuration can be found in /etc/security/limits.d/90-nproc.conf
. Generally a WebSphere instance only uses a few hundred of threads so that this problem may go unnoticed for some time before being triggered by an unusual load condition. You should also take into account that the limit applies to the sum of all threads created by all processes running with the same user as the WebSphere instance. In particular it is not unusual to have IBM HTTP Server running with the same user on the same host. Since the WebSphere plug-in uses a multithreaded processing model (and not an synchronous one), the nproc
limit may be reached if the number of concurrent requests increases too much.
One solution is to remove or edit the 90-nproc.conf
file to increase the nproc
limit for all users. However, since the purpose of the new default value in RHEL 6 is to prevent accidental fork bombs, it may be better to define new hard and soft nproc
limits only for the user running the WebSphere instance. While this is easy to configure, there is one other problem that needs to be taken into account.
For some unknown reasons, sudo
(in contrast to su
) is unable to set the soft limit for the new process to a value larger than the hard limit set on the parent process. If that occurs, instead of failing, sudo
creates the new process with the same soft limit as the parent process. This means that if the hard nproc
limit for normal users is lower than the soft nproc
limit of the WebSphere user and an administrator uses sudo
to start a WebSphere instance, then that instance will not have the expected soft nproc
limit. To avoid this problem, you should do the following:
- Increase the soft
nproc
limit for the user running WebSphere. - Increase the hard
nproc
for all users to the same (or a higher) value, keeping the soft limit unchanged (to avoid accidental fork bombs).
Note that you can verify that the limits are set correctly for a running WebSphere instance by determining the PID of the instance and checking the /proc/<pid>/limits
file.