Tag systemd

Cannot create GC thread but a lot of memory

Rédigé par gorki 23 mars 2022 Aucun commentaire

Problem :

Launching a JVM I have the message : "Cannot create GC thread. Out of system resources"

Enough memory
Enough swap
Enough ulimit
Enough threads-max
Enough CPU

Event extend the PID limit...

Important (at the end) : debian version = 10.11

Solution :

After a hours of googling, I found :

https://stackoverflow.com/questions/18078859/java-run-out-of-memory-issue
- Fix the number of CPU for GC (in case of many CPU available ... greater than 8)
https://serverfault.com/questions/662992/java-on-linux-insufficient-memory-even-though-there-is-plenty-of-available-memor
- Check ulimit -u
- Check ulimit -Hu
- Check open files (https://shaarli.hoab.fr/?8yFkBw)
https://www.cyberciti.biz/tips/howto-linux-increase-pid-limits.html
- Extend PID max : cat /proc/sys/kernel/pid_max

But none of these solutions works and none was matching the number I had :

number of open files < ulimit -n
maximum process/tasks < ulimit -u

But in a thread, I found something that was working : UserTasksMax.
I'm running SystemD, I have around 10805 task running for my user.
And from : https://manpages.debian.org/stretch/systemd/logind.conf.5.en.html

UserTasksMax=

Sets the maximum number of OS tasks each user may run concurrently. This controls the TasksMax= setting of the per-user slice unit, see systemd.resource-control(5) for details. If assigned the special value "infinity", no tasks limit is applied. Defaults to 33%, which equals 10813 with the kernel's defaults on the host, but might be smaller in OS containers.

For my suspect PID (a lot of files) :

cat /proc/21890/status | grep Thread => 1 thread
ls /proc/21890/task | wc
confirmed by the usual command : ps -eLf | grep calrisk | wc

I have around 10805 threads running for a given JVM very close to the limit.

Complete guide :

https://www.journaldufreenaute.fr/nombre-maximal-de-threads-par-processus-sous-linux/

Parameters not present in all man page, it could grown up to 12288 on latest version.

To be check !

Upgrade debian et lost network

Rédigé par gorki 08 octobre 2019 Aucun commentaire

Problem :

I manage a dedicated server in OVH and I upgrade my debian from jessie to buster. Upgrade works quite well (it seems...) and I try to restart.

Server reboot fails as unreachable, fortunately OVH rescue mode allows me to login.

I check error log and first lost myself in RAID error message, but it was more simple than that.

Solution :

I check the /etc/network/interfaces file, it was OK

I check the logs files, clean, reboot, check again, still OK except that network was unreachable for named.

I finally remember that Debian switch to systemD in latest version so I tried to create system networking file manually : too complicate, it was not working.

In rescue mode, you can access your files as a mounted point so usual commands as systemctl does not work.

The solution was to chroot a shell :

mkdir /mnt/md2
mount /dev/md2 /mnt/md2
chroot /mnt/md2 bash
systemctl enable networking

And it works...

Now I have to check all other system to be sure that everything is working...

Begining with :

sudo apt-get update

sudo apt-get clean

sudo apt-get autoremove

sudo apt-get update && sudo apt-get upgrade

sudo dpkg --configure -a

SystemD and tomcat hang on startup

Rédigé par gorki 06 mai 2019 Aucun commentaire

Problem :

I used robertdebock/ansible-role-tomcat to install a Tomcat instance using Ansible. Works well until I deploy an application on it. Then java process hangs with 100% system CPU.

Starting with tomcat users without system work correctly.

Solution :

I suspected :

SELinux
Linux limits
VM slow I/O

But after a while I ran strace :

by modifying systemd configuration
by modifying catalina.sh configuration

All I have was a simple FUTEX wait...

And then I read the manual, as simple as :

strace -f -e trace=all -p <PID>

No need to trace from startup and by default, not all is traced...

After that, easy way, the process was reading recursively :

/proc/self/task/81569/cwd/proc/self/task/81569/cwd/proc/self/task/81569/cwd/proc/self/task/81569/cwd/proc/self/task/81569/cwd/proc/self/task/8156...

Just fixing the working_directory in the ansible role, and all is working.

Issue reported here.

Tomcat, NIO, Hanging et CLOSE_WAIT

Rédigé par gorki 11 mars 2019 Aucun commentaire

Problem :

We are testing a springboot application in AWS with ELB in front.

After a while of load-testing, the application was hanging :

HTTP 504 error code from Jmeter client
HTTP 502 if we raise ELB timeout
Once logged on the server :
- telnet localhost 8080 was OK
- sending GET / on this socket was not responding
- plenty of CLOSE_WAIT socket
- wget was also hanging (normal)
- connection was established during wget hang
- nothing in the log

Solution :

I initially think about the keepAlive timeout and pool of tomcat but

SpringBoot copy the connectionTimeout parameter to keepAliveTimeout
new socket is accepted and established
CLOSE_WAIT wasn't shutdown after hour

Doing the test many times, I finally so a classical "Too many open files" in the log. That's why I could not see more log during the hang.

So we change the nproc and nofile in /etc/security/limits.conf

And taadaaaa ! Nothing change in :

cat /proc/<$PID>/limits

Thanks to blogs over the world like this one :

the service is start with systemd
to override ressources limits with systemd :

[Service]
...
LimitNOFILE=500000
LimitNPROC=500000

At last but not least, the value of Tomcat NIO socket queue is around 10000 + other files + other process... choose wisely your limit

HOAB

History of a bug

Cannot create GC thread but a lot of memory

Problem :

Solution :

Upgrade debian et lost network

Problem :

Solution :

SystemD and tomcat hang on startup

Problem :

Solution :

Tomcat, NIO, Hanging et CLOSE_WAIT

Problem :

Solution :