Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iceccd is not reaping dead processes in good time #619

Open
jimis opened this issue Apr 27, 2023 · 1 comment
Open

iceccd is not reaping dead processes in good time #619

jimis opened this issue Apr 27, 2023 · 1 comment

Comments

@jimis
Copy link

jimis commented Apr 27, 2023

A recent issue appeared recently on my OpenSUSE-Tumbleweed bleeding edge system, and all the jobs icecc sends to other nodes are failing. But the biggest problem is that they are failing really slow, timing out basically before they re-run locally.

On my local system:

ICECC[5184] 2023-04-27 21:20:36: <Transfer Environment>
ICECC[5184] 2023-04-27 21:20:36: sent 32884804 bytes (99%)
ICECC[5184] 2023-04-27 21:20:36: Verified host 10.9.70.26 for environment 56e7bcc88b541ddf314dacc7a22a9e23 (x86_64)
ICECC[5184] 2023-04-27 21:20:36: </Transfer Environment: 318ms>
ICECC[5184] 2023-04-27 21:20:36: <send compile_file>
ICECC[5184] 2023-04-27 21:20:36: </send compile_file: 0ms>
ICECC[5184] 2023-04-27 21:20:36: <write_fd_to_server from cpp>
ICECC[6043] 2023-04-27 21:20:36: preparing source to send: /usr/bin/c++    [...]
ICECC[5184] 2023-04-27 21:20:36: sent 308295 bytes (18%)
ICECC[5184] 2023-04-27 21:20:36: </write_fd_to_server from cpp: 126ms>
ICECC[5184] 2023-04-27 21:20:36: <wait for cpp>
ICECC[5184] 2023-04-27 21:20:36: </wait for cpp: 0ms>
ICECC[5184] 2023-04-27 21:20:36: <wait for cs>
[...waiting...]
ICECC[5184] 2023-04-27 21:21:37: </wait for cs: 60103ms>
ICECC[5184] 2023-04-27 21:21:37: the server ran out of memory, recompiling locally
ICECC[5184] 2023-04-27 21:21:37: local build forced by remote exception: Error 101 - the server ran out of memory, recompiling locally

On the server side, it does not appear like it run out of memory. In debug mode I capture the failure as following:

Apr 27 21:20:36: remote compile arguments:    [...]
Apr 27 21:20:36: <parent, waiting>
[...waiting...]
Apr 27 21:21:37: timeout while reading preprocessed file
Apr 27 21:21:37: compiler produced stderr output:
Apr 27 21:21:37: cc1plus: error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or directory
Apr 27 21:21:37: [23408] 2023-04-27 19:21:37: Remote compilation exited with exit code 1
Apr 27 21:21:37: [23408] 2023-04-27 19:21:37: </parent, waiting: 60228ms>

The crash of cc1plus seems to happen very quickly but iceccd does not reap the dead process immediately. Until that minute elapses, I can see in the process table a zombie process named [g++] defunct.

Version of icecream on both systems:

$ rpm -q icecream
icecream-1.4.0-2.3.x86_64

P.S. The secondary issue is the failure finding libz.so.1 which causes the crash. I'm assuming icecream is failing to transfer all the dependencies that the binary needs. Could be related to the recently enabled hwcaps optimisations in OpenSUSE. I see that my cc1plus binaries depend on:

$ ldd /usr/lib64/gcc/x86_64-suse-linux/13/cc1plus | grep libz.so
        libz.so.1 => /lib64/glibc-hwcaps/x86-64-v3/libz.so.1.2.13 (0x00007f932ddd7000)
@jimis
Copy link
Author

jimis commented May 4, 2023

P.S. The secondary issue is the failure finding libz.so.1 which causes the crash. I'm assuming icecream is failing to transfer all the dependencies that the binary needs. Could be related to the recently enabled hwcaps optimisations in OpenSUSE.

This is probably fixed by #602.

The primary issue is still valid, a crashed process is not reaped in time by iceccd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant