Thursday, February 22, 2007

linuxthreads being used instead of NPTL (native posix threading library)

Built a Debian machine which had an unusual problem whereby it was using linuxthreads instead of NPTL (native posix threading library) threads. Thus, when threaded services like nscd and java started, they would show multiple processes rather than a single (threaded) process. The /lib/tls directory was present, and all the right packages. But as a getconf showed, it was using linuxthreads:

# getconf GNU_LIBPTHREAD_VERSION
linuxthreads-0.10

ldconfig -v showed the /lib/tls libraries, it just wasn't using them

The clue was in an strace when starting nscd:

access("/etc/ld.so.nohwcap", F_OK) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f00000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=74936, ...}) = 0
mmap2(NULL, 74936, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7eed000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = 0
open("/lib/libncurses.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200\345"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=263040, ...}) = 0
mmap2(NULL, 264196, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7eac000
mmap2(0xb7ee4000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x38) = 0xb7ee4000
mmap2(0xb7eec000, 2052, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7eec000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = 0
open("/lib/libdl.so.2", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20\f\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=9592, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7eab000
mmap2(NULL, 12404, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7ea7000
mmap2(0xb7ea9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1) = 0xb7ea9000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = 0
open("/lib/libc.so.6", O_RDONLY) = 3


On a machine using nptl:


execve("/etc/init.d/nscd", ["/etc/init.d/nscd", "start"], [/* 16 vars */]) = 0
uname({sys="Linux", node="ws-6", ...}) = 0
brk(0) = 0x80e6000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=19421, ...}) = 0
old_mmap(NULL, 19421, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libncurses.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\220\342"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=252592, ...}) = 0
old_mmap(NULL, 257868, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4001d000
old_mmap(0x40053000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x35000) = 0x40053000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/tls/libdl.so.2", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\32"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=9872, ...}) = 0
old_mmap(NULL, 8632, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4005c000
old_mmap(0x4005e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x2000) = 0x4005e000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/tls/libc.so.6", O_RDONLY) = 3


The presence of a file 'ld.so.nohwcap' was the problem. According to the man page of ld.so (on an Etch machine): "When this file is present the dynamic linker will load the non-optimized version of a library, even if the CPU supports the optimized version"

The reason for the presence of this file was because glibc was downgraded to an earlier version, which causes Debian to put this file in /etc. glibc was downgraded because this machine was mistakenly installed with 'etch' instead of 'sarge' and it was deemed easier at the time to downgrade via apt/aptitude etc, rather than do a fresh install. Perhaps a mistake :)

No comments: