Friday, November 25, 2011

broken ps

ps, w and a bunch of other things were not working (they would just hang uninterruptedly). I checked in proc. 'ls' worked fine, and so I thought I'd cat the cmdline of each process. I did this as a 'for' loop and found that it stopped at one particular PID. As I was able to read the /proc dir, I could see that the perms were ok, and that the process owner was someone else than the logged in user or root. I could not find anything in /sys (not that I would know really what to look for, since that is mostly device stuff). I ran tcpdump in case it had a keylogger sending stuff back (netstat would also hang)

cd /proc

ls | grep -E "([0-9]){1}" > /tmp/cm.txt

for pid in `cat /tmp/cm.txt`;do echo "checking PID $pid...";cat /proc/${pid}/cmdline;echo;done

I found that I could not kill the process either. Tried all 64 signals with a for loop.

Nothing in /var/run with a PID matching that.

The machine gets loads of 'BUG: Bad page state in process' errors for a variety of processes. These don't seem to result in crashes.

Perhaps there is a region of memory that the process has written to that is unreadable?

Wednesday, August 31, 2011

initrd images created from kickstart

Update: initrd files are now 'xz' archives.



I wondered whether a kickstart of a RHEL/CentOS distro would provide a customised initrd image for the hardware it was being installed on. To check this, I compared the initrd from a server with a particular (Marvell) SAS disk controller with another server without that controller.

initrd images are gzipped cpio files, so to compare the two, I copied each initrd to another location, so as not to interfere with needed boot files, moved each to a filename with a '.gz' extension, then gunzipped them and ran 'cpio -t < initrd_file_name' to list the files. The SAS disk controller kernel module was present in the initrd from the host that had that particular hardware, but not on the other host which didn't. It wasn't present in the kernel image (vmlinuz) on either, though it was present in config as a loadable module, so would have to be in the initrd, since it needs a filesystem to load from, presumably:

CONFIG_SCSI_MVSAS=m

Wednesday, February 16, 2011

replacing a space with a newline

contents of file.txt:

abc def ghi

cat file.txt | tr ' ' '\012'

abc
def
ghi


I originally tried this with

cat file.txt | sed 's/ /\n/g'

and it didn't work. I am told that '\n' isn't always recognised as a newline

Wednesday, February 02, 2011

Migrating (expanding) an existing array from RAID 1 to RAID 10 on HP SmartArray 400i

I added a couple of 72GB drives to an existing 2-drive array on an HP DL360 G5, which has a SmartArray P400i controller. The driver was the HP cciss driver, and the server was running CentOS 5.5 x86_64.

The expansion of the array can be done while the server is online, over ssh using hpacucli. While the underlying drive can be expanded, I have not yet found a way to notify the kernel about the changed drive size, so it appears a reboot is still required (or reloading the cciss driver, which in most cases means a reboot).

Using hpacucli:


=> controller all show

Smart Array P400i in Slot 0 (Embedded) (sn: PH81MQ6085 )

=> ctrl slot=0 show config

Smart Array P400i in Slot 0 (Embedded) (sn: PH81MQ6085 )

array A (SAS, Unused Space: 0 MB)


logicaldrive 1 (68.3 GB, RAID 1, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)

unassigned

physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK)

=> controller slot=0 logicaldrive 1 add drives=1I:1:3,1I:1:4
=> ctrl slot=0 show config

Smart Array P400i in Slot 0 (Embedded) (sn: PH81MQ6085 )

array A (SAS, Unused Space: 139953 MB)


logicaldrive 1 (68.3 GB, RAID 1+0, Transforming, 0% complete)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK)

=>

Then you need to expand the logical drive into this space:

=> controller slot=0 show config

Smart Array P400i in Slot 0 (Embedded) (sn: PH81MQ6085 )

array A (SAS, Unused Space: 139953 MB)


logicaldrive 1 (68.3 GB, RAID 1+0, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK)

=>
=> controller slot=0 logicaldrive 1 modify size=max

Warning: Extension may not be supported on certain operating systems.
Performing extension on these operating systems can cause data to
become inaccessible. See ACU documentation for details. Continue?
(y/n) y

=> controller slot=0 show config

Smart Array P400i in Slot 0 (Embedded) (sn: PH81MQ6085 )

array A (SAS, Unused Space: 0 MB)


logicaldrive 1 (136.7 GB, RAID 1+0, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK)

=>

The size is correctly reported in /sys:

# cat /sys/block/cciss\!c0d0/size
286611840
#

However, fdisk does not see it, even after a partprobe, and I could find no way of re-scanning in /proc or /sys to get it to see the increased size.

One suggestion I haven't tried is to use 'sfdisk -R'

So for now a reboot, then resize partitions via resize2fs.