Linux

I've been using Linux/Unix for many years. I've always had a strong interest in technology in general and computing specifically.

These are my opinions. Opinions are like noses, everyone has one, and they all smell.

Enjoy your visit.
December 2014
M T W T F S S
« Dec    
1234567
891011121314
15161718192021
22232425262728
293031  

Replacing failed software raided drive

I referenced these instructions to remind me how to replace a drive.

In my case the output of mdstat looks like this:

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[1]
33615936 blocks [2/2] [UU]

md1 : active raid1 sda3[2](F) sdb3[1]
2096384 blocks [2/1] [_U]

md0 : active raid1 sda1[0] sdb1[1]
128384 blocks [2/2] [UU]

unused devices:

So I have three partitions on two drives raided together. And sda3 is failing. This is the message I received in email.

This is an automatically generated mail message from mdadm
running on host.domain.com

A Fail event had been detected on md device /dev/md1.

Faithfully yours, etc.

Device Boot Start End Blocks Id System
/dev/sda1 * 1 16 128488+ fd Linux raid autodetect
/dev/sda2 17 4201 33616012+ fd Linux raid autodetect
/dev/sda3 4202 4462 2096482+ fd Linux raid autodetect

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 16 128488+ fd Linux raid autodetect
/dev/sdb2 17 4201 33616012+ fd Linux raid autodetect
/dev/sdb3 4202 4462 2096482+ fd Linux raid autodetect

Disk /dev/md0: 131 MB, 131465216 bytes
Disk /dev/md1: 2146 MB, 2146697216 bytes
Disk /dev/md2: 34.4 GB, 34422718464 bytes

Removing the failed partition(s) and disk:

I used the mdadm command to first fail

mdadm –manage -dev/md0 –fail /dev/sda2
mdadm –manage -dev/md1 –fail /dev/sda3
mdadm –manage -dev/md2 –fail /dev/sda1

then remove the raid devices on the failing drive.

madam –manage /dev/md0 –remove /dev/sda2
madam –manage /dev/md1 –remove /dev/sda3
madam –manage /dev/md2 –remove /dev/sda1

Then I shut down the system

shutdown -h now

and replaced the drive with a new one. Then I tried to reboot. But because the failed drive was the first drive in the scsi chain, it failed to boot with the message.

No Operating System Present

Adding the new disk to the RAID Array:

So I ended up having to switch the drives, putting sdb in as sda and then proceeding. I used sfdisk to mirror the partitioning between the two drives.

sfdisk -d /dev/sd1 | sfdisk /dev/sdb

Add the partitions back into the RAID Arrays:

mdadm –manage /dev/md0 –add /dev/sdb2
mdadm –manage /dev/md1 –add /dev/sdb3
mdadm –manage /dev/md2 –add /dev/sdb1

cat /proc/mdstat

I could see the drive rebuilding. When it finished I hot swapped out sda and did the whole process over again, this time without rebooting the system, since the system uses hot swap drives. It worked fine and I had both drives up and running. I could have done the whole process without rebooting the machine.

Install Grub on new hard drive MBR:

# grub
grub> find /grub/stage1
(hd0,0)
grub> device (hd0) /dev/sdb
grub> root (hd0,0)
grub> setup (hd0)
grub> find /grub/stage1
(hd0,0)
(hd1,0)

grub> quit

So now I have the boot manager mirrored on both drives. I can reboot with either single drive and it will work fine.

Share

Three Virtual Machines, one host

Just a word about this post:

If this seems disjointed, illogical with more than a few misspelled words, come back later. This started out as my notes on configuring a set of virtual servers inside a client’s network. So I started writing as I went along. Eventually I’ll have it edited and make it final, but for now it’s just my notes. When it’s finished, I’ll remove this note.

Just sayin.

I have been tasked with setting up three virtual machines on a single host. These hosts will each have only one basic function on the company intranet. They will be a cvs server, a ftp server and a samba server. This will be a place to keep my notes.

Since I’m working with x86-64 hardware and all guest OS’s will be Linux, KVM seems the best choice. What I’ve read says it imposes very low overhead, and I like the ability to use a logical volume directly. I planned to install each virtual host into a separate logical volume, with the intention of being able to adjust the size of the hard drive inside the virtual machine, as needed by a changing business. We chose CentOS 5.5 as it seems a good choice for the standard on all this company’s servers. Most of their servers are RH or clones thereof.

Hardware

Dell PowerEdge R210 Intel® Core™ I3 540 3.06GHz, 4M Cache, 2C/4T 16 Gb Memory (4x4Gb) 2 x 2TB 7.2K RPM SATA 3.5in Cabled Hard Drive mirrored, DVD-ROM Drive and BMC.

On some Dell hardware, you also need to disable “Trusted Execution”, otherwise VT will not be enabled. That was not the case on this hardware. The CPU does have the VT extensions.

ftp server
200 Gb
1 CPU
4 Gb RAM

cvs server
300Gb
1 CPU
4 Gb RAM

samba server
800Gb
1 CPU
4 Gb RAM

The machine came with 2 drives so that we could mirror them. The system came with software raid controller, so I just chose to use Linux built in software raid. I configured most of the the drive space in RAID I. To install the system I needed to create a 100M partition outside the raid, because /boot cannot be within the software raid.

Raid Devices

/boot  100MB
/opt 100MB

After the install, I changed the options in fstab to ro,noauto,nouser,sync and then did a poor mans mirror

dd if=/dev/sda1 of=/dev/sdb1

The unexpected result was that this changed the label so the partition would no longer mount on /opt. I’ll have to relabel the partition and then add the entry back to the fstab file. I wonder what happens when you have two disks with the same disk label?

I found later one consequence, when I got in a state where I needed to try to upgrade to fix some things. The boot CD found two partitions /dev/sda1 and /dev/sdb1 with disklabel /boot. It refused to continue, telling me to fix that first. When I do this next time I will use tune2fs to relabel the partition. If we ever lose the drive with /boot, my hope is that it will allow us to continue running the system. This is a mirrored drive setup, and with a bootable partition, it can be recovered more simply.

The rest of the 2 terrabyte drive  became the Volume Group System. I’m not a fan of the default names used during the install. Which logical volume inside VOL_GROUP00 contains the /usr partiton? Is it VOL00 or VOL01 or VOL05? I override the names and give them names that will help me identify the data, when I have to boot from a rescue CD and start copying the data off a failing system, or make a change to the fstab to get the system to boot from the still good drive in a mirrored pair.

Inside that I created logical volumes

logical volume root mounted on /

logical volume swap

Since the installer always wants a mount point for each partition and logical volume, I wait until after the install of the host system to create:

logical volume ftp for the ftp server

lvcreate -L 200G -n ftp System

logical volume cvs for the cvs server

lvcreate -L 200G -n cvs System

logical volume samba for the samba server

lvcreate -L 200G -n samba System

Resizing logical volumes inside logical volumes

1. shutdown virtual machine
use kparx to add the logical volumes

kpartx -a /dev/System/ftp

lvs

LV            VG      Attr    LSize     Origin  Snap%  Move  Log Copy% Convert
samba      System    -wi-a-     1.00T
cvs           System    -wi-a-  198.00G
ftp           System     -wi-ao 248.00G
root         System     -wi-ao    3.91G
swap        System    -wi-ao    1.94G
LogVol00 VolGroup00 -wi— 192.22G
LogVol101 VolGroup00 -wi–    5.66G

2. extend HOST logical volume

lvextend --L+50G /dev/mapper/System-ftp
Extending logical volume ftp to 248.00 GB
Logical volume ftp successfully resized

vgdisplay -v

3. resize the physical volume on the virtual machine

pvresize --setphysicalvolumesize 248G /dev/vda2

lvextend -L+50G /dev/VolGroup00/LogVol00

This failed. I was able to resize the logical volumes on the host, but I kept getting errors similar to:

device-mapper: reload ioctl failed: Invalid argument Failed to suspend LogVol00

I found plenty of links to other people who’ve encountered this problem, but no solution. So we decided to fix the size of the logical volumes for each host and move on.

So I set fixed sizes for the logical volumes.

400G for the ftp server

557G for the cvs server

900G for the samba server

Then I installed CentOS 5.5 on each.

Networking

Getting the machines to connect to the LAN with addresses on that LAN was another challenge. The default CentOS install set up a virtual network between the hosts with an outbound NAT connection. I wanted each machine to have a discreet IP address and be as separated as possible. Same reason I had them running in their own logical volumes.

I found lots of descriptions of what I needed to do, but they all seemed to lacking a small piece of information. I discovered you must create the bridge device on the host first, before you install the virtual hosts. At least for me, I wasn’t able to install and then change the network configuration. I may not have understood the required modifications well enough.

I used the instructions from this site:

CentOS / Redhat: KVM Bridged Network Configuration

I first created the bridge device. I made backups of original files before I started. Remember to use prefix, not postfix notation. bak.ifcfg. The scripts look for any file that starts with ifcfg and then acts upon them. First a new file for the bridge device.

vi /etc/sysconfig/network-scripts/ifcfg-br0

DEVICE=br0
TYPE=Bridge
BOOTPROTO=static
IPADDR=192.168.1.5
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
ONBOOT=yes

Then edit the ethernet configuration file, after making a backup:


vi /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
BOOTPROTO=none
HWADR=12:34:56:78:91:23
BRIDGE=br0
ONBOOT=yes

Other interesting links:

Libvirt overwrites the existing iptables rules

Redhat Hypervisor Deployment Guide

Xen Cloud Platform

Setting guest network

VLAN bridge config

How to Get Maximum Network Performance using paravirtual drivers and bridged networking

Using bridged networking with Virt-manager

A Quick Guide to Using KVM with CentOS-5.1

Virtual Networking

KVM is interesting again.. and how I setup my virtual network…

KVM Bridged Network – solutions and problems

CentOS / Redhat: KVM Bridged Network Configuration

How to setup Windows guest paravirtual network drivers

CentOS 5.x Samba Domain Controller With LDAP Backend

Share

Nagios monitoring mysql

I was asked by a client to configure nagios to monitor two database servers, running on Redhat
Enterprise Linux 5. Here are the steps, including a couple mis-steps to get it working. Nagios
was already set up and running on a server called monitor, which is running CentOS5.

I had two options. I could directly monitor the databases from monitor using the check_mysql or
run check_mysql on the database server and call it through check_by_ssh. I started out configuring
the method over ssh.

First, in each database server I created a user nagios and set a password for that user. I then
created a set of keys:

ssh-keygen -t dsa

I set no passphrase for the key, since I intended it to have a single purpose and limited
access to the database servers. I then tested the access to see if there were any glitches.
It worked on one server but not the other. After a quick once over, I decided to proceed and
solve that problem later. Since I needed check_mysql I compiled the plugins

From the nagios plugins web site, I downloaded nagios-plugins-1.4.14.tar.gz to each of the
database servers. However, because not all the necessary mysql packages were in place, it
threw some errors during the configure stage.

./configure –with-nagios-user=nagios –with-nagios-group=nagios –with-mysql=/usr

The main error that interfered with my plans was the failure to build the check_mysql plugin.
After some research I discovered that the failure was probably due to the absence of some
mysql libraries that would be in a development package. However, RHEL5 doesn’t have such a
package in their repository for RHEL5. It is available in some of the alternate repositories. It’s not really that surprising, when I think about it, that the package is unavailable, RHEL isn’t intended to be a development platform, it’s a server platform. I didn’t want to add alternate repositories without permission from the client.

So I went for plan B. I decided to configure the check-mysql to run on the monitor server
and attach to the mysql database over the network. There is a danger that If not carefully configured this could represent a security vulnerability for the database server. To make it as secure as possible
I logged into mysql on each of the database servers and created special access rules for this
purpose. I created a special nagios user with it’s own password and gave it read only permissions
and only on one database.

grant select on database.* to nagios@monitor identified by “password”

Now the user nagios can read that database. It doesn’t have any more privileges, so it’s unlikely to be used to damage the database, even if the monitor were compromised. The bad guys won’t
be able to use the compromise of the monitor to also compromise or damage the database server.

To test my work I issued the following command:

/usr/local/nagios/libexec/check_mysql -d database -u nagios -p password -H $HOSTNAME$

$HOSTNAME is the ip of the database server.
The data came back:

Uptime: 528011 Threads: 61 Questions: 83799845 Slow queries: 38527 Opens:
11365 Flush tables: 1 Open tables: 1003 Queries per second avg: 158.709

To prevent the username and password from being exposed in the web interface I put some of the command values in resource.cfg

###########################################################################
#
# RESOURCE.CFG – Sample Resource File for Nagios 3.0b6
#
# Last Modified: 09-10-2003
#
# You can define $USERx$ macros in this file, which can in turn be used
# in command definitions in your host config file(s).  $USERx$ macros are
# useful for storing sensitive information such as usernames, passwords,
# etc.  They are also handy for specifying the path to plugins and
# event handlers – if you decide to move the plugins or event handlers to
# a different directory in the future, you can just update one or two
# $USERx$ macros, instead of modifying a lot of command definitions.
#
# The CGIs will not attempt to read the contents of resource files, so
# you can set restrictive permissions (600 or 660) on them.
#
# Nagios supports up to 32 $USERx$ macros ($USER1$ through $USER32$)
#
# Resource files may also be used to store configuration directives for
# external data sources like MySQL…
#
###########################################################################

# Sets $USER1$ to be the path to the plugins
$USER1$=/usr/local/nagios/libexec
# Sets $USER2$ to be the path to event handlers

#$USER2$=/usr/local/nagios/libexec/eventhandlers
# Store some usernames and passwords (hidden from the CGIs)

$USER3$=nagios
$USER4$=password

###############################################################################################

commands.cfg is where I put my tested command

############################### commands.cfg ######################
define command{
command_name check-mysql
command_line $USER1$/check_mysql -d tracking -u $USER3$ -p $USER4$ -H $HOSTADDRESS$
}
####################################################################

Next I need to tell nagios where to send messages.

############################### contacts.cfg #######################
define contact{
contact_name pacneil
use generic-contact
alias Neil Schneider
email pacneil@linuxgeek.net
}

define contactgroup {
contactgroup_name pacneil
alias Test Group
members pacneil
}
######################################################################

What group of servers are we going to monitor?

############################### host_groups.cfg #####################
define hostgroup{
hostgroup_name db-host-group
alias Database Servers Host Group
hostgroup_members db-slave-host-group
}

define hostgroup{
hostgroup_name db-slave-host-group
alias Slave Database Servers Host Group
}
######################################################################

And we need to configure some parameters how we want to display the hosts in the web interface.

############################### hosts.cfg ############################
define host{
use db-server
host_name db3.servers.pmc
hostgroups db-slave-host-group,lb1-host-group,rackspace-host-group
alias db3
display_name Db3
address 74.205.65.35
parents app2.servers.pmc
2d_coords 100,0
3d_coords -5,4,1
}

define host{
use db-server
host_name db4.servers.pmc
hostgroups db-slave-host-group,lb1-host-group,rackspace-host-group
alias db4
display_name Db4
address 74.205.65.36
parents app2.servers.pmc
2d_coords 200,0
3d_coords -5,4,-1
}

######################################################################

And I create a service group just for database servers.

########################## service_groups.cfg ########################

define servicegroup{
servicegroup_name db-server-service-group
alias Database Server Service Group
servicegroup_members server-service-group
}
######################################################################

Then I define the service

############################ services.cfg ############################
define service{
use server-service
name db-server-service
servicegroups db-server-service-group
hostgroup_name db-host-group
register 0
}

######################################################################

Share

Apache with Syslog

I just updated the page for Remote Logging Apache with Syslog-ng.

I hope it saves you some time and you find it useful.

Share

Is MeeGo Linux’ Answer to iPad?

Three weeks ago, the technology world was aflutter with buzz of the iPad. But with yesterday’s MeeGo announcement from Intel, the Linux Foundation and Nokia, it appears that Apple could have a Linux-based competitor for tablets, netbooks and other categories of devices.

…… read more

 

Share

Redhat Enterprise 5.4 Released

This is at it’s heart an update to RHEL 5. But Redhat is promoting it as “the first product to deliver commercial quality open source virtualization featuring Kernel-based Virtual Machine (KVM) hypervisor technology.” The kernel version is still 2.6.18, with backported patches. Redhat is promoting upcoming Red Hat Enterprise Virtualization products along with RHEL 5.4 on the announcement page .

The server release is touted as providing “the most cost-effective, flexible, and scalable environment”. There are two flavors, Red Hat Enterprise Linux and Red Hat Enterprise Linux Advanced Platform. The number of guests on Advanced Platform is unlimited while the standard release is limited to four guests.

Redhat Enterprise Linux is certified as a guest OS on the following platforms:

  • VMware ESX and VMware ESXi
  • BM POWER LPARs
  • IBM System z

Redhat Enterprise Linux supports three distributed system technologies:

  • Logical Volume Manager (LVM)
  • Global File System (GFS)
  • Distributed Lock Manager (DLM)

Redhat also has new managment tools for managing virtualized environments. And of course it provides tools like Samba for integration into Windows environments, Apache Web server, MySQL and Postgresql databases. And of course all the enterprise network services such as DNS, dhcp, and firewall capabilities are included.

Redhat is also promoting Redhat Enterprise Desktop as an alternative to “proprietary” desktop operating systems. They’re selling the “security” features and cost benefits of managing their system. And if you must run one of those “proprietary” systems, they offer virtualization to run it as a guest OS.

While not everything on RHEL 5.4 is the latest release, it does provide the kind of system and support that will make management comfortable.

Subscribers to RHEL 5 will get the updates automatically for free. New subscribers will pay about $349 for the server version and prices start at $80 for the desktop version.

Share

XML Flaws disclosed

http://www.sdlinux.com/2009/08/xml-flaws-disclosed/

Share

OpenSuSe hardware failure

I got a call from a customer asking for help with a subversion server that suddenly went off-line. The system was running SuSe 10.1, apache2 and subversion, and almost nothing else. It had been operating for 2 years in the server room, in a headless configuration. The developers that used it connected with their Windows machines and everything worked flawlessly for two years. Suddenly, last week, it became unresponsive. They tried rebooting the server, nothing worked. nothing had changed in the configuration, and since this machine was only accessible in the LAN, and protected by a firewall, they hadn’t even done any upgrades. Their attitude could best be described “if it’s not broken, don’t fix it”. 8-)

Before I was called in, the customer had installed a new ethernet card, on the assumption that the on-board card had failed. They came to that conclusion because they couldn’t ping out, and no other machines could ping the server. A good assumption, I think. However, the new ethernet card didn’t fix the problem. That’s when they called me in. ;-)

We went through the configuration with a fine tooth comb. We disabled Network Manager, since I’ve known it to cause issues on some systems. Everything seemed fine, but the network still didn’t work. The network configuration was static. So we changed to using dhcp to get an address. This worked, it got an address, but we still couldn’t ping either direction, from the server or to the server. We looked for alternate kernels we could boot, to see if it was a kernel specific problem, however there was only one kernel available. As I recall there wasn’t even a failsafe option in the boot menu.

As part of the diagnostic process we put a laptop on the wired network and tried pinging it from the server. No response. we checked the ARP tables and there was an entry for the server we was working on, with it’s IP and MAC addresses. We concluded that the server could “talk” but couldn’t “hear”. We checked for firewall rules that might be blocking. There were none. We checked the arp tables on the server, and there were entries, without host names, for other machines on the network. None of us had see this before. The plot thickens. 8-)

So next we decided to try a Knoppix live CD to test the hardware. we figured if Knoppix worked, then there was a software configuration problem. Sure enough, Knoppix booted and networking was working. We could ping, other machines on the network could ping the server. So we concluded that it must be something in the software. :-(

We again, went through the network configuration files. Suspecting that there may have been some corruption in the files, we manually rewrote them from scratch. We thought there was a possibility of unprintable characters in the configuration files that was causing them not to work properly. We flushed the firewall rules for good measure with iptables -F just to make sure there were no rules, that were not being reported by iptables -L. Still no change. :-(

Still operating under the assumption that there was a configuration errors somewhere, we decided to try an upgrade, to OpenSuSE 11.1. We backed up all the configuration files and data to an external USB drive. Then we did an upgrade. Upgrading is a pretty slow process, relative to a fresh install. Nothing changed. :-(

Next we tried a fresh install, no change. Then I tried another distributions, CentOS 5. Still no change. Finally as my frustration mounted, I did ifup eth0 and a root console popped up displaying an error message. “…. irq #66 disabled” So we googled the error and found that this kind of error appears when there are spurious interrupts on the bus. After a couple more tests we concluded that there was some kind of hardware problem, that we hadn’t seen before. Since the system was under warranty, we contacted the manufacturer and they replaced the motherboard. After a fresh install, the system came up and ran perfectly. 8-)

This is one of the things I both love and hate about this business. Everyday there is a new problem to be solved that we’ve never seen before. I hope this saves someone else some time diagnosing their hardware problems.

Share

Moving CentOS to new hardware

I’ve been working on a customers system, upgrading the motherboard and memory for better performance and reliability. We replaced an MSI motherboard with a SuperMicro Server board. When I tried to boot up the system after installing the new hardware, I got the following message in the screen.


uncompressing linux
OK booting the kernel
Red Hat nash version 4.2.1.10 starting
Ext3-fs: unable to read superblock
mount error 22 mounting ext3
mount error 2 now mounting none
switchroot: mount failed 22
umount /initrd/dev failed 2
Kernel panic not syncing
Attempted to kill init

It’s been a while since I had issues like this. So I went looking through Google for answers. Isn’t that what everyone does these days? Everything search showed seemed to indicate a problem with the the raid. So I tried to look for a problem with the raid partitions. First I used Knoppix. I found a web site Recover Data From RAID1 LVM Partitions With Knoppix Linux LiveCD I followed the instructions, using mdadm –examine –scan to examine the partitions and manually set up the raid. Everything worked fine. I could see the raid partitions, I could mount them and view the data. cat /proc/mdstat showed the proper raided partitions. So what’s the problem?

Next I tried the rescue option of the Centos CD. The system booted fine, and the rescue mode dutifully mount all the partitions in the right order to /mnt/sysimage. But still no love when I tried to boot. Next I got on irc and talked to some of my friends from KPLUG.

One suggested trying to boot to the grub prompt and running boot options by hand, using tab completion. He suggested that perhaps the system wasn’t recognizing the drive. It seemed plausible, so I tested it. But the system seemed to be finding the /boot partition just fine and tab completion found the init ram disk fine.

Another suggested that the failure of the system to recognize the drives was caused by not loading the proper controller module. This had not occurred to me, because I thought I remembered that all the modules for hard drive host controllers were built into the kernel. At least that’s the way I used to build them. I remembered wrong. My friend suggested that I look at /etc/modprobe.conf and check the scsci_hostadapter parameter. So I booted into the Centos CD rescue mode, as the system booted I noticed the message loading ata_piix. This was the clue I had been missing. I edited the /etc/modprobe.conf file and changed the reference there. When I attempted to reboot the system the raid partitions were recognized and the system finished booting.

Once upon a time, modules for hard drives were included in the kernel, not loaded at boot time. Now with the size of kernels getting very large, nearly everything is loaded at boot. That’s fine for things like ethernet and sound cards. It’s always been a point of pride with me, that we could take the hard drives from one system, and install them into another system and it would just run. I guess that’s no longer true.

IMNSHO this is not progress.

Share

Lenny ldap problem solved

After some poking around I found the solution to my ldap problem. I started reading the log files, turned on debugging in the init.d file and determined ldap was not starting. I started reviewing and checking the slapd.conf file and noticed that I was missing the samba.schema file, previously named samba3.schema.

So I started looking for it on my system. Not to be found. Not where it belonged in /etc/ldap/schema not in /usr/share/docs/samba. I used find
to search my system, looking for the schema file. In Debian, even if you install packages that support the ldap authentication method for samba, you don’t get the samba.schema file necessary to make it work. You have to install samba-doc and then move or copy the samba.schema file to the right place to make it work.

I installed the samba-doc package and the schema file in /usr/share/doc/samba-doc/examples/LDAP I have to question what the Debian developers were thinking when they put the schema file in the samba-doc package. I didn’t install most of the optional doc packages, because my space is a little tight and I am trying to conserve it the best I can.

Once I found the samba.schema file and put it in the right place, I used ldapadd to install my ldif file and restarted slapd. After that, everything worked fine and samba is up and running as it was before I did the upgrade to Lenny.

Onward ……………….

Share