Occasionally, the Nebulous mount monitoring daemon called nebdiskd will generate emails that look like this:

From: nobody <nobody@alala.ifa.hawaii.edu>
Date: Tue, 11 Aug 2009 12:50:18 -1000
To: ps-ipp-ops@ifa.hawaii.edu
Subject: [ps-ipp-ops] nebdiskd alert

2009-08-11 12:50:18 | alala | WARN | main::poll_mounts - retrying test of /data/ipp022.0
_______________________________________________
ps-ipp-ops mailing list
ps-ipp-ops@ifa.hawaii.edu
http://pan-starrs.ifa.hawaii.edu/mailman/listinfo/ps-ipp-ops

These emails mean that the nebdiskd daemon running on the host alala tried to call statfs(2) on the autofs/NFS mount point /data/ipp022.0 and the system call returned with an error. This means that the system exporting the volume is down or (more often) that some sort of transitory NFS glitch has happened. By default, nebdiskd will retry 3 times before giving up on a mount point (note: the # of retires is configurable in the .nebdiskdrc file). There is currently no mechanism to limited the total number of warnings or errors sent by the daemon. This is intentional so we have a trace of NFS glitches. In the event that a system goes, these messages will be sent once every nebdiskd poll interval. To silence these messages when a host has gone down for a long period of time it will need to be set to available == 0 in Nebulous. An example of doing this with the neb-voladm utility is:

neb-voladm --user nebulous --pass XXXXXXXX --db nebulous -vhost ipp022 --available 0

Current locations that nebdiskd is running:

  • MHPCC -> ippdb02
  • Manoa -> alala

Nebulous procedure when Degraded Array is detected

While the array is rebuilding it should be placed in a state of repair for the IPP:

> neb-host ipp<num> repair

and back into regular service once the build is complete:

> neb-host ipp<num> up

These commands can be run as user ipp from any ipp node