PS1 IPP Czar Logs for the week 2014.10.27 - 2014.11.02

(Up to PS1 IPP Czar Logs)

Monday : 2014.10.27

  • 08:35 EAM : ipp035 crashed, rebooting (no message on console)
  • 08:45 EAM : stopping and restarting stdlocal
  • 11:00 CZW : restarted stdlanl.

Tuesday : 2014.10.28

  • 08:20 EAM : restarting stdlocal
  • 09:20 Bill : moved pstamp pantasks from the ipptest user to ipp. Started it up.
    • there is a large backlog of 316 old requests pending cleanup. I have stopped the parser for now to allow cleanup to catch up.
  • 12:50 CZW: ipps04 out of processing for Haydn to replace failed disks.
  • 13:35 CZW: ipps04 available again.
  • 17:30 CZW: I've set ipp082 and ipp077 to 'up' in nebulous. ipp082 is a bit cranky about this change, with a load in 130s. However, it seems to be responding well. Prior to this change, ipp075 and ipp076 were having high load issues. The system seems stable at the moment.
  • Earlier today CZW: I sent 71k PV2 warps to cleanup. My calculation says this translates to 214 TB of space that will become available in the next few days.
  • 18:05 CZW: ipp082 never dropped in load as I was hoping. I've put it back to repair for the evening.

Wednesday : 2014.10.29

  • 12:15 EAM : stdlocal sluggish, preparing to restart.
  • 13:33 EAM : after restarting stdlocal, Bill noted the stdstar reprocessing was done, so I took back the himem (ippsXX) nodes for stdlocal.

Thursday : 2014.10.30

  • HAF 20:00 - ipp008 is down - I set it to neb-host down, emailed gene + ipp-dev, I need help to reboot it
  • 20:30 MEH: disk access lost, Haydn rebooted -- set to neb-host repair

Friday : 2014.10.31

  • HAF 6:30 registration is stuck. not the usual way. investigating
  • HAF 6:40 ipp008 is doooowwwwnnn again (I set to neb-host down)
  • HAF 6:45 gremlins fixed registration.... seems to be moving again -- maybe related to neb-host down command? I don't know.
  • 11:10 EAM : stdlocal is getting slow (150k+ chips), so I'm restarting...
  • 14:00 CZW: stdlocal restarted.
  • HAF 17:00 registration stuck on first exposure, seems related to ipp008 being stupid again - I've removed ipp008 from registration, seems to be going again. I've also emailed to ask for help on ipp008
  • 22:10 MEH: notice ipp008 still on in stdsci.. since it has been behaving badly, setting them off

Saturday : 2014.11.01

  • 16:50 HAF : got a txt that ipp013 is down -- by the time i checked, it was already up. Thanks, mystery rebooter! :-)

Sunday : 2014.11.01