PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : YYYY.MM.DD

Tuesday : YYYY.MM.DD

Wednesday : 2015-07-01

  • 16:20 CZW: Power outage at MRTC-B. Still recovering from hosts that refuse to reboot cleanly. From Haydn's email of hosts with issues:
    ippdb02
    ipp026
    ipp034
    ipp038
    ipp040: is missing since 2015-07-01 07:42:00 HST.
    ipp044: is missing since 2015-07-01 07:41:59 HST.
    ipp052: is missing since 2015-07-01 07:40:33 HST.
    ipp067: is missing since 2015-07-01 07:41:11 HST.
    ipp082: is missing since 2015-07-01 07:40:52 HST.
    ippc06: is missing since 2015-07-01 07:32:03 HST.
    ippc13: is missing since 2015-07-01 07:31:33 HST.
    ippc14: is missing since 2015-07-01 07:31:51 HST.
    ippc16: is missing since 2015-07-01 07:31:13 HST.
    ippc31: is missing since 2015-07-01 07:30:38 HST.
    ippc32: is missing since 2015-07-01 07:31:53 HST.
    ippc33: is missing since 2015-07-01 07:31:18 HST.
    ippc34: is missing since 2015-07-01 07:31:45 HST.
    ippc35: is missing since 2015-07-01 07:31:36 HST.
    ippc36: is missing since 2015-07-01 07:30:46 HST.
    ippc37: is missing since 2015-07-01 07:31:56 HST.
    ippc38: is missing since 2015-07-01 07:31:22 HST.
    ippc39: is missing since 2015-07-01 07:31:58 HST.
    ippc40: is missing since 2015-07-01 07:30:53 HST.
    ippc41: is missing since 2015-07-01 07:30:20 HST.
    ippc42: is missing since 2015-07-01 07:30:39 HST.
    ippc44: is missing since 2015-07-01 07:30:52 HST.
    ippc45: is missing since 2015-07-01 07:30:56 HST.
    ippc46: is missing since 2015-07-01 07:31:46 HST.
    ippc47: is missing since 2015-07-01 07:31:47 HST.
    ippc48: is missing since 2015-07-01 07:31:08 HST.
    ippc49: is missing since 2015-07-01 07:31:01 HST.
    ippc50: is missing since 2015-07-01 07:30:57 HST.
    ippc51: is missing since 2015-07-01 07:30:35 HST.
    ippc52: is missing since 2015-07-01 07:31:48 HST.
    ippc54: is missing since 2015-07-01 07:31:27 HST.
    ippc55: is missing since 2015-07-01 07:31:54 HST.
    ippc56: is missing since 2015-07-01 07:30:39 HST.
    ippc57: is missing since 2015-07-01 07:30:19 HST.
    ippc58: is missing since 2015-07-01 07:31:13 HST.
    ippc59: is missing since 2015-07-01 07:31:32 HST.
    ippc60: is missing since 2015-07-01 07:31:36 HST.
    ippc61: is missing since 2015-07-01 07:30:39 HST.
    ippc62: is missing since 2015-07-01 07:30:46 HST.
    ippc63: is missing since 2015-07-01 07:31:34 HST.
    ippx013: is missing since 2015-07-01 07:50:06 HST.
    ippx016: is missing since 2015-07-01 07:50:32 HST.
    ippx040: is missing since 2015-07-01 08:00:28 HST.
    stare00: is missing since 2015-07-01 07:31:00 HST.
    stare01: is missing since 2015-07-01 07:31:10 HST.
    stare04: is missing since 2015-07-01 07:30:41 HST.
    stsci16: is missing since 2015-07-01 07:40:35 HST.
    stsci17: is missing since 2015-07-01 07:31:01 HST.
    stsci18: is missing since 2015-07-01 07:31:56 HST.
    stsci19: is missing since 2015-07-01 07:40:13 HST.
    
  • 17:00 CZW: I've set ipp026, ipp034, and ipp038 to down in nebulous, as they are not likely to be up today.
  • 18:00 CZW: I've set ipp082 to down, as it doesn't seem to have the RAID visible. I've set ipp071 to repair, as it is serving NFS files, but is not accepting logins.

Thursday : 2015-07-02

  • 07:39 Bill: started up ~ippsky's staticsky pantasks. It's running the SAS updates with a fix to the problems with bad sources.
  • 11:38 Bill: earlier I shut down this pantasks because it was runnning into faults due to requiring data on /data/ipp082.0 which was not available. Haydn has replaced the raid card and it's back online. I have restarted this pantasks. Once the SAS staticsky runs finish I will start up skycal.
  • 13:16 Bill: staticsky and skycal updates for SAS have finished. The pantasks has been shut down.
  • 13:30 CZW: restarted replication pantasks to do shuffle to stsci nodes. ipp082 is back online, so the error rate should be minimal.

Friday : 2015.07.03

  • 10:20 MEH: restarting pstamp so QUB can get their stamps
  • 10:30 MEH: since restarting pstamp for high Njobs, check on and restarting stdsci as well because it will need it as well for tonight have ~100K Njobs

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD