PS1 IPP Czar Logs for the week 2014.07.07 - 2014.07.13

(Up to PS1 IPP Czar Logs)

Monday : 2014.07.07

  • 12:00 Bill: turned parser and finish tasks off in pstamp in prepartion for restart and cleanup of ps_ud labels
    • 12:04 all jobs waiting for updates have finished. Set chip, warp, and diff runs with label like ps_ud% to be cleaned
    • 12:23 pstamp pantasks restarted
  • 14:40 EAM : had a failing warp, set to bad quality:
    warptool -dbname gpc1 -updateskyfile -fault 0 -set_quality 3006 -warp_id 976390 -skycell_id skycell.0698.007
  • 20:40 CZW: issued "run" command to all pantasks to get them going for nightly processing.

Tuesday : 2014.07.08

  • 07:50 EAM : the chiller at MRTC-B failed in the night and Haydn had to shut down the whole cluster. machines are slowly coming back up now.
  • 13:00 EAM : gavin brought back up the machines which could be power-cycled from Manoa. ippdb00 had trouble starting mysql : it needed to recover from the binlogs, and this was slow, but eventually succeeded. jaws machines needed to be manually booted by Haydn with some difficulty. ipp030 also gave some trouble. I initially proposed that we move ippdb01 as well today, but it took so long for Haydn to get past security that we gave up on that plan for today. At this point, processing is running, but there are lots of fault, probably due to block nebulous files. I need to clear these out before we will be in reasonable shape.

Wednesday : 2014.07.09

  • 02:00 Bill : stdscience pantasks has apparently not been running since 22:45 or so last night. Nothing in pantasks.stderr.log except the startup messages. panatsks.stdout.log is dated 22:45. Last like in pcontrol log is "caught parent shutdown"

  • 11:30 EAM : setting bad quality for repeated failure:
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.1143.040 -diff_id 576076 -fault 0

Thursday : 2014.07.10

  • 20:15 MEH: using compute3 for a little while since mostly idle except for pstamp jobs

Friday : 2014.07.11

mark is czar

  • 10:30 MEH: cleared though updates some long stalled (late June) 3PI diffims from inputs being cleaned
  • 15:55 Haydn replacing/adding power supply in ippb03 -- down for just a few minutes
  • 23:30 MEH: using the idle compute3/c2+stare nodes for MD staticsky+stack reprocessing

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD