PS1 IPP Czar Logs for the week 2014-01-13 - 2014-01-19

(Up to PS1 IPP Czar Logs)

Monday :2014-01-13

Tuesday : 2014-01-14

  • 11:00 - 11:45 Bill restarted pstamp and update pantasks. Cleared some dependents that were not updating for various reasons.

Wednesday : 2014-01-15

Thursday : 2014-01-16

  • 16:20 Bill: ippsky is running staticsky on the sas fields. ~ippsky/staticsky pantasks is on ippc04.
    • 1 x hosts off compute3 in stdscience to account for the staticsky usage
  • 22:50 MEH: looks like something is wrong, no new nightly is moving forward.. chip stage taking >10ks to fault --
    • looks like detrends missing from /data/ipp032.0/ since remapped onto stsci nodes which must be down, same for /data/ipp031.0
    • neb-host down for both and now some making it through

Friday : 2014-01-17

mark is czar

  • 05:00 Bill: set ippc47 - ippc62 to off in ~ippsky/staticsky
  • 05:30 Bill: distribution jobs are mostly failing because they need files on the stsci nodes that have been shut down. Setting revert tasks there to off to avoid retrying
  • 07:30 MEH: nightly mostly through, ippc47-c63 off in all pantasks
    • remaining diffs also trying to find files on stsci03-05, set
  • 13:00 Bill: started skycal processing in staticsky
  • 13:22 Bill: survey.add.lapgroup LAP.ThreePi?.20130717 in stdscience. This will find the sets of lap runs that are ready for staticsky processing.
  • 19:50 MEH: cleaning up the host on/off usage for rsyncs and rawOTA scans..
  • 20:18 Bill: set stsci03-05 to repair turned on distribution revert and reverted the faulted diffs
    • MEH: as well as diff.revert.on
  • 21:40 MEH: looks like ipp046 crashed ~10 min ago -- nothing on console, multiple power cycle attempts no response. set power off, neb-host down from repair, take out of processing

Saturday : 2014-01-18

  • 10:55 MEH: MOPS stamps could use some more nodes, restarting pstamp and adding compute3 (shouldn't be problem with the extra staticsky, 2x running now)
    • update had hanging jobs from ipp046 down last night, cleared
  • 11:10 MEH: clearing diffims
    • two fault 5 from input psf nan --
      515304 	skycell.2606.047 	ThreePi.WS.nightlyscience 	
      515319 	skycell.2606.047 	ThreePi.WS.nightlyscience
    • 3PI WWdiff diff_id 514772 stuck advancing, 3PI WSdiff diff_id=513916 stuck distribution -- both because cleaned/PSS updated from 1/10-11

Sunday : 2014-01-19