PS1 IPP Czar Logs for the week 2015.06.22 - 2015.06.28

(Up to PS1 IPP Czar Logs)

Monday : 2015.06.22

  • 01:30 MEH: processing rate is terrible ~20/hr.. not sure cause since stdsci looks like it needed to be restarted for sunday night.. doing now and pstamp as it needs it too. then will see if another problem if time before observing...
    • better but not normal, ipp079 seems to being harassed with high cpu wait conditions so try setting repair.. something extra over-running unmonitored?
  • 02:30 MEH: summitcopy 10% jobs are faults from 404 files from conductor issue on saturday start of night... -- probably just need to set to drop but no time to deal with
    --> then most to all of 0010

Tuesday : 2015.06.23

  • 09:35 Bill: restarted pstamp pantasks
  • 11:23 HAF: repeat diff failure, fixed: difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -fault 0 -skycell_id skycell.1326.001 -diff_id 1166562
  • 16:08 HAF: restarting pantasks for tonight

Wednesday : 2015-06-24

  • 12:15 CZW: set ipp094 and ipp097 off in stdscience. They seem to know how to automount ippb06.2, they have an automounter running, but they do not automount ippb06.2. I suspect this is some transient weirdness, as this would have been noticed before now. They symptom is that they are attempting to do a WS diff, failing to find the primary copy of the stack that is on b06, and failing with fault 2. I've also set WS diffs that were repeatably not completing with fault 5 to a bad quality to clear them out.
  • 15:00 CZW: Pushing the phase 1 shuffle (off stsci) somewhat harder to try and make sure that we have free space for the incoming data.
  • 20:50 EAM: stopping and restarting ipp pantaskses.

Thursday : 2015.06.25

  • 08:37 Bill: started up staticsky as ~ippsky using x0 and x1 hosts

Friday : 2015.06.26

  • 06:45 MEH: once nightly finishes, will start regular restart of nightly pantasks including pstamp
  • 08:45 MEH: two broken exposures since last saturday still failing and wasting ~10% of the summitcopy cycles.. dropping..
    pztool -dbname gpc1 -updatepzexp -inst gpc1 -telescope ps1 -set_state drop -summit_id 927403 -exp_name o7194g0010d
    pztool -dbname gpc1 -updatepzexp -inst gpc1 -telescope ps1 -set_state drop -summit_id 927402 -exp_name o7194g0009d

Saturday : 2015.06.27

  • 03:00 MEH: ipp077 is being overloaded by something -- putting into repair
    • then ipp072, then ipp081.. something is over-running again during nightly processing..
  • 03:55 MEH: ipp017 unresponsive.. powercycle and turn off in processing -- will try to deal with is while observing...
    • stalled with -- F1 seemed to clear
      Event Log messages, enter Setup to view                                         
      0211: Keyboard error                                                            
      Press <F1> to resume,  <F2> to Setup 
  • 07:35 MEH: clear stalled warp
    warptool -dbname gpc1 -updateskyfile -set_quality 42 -skycell_id skycell.1219.087 -warp_id 1597779  -fault 0
  • 16:55 MEH: while stdsci may be ~okay for night with ~50k jobs, doesn't hurt to spend a minute to just restart so queue remains fully loaded for nightly processing

Sunday : YYYY.MM.DD