PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

Continuing HAF suggestion to improve communication, so we know what's going on better -- a list in the czar pages additional (non-standard) processing - so that we all know what's going on.

Daily Czaring:

  • currently there is a modified ops tag running diffs (WS labels only) as ippqub (was ippmops) under ~ippqub/src/stdscience_ws on ippc06 -- if problems (Njobs>100k, power loss on ippc06 etc), it will need to be restarted like a normal nightly processing pantasks (or IF ANY OTHER ISSUES LIKE THE ~ipp pantasks)
    ./ stdscience_ws
    • even with the modified ops tag, there are still various files MIA in nebulous and as such requires the daily czar to check and clear them as has been discussed before
  • ps_ud_QUB has also been moved to the ippqub:stdscience_ws pantasks to support updates possibly broken by missing cmf files, chip and warp updates will also be done in that pantasks as well

(Up to PS1 IPP Czar Logs)

Monday : 2015.11.09

  • 13:57 MEH: looks like ipp017 has been down for ~10hrs -- neb-host down from repair so jobs can continue w/o faulting, will try power cycle -- back up again, same unknown crash reason. neb-host repair again

Tuesday : YYYY.MM.DD

Wednesday : YYYY.MM.DD

Thursday : YYYY.MM.DD

Friday : 2015.11.13

  • 01:45 MEH: ipp017 down for past 14.5 hrs -- seems like same case of unknown crash -- power cycle and back up
    • not sure if crash and power cycle-not reboot are the same for leaving the power on and fire danger --
    • didn't seem to be email/nagios report of being down -- some machines missing that still?
    • neb-host repair still -- if this system goes will it stall nightly processing wanting detrends? if so, probably want neb-host down @night..
  • 09:23 MEH: doing the necessary regular restart of nightly pantasks
  • 10:00 MEH: clearing stalled diff from 11/08
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.1711.043 -diff_id 1276400  -fault 0

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD