PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : 2013.03.18

mark is czar

  • 01:00 MEH: putting 1x compute3 back on in deepstack for the staticsky outside of the GP for now. leaving normal allocation of 1x compute3 in stack now that mostly caught up, will have nightly science stacks to do. only 2/4x compute3 in stdscience manually for now.
  • Serge looked at mysql on ipp0012 and ippdb02, restarted and slowly catching up.
  • 15:20 MEH: odd, processing almost came to halt.. looks like stdscience pantasks is underloaded.. slowly coming back up. might as well do regular pantasks restart

Tuesday : 2013.03.19

mark is czar

  • 00:30 MEH: datastore having intermittent proxy errors
  • 07:05 MEH: nightly science still processing, turning to push the nightly warps out. looks like an OSS and a few other publishing jobs had fault and revert not automatic.
  • 09:20 MEH: warps+diffs only running, something limiting processing (stsci06/07 over-targeted?) -- 3PI nightly now mostly finished, 3PI WSdiff a good number to do still
  • 11:15 MEH: 3PI WSdiff finished
  • 11:20 MEH: ippc17 is down? appears so, no info on console. power-cycled and restarted pantasks
  • 11:50 MEH: stopping stdscience for normal restart and will do ipp020 reboot now nightly science is finished
  • 12:00 MEH: ipp020 has note in neb-host as problem machine to be in repair, but was neb-host up.. crash on 3/15 was similar to before when not in repair so placing back into repair
  • 15:30 Serge: ippc05 and ippc06 are both down. Power-cycled. Nothing but the prompt on the console.
  • 15:35 Serge: pantasks: stop, shutdown, start.server, setup, hostoff ipp020, run. Missing stack and detrend since c05 and c06 are recovering
  • 16:10 Serge: stack and detrend restarted
  • 18:30 MEH: stdscience pantasks in odd state, restarting. update in same state, setup not run?
  • 22:15: MEH: looks like ippc17 crashed again ~1hr ago, looking into -- nothing on console, rebooting again..
    • date came up wrong again on power-cycle. going to move pstamp and update to another host -- pstamp+update on ippc06 for now, detrend off and deepstack off
    • manually set date to w/in ~1sec (as per Bill's email in the morning, need to do this before starting pstamp where ever it is)

Wednesday : 2013.03.20

  • 09:30 and later Bills has disabled dvodist access because it is overloading the apache server some way and causing data store errors.
  • 09:50 Bill started pantasks as bills from ~bills/relgroup that is simply looking for completed lapGroups and queuing staticsky runs. The pantasks is running on ippc17
  • 10:15 Bill started deepstack pantasks
  • 11:05 Bill restarted apache on ipp017 with the number of clients reduced to 127 This is configured in /etc/apache2/modules.d/00_mpm.conf
  • 11:50 Bill stopped apache on ippc02 moving /tmp/nebulous log to a partition that is not full
  • 3:50pm heather added ssp to summitcopy
  • 16:33 restarted update and pstamp pantasks because they were down ippc06 crashed and was rebooted.
  • 22:38 The postage stamp server pantasks is not running. Looks like ippc06 crashed again. Moving pstamp and update back to c17.

Thursday : 2013.03.21

Bill is czar today

  • 07:28 Bill recovered missing raw file o5516g0679o.ota26.fits using
  • 07:34 stopping stdscience for periodic restart
  • 07:49 stdscience restarted
  • ~10:30 some lap runs are stalled because some chipRuns have been set to be cleaned but they are needed for open Lap runs. Changed the state to 'full' seems to not have caused problems.

Friday : 2013.03.22

  • 02:35 Bill set 495 LAP chip runs to be updated. label is bills.20130322
  • 17:12 Bill queued 100 r band stack runs with label bills.20130322. Set priority to be less than LAP so they won't run until LAP stacks are done.
  • 17:55 MEH: re-reprocessing for MD starting for exposures before 4/2012 faintend bias fix. MD04.pv1.20130322 first
  • 23:04 Bill set 14 LAP chip runs that had warp runs in state update to be updated. (They've been that way since this afternoon)

Saturday : 2013.03.23

  • 12:10 MEH: stdscience could use its regular refreshing restart
  • 14:50 MEH: LAP will not finish stacking because warps are still causing trouble (some set to full were actually already cleaned) -- cleanup off and fixing..
  • 17:15 Bill: set my g, z, and y band chips to correct label. Queued i band warps.
  • 17:40 MEH: still slowly working through the conflicting LAP states
  • 18:00 MEH: LAP queue list cleared -- cleanup on
  • 21:20 MEH: continuing with MD07.pv1.20130323

Sunday : 2013.03.24

  • 13:10 MEH: again with the stdscience regular restart. fixing MD07 missing chip problems..
  • 15:25 chip.revert.on
  • 17:30 MEH: MD10.pvt.20130324 ready to start