PS1 IPP Czar Logs for the week 2013.01.14 - 2013.01.20

(Up to PS1 IPP Czar Logs)

Monday : 2013.01.14

  • 00:25 MEH: removing stare from stdscinece for lap_staticsky_notgp may be too many nodes lost for stdscience to keep up with observations
  • 09:05 Bill: set pantasks lap_staticsky_notgp to stop. The galactic plane skycells are done and we can resume standard operation of deepstack
  • 09:?? Bill: set label for the staticsky jobs with label LAP.ThreePi?.20120706.gp back to LAP.ThreePi?.20120706
  • 11:48 Bill: there are 6 staticsky jobs that are at the "write the outputs" step but haven't made progress since ~9:15-9:30
  • 13:40 Bill: the staticsky jobs were all stuck in a soap call trying to create their output cmf files. Killed off the processes
  • 13:45 Bill: started deepstack pantasks with 3 x compute3 nodes
  • 15:30 MEH: throwing in the 2x wave4 to deepstack for staticsky as well
  • 20:25 MEH: removing 2x wave4 from deepstack to run in other pantasks to redo MD staticsky
  • 23:05 MEH: looks like ipp013 crashed ~21:33 with GPF ipp013-crash-20130114, power cycle

Tuesday : 2013.01.15

  • 10:30 MEH: stealing stare nodes from stdscience (turned off), since it isn't doing anything, for MD staticsky. might as well do all of wave4 as well..
  • 14:25 MEH: some odd missing files from ipp028 found today, Gene looking at links to fix. stdscience+cleanup stopped, chip.revert.off for a bit. Gene fixed and MOPSreq.czw chips moving through
  • 15:30 Serge: Stopped cleanup so that ippdb02 can catch up with its master
  • 16:30 Serge: Queued a bunch of diffs to be reprocessed for mops (warp label = czw.mopstest.rerun)
  • 18:50 MEH: returning wave4,stare to stdscience -- keeping the 2x wave4 from stack however for continued running of MD staticsky

Wednesday : 2013-01-16

Bill is czar today

  • 06:55 Serge: ippdb02 caught up -> Restarted cleanup
  • 09:30 set deepstack pantasks to stop. There are a couple of skycells in 267 < RA < 303 abs(glat) < 15 that need to run so I'm going to reduce the number of hosts and change the label.
  • 10:40 Determined that there are also runs with abs(glat) > 15 that need to run so decided to let those run first (along with the others queued with ra < 267. We'll rerun the galactic plane skycells later, hopefully after the psLib performance problem gets fixed.
  • 11:50 set all pantasks to stop. Updated psImageInterpoate.c. Rebuilding tag starting with psLib
  • 12:01 restarted stdscience, distribution, pstamp, and update pantasks. Set others to run
  • 14:31 restarted deepstack with 1 x compute3 and label LAP.ThreePi.20120706.gp 409 skycells to process
  • 14:40 Gene,Serge fixing some access issues on ipp028-030 data
  • 20:00 Bill restarted summit copy and registration pantasks

Thursday : 2013-01-17

  • 16:34 Bill added 2 x compute3 hosts to pstamp to try and help with the MOPS backlog. Since pstamp jobs generally don't use much memory this shouldn't bother the peak culling - I mean psphotStack - jobs running on those nodes.
  • 20:25 Bill reset the big pstamp users to priority 495 and bumped up WEB.UP to it's nominal value (higher than everyone but WEB) MOPS is being a pig today
  • 22:30 Bill removed compute3 nodes from pstamp

Friday : 2013-01-18

  • 09:20 nightly science is a bit behind because summit copy is a bit behind. The last science exposures just finished downloading
  • 09:25 Bill turned off the skycal survey task. We need to rerun all of the LAP skycals.
  • 11:44 Bill stopped processing in preparation for rebuilding psastro, ippconfig, and ippTools
  • 11:50 Bill processing restarted including new LAP skycal runs
  • 12:48 Bill restarted deepstack pantasks with 3 x compute3 nodes. Now working on area between 8 - 10 hours ra dec < 70. 1077 skycells to do. Restarting pstamp pantasks without the compute3 nodes
  • 13:04 Bill stdsci06 is being a big sluggish today. Very slow to respond to nfs file operations. Only cleanup, pstamp, and deepstack are doing much at the moment. set cleanup pantasks to stop.

Saturday : 2013-01-19

  • 20:05 Bill restarted stdscience because it hasn't been done in awhile

Sunday : 2013-01-20