PS1 IPP Czar Logs for the week 2011-08-15 - 2011-08-21

(Up to PS1 IPP Czar Logs)

Monday : 2011-08-15

  • 15:39 Serge. Dropped 157 exposures set to be published by mistake:
    mysql> SELECT count(*) FROM publishRun where state='new' AND label='ThreePi.nightlyscience';
    mysql> UPDATE publishRun SET state='drop' WHERE state='new' AND label='ThreePi.nightlyscience';
    mysql> SELECT count(*) FROM publishRun where state='new' AND label='ThreePi.nightlyscience';

Bill is czar today

  • investigated why a publishRuns were stuck (before Serge dropped them). It is related to the code to do "off night diffs".
  • Investigated a faulted MD10.GR0 publishRun. Bug in destreak script. Dropped the run.
  • 17:18: No faults on czartool M31 reference stacks are taking > 8000 seconds per skycell.

Tuesday : 2011-08-16

Bill is czar today

  • Gene reports that some of the M31 stacks are using huge amounts of memory. These are i band skycells near the M31 core. We've killed off a few
  • turned hosts off in stdscience for ippc22 ippc24 ippc27 ippc28 to dedicate them to stacking
  • added stare nodes to stack
  • 11:05 fixed corrupted chip --chip_id 271038 --class_id XY53
  • 11:16 fixed corrupted diff --diff_id 154025 --skycell_id skycell.7.41
  • 13:15 removed M31.refstack.20110812 label from stack. We're going to run the 4 bloated skycells in a separate pantasks with only the stare nodes enabled.
  • 13:39 set stare hosts to off in stack
  • 14:23 remove stare hosts from stack. last m31 refstack process is swapping in it's 19GB of memory to check for leaks
  • 15:27 M31 stacking moved to new pantasks ~ipp/stack.m31. only the 4 stare nodes are enabled.

Wednesday : 2011-08-17

  • 09:05 Serge End of IPP-MOPS-TEST datastore cleaning: 107GB recovered, 19344 objects deleted
  • 14:25 heather turned off chip to finish nightlyscience
  • 15:20 Serge End of IPP-MOPS datastore cleaning: 131GB recovered, 31207 objects deleted
  • 17:37 heather did various warptool -revert and warp.reset until the nightlyscience warps finished. chips are now back on.

Thursday : 2011-08-18

  • 08:52 Bill reran --diff_id 155053 --skycell_id skycell.1399.083 because one of the output files was corrupted.
  • 08:52 Bill reverted some M31 stack faults and publish faults from various labels.
  • 9:15 heather did - all nightlyscience is through chip, but other non-nightlyscience labels are not.
  • 14:30 heather did chip.on and reverted some failed 3pi publishing

Friday : 2011-08-19

  • 17:23 Mark: last LAP chip kept reverting because zero size file. Copied over good version.
     ls -l /data/ipp016.0/nebulous/96/a2/1083178717.gpc1:20110607:o5719g0290o:o5719g0290o.ota24.fits
    -rw-rw-r-- 1 apache 0 Jul 11 11:15 /data/ipp016.0/nebulous/96/a2/1083178717.gpc1:20110607:o5719g0290o:o5719g0290o.ota24.fits
    cp /data/ipp053.0/nebulous/96/a2/984881980.gpc1:20110607:o5719g0290o:o5719g0290o.ota24.fits  /data/ipp016.0/nebulous/96/a2/1083178717.gpc1:20110607:o5719g0290o:o5719g0290o.ota24.fits
  • 21:45 Mark: seemed to be a pileup at warp., no warps running, warp.reset, warp.on. warps going.

Saturday : 2011-08-20

  • 18:00 Mark: added new reduction class for unconvolved, deep stacks (DEEP_QUICKSTACK) to make set of just unconvolved refstacks for MD01.
  • 18:30 Mark: MD01.GR0 warps seem stuck., no warps running, warp.reset, warp.on and remaining 3PI warp running now but not MD01.GR0. The first missing 3PI exposure noted by Denver, o5793g0063o, seems to have a skycell stuck in warp as well.

Sunday : 2011-08-21

  • 22:00 Mark: publishing pantasks server not running, restarted.