PS1 IPP Czar Logs for the week 2011-08-15 - 2011-08-21

(Up to PS1 IPP Czar Logs)

Monday : 2011-08-15

  • 15:39 Serge. Dropped 157 exposures set to be published by mistake:
    mysql> SELECT count(*) FROM publishRun where state='new' AND label='ThreePi.nightlyscience';
    157
    mysql> UPDATE publishRun SET state='drop' WHERE state='new' AND label='ThreePi.nightlyscience';
    mysql> SELECT count(*) FROM publishRun where state='new' AND label='ThreePi.nightlyscience';
    0
    

Bill is czar today

  • investigated why a publishRuns were stuck (before Serge dropped them). It is related to the code to do "off night diffs".
  • Investigated a faulted MD10.GR0 publishRun. Bug in destreak script. Dropped the run.
  • 17:18: No faults on czartool M31 reference stacks are taking > 8000 seconds per skycell.

Tuesday : 2011-08-16

Bill is czar today

  • Gene reports that some of the M31 stacks are using huge amounts of memory. These are i band skycells near the M31 core. We've killed off a few
  • turned hosts off in stdscience for ippc22 ippc24 ippc27 ippc28 to dedicate them to stacking
  • added stare nodes to stack
  • 11:05 fixed corrupted chip --chip_id 271038 --class_id XY53
  • 11:16 fixed corrupted diff --diff_id 154025 --skycell_id skycell.7.41
  • 13:15 removed M31.refstack.20110812 label from stack. We're going to run the 4 bloated skycells in a separate pantasks with only the stare nodes enabled.
  • 13:39 set stare hosts to off in stack
  • 14:23 remove stare hosts from stack. last m31 refstack process is swapping in it's 19GB of memory to check for leaks
  • 15:27 M31 stacking moved to new pantasks ~ipp/stack.m31. only the 4 stare nodes are enabled.

Wednesday : 2011-08-17

  • 09:05 Serge End of IPP-MOPS-TEST datastore cleaning: 107GB recovered, 19344 objects deleted
  • 14:25 heather turned off chip to finish nightlyscience
  • 15:20 Serge End of IPP-MOPS datastore cleaning: 131GB recovered, 31207 objects deleted
  • 17:37 heather did various warptool -revert and warp.reset until the nightlyscience warps finished. chips are now back on.

Thursday : 2011-08-18

  • 08:52 Bill reran --diff_id 155053 --skycell_id skycell.1399.083 because one of the output files was corrupted.
  • 08:52 Bill reverted some M31 stack faults and publish faults from various labels.
  • 9:15 heather did chip.off - all nightlyscience is through chip, but other non-nightlyscience labels are not.
  • 14:30 heather did chip.on and reverted some failed 3pi publishing

Friday : 2011-08-19

  • 17:23 Mark: last LAP chip kept reverting because zero size file. Copied over good version.
     ls -l /data/ipp016.0/nebulous/96/a2/1083178717.gpc1:20110607:o5719g0290o:o5719g0290o.ota24.fits
    -rw-rw-r-- 1 apache 0 Jul 11 11:15 /data/ipp016.0/nebulous/96/a2/1083178717.gpc1:20110607:o5719g0290o:o5719g0290o.ota24.fits
    
    cp /data/ipp053.0/nebulous/96/a2/984881980.gpc1:20110607:o5719g0290o:o5719g0290o.ota24.fits  /data/ipp016.0/nebulous/96/a2/1083178717.gpc1:20110607:o5719g0290o:o5719g0290o.ota24.fits
    
  • 21:45 Mark: seemed to be a pileup at warp. warp.off, no warps running, warp.reset, warp.on. warps going.

Saturday : 2011-08-20

  • 18:00 Mark: added new reduction class for unconvolved, deep stacks (DEEP_QUICKSTACK) to make set of just unconvolved refstacks for MD01.
  • 18:30 Mark: MD01.GR0 warps seem stuck. warp.off, no warps running, warp.reset, warp.on and remaining 3PI warp running now but not MD01.GR0. The first missing 3PI exposure noted by Denver, o5793g0063o, seems to have a skycell stuck in warp as well.

Sunday : 2011-08-21

  • 22:00 Mark: publishing pantasks server not running, restarted.