PS1 IPP Czar Logs for the week 2012.04.23 - 2012.04.29

(Up to PS1 IPP Czar Logs)

Monday : 2012.04.23

Mark czar tagging in for Roy

  • 08:08 Bill: two chip_imfile processes have been running since around midnight making no progress. Killed the processes and reverted the faults. They quickly finished. These are the case of the 2 three pi exposures that haven't finished.
  • 09:20 Mark: covering for Roy, stare03 out of processing for disk install
  • 11:15 stare03 disks installed, back into processing.
  • 12:00 with ganglia DOA, manually keeping eye on systems while MD08.GR0-z chip-warp runs with
    ~ipp/who_uses_the_cluster/sshToNodes.py ipp 'uptime'
    
  • 17:10 going to restart stdscience once MD08.GR0-z finished, pcontrol running 100% cpu but still peaked @100 chip exposures/hr. will queue up MD08.GR0-y afterwards to run along nightly science and finish up.
  • 20:30 stdscience dragging even after restart, try restart again. looked to be a pileup of jobs wanting ipp039.
  • 22:30 with the new nodes, able to run MD08.GR0-y processing alongside nightly science and finish ~500 exposures chip->warp before morning.

Tuesday : 2012.04.24

Mark is czar

  • 07:00 nightly science still downloading, full night of observations but seems slower than normal. been seeing chip_imfile/chip stage taking >1ks on various machines, also seems longer than normal -- look like some crowded fields.
  • 09:30 nightly science 99.9% done
  • 10:45 destreak_restore has been holding up the remaining 3PI nightly science and time sensitive MD SSdiffs from distribution.. setting destreak.revert.off until nightly science out.
  • 11:15 nightly science distribution finished, flipping destreak.revert.on
  • 14:30 MD08.GR0 is supposed to have >3k stacks running..

Wednesday : 2012-04-25

Serge is czar

  • 06:00 Mark: registration stuck ~half the night, had to run
    regtool -updateprocessedimfile -exp_id 480743 -class_id XY03 -set_state pending_burntool -dbname gpc1
    
  • 09:20 Serge: Started shuffling in replication pantasks. Still about 50 exposures before warp.

Thursday : 2012-04-26

  • 07:50 Mark lending a hand: pileup in distribution again, destreak.revert.off. ipp021 suffering with load >100, looks like ppsub using 90% ram so killing. may also be a bit overloaded with dvo_client running. MD05 chip on ippc53 and 3PI on ipp021 been running for a bit so killing -- MD05 looks shallow, the 3PI is somewhat crowded.
      0    ippc53    RESP  40730.40 0.0.0.711  0 chip_imfile.pl --threads @MAX_THREADS@ --exp_id 481050 --chip_id 444542 --chip_imfile_id 26497741 --class_id XY72 --uri neb://ipp049.0/gpc1/20120426/o6043g0052o/o6043g0052o.ota72.fits --camera GPC1 --run-state new --deburned 0 --outroot neb://ipp049.0/gpc1/MD05.nt/2012/04/26//o6043g0052o.481050/o6043g0052o.481050.ch.444542 --redirect-output --dbname gpc1 --verbose
      1    ipp021    BUSY   8301.13 0.0.1.8827  0 chip_imfile.pl --threads @MAX_THREADS@ --exp_id 481497 --chip_id 444973 --chip_imfile_id 26523571 --class_id XY33 --uri neb://ipp021.0/gpc1/20120426/o6043g0499o/o6043g0499o.ota33.fits --camera GPC1 --run-state new --deburned 0 --outroot neb://ipp021.0/gpc1/ThreePi.nt/2012/04/26//o6043g0499o.481497/o6043g0499o.481497.ch.444973 --redirect-output --dbname gpc1 --verbose 
    
  • 09:00 now ipp006 is being punished with load >100, ppSub using >90% RAM running neb://ipp006.0/gpc1/ThreePi.nt/2012/04/26/RINGS.V3/skycell.1392.070/RINGS.V3.skycell.1392.070.dif.234951 so killed -- the diffim is fairly poor and suspect many bad-sub detections.
  • 09:30 Serge: Restarted all pantasks but addstar
  • 12:27 Bill turned destreak revert off

Friday : YYYY.MM.DD

Saturday : YYYY.MM.DD

Sunday : 2012.04.29

  • 16:40 Mark: see MD08-z was observed last night, will setup for processing with new refstack later tonight.
  • 18:30 restarting standard science.