PS1 IPP Czar Logs for the week 2011.10.10 - 2011.10.16

(Up to PS1 IPP Czar Logs)

Monday : 2011.10.10

  • 18:41 CZW: replication is stopped as I'm not done debugging the shuffle to deal with ippb02 being full. This should be done tomorrow.

Tuesday : 2011.10.11

  • 10:00 Bill: set a lot of data with labels like ps_ud% to be cleaned
  • 10:35 Added another set of hosts to the update pantasks. There are several thousand postage stamp requests outstanding and the cluster is quiet.

Wednesday : 2011.10.12

  • 11:00 Mark: stealing one compute2 group from distribution to run a deepstack pantasks and observe if bothers LAP/nightlyscience throughput over the next day. restarted distribution.

Thursday : 20111013

  • 08:55 Serge: registration looked stuck. I ran the failing command ipp_apply_burntool_single.pl --exp_id 407102 --class_id XY06 --this_uri neb://ipp007.0/gpc1/20111013/o5847g0636o/o5847g0636o.ota06.fits --continue 10 --previous_uri neb://ipp007.0/gpc1/20111013/o5847g0635o/o5847g0635o.ota06.fits --dbname gpc1 --verbose.
  • 09:20 Serge: still stuck.
  • 09:30 Mark: seems to be running after kicked it with
    regtool -updateprocessedimfile -exp_id 407104 -class_id XY06 -set_state pending_burntool -dbname gpc1
    regtool -updateprocessedimfile -exp_id 407103 -class_id XY06 -set_state pending_burntool -dbname gpc1
    
  • 09:45 Mark: stealing the compute group from distribution now for deep stack test.
  • 11:00 the condor reprocessing of all MD04 looked to be a success, so setting the MD04.GR0 run for refstacks to goto_cleaned to free up disk space.

Friday : 2011.10.14

  • 07:50 Mark: registration/burntool stuck, kicked with
    o5848g0198o  XY31 -1 check_burntool neb://ipp020.0/gpc1/20111014/o5848g0198o/o5848g0198o.ota31.fits
    
    regtool -updateprocessedimfile -exp_id 407351 -class_id XY31 -set_state pending_burntool -dbname gpc1
    
  • 10:50 czarpoll exited. restarted.

Saturday : 2011.10.15

  • 08:00 Mark: set deepstacks faulted after 17ks. this will be a problem.
  • 08:30 LAP warp stuck and so re-ran camera stage
    Reading FITS file /data/ipp019.0/nebulous/4b/cd/1453782697.gpc1:LAP.ThreePi.20110809:2011:10:15:o5424g0264o.207615:o5424g0264o.207615.cm.301319.XY02.mk.fits
    
    perl ~ipp/src/ipp-20110622/tools/runcameraexp.pl --redirect-output --cam_id 301319
    
  • 09:00 OTA in LAP processing stuck at chip stage, no valid instance for neb://ipp040.0/gpc1/20100823/o5431g0465o/o5431g0465o.ota15.fits. using LAP_Czar guide and ran
    chiptool -updateprocessedimfile -set_state full -chip_id  324912 -class_id XY15
    chiptool -updateprocessedimfile -fault 0 -set_quality 42 -chip_id 324912 -class_id XY15
    regtool -updateprocessedimfile -set_state corrupt -class_id XY15 -exp_id 211862
    
  • 17:37 looks like ippc11 is crashing while sitting idle now, or did someone reboot it?

Sunday : YYYY.MM.DD