PS1 IPP Czar Logs for the week 2014.05.19 - 2014.05.25

(Up to PS1 IPP Czar Logs)

Monday : 2014.05.19

  • 07:30 EAM : setting warp off while nightly science finished diffs; also bumping up the pol to 600. actually, thing are running slow, so I'm restarting stdscience.
  • 12:20 EAM : for a test of relastro, I am turning off ipp04x for processing in stdscience
  • 14:20 HAF: heather stopped stack for ippc05 replacement drive (shouldn't take too long)
  • 17:05 HAF: haydn repaired 42/36/c05 - only 42/c05 are up (he's still working on 36)
  • 17:10 HAF: postage stamp server ws down. restarted, not sure why it died.
  • 19:33 HAF: turned off Lap.* labels in stdsci and stack, as suggested by gene for nighly LAP processing. will turn on again in the AM

Tuesday : 2014.05.20

  • 8:ish HAF/GENE added labels back into stack/stdsci, reverted stuff
  • 09:31 Bill: regenerated a couple of missing burntool tables
    ipp@ippc18:/home/panstarrs/ipp>perl ~bills/ipp/tools/fixburntool --exp_id 144139 --class_id XY74
    /home/panstarrs/ipp/psconfig/ipp-20130712.lin64/bin/ --dbname gpc1 --camera GPC1 --exp_id 144139 --class_id XY74 --this_uri neb://ipp015.0/gpc1/20100302/o5257g0090o/o5257g0090o.ota74.fits --previous_uri neb://ipp015.0/gpc1/20100302/o5257g0089o/o5257g0089o.ota74.fits
    Running [/home/panstarrs/ipp/psconfig/ipp-20130712.lin64/bin/ --dbname gpc1 --camera GPC1 --exp_id 144139 --class_id XY74 --this_uri neb://ipp015.0/gpc1/20100302/o5257g0090o/o5257g0090o.ota74.fits --previous_uri neb://ipp015.0/gpc1/20100302/o5257g0089o/o5257g0089o.ota74.fits]...
          Deleted 1 chipProcessedImfiles
          Updated 0 chipProcessedImfiles
    ipp@ippc18:/home/panstarrs/ipp>perl ~bills/ipp/tools/fixburntool --exp_id 175176 --class_id XY15
    /home/panstarrs/ipp/psconfig/ipp-20130712.lin64/bin/ --dbname gpc1 --camera GPC1 --exp_id 175176 --class_id XY15 --this_uri neb://ipp040.0/gpc1/20100530/o5346g0436o/o5346g0436o.ota15.fits --previous_uri neb://ipp040.0/gpc1/20100530/o5346g0435o/o5346g0435o.ota15.fits
    Running [/home/panstarrs/ipp/psconfig/ipp-20130712.lin64/bin/ --dbname gpc1 --camera GPC1 --exp_id 175176 --class_id XY15 --this_uri neb://ipp040.0/gpc1/20100530/o5346g0436o/o5346g0436o.ota15.fits --previous_uri neb://ipp040.0/gpc1/20100530/o5346g0435o/o5346g0435o.ota15.fits]...
          Deleted 1 chipProcessedImfiles
          Updated 0 chipProcess
  • 09:54 Bill: restarted pstamp pantasks. It was running slowly because of a tired pcontrol.
  • 14:54 Bill: ippc30 is getting nfs overload from pstamp jobs. set one set of c2 nodes to off.
    • 15:00 Bill installed new version of which does not recompute the file size and md5sum and uses the values calculated in
  • 17:00 HAF is restarting the pantasks at the request of Bill.
  • 17:38 HAF: ok stuff is restarted and same as last night I pulled out the labels for LAP. Will re-add them tomorrow
  • 22:30 MEH: adding LAP.ThreePi?.20130717.pole back into stack since uses nodes independent of stdsci -- will watch for couple hours to verify rate is similar

Wednesday : 2014.05.21

mark is czar

  • 01:10 MEH: stdsci rate seems similar w/ stack running (as well as staticsky) on compute nodes. leaving in -- though it seems the default stack allocation of nodes is a bit to high on the c2 group w/ staticsky, someone has modified things and they are not initially compatible -- warning whenever restarting pantasks...
  • 08:00 MEH: leftover stuck dark -- o6798g0020d
  • 09:50 MEH: restart summitcopy+registration with ipp036,042 back in processing
    • recent restarts of pantasks_servers have not been rolling over log files --
  • 11:30 MEH: tasking some s3 nodes since unused currently -- taking all s3 nodes from stdsci for other tasks..
  • 16:00 MEH: restarting stdsci and stack -- turning MD06 back on, fixing auto-overload of c2 since being used by staticsky, forcing days old logs to actually rotate in order to track problems easier
     -- giving all of c2 to staticsky (3x) since out of plane 
  • 19:50 MEH: reminder LAP label out of stdsci for nightly processing once it starts
  • 23:20 MEH: fault 5 diffims to clear, poor weather so LAP label back in until data or sleep

Thursday : 2014.05.22

mark is czar

  • 06:30 MEH: appears to be stuck files from last nights power glitch
  • 08:46 Bill: in the process of restarting the postage stamp server. Waiting for a huge request to finish parsing.
  • 09:45 MEH: summit files available again, LAP label out until nightly finished -- nope, now timeouts.. bad md5sums and bad files triggering retry in wget.. now for attempted cleanup
  • 10:56 Bill: postage stamp pantasks restarted.
  • 13:30 MEH: was going to drop all partial exposures from last night, but can try to squeeze ones through by setting imfile to number of valid entries. may be questionable.. but there they are
    update summitExp set imfiles=59 where exp_name = 'o6799g0277o';
    --> proceeded to through to make diffim (visit 4)
    update summitExp set imfiles=57 where exp_name = 'o6799g0278o';
    update summitExp set imfiles=58 where exp_name = 'o6799g0279o';
    update summitExp set imfiles=53 where exp_name = 'o6799g0280o';
    update summitExp set imfiles=52 where exp_name = 'o6799g0281o';
    update summitExp set imfiles=51 where exp_name = 'o6799g0282o';
    update summitExp set imfiles=51 where exp_name = 'o6799g0283o';
    update summitExp set imfiles=52 where exp_name = 'o6799g0284o';
    --> ota missing not matched so burntool going to get stalled
    update summitExp set imfiles=51 where exp_name = 'o6799g0300d';
    update summitExp set imfiles=51 where exp_name = 'o6799g0301d';
    • no clear way without manually to get burntool to get through these OBJECT exposures and just setting to broken
          update summitExp set exp_type = 'broken', imfiles=0, fault =0 where exp_name = ''; 
  • 13:50 MEH: stack needs nodes since c2 in staticsky, use s2 until nightly starts -- 3x running close to mem limit (CNP) so back down to 2x -- actually way too many CNP inputs for normal LAP/3PI stack..
  • 16:10 MEH: restart stdsci and stack for nightly processing -- MD06 will be observed tonight so need regular stack config (c2 can stay in staticsky however)
  • 17:10 MEH: new day and summitcopy is similarly unhappy.. timeouts -- appears one of the summit servers may have a problem.. leaving summitcopy to just try and get as many OTA as possible until fixed, but no complete exposures possible and thus no processing possible
    • was the same machine (t20) that had power glitch issue last night -- Sidik fixed and downloads finishing. may want to retry some of the ones from last night.
  • 16:51 Bill: set staticsky to stop. Once jobs finish up I will run a short fullforce test and then set back to run.
  • 19:28 Bill test started.
    • 19:38 test completed staticsky set back to run

Friday : 2014.05.23

  • 01:30 MEH: quick restart of stdsci to fix a MD06 error for distribution
  • 09:40 MEH: MD06 SSdiffs for last night -- only 2z so manually made a NS for it just for comparison, looks like there is saturation on source in the warps
  • 14:30 MEH: taking all s3 group from stdsci to do sideband processing for Gene

Saturday : 2014.05.24

Sunday : 2014.05.25

  • 12:05 MEH: stealing s3 nodes from stdsci for the afternoon to run sample WSdiff using LAP PV2 stacks