PS1 IPP Czar Logs for the week 2017.02.13 - 2017.02.19

(Up to PS1 IPP Czar Logs)

Monday : 2017.02.13

  • 11:00 EAM : ippmonitor shows 2 camera runs are not yet done. looking at pantasks.stdout.log, they have been failing since friday night. I do not immediately see the cause in the logs for the jobs. I'll note them here so we can manually re-run, but I will mark these cam_ids as bad quality:
    camera_exp.pl --exp_tag o7795g0123o.1202244 --cam_id 1868106 --camera GPC1 --outroot neb://any/gpc1/OSS.nt/2017/02/11//o7795g0123o.1202244/o7795g0123o.1202244.cm.1868106 --redirect-output --run-state new --reduction SWEETSPOT --dbname gpc1 --verbose
    camera_exp.pl --exp_tag o7795g0124o.1202246 --cam_id 1868107 --camera GPC1 --outroot neb://any/gpc1/OSS.nt/2017/02/11//o7795g0124o.1202246/o7795g0124o.1202246.cm.1868107 --redirect-output --run-state new --reduction SWEETSPOT --dbname gpc1 --verbose
    
    update : actually, since the camRuns were already reverted and since the chips have been cleaned, I've set the state of the camRuns to 'drop' to remove them from the ippMonitor list:
    camtool -dbname gpc1 -updaterun -exp_name o7795g0123o -set_state drop
    camtool -dbname gpc1 -updaterun -exp_name o7795g0124o -set_state drop
    
  • 19:30 EAM : stopping & restarting pantasks

Tuesday : YYYY.MM.DD

Wednesday : 2017-02-15

  • 15:15 CZW: Haydn is taking ippc65 down to fix a bad drive.
  • 16:10 CZW: Restarting IPP pantasks.
  • 17:00 CZW: Haydn is taking down ippc69, not ippc65, for the bad drive fix.

Thursday : 2017-02-16

  • 9:20 HAF requeued these at the request of Serge. Needs to be debugged why they were not handled by nightlyscience.
    difftool -dbname gpc1 -definewarpwarp -exp_id 1204352 -template_exp_id 1204358 -backwards -set_workdir neb://@HOST@.0/gpc1/ESS.nt/2017/02/16 -set_dist_group SweetSpot -set_label ESS.nightlyscience -set_data_group ESS.20170216 -set_reduction SWEETSPOT -simple -rerun
    difftool -dbname gpc1 -definewarpwarp -exp_id 1204351 -template_exp_id 1204361 -backwards -set_workdir neb://@HOST@.0/gpc1/ESS.nt/2017/02/16 -set_dist_group SweetSpot -set_label ESS.nightlyscience -set_data_group ESS.20170216 -set_reduction SWEETSPOT -simple -rerun
    difftool -dbname gpc1 -definewarpwarp -exp_id 1204355 -template_exp_id 1204362 -backwards -set_workdir neb://@HOST@.0/gpc1/ESS.nt/2017/02/16 -set_dist_group SweetSpot -set_label ESS.nightlyscience -set_data_group ESS.20170216 -set_reduction SWEETSPOT -simple -rerun
    difftool -dbname gpc1 -definewarpwarp -exp_id 1204354 -template_exp_id 1204360 -backwards -set_workdir neb://@HOST@.0/gpc1/ESS.nt/2017/02/16 -set_dist_group SweetSpot -set_label ESS.nightlyscience -set_data_group ESS.20170216 -set_reduction SWEETSPOT -simple -rerun
    
  • 12:26 CZW: I'm setting ipp105 to up in nebulous. This should have no effect on processing, but should distribute shuffle load slightly.
  • 14:30 MEH: once cleanup finishes, shutting down all nightly pantasks to start the rollout and test of new ops tag ipp-20170121 for tonight
  • 15:30 CZW: I will be starting HSC processing on the stare nodes today. This will run (for tonight) completely on the stare nodes, and will be reading and writing data from the stare nodes, and as such, should not impact anything else.

Friday : 2017.02.17

  • MEH: pstamp to remain offline until odd incompatibility of tag ipp-20170121 and updates is sorted out
  • 10:40 MEH: reverting ops tag back to ipp-20141024 --
    • cleanup should remain stopped for now -- ok to run
  • MEH: large update cycles to clean up the silent false quality 8006 set by incompatible code in ipp-20170121 when updates run yesterday

Saturday : YYYY.MM.DD

  • 00:55 MEH: ipp056 appears to be overloaded -- manually take out of processing and neb-host repair -- ipp054,058,060,063,066 all had issues, too much data going to them w/o BBU possibly
  • MEH: QUB targeted followup needs upper ippx, ipps nodes as well as nightly nodes in AM -- finished
  • MEH: added additional files for cleanup but still no space on ipp082-097 -- adjust data targeting some to help avoid excess load on ipp054-066 w/o BBU
  • MEH: summitcopy in an odd state and not reporting issue to ippmonitor or exposures at summit -- running ~100 exposures behind

Sunday : 2017.02.19

  • MEH: QUB targeted followup needs upper ippx, ipps nodes as well as nightly nodes in AM -- finished