PS1 IPP Czar Logs for the week 2016.11.14 - 2016.11.20

(Up to PS1 IPP Czar Logs)

Monday : 2016.11.14

  • MEH: Haydn replacing disks -- ipp005, ippx020, ipp115 -- ipp005 repair & manually out of processing until finished ~24 hrs, ippx020 as compute node back in summitcopy as normal, ipp115 not part of nightly processing
  • 11:45 CZW: Restarting ITC transfer test pantasks (/data/ippc19.0/home/watersc1/itc_sync_script.20161026/test_pant). Should be minimal impact on the cluster.

Tuesday : 2016.11.15

  • 07:05 MEH: stdscience pantasks segfault ~0509 -- restarted and will take ~hr to finish nightly processing
    [2016-11-15 05:09:39] pantasks_server[23034]: segfault at 491a228 ip 00000000004
    08a4e sp 000000004120ef20 error 4 in pantasks_server[400000+16000]  
  • 12:15 CZW: Restarting ITC transfer test pantasks (/data/ippc19.0/home/watersc1/itc_sync_script.20161026/test_pant) with 40 active jobs (all based on stare nodes, but with IO to other hosts). Should be minimal impact on the cluster.

Wednesday : 2016.11.16

  • 16:55 CZW: Restarting IPP pantasks.
  • 17:30 CZW: Added ipp118 and ipp119 to nebulous. They are in repair.

Thursday : 2016.11.17

  • 9:00 CZW: Generated additional diffs for MOPS with new code. For some reason, no morning darks were taken last night, which is problematic as NS then doesn't know when observing has finished. In addition, there's an exposure c7696g0016f from 2016-11-04 that is repeatedly failing to register. I'm not sure exactly how to fix this, as there appears to be critical header data that is missing (it detects as camera SIMPLE, and not GPC1).
  • 12:30 CZW: Restarted ITC shuffle as ipptest with pantasks server ~ipptest/replication/ (server on stare04). This seems to be having a higher error rate than expected, with a possible cause being trouble getting md5sums from ippb nodes. Documentation for this process is here.
  • 14:30 CZW: I have added ipp120 and ipp121 to nebulous, but have set them to repair for now. They should be clear for use, but don't want to do that before leaving the island.
  • 16:20 CZW: Notes on the new nightly science script available here.
  • 16:30 CZW: Restarting IPP pantasks.
  • 16:40 EAM: running relastro -update-objects for the region dec > 70. should be done before nighttime.
  • 18:00 MEH: SNIaF updates running on ipps nodes for next few hours

Friday : 2016.11.18

  • MEH: MOPS test processing on ipps nodes today

Saturday : 2016.11.19

* HAF 17:00 restart pantasks

Sunday : 2016.11.20

  • HAF 17:00 restart pantasks