PS1 IPP Czar Logs for the week 2017.04.17 - 2017.04.23

(Up to PS1 IPP Czar Logs)

Monday : 2017.04.17

Nebulous/pantasks was turned off in order for the copy (db01) to be rsynced somewhere safe. It is still ongoing, we anticipate it will be done tomorrow morning. HAF will check status and restart when it is complete. No pantasks for tonight. Nightly processing will start tomorrow morning (if all goes well).

Tuesday : 2017.04.18

  • MEH: Serge is running transfer of nebulous mysql files from ipp122->ipp115 -- ipp115 will have high load until finished, nothing else should be using ipp115/ipp122
    • ippdb01 (master) and ipp115 (slave) setup and active again -- restart all processing, stamps+stdscience updates seem fine, summitcopy+registration darks ok, cleanup ok
    • weather clearing so allowing nightly to process and leaving things running -- rate is lower than normal, chips seems to get stuck 400-600s at times

Wednesday : 2017.04.19

  • Weryk using ippc76-ippc126 for processing during day per Gene's comment at the group meeting
  • HAF restarted pantasks at 17:30
  • MEH: add/fix ippdb06(neb) and ippdb05(gpc1) for ippMonitor in ippMonitor/site.php, czartool_labels.php

Thursday : 2017.04.20

  • MEH: ippdb06 nebulous replication rsync -- disk writing rate much lower than expected, Haydn triggered raid initialization (won't wipe but need to suspend transfer so it can finish faster) -- still seems to have a reduced disk write rate ~50-60MB/s
    • rebuild mysql 5.6 for use on new raid and modify configs to use ippdb06.1 instead
  • MEH: ipp121 back online after Haydn and Gavin swapped out disks and restored the OS -- /etc/exports empty, copied and modified ipp122 version; key problem, cleared for ipp,ippmops,ippqub -- leave in repair for a while (probably should until remote power access is setup)

Friday : 2017.04.21

  • MEH: Gavin updated all remaining added nodes at ITC to do hard mounts now, all seem to be using that except for ippc18 (everyone will need to log off) and ippdb06-ipp122 for ippdb06 nebulous replication rsync
  • MEH: put pod 2+3 nodes at ITC from down to repair -- ipp090, 032, 054-066 -- should leave all in repair until remote power management online
    • ipp056 autofs not running -- started ok
    • ipp090 slow to mount, but after a bit seems fine
  • MEH: ipp097 needs to be down for Haydn to repair SSD -- back up and fine
  • MEH: ippx041-x044 being used for power use testing -- we have not network access until the extended network deployed
  • MEH: ippdb06 sync finished, Serge restarting replication from ippdb01 -- 226094s behind master, caught up ~1.5 hrs so even with degraded disk writes, replication seems ok -- unlikely could be used as primary until fixed.
    • ippMonitor needed grant REPLICATION CLIENT for ippdb06 so can monitor on ippMonitor

Saturday : 2017.04.22

  • MEHL network access into ITC down -- ~0700-0715, all processing seemed to be fine within the ITC
  • MEH: ippc111 appears to have crashed ~1700 -- taking out of processing and comment out in ~ippitc/ippconfig/pantasks_hosts.input

Sunday : 2017.04.23

  • MEH: bad exposure stalled in camera stage -- o7866g0067o -- fails to find solution... set quality 42
  • MEH: Rob reports missing exposures -- case of no or poor visit 4 -- manually queued up v2-v3 since nightly_processing script still reporting none to do even after the stalled camera through warp and others in email (v2 bad camera stage had v1-v3 auto queued)...
    o7866g0045o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_1611 visit 3
    o7866g0046o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_1457 visit 3
    o7866g0047o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_1652 visit 3
    o7866g0056o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_3104 visit 3
    o7866g0057o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_1655 visit 3
    o7866g0060o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_1640 visit 3
    o7866g0061o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_1459 visit 3
    o7866g0062o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_1638 visit 3
    o7866g0063o FAIL (Diff stage) OSSR.R11S1.8.Q.w ps1_24_1458 visit 3