PS1 IPP Czar Logs for the week 2017-02-06 - 2017-02-13

(Up to PS1 IPP Czar Logs)

Monday : 2017-02-06

  • 13:28 CZW: Restarted ipptest/replication due to laggy pantasks server. Set npend at 400, which hopefully won't overload any particular host. 300 seemed to be working fine over the weekend.
  • 17:15 CZW: Started script running on stare04 to replicate a copy of the detrend data to the ITC cluster (randomly selected host/volume) and to the new ATRC B nodes (randomly selected from ippb07-ippb15). This is running single threaded, and is not pushing very hard, but the number of files included in this transfer is fairly small (34 individual det_id values, ~2000 total files) so this shouldn't be an issue.
  • 19:00 EAM : PS1 is down for mirror cleaning

Tuesday : 2017.02.07

  • 21:00 EAM : PS1 is down for dome (and weather)
  • 21:00 EAM : I rsynced both the gpc1 and nebulous mysqls from the ipp116 & ipp117 replicants to ipp114 & ipp115 today. gpc1 (ipp114) finished in the afternoon, so I restarted ipp116 & started mysql for ipp114 with replication running on both. nebulous just finished. I have restarted ipp117 and started replication there, but I need to build the 5.6 version of mysql before starting ipp115.

Wednesday : 2017.02.08

  • MEH: ipp001 disk full per the regular email warning -- running the gpc1 db dump cleanup script
  • MEH: ipp089 overloaded from excessive remote_md5sum.pl jobs since ~1700 yesterday -- if expected, then probably needed to take out of processing and put neb-host repair to isolate and notice sent to group, if not expected then should be check in on and not fireoff and forget
    • after ~hour of no response, have stopped ~ipptest/replication in order to get other work done
  • MEH: a possible source of free disk space leaking may be due to error_cleaned on the large numbers of LAP.PV3 warp updates recently -- ~few TB freed up, but many still fail due to missing .mdc files leaving large .fits pixel files still on disk...
  • MEH: ippc19/home disk warning @97% -- archive pantasks logs as described for the czars on Processing page -- ~7GB freed
  • MEH: Haydn notes the ipp056, 058 non-optimal config emails are not correct and have been just ignored for a while now -- put ipp056,058 neb-host up
    Controller ID: 0 Reminder: Potential non-optimal configuration due, PD commissioned as Emergency spare: --:--:16
    Generated on:Wed Feb  8 11:31:59 2017
    
  • MEH: ipp121 seems ok past few days and in use today -- neb-host up

Thursday : 2017.02.09

  • MEH: MOPS test chunk updates running today in normal stdscience

Friday : YYYY.MM.DD

  • 02:20 MEH: ipp121 non-responsive -- another kernel panic -- power cycle -- back to neb-host repair since not reliable (was not back in normal processing)

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD