PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : 2016-05-02

  • 11:35 CZW: Started batch 4 rsyncs for wave 3 nodes (ipp048-ipp053).
  • 16:05 CZW: Daily pantasks restart.
  • 18:22 CZW: Stopped processing to remove ippc43 from active status while Haydn repairs a failed drive in it.
  • 18:31 CZW: Haydn has revived ipp017, but I will leave it down in nebulous until tomorrow. I'd like to test that it's fully functional before throwing it back to nebulous.

Tuesday : 2016-05-03

  • 16:40 CZW: Restarting ipp pantasks, which will pick up the loading changes Gene sent an email about.
  • 16:50 CZW: Starting a dummy rsync of ipp017 to ipp104. This should put some load on ipp017 to determine if it's stable enough to be put back to repair (with the previous hardware, rsyncs had a 100% crash rate).

Wednesday : 2016.05.04

  • 00:46 MEH: ipp006 looks to be having troubles -- turning off in processing (looks like 1x sum+reg, 5 on+1 off already for 6x stdscience) and setting neb-repair (red already)
  • 10:20 EAM: modified the panstasks hosts to reduce stdsci to ~300 connections (all s-nodes). removed the c0 and c1a, c1b entries (to be retired). restarted pantasks.

Thursday : 2016-05-05

  • 15:00 CZW: I've set ipp017 to repair in nebulous and cancelled the rsyncs that were running on it. It looks to be stable and usable with the new hardware. I'll plan on deleting the rsynced copy next week.
  • 19:50 EAM: Stopping and restarting the pantasks.

Friday : 2016.05.06

  • 14:30 EAM: stopping and restarting the pantasks. last night's loading was good, pushing just a bit more from 300 to 265 connections (8*s3 + 6*s4 + 6*s5 + 3*s6)
  • 16:26 MEH: large number of QUB stamp request, bumping pstamp to push through them before nightly -- mostly diff updates (since warps being smaller aren't cleaned as much) so ippqub:stdscience_ws needs more nodes -- adding 4-6x m0+1 since those nodes are idle
    • QUB updates finished, restarting stdscience_ws back to default for the night -- now adding ippsXX nodes to pstamp to finish stamps
    • QUB stamps finished, restarting pstamp to default for hte night

Saturday : YYYY.MM.DD

Sunday : 2016.05.08

  • 21:40 EAM : ipp078 is down, attempting to reboot.
    • 21:45 EAM : reboot succeeded, leaving in repair