PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : 2016.02.22

  • 11:28 MEH: Haydn may be able to replace ipp087:BBU in the afternoon -- will wait for regular pantasks restart until then if raid WT enabled (however suspect will need a day to cycle setup), if WT then can make targeted again -- if not targeted can still have possible problems if most 80T data nodes go full red and may nee to go back to repair
    • ipp087 new BBU also degraded and ipp103 not triggering on manual relearn -- Hadyn will need to work on later. Neither should be setup for targeting and may need to be put into repair if too many of the data targeted nodes shut off for data -- reminder, manually removed ipp087 from data target in
      ~ipp/psconfig/ipp-20141024.lin64/share/pantasks/modules/ipphosts.mhpcc.config
      
    • also seemed to help to keep ipp087 out of processing as well
  • 17:36 MEH: unless hear differently, will put ipp072,076 and ipp056,061,062,063,066 to up as well..
    • looks like all ipp054,055,056,057,058,059,060,061,062,063,065,066 set have failed BBU and should not be targeted -- may also have issues if too many data targeted nodes are full
    • future czar may want to try singular target all 40TB nodes to limit random overload while the 80TB disks are full along with 6 of the new 86TB nodes volumes
  • 23:52 MEH: telescope problems for late start tonight so only putting repair->up ipp072,076 since they have working BBU

Tuesday : 2016-02-23

  • 11:00 CZW: Setting up replication as ipptest to see if that version of pantasks supports large numbers of fast jobs better.
  • 16:20 CZW: I've installed the new nightly_science.pro module, and am restarting ipp pantasks for that change to take effect. This should fix stacks not automatically generating.
  • 18:08 CZW: I just had to do reset.advance in the summitcopy pantasks. This was stuck on o7442g0010d.
  • 18:43 MEH: see the note above about manually removing ipp087 from data target... restarting pantasks to use local changes (hopefully temporary changes)

Wednesday : YYYY.MM.DD

  • 17:22 CZW: Restarting ipp pantasks.

Thursday : YYYY.MM.DD

  • 15:00 HAF: stopped pantasks, set ipp087 to down so HH can replace BBU in ipp087.
  • 15:56 HAF: restarting pantasks, set ipp087 to up

Friday : YYYY.MM.DD

  • 17:00 HAF restarting pantasks for tonight

Saturday : 2016.02.27

  • 17:00 EAM: restarted pantaskses. removed CFA.nightlyscience label from stdscience for tonight. put ( ipp008 ipp012 ipp013 ipp014 ipp016 ipp018 ipp037 ) into repai.
  • 22:20 HAF: registration hung, I restarted pantasks, not sure why it fixed it, but it is moving along now.
  • 22:30 HAF: no, it was summitcopy that was the problem. Some hung chips. Restarted summitcopy. it's slowly inching forward

Sunday : YYYY.MM.DD