PS1 IPP Czar Logs for the week 2011-12-12 - 2011-12-18

(Up to PS1 IPP Czar Logs)

Monday : 2011-12-12

  • No observation last night
  • 11:37 Restarted pstamp
  • 11:55 Something weird happened to ippc18. All connections have been closed. I'm investigating.
  • 13:45 I quote Gavin: "In the network switch logs, I see that the uplink that provides network connectivity to the cluster switch went down and immediately came back up. I don't have any info why it went down. I'm wondering if this has anything to do with Hawaiian Telcom network outage yesterday in Kihei."
  • 13:50 14 diffs are repeatedly failing because of files on ipp064. I diff.revert.off'd them. We'll talk about them during the IPP meeting.
  • 13:55 Same for the 118 destreak failures.
  • 14:28 I quote Gavin again (see 13:45 entry) "It appears that our switch (ippcore) is reporting that it was a neighboring UH ITS switch that was flapping."

Tuesday : 2011-12-13

  • No observation last night
  • 09:35: neb-host ippb02 down (Gavin's request: ethernet bonding deployment)
  • 09:49: We (HF & SC) turned off dist.revert. There are a lot of dist failures (related to ippb02?). Anyways, there are a lot of dist failures that keep getting reverted, so we turned it off for now.
  • 09:56: neb-host ippb02 up
  • 10:40: Bill shut distribution down while debugging/fixing the nomagic distribution.

Wednesday : 2011-12-14

No observation last night. Distribution and replication were restarted.

  • 09:06 Bill: removed all stages from distribution except for stack and raw. This is to allow the M31 raw data to run quicker.
  • 15:38 Bill: restarted distribution with the nominal set up

Thursday : 2011-12-15

  • 9:30: CZW: nebulous seemed to be having problems. Restarted apache on the nebulous apache servers. It looked like this might be caused by issues with ipp013/nfs, which has since been rebooted.

Friday : 2011-12-16

Saturday : 2011-12-17

Sunday : 2011-12-18