(Up to PS1 IPP Czar Logs)

Monday : 2012-09-10

  • 09:45 (Serge): LAP
    • Recovered:
      • gpc1/20100814/o5422g0050o/o5422g0050o.ota33.fits
      • gpc1/20100814/o5422g0038o/o5422g0038o.ota33.fits
      • gpc1/20100814/o5422g0047o/o5422g0047o.ota33.fits
      • gpc1/20100823/o5431g0038o/o5431g0038o.ota31.fits
      • gpc1/20100616/o5363g0181o/o5363g0181o.ota26.fits
      • gpc1/20100607/o5354g0207o/o5354g0207o.ota65.fits
      • gpc1/20100619/o5366g0300o/o5366g0300o.ota26.fits
      • gpc1/20100607/o5354g0119o/o5354g0119o.ota65.fits
      • gpc1/20100619/o5366g0320o/o5366g0320o.ota26.fits
    • Fixed:
      • gpc1/20110428/o5679g0510o/o5679g0510o.ota64.burn.tbl
      • gpc1/20100607/o5354g0208o/o5354g0208o.ota13.burn.tbl
  • 12:20 CZW: daily stdscience restart

Tuesday : 2012-09-11

  • 09:18 Bill is restarting pstamp and update pantasks. Their pcontrols are spinning. Also doubled the number of hosts working on each of those tasks since there is work to do.
  • 09:24 Bill reverted stacks with fault == 2 to recover from last night's server overload
  • 10:15 (Serge): Stopped replication on ippc63. Started rsync to ippc61.
  • 11:15 CZW: daily stdscience restart
  • 13:53 CZW: Added test diffs for MOPS. Raised the priority of the ecliptic.rp label to attempt to force the remaining ~800 stacks to finish.
  • 15:50 MEH: rebuilding ippMonitor with cam simple plot FWHM_major query option.
  • 16:00 (Serge): All test diffs for MOPS published to IPP-MOPS-TEST-2 (label: WSS.test)
  • 17:30 MEH: added parts to ippMonitor to make simple sky(cell) plot for stacks (built on Manoa ippMonitor and checked in but doing more testing before rebuilding production ippMonitor)

Wednesday : 2012-09-12

  • 06:54 bill reverted chips. many had failed to find detrend files. Why is nfs been cranky (less reliable) this week?
  • 07:08 We have a number of jobs stuck on ipp066 stacks and chips. Stopping stdscience and stack to sort out. Will restart stdscience as well as throughput has dropped.
  • 07:15 restarted stdscience pantasks. Killed 2 stack jobs running for > 100,000 seconds on ipp066. Didn't find any current mount problems. Will need to watch that node.
  • 07:29 rebooting ipp066. Any jobs we try to run on it hang.
  • 08:17 all stacks queued completed. Reverted fault 4s set stack to run.
  • 08:54 fixed about 20 lost burntool tables and missing raw images. 2 were actually lost unfortunately my shell history is too short.
  • 11:10 (Serge): rsync of nebulous mysql from ippc63 to ippc61 complete (in 18 hours).
  • 11:30 MEH: rebuilding ippMonitor on production cluster to add stack/staticsky/skycal simple plots

Thursday : 2012-09-13

  • 14:24 (Serge): By Gene's request, processing restarted check_system.sh run
  • 22:00 MEH: starting deepstack pantasks to run replacement MD09 staticsky/skycal

Friday : 2012-09-14

  • 13:00 (Bill) restarted pstamp and update pantasks.
  • 16:00 (Bill) ipp020 has been having nfs problems. It got stuck trying to talk to ipp049. Couldn't get it reset. Rebooted the machine.
  • 16:20 pantasks set back to run. Stopped cleanup pantasks. It should not be running right now.

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD