PS1 IPP Czar Logs for the week 2013.04.2 2- 2013.04.28

(Up to PS1 IPP Czar Logs)

Monday : 2013.04.22

  • 10:40 Bill: ippc21 is having autofs issues turned it off in stdscience. Gavin will reboot
  • 10:47 Bill: stdscience is sluggish set to stop for restart
  • 10:53 Bill: stdscience pantasks restarted. Camera set to off to let the camera runs that were finishing up in the previous pantasks finish.
  • 11:08 camera.on
  • 21:55 MEH: sending distribution MD08.GR0 stacks (year old) and MD05.GR0 stacks (6 months) to cleanup to recover some diskspace

Tuesday : 2013.04.23

  • 06:05 Bill: distribution has a lot of data outstanding yet seems to be backed up. Restarted the pantasks. Increased poll limit from 200 to 400.
  • 17:45 CZW: home directories are temporarily hard to get to while I accidentally DDOS them with some stacking tests that still have a testing flag enabled. I'm killing jobs, fixing that testing flag issue, and trying to restore service.
  • 17:58 CZW: ippc18 is back to normal.

Wednesday : 2013.04.24

  • 11:25 Bill: added 2 sets of compute2 and compute3 to pstamp pantasks since the cluster isn't too busy
  • 13:25 Bill: removed compute3 hosts from pstamp pantasks and added them to update

Thursday : 2013.04.25

  • 09:00 MEH: appears a large number of MD07.GR0 night stacks failed to be made. will need to be heavily using the system for updates and stacking when nightly processing is not running...
  • 11:54 CZW: running MD09 deep stack test to check how much my ppStack changes impact that reduction. These are running on stare00-03. I expect it will take a day or so to finish, but if these need to be stopped before then, let me know.
  • 16:55 MEH: stdscience really could use its regular restart, may help speed up the MD07 updates as well

Friday : 2013.04.26

  • 9:05 lazy czar (heather) kicks registration using regpeek.pl - some exposures not yet registered -repeated a few more times.
  • 11:11 Serge: Reprocessing PI night (20110130). Label PI_20110130.reprocessing.20130426
  • 11:30 MEH: stdscience needs its regular restart, in progress

Saturday : 2013.04.27

  • 01:10 MEH: ipp032 unhappy, blocking nightly data for registration for almost an hour. can't log in but hasn't crashed. is it the rsync's, too many for nightly processing and relastro? load ~20 but wait>80%
    • ipp031,032 thought were out of processing for the rsyncs but looks like summitcopy and diffims there, stopping from processing -- seems to slowly be coming back..

Sunday : 2013.04.28