Monday : 2012.12.31

  • 11:40 MEH: no MD SSdiffs made, oddly MD nightly stacks weren't made last night. ns.stacks.run many errors, restarting stdscience. still getting failure for: nightly_science.pl --queue_stacks --date 2012-12-31 --dbname gpc1
    • ESS.nightlyscience is having trouble and blocking? removing label -- no help, must be over-riding/hardcoded somewhere..
    • stacktool command from nightly_science.pl is using -skycell and not -select_skycell.. looks like hadn't used the skycell selection option before?
    • fixing nightly_science.pl to use -select_skycell_id..
    • many timeouts but finally queued stacks @13:55, will take ~1-2 hrs before MD SSdiffim are ready, 300 ESS stacks to make first. tweak_ssdiff for 1500-1700 before nightly science..

Tuesday : 2013.01.01

  • 11:35 MEH: again no MD SSdiffims, looks like nightly_science.pl is still getting hung up for nightly stacks (and time critical SSdiffims) with a timeout now
    • raised timeout for ns.stack.run to 480 and queued stacks -- finished @14:00
    • tweak_ssdiff for 1400--1600 before nightly science again

Wednesday : 2013.01.02

  • 16:00 MEH: queuing up new MD07 refstack exposures to keep the nodes busy

Thursday : 2013.01.03

  • 08:35 MEH: chip.off to throw all nodes at remaining nightly warps.
  • 09:30 MEH: chip.revert.off to fix MD07 problems
  • 09:50 MEH: need to restart mysql on ippb machines to search for missing/0 size files. ipp037 was down but needed zap to start.
  • 18:20 MEH: MD07 exposures finished, setting up wave4 from stack pantasks to run refstacks/tests again

Friday : 2013.01.04

  • Bill fixed gpc1 mysql replication
  • 10:47 Bill restarted update and pstamp pantasks. Doubled number of hosts working on update
  • 14:36 Bill restarted update with standard setup.

Saturday : 2013.01.05

Sunday : 2013.01.06

  • 21:50 MEH: looks like ipp027 has been down for 3.5 hours. is in neb-host repair, setting to neb-host down.
    • many/all ESS/OSS 3PI nightly processing faults, looks like detrends on ipp027 needed..
    • restarted stdscience after neb-host down to more easily watch for problems -- 150 ESS+OSS moving through now @22:20