PS1 IPP Czar Logs for the week 2014.07.21 - 2014.07.27

(Up to PS1 IPP Czar Logs)

Monday : 2014.07.21

  • PS1 still down. skycal and MD running.

Tuesday : 2014.07.22

  • 11:20 EAM : I re-labeled the LAP.ThreePi?.20130717 staticsky runs in dense fields back to the standard name (from variants with .dense, .100k, etc appended). I queued the unqueued skycal skycells from these regions. I've also stopped and restarted the skycal pantasks (running as ipptest@ippc04) since pcontrol was getting sluggish.
  • 18:00 MEH: MOPS needs another sample reprocessing diffim set -- using the c2/compute3 group of nodes as ippmd in local pantasks. should run fine along w/ 3PI skycal

Wednesday : 2014.07.23

mark is czar

  • ps1 still down, no data last night
  • 09:35 MEH: pstamp running for while, doing regular restart
  • 16:40 MEH: looks like ipp071 has been down for 3ks -- new machine, not in service yet

Thursday : 2014.07.24

mark is czar

  • no data last night
  • 10:00 MEH: possibly data tonight, doing regular restart of pantasks in preparation
  • 16:50 MEH: large set of biases downloading
  • 20:00 MEH: yay data
  • 23:00 MEH: downloads seem to be slipping (43 incomp) -- ipp033-036 still transferring data?

Friday : 2014.07.25

  • morning Bill: started up distribution of the LAP skycal runs. Needed to change labels for all but 100,000 to LAP.ThreePi?.20130717.wait because having 900k+ runs to do made the query take a very long time.

Saturday : 2014.07.26

  • 06:01 Bill: first batch of skycal runs have finished distribution. Changed labels for another 200,000. The pending query takes up to 3 minutes with this setting.
    • 06:05 The distribution pantasks is a little tired. pcontrol is spinning and runs are getting queued slowly. I'm going to restart and change the list of stages being distributed to only include skycal for now.
  • 12:00 MEH: MOPS was missing some diffims, warp memory fault 4 cleared after a few manual reverts: o6864g0148o, 983502, skycell.1024.093

Sunday : 2014.07.27

  • 11:30 MEH: c2/compute3 group not fully utilized, running some stacks there as well
  • 21:00 MEH: ippc38 crashed, power cycled
    /dev/md2 has gone 186 days without being checked, check forced.
    ippc38.0 has gone 186 days without being checked, check forced.
    --> not sure it should be doing check on the local raid.. taking a while..
  • 21:20 MEH: still waiting for disk check to finish.. noticed 3PI warp w/ mem error (fault 4) needs manual reverting -- o6866g0081o, 984317, skycell.2521.004
  • 22:30 ippc38 now checking ippc38.1...
  • 23:30 ippc38 back up