PS1 IPP Czar Logs for the week 2013.02.18 - 2013.02.24

(Up to PS1 IPP Czar Logs)

Monday : 2013.02.18

  • 08:00 MEH: regular restart of stdscience to keep LAP rate up
  • 22:00 MEH: running low on chip/warp/diff (pile-up of stacks in GP) so adding a few more runs 43->50

Tuesday : 2013.02.19

mark is czar

  • 08:00 MEH: again restarting stdscience to keep the LAP rate up
  • 08:40 MEH: ipp008-ipp010 taken out of processing in ppconfig/pantasks_hosts.input for Gene to use with ipptopsps runs
  • 08:50 MEH: chip.revert.off while fixing LAP chips
  • 09:30 MEH: giving stack an extra allocation of compute3 to try and more quickly push the pile of 3000 through -- without nightly science to process stacks never caught up over the weekend
  • 10:50 MEH: out of chips/warps.. adding more LAP runs -- should've used the ~ipp/lap/trickle_add.pl script to slowly add 1x1, but was manually doing similarly so should be okay. in the past Chris notes some exposures get dupe processed.
  • 10:57 Bill: ran 'queuessky.lap --ra_min 60 --ra_max 90 --go' which queued about 5800 five filter staticsky runs
    • MEH: set poll 30->10 until some of the large backlog of stacks move through since staticsky running in stack now
  • 16:30 MEH: ippc03 has had warnings about disk space, there is/was an old old 2011 nebulous_server.old.log hanging onto 24G of disk
  • 16:40 turning skycal.on back and in the stdscience/input file
  • 17:50 Chris setup next RA hour slice for LAP
  • 18:30 MEH: ippdb01 went into heavy WAIT_CPU state and all processing piled-up. stopping processing until worked out
  • 19:40 MEH: things settling down, slowly restarting
  • 23:25 MEH: LAP stacks mostly caught up so reallocating all back to default

Wednesday : 2013.02.20

heather is czar. woke up this morning and stdscience was down. I'm restarting it.

  • 13:30 Bill: regenerated 2 missing burntool tables and recovered o5520g0235o.ota26.fits

Thursday : 2013.02.21

Begin updates by Bill

  • 11:32 set stack pantasks to stop in preparation for a rebuild of ippTools and a database change.
  • 12:00 all ~ipp pantasks shut down. rebuilding ipp-20121218
  • 12:12 ippc17 has become unresponsive. Ganglia says that it happend about 900 seconds ago. Nothing on the console. Power cycling.
  • 12:24 restarted all pantasks servers except cleanup (which is off to reduce load on nebulous) and deepstack (which is not being used)
  • 12:30 chip.revert is off in stdscience so we can find files that need to be repaired.
  • 12:35 recovered o5499g0668o.ota26.fits and o5498g0681o.ota26.fits regenerated burntool tables for o5255g0082o.ota02.fits and o5255g0079o.ota11.fits
  • 12:37 ippc17 can't see /data/ippb02.1. Reported this to Gavin.

End entries by Bill

Friday : 2013.02.22

  • 17:00 : ipp012 has been running on kernel 3.7.6 since Feb 8 and on for nebulous since Feb 14. I have rebooted ipp011 and ipp013 to use the new kernel as well (but not turning them on in nebulous yet). these machines have the new kernel NOT as the default option for boot -- the booter needs to press the up-arrow key. we will switch neb to on monday.

Saturday : 2013.02.23

Sunday : 2013.02.24

  • ipp001 replication stopped and backup crashed because of full disk. Moved last backups made in 2013-01 to Archives/2013 and deleted the other ones.

Sunday : 2013.02.24