PS1 IPP Czar Logs for the week 2015.09.07 - 2015.09.13

Monday : 2015.09.07

  • 07:45 MEH: phantom reboot of ipp008,12,13,14,16,18 again last night ~0210 HST -- they seem to have come back up okay
  • 10:05 MEH: restarting nightly pantasks
  • 10:20 MEH: oddly seeing >400 chips w/ label goto_clleaned for data_group x.20150728 -- don't see in my cut/paste script notes so not sure who from.. -- then seeing few MD.PV3 here and there.. looks like a pstamp bug?
  • 20:25 MEH: looks like obs stalled from weather, clearing stalled WS diffs -- 37 skycells lost from 18 exposures
  • 21:00 MEH: MOPS finished w/ x.20150905, all-clear for data to be cleaned up

Tuesday : 2015.09.08

  • 01:04 MEH: clearing bad warp -- cannot build growth curve (psf model is invalid everywhere)
    warptool -dbname gpc1 -updateskyfile -set_quality 42 -fault 0 -skycell_id skycell.1129.012 -warp_id 1619498  
  • 01:11 MEH: cleared WS diffs again -- 601 skycells lost from 78 exp
  • 06:00 MEH: ipp017 crashed ~2hrs ago, nothing on console and not unusual load -- power cycling
    • keyboard error -- had to skip w/ F1
    • many ypbind failure -- finally connected and jobs clearing from stdsci as well @0615 back to normal processing w/ ipp017 out and back to repair
  • 07:55 MEH: clearing ws diffs -- 664 skycells lost for 208 exp
  • 16:40 MEH: Haydn rebooted ipp017 to try and stop the PCI warning -- seems to be fine again, neb-host repair again but still manually out of processing
    2015-9-9  6:30:58 Controller#1(PCI) SATA PHY +2.5V   Over Voltage
  • 22:00 MEH: clearing bad warp -- cannot build growth curve (psf model is invalid everywhere)
    warptool -dbname gpc1 -updateskyfile -set_quality 42 -fault 0 -skycell_id skycell.1221.028 -warp_id 1619871

Wednesday : 2015-09-09

  • 14:00 CZW: restarting pantasks servers.

  • 12:52am (HAf/ saturday night?): problem with registration. After stopping addstars becuase of various timeouts, and restarting summit/registration because sometimes that helps, discovered regtool peek didn't show anything wrong. It was trying to burntool o7277g0262o but failing. Eventually discovered that 0261 was not fully downloaded (was missing 2 files): had errors like these in the summitcopy logs:
Unable to perform /home/panstarrs/ipp/psconfig/ipp-20141024.lin64/bin/pztool -copydone -row_lock -summit_id 961792 -exp_name o7277g0261o -inst gpc1 -telescope ps1 -class chip -class_id ota44 -uri neb://ipp092.0/gpc1/20150912/o7277g0261o/o7277g0261o.ota44.fits -hostname ipp092 -dbname gpc1 -md5sum 4b9c44a681cddf4a96c415f67d54c69b -bytes 23230080: 3
config error for: --uri --filename neb://ipp092.0/gpc1/20150912/o7277g0261o/o7277g0261o.ota44.fits --summit_id 961792 --exp_name o7277g0261o --inst gpc1 --telescope ps1 --class chip --class_id ota44 --bytes 49432320 --md5 fd7d8426c150b1280b8821326506703a --dbname gpc1 --timeout 600 --verbose --copies 2 --compress --nebulous
Running [/home/panstarrs/ipp/psconfig/ipp-20141024.lin64/bin/dsget --uri --filename neb://ipp087.0/gpc1/20150912/o7277g0261o/o7277g0261o.ota51.fits --compress --bytes 49432320 --nebulous --md5 0abbb949b3810f5af70bb312cba7df5c --timeout 600 --copies 2]...
downloading file to /tmp/o7277g0261o.ota51.fits.1ckkJZ89.tmp

after I had discovered this, somehow magically, registration /summitcopy fixed itself. I'm hoping stopping addstars and restarting summitcopy/registartion kicked whatever had been stuck, but it makes no sense to me...

