PS1 IPP Czar Logs for the week 2015.04.20 - 2015.04.26

Monday : 2015.04.20

  • 12:06 Bill: restarted pstamp pantasks. It was pretty sluggish.
  • 12:33 CZW: restarted pv3diff, pv3diffleft, and now doing pv3skycal.
  • 13:34 MEH: again, more OSS diffim to manually clear, cannot build growth curve (psf model is invalid everywhere) -- looks like same skycell
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0687.072 -diff_id 1055084 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0687.072 -diff_id 1055125 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0687.072 -diff_id 1055139 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0687.072 -diff_id 1055154 -fault 0
  • 16:31 CZW: restarting ipp user pantasks.

Tuesday : 2015-04-21

  • 15:15 CZW: I'm going to stop and restart the pv3diff/left/skycal pantasks.

Wednesday : 2015.04.22

  • 14:39 CZW: Restarting pv3diff/left and pv3skycal, which apparently crashed at midnight last night.

Thursday : 2015.04.23

  • 08:10 EAM: stopping and restarting stdscience, pv3diff, pv3diffleft. adding the compute machines to stdscience and setting pv3diff to stop using compute machines if stdscience is to far behind.
  • 21:34 HAF: registaration jammed - nothing obvious - I am restarting registratoin
  • 21:50 HAF: I have suspicions that summitcopy is jammed too, restarting it
  • 21:50 seeing errors like this (never seen before??)
    Running [/home/panstarrs/ipp/psconfig/ipp-20141024.lin64/bin/dsget --uri --filename neb://ipp085.0/gpc1/20150424/o7136g0111o/o7136g0111o.ota34.fits --compress --bytes 49432320 --nebulous --md5 a12834282ce89b53f350665f8beec3da --timeout 600 --copies 2]...
    downloading file to /tmp/o7136g0111o.ota34.fits.653F0K9j.tmp
    Running [/home/panstarrs/ipp/psconfig/ipp-20141024.lin64/bin/neb-locate --path --all neb://ipp085.0/gpc1/20150424/o7136g0111o/o7136g0111o.ota34.fits]...
    Running [/home/panstarrs/ipp/psconfig/ipp-20141024.lin64/bin/pztool -copydone -row_lock -summit_id 901577 -exp_name o7136g0111o -inst gpc1 -telescope ps1 -class chip -class_id ota34 -uri neb://ipp085.0/gpc1/20150424/o7136g0111o/o7136g0111o.ota34.fits -hostname ipp046 -dbname gpc1 -md5sum c629fc93e29dadddc051715db378eef2 -bytes 22109760]...
    *** stderr ***
     -> psDBAlloc (psDB.c:166): Database error originated in the client library
         Failed to connect to database.  Error: Lost connection to MySQL server at 'reading authorization packet', system error: 0
     -> pztoolConfig (pztoolConfig.c:212): unknown psLib error
         Can't configure database
     -> main (pztool.c:71): (null)
         failed to configure
    Unable to perform /home/panstarrs/ipp/psconfig/ipp-20141024.lin64/bin/pztool -copydone -row_lock -summit_id 901577 -exp_name o7136g0111o -inst gpc1 -telescope ps1 -class chip -class_id ota34 -uri neb://ipp085.0/gpc1/20150424/o7136g0111o/o7136g0111o.ota34.fits -hostname ipp046 -dbname gpc1 -md5sum c629fc93e29dadddc051715db378eef2 -bytes 22109760: 3
  • 21:54 HAF: ok, I see it now. stuck exposures in summitcopy, all from around an hour or 2 ago (I can get exact time later). What the hell happened? there were a handful of chips not copied, nto faulted from o g o 95 /96 /97, as well as some o h o problems as well. Do we do burntool etc on gpc2? do we have similar tools for checking that?
  • 22:01 HAF: there was a jammed up process on stdsci as well - these seem to be related to database stuffs... I am going to take a look at mysql processlist. Ok, it doesn't look like there are other stupidly long ones - I think I am leaving it alone. I think stdsci / summit/ reg is ok now?

Friday : 2015.04.24

  • 06:10 EAM: restarting pv3skycal -- pantasks apparently crashed at 05:22 (or someone shut it down)
  • 06:20 EAM: also restarted pv3diff and pv3diffleft
  • 17:15 EAM: We only have a single (r-band) flat for gpc2. I had been lying to the system in camera.config to pretend that all filters (eg, g.00002) were in fact r-band. this allowed all images to be processed, albeit with a sub-optimal flat. However, we have known for a while that psastro uses this wrong value (r) to select the reference stars. It also seems to mess up MOPS somehow. To get around this problem, I have registered the chips for the single flat as each of g,i,z,y,w flats as well. I have therefore stopped lying to IPP in gpc2/camera.config, so hopefully this will allow MOPS to get better photometry starting tonight.

Saturday : 2015.04.25

  • 08:20 EAM: I queued the ps2 diffs, but ipp054 got overworked (check_nighly does not check ps2 data). I have had to stop things and clear out jobs while I fix these issues. I've put ipp054 into repair for the moment.
  • 08:40 EAM: ipp054 is happy again. however, the check_nightly_queue scripts need to be fixed. for today, I'm disabling the test in pv3diff and pv3diffleft. I will remove the compute and storage nodes manually tonight.
  • 16:45 MEH: doing restart of stdsci to test if something broke the warpstack.trange macro

