PS1 IPP Czar Logs for the week 2015.04.13 - 2015.04.19

(Up to PS1 IPP Czar Logs)

Monday : 2015.04.13

  • 05:20 Bill: Richard reports that the results from two exposures have not been received by MOPS. The problem was one of the warps had a fault 3 which doesn't get reverted. It was a database connection problem. Seems like we should change the fault code from 3 (config error) to 2 (data error) so it would be reverted.
  • 13:15 CZW: Restarting stdlanl pantasks and removing the auto-queue queue file entry from the input file to cull down the number of active lapRuns. The cleanup was timing out due to the large number of input exposures, and that launched 37 runs that should not be active yet.
  • 22:00 EAM: restarted pv3diff & pv3diffleft

Tuesday : 2015.04.14

  • 06:15 EAM : stsci10 is down, rebooting it now.
  • 10:20 EAM : started pv3skycal under ~ippsky/pv3skycal (1 x storage nodes for now).
  • 10:30 EAM : restarting pv3diffleft.

Wednesday : 2015.04.15

  • 03:58 Bill: MOPS reports two exposures stalled. Each had a skycell with faulted fault 4. One succeeded after reverting. The other was cleared with warptool -updateskyfile -warp_id 1531927 -skycell_id skycell.0938.019 -fault 0 -set_quality 42
  • 07:38 Bill: restarted pstamp pantasks
  • 21:40 MEH: stdsci needs its regular restart

Thursday : 2015.04.16

  • 03:45 MEH: cleared OSS diffim fault
    Error in subtraction:
     -> VectorFitPolynomial1DOrd (psMinimizePolyFit.c:633): unknown psLib error
         Could not solve linear equations.
    
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0764.041 -diff_id  1013770 -fault 0
    
    
  • 09:55 MEH: doing regular restart of all pantasks, maybe will clear up problem summitcopy is having
  • 10:25 EAM: queued 06h for skycal

Friday : 2015.04.17

  • 08:00 EAM: restarted pv3diffleft
  • 08:20 EAM: queued new jobs for sas diff (SAS.20141118.d8) and full-force (SAS.20141118.ff8). I added the diff label to pv3diff (NOT in the input file) and the fforce label to pv3skycal
  • 09:55 EAM: after 1.5h, none of the SAS diffs have been queued, which I do not understand. I have removed the lap label from pv3diff to debug.
    • the infinite depth queue for pv3diff meant the difftool -toskyfile query never included any SAS entries in the limited list. I'm going to leave pv3diff with just the SAS label until it is done with SAS (should be quite fast).
  • 12:30 EAM: restarted pv3diffleft again -- already in oscillation
  • 20:30 MEH: stdsci chips ~75k already, half are faults.. looks like GPC2.. may require a restart of stdsci soon..
  • 21:00 EAM: the source of the errors was the incorrect filter information in the headers (CLEAR.00002 instead of r or i). as a result, flat-field lookups failed. I've adjusted the config to map that name to r-band for flats (the only one we have).

Saturday : 2015.04.18

  • 08:10 MEH: cleared OSS diffim, curve of growth invalid everywhere
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0687.072 -diff_id 1037212 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0687.072 -diff_id 1037227 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0687.072 -diff_id 1037239 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0687.072 -diff_id 1037283 -fault 0
    
  • 08:45 MEH: regular restart of stdsci for tonight's nightly processing

Sunday : 2015.04.19

  • HAF 20:47
  • conductor died yesterday, sidik and craig fixed it up, but we got a summit fault sometime while they were fixing it. I deleted the offending thing in summitExp, to see if it fixes it:
  • delete from summitExp where summit_id = 899573 and fault = 250;
  • looks like it did. This is again a time where we get a fault, but NULL for number of imfiles... We need to better / gracefully handle this case rather than deleting things in summitExp manually.

Sunday : 2015.04.19