PS1 IPP Czar Logs for the week 2013-12-23 - 2013-12-29

(Up to PS1 IPP Czar Logs)

Monday : 2013-12-23

  • 09:30 EAM : things are running a bit slowly; I'm stopping all processing and will restart from scratch
  • 15:45 EAM : deactivated FilesMonitoring by adding an exit statement to the python script.

Tuesday : 2013-12-24

  • 11:10 EAM : a lot of LAP runs were blocked by chip runs in state of error_cleaned. I conferred with Bill and he put them all in state goto_cleaned and they are cleaning up.
  • 15:30 EAM : there were a number of lapExp entries in the active lap runs which were in an inconsistent state: lapExp.data_state was 'full' but there were outstanding chips or warps to be updated. I set these lapExp entries to a data_state of 'pending_update' and things now seem to be running. query to list the inconsistent states:
select lap_id, lapExp.data_state as lap_data_state, lapExp.exp_id, chipRun.chip_id, chipRun.state as chip_data_state, chipRun.label, camRun.state as cam_data_state, camRun.label, warpRun.state as warp_data_state, warpRun.label from lapExp join chipRun using (chip_id) join camRun using (chip_id) join fakeRun using (cam_id) join warpRun using (fake_id) where lap_id >= 20800 and lap_id < 21000 and warpRun.state != 'full';

I used commands like this to set the data_state: laptool -dbname gpc1 -updateexp -lap_id 20958 -exp_id 78569 -set_data_state pending_update

  • 22:00 EAM : I am running a big dvomerge from stsci00 -> ipp064 (parallel). I've set stsci00 to neb repair (and meanwhile have set stsci19 to neb up since it is no longer being stressed as before)

Wednesday : 2013-12-25

  • 08:20 Bill: Changed some recurring faults to quality errors
    difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 509163 -skycell_id skycell.2582.087
    difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 509177 -skycell_id skycell.2608.044
    difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 509182 -skycell_id skycell.2582.087
    difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 509193 -skycell_id skycell.2608.044
    stacktool -updatesumskyfile -fault 0 -set_quality 13006 -stack_id 3002800
    stacktool -updatesumskyfile -fault 0 -set_quality 13006 -stack_id 3003582
  • 8:42 Bill: set all pantasks to stop to prepare for restart. Restart completed sometime later.

Thursday : 2013-12-26

Friday : 2013-12-27

Saturday : 2013-12-28

  • 15:15 Bill: set existing STS.rp.2013 data to be cleaned MPG has downloaded the current set. Queuing new batch from 2010.
    • removed goto_cleaned.rerun label from cleanup so that the sts data gets cleaned. More bytes for the run there.
  • 15:20 Bill Set stdscience to stop to prepare for restart.
    • restarted at 15:29 and over the next 10 minutes restarted most of the rest of them.

Sunday : 2013-12-29

  • 05:55 EAM : started rsync jobs for non-raw data on ipp033-ipp040 yesterday. the load on these machines is a bit high now, but the rsyncs need to go a bit faster. I'm boosting the number of threads in the rsync jobs, so I'm removing these 8 machines from stdscience.
  • 07:39 Bill : added the "rerun" labels back to cleanup
  • 16:42 Bill : stdscience could use a restart setting to stop.
    • dropped sts chip runs 929337 and 929442 filter = 'Not available'
    • 16:55 restarted stdscience. Setting for 60 minutes to let the warps catch up a little bit
    • 18:02 chip.on