PS1 IPP Czar Logs for the week 2012.01.16 - 2012.01.22

(Up to PS1 IPP Czar Logs)

Monday : 2012.01.16

Bill is acting czar this holiday morning

  • Fixed several bad instances (ipp064 - XY26)
  • fixed stuck lap stack due to bad warp skyfile by rerunning warp_id 334956 skycell.1915.080
  • Enabled distribution interest for M31.V4 warps. It tuns out that Johannes needs to background corrected warps for the variance images.
  • 11:00 Restarted stdscience and distribution. pcontrol was spinning Shut off magic and destreak in distribution pantasks since they are no longer needed
  • 11:01 dist.revert is off because the M31 warps are affected by the ipp064 lost file problem. Once all of the good files are done I will attempt to regenerate the missing ones.
  • 11:57 dropped two chips that were popping assertions in psphot. exp_name o5942g0110o chip_id 383027 XY60 and XY76. There appear to be no stars in the rawImages. The observing summary says that the dome was closed during the STD observations and that this is one of the exposures that should be ignored. Filed ticket # 1502
  • 12:32 Investigated outstanding ppStack failures. They are all instances of the problem reported in ticket # 1427. (MEH: majority are from MD06.GR0 nightly stacks running with modified/non-standard min_num 2 and are ones with 2-3 input warps)
  • 12:45 restarted summitcopy and registration pantasks.
  • 13:00 increased value for DECONV.LIMIT from 2.0 to 3.0. Reverting LAP staticsky runs that have faulted.

Tuesday : 2012.01.17

  • 09:00 Mark: MD04.GR0 processing of all possible nightly stack started, again like MD06.GR0 on deepstack pantasks and running independent of nightly science and LAP. MD06.GR0 nightly stacks setup for distribution now. a few remaining left being investigated
  • 12:51 CZW: Attempted to get stuck warp/updates running again did not seem to work. I attempted to re-run the camera stage products with, but these did not complete correctly, and faulted. These updates are from the region where we had data loss/problems due to ipp064 going down, so since I can't seem to get the updates to complete cleanly, I'm going to mark these LAP runs as "full," and skip them to get processing moving on other data that shouldn't have this problem.
  • 16:45 CZW: Re-allocated hosts in pantasks to remove most hosts from distribution and push them into stdscience and stack. Since we no longer need to run magic/destreak, we can use those hosts to get other things done.

Wednesday : 2012.01.18

  • 08:00 after watching a bit, looks like the re-allocation of the hosts has oddly reduced the deepstack pantasks using the wave4 machines by factor of 2-3x in processing quick nightly stacks. (12:50) Now rate back up to normal with M31 updates mostly running and cluster processing/network load at a similar level. (14:50) same with when LAP stacks also running heavily with M31 updates. doesn't seem to like it when heavy load of camera stage is running.
  • 15:00 CZW: Restarted stdscience.
  • 16:45 CZW: Restarted stack to take advantage of new host allocation. Not completely convinced that this isn't overtaxing some hosts (due to threading, etc).
  • 22:25 Bill restarted registration pantasks which had died. Nothing in the logs except that pcontrol caught server shutdown

Thursday : 2012.01.19

  • 10:20 Bill noticed that some nodes are getting overloaded. for example ippc29 4 ppStacks and a ppImage are a bit much. Very little free memory.
  • 11:30 Bill is doing some development using the mysqld on ipp049. Turned off that node in stdscience for a little while to allow some big mysql alter table commands to complete.
  • 11:56 ipp049 set to on in stdscience
  • 14:50 CZW: restarting stdscience as it seems to not be queuing as many jobs as possible, and the rates are slower than expected.

Friday : 2012.01.20

Saturday : 2012.01.21

Sunday : 2012.01.22