PS1 IPP Czar Logs for the week 2014-03-31 - 2014-04-06

(Up to PS1 IPP Czar Logs)

Monday : 2014-03-31

Tuesday : 2014-04-01

  • 12:30 CZW: restarted stdscience
  • 14:00 CZW: resumed replication processing with test.40 macro. This should be the optimum value of number of simultaneous replications/md5sum disk wait on ippb03.

Wednesday : 2014-04-02

  • 13:25 Bill: restarted postage stamp pantasks adding a task that queues requests older than 14 days to be cleaned.

Thursday : 2014-04-03

  • 12:13 Bill: restarted postage stamp pantasks with an optimization that should speed up request parsing somewhat.
  • 21:05 MEH: stdsci way way way past regular restart, polls barely loaded.. restarting so nightly doesn't get backed up...

Friday : 2014-04-04

Bill is czar today

  • 10:28 The postage stamp jobs are taking too long to process. Ran the macro hosts.detrend to add storage nodes to the list of pstamp hosts.
  • 14:07 restarted stdscience, pstamp, summitcopy, distribution and registration pantasks.
  • 14:15 started up pantasks ~ippsky/sas-pantasks using 2 x hosts.detrend. Set 2 sets of s0, s1, s2, and s3 to off in stdscience
  • 20:07 pstamp processing is overloading ippc30 (8 x finish running together). Turned dependency checking off for now and cleanup off for now. Removed the storage nodes as well.
  • 20:40 ippc30 stabilized. Reduced the # of cleanup jobs from 5 to 2 and dependency poll value down to 32.
  • 22:23 well, the sas-pantasks died about an hour ago. The pantasks.stderr and pcontrol log files each claim that the other died. Will go ahead and leave it off for the night. Putting nodes s0, s1, s2, and s3 nodes back to on in stdscience.

Saturday : 2014-04-05

  • 04:30 Bill: nightly processing is proceeding smoothly. The warp rate is 40 per hour while than chip and camera are around 130 per hour. So we now have a 200 exposure backlog at warp stage. Removing LAP label from stdscience so that once the chips finish the processing power will go to the nightly warps.
  • 06:25 since LAP is off there are no stacks so I added 2 x c2 to stdscience
  • 06:35 summit copy is done. registration complete for science exposures.
  • 08:33 restarted stdscience pantasks with default host configuration. This starts up LAP again.
  • NOTED: When MOPS submits it's postage stamp jobs the number of faults increases greatly. Serge's special processes are running on ipp032 as well. Until the stamp jobs started processing proceeded very smoothly last night. Another thing that happened is that a number of the stsci nodes have run out of free space, which may cause stresses on some of the other nodes.
  • 08:55 set one set of storage nodes to off in stdscience and to on in pstamp
  • 14:01 pstamp overloading ippc30 again. Too many request finish jobs running dsreg doing md5sums. Set to stop to let things stabilize
  • 14:30 restarted pstamp. Set to only run 2 request finish jobs at a time.
  • started sas-pantasks

Sunday : 2014-04-06

  • 10:20 Bill sas-pantasks is done MPE's postage stamp requests requiring pv1 updates are done. Restarting stdscience and pstamp pantasks with their normal host configurations.