PS1 IPP Czar Logs for the week 2011.02.07 - 2011.02.13

(Up to PS1 IPP Czar Logs)

Monday : 2011.02.07

Tuesday : 2011.02.08

Serge is czar.

  • 06:59: No science exposures last night. Only STS data in stdscience. The day should be quiet...
  • 07:41: Bill Turned stack.on in stdscience
  • 08:57: removed ipp045 from stdscience (controller host off ipp045)
  • 09:15: Bill queued STS.201009 finished distribution. Set chips, warps, and diffs to goto_cleaned
  • 11:50: Bill set stack.off
  • 11:53: Bill doubled the number of hosts working on update to 55
  • 12:10: EAM : turned off ipp045 from cleanup, pstamp, update, distribution
  • 13:15: EAM : stopped all pantasks for NFS mount switch
  • 13:45: EAM : restarted all pantasks
  • 15:10: Heather : processing magictest%20110208 in her own build.
  • 15:35: CZW: rebuilt ippconfig to push in change to dqstatstool requested by OTIS.
  • 15:40: Bill removed MOPS label from pstamp in order to fix a bug.
  • 18:16: CZW: add label MOPSreq.czw.20110208 to reprocess 12 exposures for MOPS.
  • 19:57: Bill added STS.refstack.20110207 to the survey tasks to queue it for diff through distribution.
  • 20:23: Bill added MUGGLE.20110203 to stdscience. chip-warp reprocessing of certain data from 2011-02-03.
  • 21:30: Serge restarted summitcopy (crashed at 20:47 according to czar tool).

Wednesday : 2011.02.09

  • 07:45: Bill ipp005, 6, 7, and 25 are scheduled for mother board upgrades today. Set them to repair state in nebulous.
  • Queued diffs for 136.3PI._.A-NE-Q_% and 136.3PI._.A-NE-P_% exposures from 5595 (2011-02-03)
  • uncensored detections have been published to IPP-MOPS-TEST
  • 09:00 In preparation for the shutdown stopped all pantasks except for pstamp and update
  • 09:30 Set all data with label ps_ud% to goto_cleaned. Cleanup is not running though.
  • 09:38 Restarted update to have clean job counts.
  • 10:58 heather restarted her stdsci to push along the magictest/photfest goodies.
  • 15:22 Serge Restarted pantasks servers:
    pantasks server cleanup is active (host: ippc07)
     Scheduler is running
     Controller is running
    
    pantasks server distribution is active (host: ippc15)
     Scheduler is running
     Controller is running
    
    pantasks server pstamp is active (host: ippc17)
     Scheduler is running
     Controller is running
    
    pantasks server publishing is active (host: ippc08)
     Scheduler is running
     Controller is running
    
    pantasks server registration is active (host: ipp052)
     Scheduler is running
     Controller is running
    
    pantasks server stdscience is active (host: ippc16)
     Scheduler is running
     Controller is running
    
    pantasks server summitcopy is active (host: ipp051)
     Scheduler is running
     Controller is running
    
    pantasks server update is active (host: ippc13)
     Scheduler is running
     Controller is running
    

Thursday :2011.02.10

Bill is czar for the next 2 days

  • 07:22 burntool stuck. Chris figured out that the date didn't get added because registration was shut off yesterday. He fixed it.
  • 11:43 (heather) - dropped the isps that will never register (exp_id < 40000, all from before april 2010).
  • 11:43 (heather) - removed 'heather' labels from distribution and publishing, and added in 3 more new 'heather' labels into distribution (so that the magictests will run magic)
  • 11:43 (heather) - heather is still running her stdscience for magictest processing.
  • 13:50 - queued STS.20100516 r band sts data. Priority is below stdscience labels.
  • 13:54 still have ~200 exposures to burntool.
  • 14:04 fixed corrupted warp file see http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/wiki/PS1_Operations/broken_files
  • 14:16 added more hosts to pstamp pantasks
  • 15:45 Bill is now running the postage stamp pantasks off of his build.
  • 16:40 pstamp is back running the production stuff
  • 16:55 stdscience was sluggish. pcontrol using full cpu. Restated stdscience.
  • 17:00 restarted distribution
  • 22:48 quiet LASER night. Queued the last of the STS.2010 data for processing. Many IFA postage stamp requests.

Friday : 2011.02.11

  • 06:45 Updates for the postage stamp server are stuck. I adjusted the poll limits yesterday in an attempt to prevent updates for the same label from later requests blocking earlier requests. Unfortunately I checked in a bug to difftool -setskyfiletoupdate that prevented skycells from being set to update. Since QUB had some requests at the top of the queue no other dependents got processed. Fixed that bug and QUB's requests finished quickly.
  • 08:00 STS.2010 entry in WSdiff survey task had the i-band template listed. The exposures are r. Fixed that
  • 09:30 Johannes reports that 166 i band STS exposures never got distributed. I investigated and found that there are actually 193 that they never made it to magic. This is quite likely the result of pre-mature cleanup on my part. Queued them with label STS.2010.i.missed.
  • 09:48 The postage stamp server seems stuck again. The problem is some requests want data in goto_cleaned state and the cleanup pantasks died last night.
  • 09:55 restarted cleanup
  • 14:00 updated warp_skycell.pl to the new version that replicates outputs.
  • 16:51 turned magic.revert.off
  • 19:43 magic.revert back on. All ps_ud data set to goto_cleaned (except MOPS)

Saturday : 2011.02.12

  • 06:35 Added another set of hosts to distribution pantasks since stdscience is done with my STS data.
  • 09:54 took ippc15 (the distribution pantasks server) out of the host list for distribution.
  • 10:06 turned destreak off (it is working on STS.2010.i.missed) so that STS.2010 will finish distribution faster so that I can clean up the data.
  • 10:14 increased poll limit in dist from 128 to 200
  • 10:30 restarted disribution. Noticed that the timing parameters are off for dist.process.load -exec was 5 but -poll was 20. Left STS.2010.i.missed out of label list for now.
  • 12:11 all distribution bundles for label STS.2010 stage chip and warp are done. Set data to be cleaned. 241 diff bundles to go. Added STS.2010.i.missed label back in.
  • 19:18 make fileset faults ipp018 couldn't see ipp021. force.umount fixed it. I wonder if we have had other troublesome connections.

Sunday : 2011.02.13