PS1 IPP Czar Logs for the week 2014.01.20 - 2014.01.26

(Up to PS1 IPP Czar Logs)

Monday : 2014.01.20


  • 07:30 MEH: nightly finishing up okay

Tuesday : 2014.01.21

mark is czar

  • 07:00 MEH: nightly slowly finishing, stdsci needs a regular restart later today, raising poll to 500 to help keep fully loaded
  • 09:00 MEH: regular restart of many pantasks for the week -- registration,summitcopy,pstamp,update,distribution,publishing,stack,stdsci
  • 09:30 appears to be a hanging QUB stamp 330715
    • 11:15 Bill: modified to detect that input warp had quality error on update and fault the job
    • 14:00 MEH: manually set warp 879292 skycell.0803.024 quality to 0 and tried to update again, sets quality 8006 again
    • 15:00 MEH: manually set diff_id 509887 skycell.0803.024 to quality 42 then since will never update either
  • 17:00 Haydn fixed mobo for ipp046, neb-host repair but still out of processing
  • 17:05: MEH: update neb-host status
    • ippb01 back to repair
    • stsci00,01,02 repair in prep for move tomorrow
    • stsci03-09 back up since checkout okay
    • ipp031,032 must remain down until all stsci machines moved

Wednesday : 2014.01.22

Bill is czar today

  • 05:45 set staticsky and cleanup to stop in preparation for the ippc18 shutdown and move
  • 06:15 shut down all pantasks except staticsky, pstamp, and update
  • 09:00 all processes shut down.
  • 10:00 machines shut down
  • 11:35 ippc18 booted
  • 12:28 started roboczar and czarpoll scripts on ippc11
  • ippc17 and the data store is up c30 is not so the postage stamp server apace is unavailable
  • 14:32 started up the pantasks including ~ippsky/staticsky fullforce jobs are falling over because of stsci00-02 being down for moving. pstamp set to stop waiting for the working directories to be available.

Thursday : 2014.01.23

Bill is czar today

  • 08:30 prepared for cab7 move. Down are ippc10, c12-c16, ippc19, db03, and ipp049-053
  • 10:19 restarted pantasks. Moved stdscience to ippc02, and the servers usually on ippc15 to ippc03
  • 10:35 changed czarpoll config file to point to scidbm (db01) instead of scidbs(db03)
    • note ippmonitor is on db03 so is not available. There is old version on ippdb01 mostly works, but doesn't show staticsky
  • 11:18 ippsky/staticsky pantasks restarted
  • 14:20 turned off staticsky tasks in the staticsky pantasks. I want to allow my fullforce test to finish.
  • 14:30 cab 7 machines are back up. Set ipp049-53 to nebulous repair state. (There were a large number of chips failing because both copies of a detrend image were in that cabinet.
  • 14:42 restarted czarpoll with nominal configuration
  • 16:00 fixed about 40 chips whose burtool tables have gone missing. All class_ids were XY60 and XY61
  • 16:15 set pantasks to stop
  • 16:25 restarted distribution, summitcopy, and publishing on their regular system ippc15
  • 16:30 restarted stdsciene on it's usual system ippc16

Friday : 2014.01.24

The cabinet with the nebulous apache servers is being moved today.

  • 07:40 Bill shut down pstamp pantasks and set ippsky/staticsky and stack to stop
    • removed LAP label from stdscience added a set of compute3 nodes to work with the remaining nightly science processing
  • 07:57 Bill all pantasks stopped except for stdscience, publishing and distribution which each have a bit of work left
  • 08:25 CZW shut down all pantasks.
  • 17:00 CZW brought all pantasks back up. ippc02 is commented out in the .tcsh for ipp, as it did not come up cleanly after the move.
  • 17:40 MEH: tweak_ssdiff to catch up with the backlog of missing SSdiffs before nightly starts

Saturday : 2014.01.25

Sunday : 2014.01.26

  • 17:30 MEH: tweak_ssdiff to do SSdiff missed with backlogged data processing before new nightly

Sunday : 2014.01.26