PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : 2017.07.24

  • MEH: cleanup of PSNSC.wref.20170707 warps to free up space for next set of warp updates and stacks
  • MEH: haydn wants to replace drives in machines -- many will need to be rebooted (actually just c65-127 aren't hot swappable)
    ippc101 sdb and sdc (1TB and 1TB WD)
    ippc103 sda (1TB WD)
    ippc108 sda (1TB WD)
    ippc114 sdb (1TB WD)
    ippc119 sdc (1TB WD)
    ippc126 sda (1TB WD)
    ippc32 sdd (3TB HGST) Note: we'll lose the data on /export/ippc32.1
    ippc52 sdd (3TB HGST) Note: we'll lose the data on /export/ippc52.1
    ippc65 sdb (1TB WD)
    ippc73 sdb (1TB WD)
    ippc80 sdd (1TB WD)
    ippc85 sdd (1TB WD)
    ippc91 sda (1TB WD)
    ippc94 sdd (1TB WD)
    ippc96 sda (1TB WD)
    ippdb03 sdc (500GB WD)
    ippx020 sdb (500GB WD)
    
    • ippx020 needs Heather's all clear -- ok
    • ippc73 is an apache server -- will need to remove from nebserver list
    • only ippc101,c32,c52,db03 needed to have disks swapped -- ippdb03 not working right, will be kept down since not actively used
      • ippc101 will need additional work later
      • ippc32,c52 had disk wiped on .1 partition -- need to restore /local/ipp/tmp links and permissions...
  • MEH: Curt needs switch over to 3x10G LAG between itc-i04-sw1 and itc-i10-sw1 -- might be a small network glitch when done -- sounds like tomorrow is better day to do it

Tuesday : 2017.07.25

  • MEH: publishing taking >2hr for o7959g0298o-o7959g0316o -- looks like o7959g0298o had telescope jump so field full of junk/residual -- manually queued o7959g0280o-o7959g0316o (v2-4)
    • v2-4 >1hr to publish so recommended to MOPS to just drop -- looks like also elongated sources in a dense star field and mostly useless diffims
  • MEH: manually removed duplicate queue of OSS.20170725 published on datastore -- done in error for the v2-4 manual diffim --
    • also removing many broken entries somehow added to ps1-3piWS-cat when datastore moved to ITC and >2 month old catalog bundles
    • removed >3 month old publish in IPP-MOPS-TEST to also reduce listing (and subdirectories by ~50k)
  • MEH: Curt changed setup to 3x10G LAG between itc-i04-sw1 and itc-i10-sw1 IPP main/core switches -- watch for dropped connections (should improve)
  • MEH: Haydn+Ming finished working on ippc101 -- both raids should finish rebuilding around when nightly science starts so will leave in for processing since just a compute node
  • MEH: odd OSS.20170726 chip fault in afternoon -- mis-label/queue exposure -- set exp_type="bad", obs_mode="ENGINEERING", chip label="badoss", data_group "badoss.20170726", quality=42
    | ENGINEERING | 1273652 | o7960g0002d | 2017-07-26 00:09:33 | y.00000 |       30 | Testing                              | 
    | OSS         | 1273653 | o7960g0003o | 2017-07-26 02:02:27 | i.00000 |       45 | OSSR.R11N7.6.Q.i ps1_22_5178 visit 1 | 
    | ENGINEERING | 1273654 | o7960g0004d | 2017-07-26 02:05:48 | OPEN    |       30 | video dark for first cell dev0, 30s  | 
    

Wednesday : 2017-07-26

  • 17:15 CZW: Restarting ippitc pantasks servers.

Thursday : 2017-07-27

  • 12:45 CZW: I'm going to add x3 to my HSC pantasks (/data/stare04.1/watersc1/hsc/stdsci2/) to see if I can improve the rate at which I'm making stacks. If someone else needs these machines, feel free to turn it off and kill jobs. I'm going to check on them in an hour or so, and confirm that it's not running into NFS issues (the stare nodes do not seem to cooperate very well).
    • 13:45 CZW: NFS problems, so I've turned them off, and have attempted to umount -f the problematic stare node mounts. I'll try that again before leaving for the day to make sure they're clear.
  • 15:55 CZW: Restarting ippitc pantasks.

Friday : 2017-07-28

  • 16:35 CZW: Restarting ippitc pantasks servers.

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD