PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : 2017.06.19

  • MEH: ippc18 seems have stopped system logging since 6/8 -- /var/log/messages stalled (and ~8GB) but dmesg is reporting recent info, unable to create new screen sessions
    • dmesg now reporting many
      [4052191.611865] sd 8:0:0:0: [sda] Device not ready: Sense Key : Not Ready [current] 
      [4052191.611869] sd 8:0:0:0: [sda] Device not ready: Add. Sense: Logical unit not ready, cause not reportable
      
      [4052215.133126] sd 8:0:0:0: [sda] Device not ready: Sense Key : Not Ready [current] 
      [4052215.133133] sd 8:0:0:0: [sda] Device not ready: Add. Sense: Logical unit not ready, cause not reportable
      [4052215.133138] end_request: I/O error, dev sda, sector 121605593
      [4052215.133187] EXT3-fs error (device sda3): ext3_get_inode_loc: unable to read inode block - inode=3584007, block=7176285
      
      [4052556.736328] Write-error on swap-device (8:0:1870561)
      [4052556.808632] sd 8:0:0:0: [sda] Device not ready: Sense Key : Not Ready [current] 
      [4052556.808639] sd 8:0:0:0: [sda] Device not ready: Add. Sense: Logical unit not ready, cause not reportable
      [4052556.808643] end_request: I/O error, dev sda, sector 1870561
      [4052556.808649] Write-error on swap-device (8:0:1870569)
      
    • also losing some binary commands like top -- basically non-fuctional now
    • homedir now switched to ippc19 (OLD version from March) --
  • MEH: nightly processing check with ippitc (stand alone/isolated homedir on ippc64) --
    • nebulous issue since nebdiskd running as ipp and not ippitc -- Gene flipped back
    • ippqub:stdscience_ws -- functioning snapshot from a few days ago rsync'd over old ippc19 version for processing
    • Serge swapped MOPS code needed to run as ippitc in place of ipp
    • ganglia not available
  • MEH: ipp107-ipp116 mostly full now (actually overly full with ipptopsps loading running) -- regularly past nebulous limit so set repair to allow ipptopsps etc to run -- ipp105,117-120,122 now specifically targeted for ippitc pantasks
    • turning off ipp105 (like ipp117 appears to already be) in nightly processing to avoid overloads -- adding 1x ci0 group to stdscience to help balance
    • moving WSdiff to start @0600 after nightly mostly finishes to avoid any overloads
  • MEH: regular cleanup of manually processed and missed products
  • MEH: bzipped ippitc pantasks logs so can make snapshot of that user setup easier/faster

Tuesday : 2017.06.20

  • MEH: Richard reports pstamp stalling and old web interface basically not functional --
    • ippops1 proxy seems fairly unresponsive to commands --
      • /var/log/messages seems to have a possible drive error (for swap partition) -- reboot ippops1 to force a disk check (>200d since last) -- all ok, still problem
      • many failed attempts from china -- add to /etc/hosts.deny -- no help
        Jun 20 14:22:14 ippops1 sshd[17155]: Failed password for root from 116.31.116.7 port 49611 ssh2
        Jun 20 14:22:14 ippops1 sshd[17155]: Failed password for root from 116.31.116.7 port 49611 ssh2
        Jun 20 14:22:15 ippops1 sshd[17155]: Failed password for root from 116.31.116.7 port 49611 ssh2
        Jun 20 14:22:15 ippops1 sshd[17155]: Received disconnect from 116.31.116.7: 11:  [preauth]
        Jun 20 14:22:37 ippops1 sshd[17163]: Failed password for root from 59.45.175.4 port 16409 ssh2
        Jun 20 14:22:37 ippops1 sshd[17163]: Failed password for root from 59.45.175.4 port 16409 ssh2
        Jun 20 14:22:38 ippops1 sshd[17163]: Failed password for root from 59.45.175.4 port 16409 ssh2
        
      • stopping apache and system is more responsive --
      • lots of dvo downloads from taiwan -- causing some load to ippops1 but not seems to impact proxy to pstamp
        211.79.51.70 - - [02/Jul/2017:19:27:59 -1000] "GET /3pi.pv3.20170216/n4500/3313.02.cps HTTP/1.0" 200 21767040
        
      • /etc/apache2/vhosts.d/08_pstamp02_vhost.conf setup for both pstamp and pstamp02 -- put back in 03_pstamp_vhost.conf -- restart proxy apache after changes to clear any possible unseen issues -- still stalling issue
    • ipp113 serving up stamps is stalling but datastore is not -- restart apache also to clear and see logs --
      • /data/ippc18.0 is still mounted on ipp113 -- Gene killed md5sum jobs and still not improved
      • ipp113 /etc/resolve.conf had ippc18 (10.10.20.16) as first entry so was a timeout delay -- Gene removed so 10.10.20.17 (ippc19) is now first entry and things are much better

Wednesday : 2017.06.21

  • MEH: ippc19.0 homedir filled up last night -- cleared ~2GB of duplicated files staged elsewhere so required nightly processing WSdiff can proceed

Thursday : YYYY.MM.DD

Friday : 2017.06.23

  • 15:30 EAM : restarting pantasks.
  • 20:03 MEH: nightly pantasks not running, unclear why -- no email response so restarting all from scratch @2045 -- 130 exposures behind now..

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD