PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : YYYY.MM.DD

Tuesday : YYYY.MM.DD

* 8:00 HAF pantasks registration died in the middle of the night, restarted it. Caused 148 delayed registrations

Wednesday : YYYY.MM.DD

Thursday : YYYY.MM.DD

Friday : 2016.06.24

  • MEH: for some reason there are 13 stalled WS nightly diffims from way back last Friday still stalled -- clearing...
  • MEH: ps-ipp-ops email from other day -- looks like time to reset nebulous logs, ippc25 seems to be getting hit much more than other nodes
    Notification Type: PROBLEM
    
    Service: Root Partition
    Host: ippc25
    Address: 10.10.20.104
    State: CRITICAL
    
    Date/Time: Thu Jun 23 02:12:26 HST 2016
    
    Additional Info:
    
    DISK CRITICAL - free space: / 3211 MB (10% inode=82%)
    
    ------ running on ippc19 ----- checking ippc20 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G   17G   13G  57% /
    ------ running on ippc19 ----- checking ippc21 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G   20G  9.5G  68% /
    ------ running on ippc19 ----- checking ippc22 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G   17G   12G  59% /
    ------ running on ippc19 ----- checking ippc23 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G   21G  8.1G  72% /
    ------ running on ippc19 ----- checking ippc24 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G   21G  8.3G  71% /
    ------ running on ippc19 ----- checking ippc25 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G   26G  3.0G  90% /
    ------ running on ippc19 ----- checking ippc26 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G   19G  9.6G  67% /
    ------ running on ippc19 ----- checking ippc27 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G  7.9G   21G  28% /
    ------ running on ippc19 ----- checking ippc28 ------
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/md2               31G  7.8G   21G  28% /
    
    • seems to be the apache2 access_log is larger than the nebulous_server.log now -- is log rolling not happening for apache or is there extra verbosity in the setup?
      -rw-r--r-- 1 root  11G Jun 24 09:23 access_log
      -rw-r--r-- 1 root 3.6G Jun 17 05:07 error_log
      
      -rw-rw-r-- 1 apache 4.1G Jun 24 09:23 nebulous_server.log
      
    • after nebulous_server.log reset and access/error_log compressed, nominal values 22GB free except for ippc25 @21GB -- may be similar once logrotation setup for other logs
    • /var/log/messages isn't rolling either -- looks like root crontab for ippc20-c26 (and possibly others) are missing
      # Logrotate syslog-ng
      1 0 * * 0 /etc/cron.daily/logrotate.cron 2>&1
      
    • added to root crontab for nodes, looks like config is on 4 week rotation

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD