PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : YYYY.MM.DD

Tuesday : YYYY.MM.DD

Wednesday : 2017.05.24

  • MEH: adding additional cleanup attempts from 2016 that had failed during the shuffle/move so that MOPS+QUB can get more of the stamps they need
  • 16:00 CZW: The migration to the updated czartool (r40008) with summit, download, new, and raw Exp stage handlers is complete. This update also ensures that temp plot files are cleaned.

Thursday : 2017.05.25

  • MEH: ipps11 appears to have been down/unresponsive on ganglia since yesterday ~noon -- barely log into, io error on commands
    -- multiple entries on console
    [252083.738963] Uhhuh. NMI received for unknown reason 3d on CPU 0.
    [252083.745002] Do you have a strange power saving mode enabled?
    [252083.750774] Dazed and confused, but trying to continue
    [252086.206723] Uhhuh. NMI received for unknown reason 2d on CPU 0.
    [252086.212754] Do you have a strange power saving mode enabled?
    [252086.218526] Dazed and confused, but trying to continue
    -- after power cycle boot -- odd message, if it matters
     * Starting local .../etc/conf.d/local.start: line 13: /sys/devices/system/cpu/sched_mc_power_savings: No such file or directory
    -- seems ok for now
    

Friday : 2017.05.26

  • MEH: ipp121.1 not mountable, setting neb-host down --volume -- probably needs xfs_check run
    [1486417.010902] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1607 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffff8121ecf9
    ...
    [1486417.010970] XFS (sdb1): xfs_do_force_shutdown(0x8) called from line 3734 of file fs/xfs/xfs_bmap.c.  Return address = 0xffffffff8122a4e7
    [1486417.011119] XFS (sdb1): Corruption of in-memory data detected.  Shutting down filesystem
    [1486417.011121] XFS (sdb1): Please umount the filesystem and rectify the problem(s)
    ...
    [1518433.309029] XFS (sdb1): xfs_log_force: error 5 returned.
    
  • 13:30 EAM : Haydn rebooted and cleared the xfs log for ipp121.1. It is behaving OK now, so I've put it in repair.
  • 16:55 EAM : restarting pantasks

Saturday : YYYY.MM.DD

Sunday : 2017.05.28

  • 12:00 EAM : restarting pantasks