PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : YYYY.MM.DD

Tuesday : YYYY.MM.DD

Wednesday : 2016.01.13

  • 23:34 MEH: MOPS reports NO data has been processed tonight... all are stuck at chip stage, quick look at log shows chips faulting at detrends on ipp036, quick scan of email shows ipp036 had hiccup today before noon preventing disk access and needed to have neb-host down set until/if rebooted...
    • ipp036 console showed kernel panic -- power cycled ok and back up -- leaving neb-host down until morning though
    • looks like nightly pantasks are also well needing their regular restart as well... otherwise nodes wont be full loaded -- just doing main problem stdscience, others in the morning
    • also looks like cab w/ ipp013 did a power cycled just before Richard's email that triggered the stalled registration email -- all seem up and ok

Thursday : 2016.01.14

  • 07:14 MEH: registration jammed around 0430, cannot revert
     -> p_psDBRunQuery (psDB.c:812): Database error generated by the server
         Failed to execute SQL query.  Error: Cannot delete or update a parent row: a foreign key constraint fails (`gpc1/chipProcessedImfile`, CONSTRAINT `chipProcessedImfile_ibfk_2` FOREIGN KEY (`exp_id`, `class_id`) REFERENCES `rawImfile` (`exp_id`, `class_id`))
     -> revertprocessedimfileMode (regtool.c:872): unknown psLib error
    
    • manually cleared
      update rawImfile set fault=0 where exp_id=1020611 and class_id="XY60";
      update rawImfile set fault=0 where exp_id=1020838 and class_id="XY35";
      
    • and needed this manually cleared as well
      regtool -updateprocessedimfile -exp_id 1020838 -class_id XY35 -set_state pending_burntool -dbname gpc1
      
  • 09:10 MEH: looks like ipp067 has gone unresponsive for past 20 minutes.. power cycle and back up
  • 11:07 MEH: regular restart of nightly science pantasks
  • 11:10 MEH: ipp036 back to neb-host repair after raidstatus check ok

Friday : YYYY.MM.DD

Saturday : 2016.01.16

  • 10:35 MEH: clearing fault 5
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0809.024 -diff_id 1313005  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.1368.042 -diff_id 1314040  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.1370.039 -diff_id 1314045  -fault 0
    

Sunday : 2016.01.17

  • 07:50 MEH: more fault 5 fixing, restarting nightly pantasks
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0725.076 -diff_id 1314125 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0646.018 -diff_id 1314127 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0647.053 -diff_id 1314128 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0646.018 -diff_id 1314139 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0647.053 -diff_id 1314141 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0647.053 -diff_id 1314182 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0646.018 -diff_id 1314184 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0647.053 -diff_id 1314199 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.0646.018 -diff_id 1314200 -fault 0