PS1 IPP Czar Logs for the week 2014.09.29 - 2014.10.05

(Up to PS1 IPP Czar Logs)

Monday : 2014.09.29

* 07:27 Bill: Set quality flag on several diffs that were failing

difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 598546 -skycell_id skycell.2163.039
difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 598547 -skycell_id skycell.2162.028
difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 598547 -skycell_id skycell.2162.057
difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 598607 -skycell_id skycell.1155.020
difftool -updatediffskyfile -fault 0 -set_quality 14006 -diff_id 598617 -skycell_id skycell.1246.074

Tuesday : 2014.09.30

  • 12:00 MEH: using ippsXX for MOPS test processing -- done

Wednesday : 2014.10.01

Thursday : 2014.10.02

  • 10:25 EAM : ipp013 is back up after a mobo-replacement. I've reverted the failed chips, but I've also run neb-repair on the ipp013-only burntables that were needed.
  • 11:20 CZW : restarted stdlanl pantasks, as it had a lot of timeout/failure counts that made reading the number of active jobs tricky.

Friday : 2014.10.03

  • 06:10 EAM : ipp036 crashed in the night, no messages on the console. I power cycled and it came back fine.
  • 10:55 EAM : burntool was having trouble with a single chip (o6933g0296o.ota14). It turns out there is a problem with an array in burntool which is not forced to be consistent with an accessing loop variable. I made a hackish fix, but have asked JT for help
  • 23:10 MEH: noticing the fault 2 build up again -- lanl stdlocal running ~230 jobs alongside nightly science
     -> psFitsOpen (psFits.c:217): I/O error
         Failed to delete a previously-existing file (/data/ipp082.0/nebulous/54/bb/5186700861.gpc1:OSS.nt:2014:10:04:o6934g0265o.804031:o6934g0265o.804031.ch.1117466.XY27.mdl.fits), error 2: No such file or directory
     -> pmFPAfileOpen (pmFPAfileIO.c:816): I/O error
         error opening file /data/ipp082.0/nebulous/54/bb/5186700861.gpc1:OSS.nt:2014:10:04:o6934g0265o.804031:o6934g0265o.804031.ch.1117466.XY27.mdl.fits
     -> pmFPAfileWrite (pmFPAfileIO.c:415): I/O error
    
  • 23:20 MEH: some fault 5 diffims to clear so MOPS gets them in the morning
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2616.002 -diff_id 599551  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2617.028 -diff_id 599551  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2616.060 -diff_id 599556  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2617.065 -diff_id 599556  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2617.068 -diff_id 599556  -fault 0
    

Saturday : 2014.10.04

  • 00:17 MEH: no problems w/ remote connection now after Gavin's email about the firewall fix. few more diffim fault 5 to clear
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.1234.088 -diff_id 599564 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.1235.079 -diff_id 599565 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.1324.032 -diff_id 599571 -fault 0
    

Sunday : 2014.10.05