PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : 2014.12.29

  • 06:20 MEH: fixing odd row entry for registration -- when register_imfile.pl --exp_id 845005 --tmp_class_id ota04 --tmp_exp_name o7020g0159o... was manually run last night, it retained fault=2 entry and caused regtool -revertprocessedimfile to fault w/ DB error so new faults couldn't be cleared. manually set fault=0 in rawImfile and moving again
  • 09:25 MEH: stdlocal to stop until nightly finished..
  • 09:40 MEH: can see regularly faulting OSS warps now that won't finish due to cannot build growth curve
    warptool -dbname gpc1 -updateskyfile -fault 0 -set_quality 42 -warp_id 1317112 -skycell_id skycell.2406.012
    warptool -dbname gpc1 -updateskyfile -fault 0 -set_quality 42 -warp_id 1317117 -skycell_id skycell.2227.006
    warptool -dbname gpc1 -updateskyfile -fault 0 -set_quality 42 -warp_id 1317413 -skycell_id skycell.2406.012
    warptool -dbname gpc1 -updateskyfile -fault 0 -set_quality 42 -warp_id 1318054 -skycell_id skycell.0821.059
    
  • 20:18 HAF: Registration stalled again. It seems to be moving (I didn't do it, I swear),but hung on a later file. Here's the magic
    regtool -updateprocessedimfile -exp_id 845754 -class_id XY50 -set_state pending_burntool -dbname gpc1
    
  • 20:20 HAF: And again.. Is it going to be one of those nights?
    regtool -updateprocessedimfile -exp_id 845763 -class_id XY50 -set_state pending_burntool -dbname gpc1
    

Tuesday : YYYY.MM.DD

  • 06:10 MEH: registration ~320 exposures behind, stdlocal running so stop until nightly is finished
    regtool -updateprocessedimfile -exp_id 846064 -class_id XY15 -set_state pending_burntool -dbname gpc1
    
  • 06:20 MEH: cleanup off?? no note, so back to run
  • 06:50 MEH: registration still catching up. pstamp running for while, doing regular restart as well
  • 14:20 EAM: turned stdlocal on.
  • 15:15 MEH: stdsci regular restart before nightly starts

Wednesday : YYYY.MM.DD

Thursday : 2015.01.01

  • 06:50 MEH: clearing cannot build growth curve to help finish up nightly
    warptool -dbname gpc1 -updateskyfile -fault 0 -set_quality 42 -warp_id 1325779 -skycell_id skycell.2349.067
    warptool -dbname gpc1 -updateskyfile -fault 0 -set_quality 42 -warp_id 1326202 -skycell_id skycell.2241.063
    
  • 06:55 MEH: also diff VectorFitPolynomial?1DOrd fault
    difftool -updatediffskyfile -fault 0 -set_quality 42 -diff_id 629740 -skycell_id skycell.1880.056 -dbname gpc1
    
  • 11:00 MEH: ippmd now focused on stacks -- ~ippmd/deepstack/ptolemy.rc, ~ippmd/stack/ptolemy.rc, if needing to stop since 10+ hr jobs, will need to kill ppStack jobs for ippmd
  • 20:40 MEH: looks like summitcopy could use a regular restart as well as registration
    • ipp090,093,095 not responding well with deep stacks and nightly summitcopy+registration..

Friday : 2015.01.02

  • 05:40 EAM : stdscience is sluggish (only ~100 jobs but should be many more). restarting.
  • 10:05 EAM : stdlocal is sluggish (~200k chip runs). restarting

Saturday : 2015.01.03

  • 20:48 EAM : last night, lightning from the Great Storm took out power in Kihei and cooling at MRTC-B. Haydn & Gavin shut the cluster down around 8pm. Today, they brought the cluster up over the course of the afternoon. At this point, all machines are back up. ippdb02 is behind 104ksec, ippdb06 is 1042ksec behind. I'm starting up stdlocal, but ippc38 is having some trouble: not mounting /export/ippc18.0 (homedir).
  • 22:30 MEH: will attempt to restart MD stacks then -- ~ippmd/deepstack/ptolemy.rc and long running
    • ipps00 required a password to log into once and then okay
    • ipp071 prompt for password but not accepted -- will not use

Sunday : YYYY.MM.DD

  • 06:40 EAM : ipp071 is exporting disk for nfs (looks fine), but since it cannot do logins, nebdiskd thinks it is down. it looks like it had some yp problems on boot; i'm rebooting now. UPDATE: no such luck -- same problem still. i cannot get in since i do not know the root password
  • 06:43 EAM : looks like ippc38 is working again (must have figured out whatever nfs problem it had).
  • 19:20 EAM : everything is currently shutdown ex. stdlocal which has run out of stuff to do (ipp071 offline is a problem). I'm starting up the stdscience pantasks so nightly processing is ready.