PS1 IPP Czar Logs for the week 2014-04-07 - 2014-04-13

(Up to PS1 IPP Czar Logs)

Monday : 2014-04-07

Bill is czar today

  • 13:00 restarted stdscience, pstamp, summitcopy, and registration pantasks
  • 06:50 restarted stdscience withnew warp.pro that uses 2 x POLL_LIMIT for warp limit

Tuesday : 2014-04-08

  • 03:30 Bill: we still have a warp backlog, but it may not be growing as quickly as usual. Removed LAP and ps_ud labels from stdscience.
    • definitely better balance between chip and warp than before. warp rate seems to be limited by number of jobs requiring stsci nodes.
  • 05:30 status near end of night
    • 36 incomplete downloads
    • 21 new chip runs
    • 16 new camRuns
    • 19 new warpRuns
  • 22:15 MEH: looks like stdsci crashed or was shutdown ~10 mins ago.. no note, so assuming crashed and restarting now

Wednesday : 2014-04-09

  • 09:35 Bill: turned off chip stage to allow 3pi warps and diffs to finish up
    • 09:38 MOPS needs some updates. Turned chip on but set LAP label to inactive
  • 10:02 nightly warps are done setting lap label back to active
  • 12:10 MEH: MD observations made monday night 4/7 -- adding labels and such back into stdsci for processing
  • 13:09 bill: removed LAP label from stdscience and stack. This afternoon we are going to use the horsepower for SAS.
    • reduced poll limit for sas 20 150 while stdscience finishes MD diffs from Monday.
    • 14:00 poll limit back up
  • 14:04 added MD labels to distribution pantasks
  • 15:06 bill : neb-host ippb02 down :(
  • 16:43 bill : sas warps are done. Will queue stacks and run them now using hosts.stack. Since lap is off and we aren't getting much MD data the stack pantasks shouldn't be doing much.

Thursday : 2014-04-10

mark is czar

  • 06:55 bill: nightly science warps are done. Adding LAP label to stdscience.
  • 07:15 MEH: ipp060.0 has only 75M space available.. -- all systems are low, will be watching some manual cleanup of nightly processing
  • 08:25 MEH: restarting long running pantasks -- summitcopy, stack (LAP removed), distribution, stdsci
  • 08:35 MEH: clearing fault 5 OSS.WS diffim (539861, skycell.2371.010,012,042 -- all stamp issue)
  • 11:15 bill: added 2 x c2 nodes to sas pantasks to help with the stacks
  • 11:30 MEH: RETENTION_TIME for chip,warp,diff now 7->5 days, will clean up any missed dates after normal cleanup runs
  • 15:35 MEH: ippb03 down, set neb-host down like ippb02
  • 15:50 bill sas pantasks is now running staticsky using hosts.staticsky

Friday : 2014-04-11

mark is czar

  • 07:20 MEH: 4/7 MD SSdiff missed due to small 1hr timeframe to run -- queued after nightly finished
  • 14:10 bill added another hosts.staticsky to sas pantasks (which is running full force)

Saturday : 2014-04-12

  • 00:30 MEH: clearing some fault 5 diffim with stamps issue for OSS --
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2113.002 -diff_id 541139  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2113.003 -diff_id 541139  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2114.037 -diff_id 541154  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2041.088 -diff_id 541158  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2040.097 -diff_id 541159  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.2114.025 -diff_id 541159  -fault 0
    
  • 05:23 bill an OSS warp is repeatedly getting memory corruption doing fits compression on output. It is near the edge of the FOV (otas 06 16 and 17) so I dropped it with warptool -updateskyfile -fault 0 -set_quality 42 -warp_id 918462 -skycell_id skycell.2065.054
  • sas has finished with the non-faulting fullForce warps. I will debug the 1104 faulted ones later today.

Sunday : 2014-04-13