(Up to PS1 IPP Czar Logs)

Monday : 2012.07.02

Mark is czar

  • 07:20 no data last night, PSS is still working on some large MPIA jobs.
  • 10:10 restarted update+pstamp to see if it will clear the old/stalled(?) MPIA (req_id=184540) job -- has'nt cleared by 18:30, will look into in more detail in a bit
  • 18:20 commented out MD10 for WS and SSdiffims in stdscience/input until new restack is ready
  • 23:00 for PSS manually triggered 8 dep_id to fault=25 three times to fault, seems to be a request for warp updates of a date_group testing.g.1771.20110503 (request also dropped past Feb.) and finally cleared MPIA request from yesterday.

Tuesday : 2012.07.03

Mark czar

  • 07:30 no data last night, PSS active
  • 10:00 Serge found pantasks_client running amok on ipp009 doing a query for publishing for czartool. killed. also looks like czarpoll.pl running 20-30 %CPU, is that normal?
  • 13:00 ippMonitor@manoa had lost the ps1sc user from ippadmin DB when rebuild. it is back now.
  • 15:20 Postage stamp server has completed outstanding jobs. Bill set chip, warp, and diff data with labels like ps_ud% to be cleaned

Wednesday : 2012-07-04

Serge is czar

  • No data last night (rotator)

Thursday : 2012-07-05

Serge is czar

  • Still no data because of rotator concerns.
  • 14:15 EAM : stopped all systems (ex addstar & deepstack), switched to ipp-20120626, restarted all systems
  • 16:00 Mark starting MD10.refstack.20120705 chip->warp processing. added stsci group 2x to deepstack (with compute3 total 55 nodes) and restarting deepstack to run with new tag as well.
  • 16:30 Mark: stsci node problems
    • of course, forgot something because nodes don't leave the DOWN state in deepstacks..
    • when logging into the systems get "@: Expression Syntax." -- looks like in the nebserver defn in .tcshrc -- splits on stsci name strangely with wild wild s/c0// and s/c//, adding s/stsci// at start of search list seems to work
    • /local/ipp needs tmp and gpc1 copied -- done
  • 17:30 turning compute3 off in deepstack to run test stacks on just stsci nodes
  • 20:00 Mark: looks like ipp008 has been down for ~20min.. gave it 30min, it is faulting/stalling processing so doing power cycle. console messages found at Ipp008-crash-20120705

Friday : 2012-07-06

  • Yesterday when we restarted the label STS.rerun.20120703 got dropped from distribution. Added it back in and removed all stages from DIST_STAGE list except for warp.

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD