PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

Continuing HAF suggestion to improve communication, so we know what's going on better -- a list in the czar pages additional (non-standard) processing - so that we all know what's going on.

Daily Czaring:

  • currently there is a modified ops tag running diffs (WS labels only) as ippqub (was ippmops) under ~ippqub/src/stdscience_ws on ippc06 (was ippc29) -- if problems (Njobs>100k, power loss on ippc06 etc), it will need to be restarted like a normal nightly processing pantasks
    ./ stdscience_ws
    • even with the modified ops tag, there are still various files MIA in nebulous and as such requires the daily czar to check and clear them as has been discussed before
  • ps_ud_QUB has also been moved to the ippqub:stdscience_ws pantasks to support updates possibly broken by missing cmf files, chip and warp updates will also be done in that pantasks as well

(Up to PS1 IPP Czar Logs)

Monday : 2015-10-26

  • 11:30 CZW restarting ipp/pantasks.

Tuesday : YYYY.MM.DD

  • 23:00 CZW: One of the nebulous apache servers has a full /tmp/ partition. I don't remember if this is something that an apache restart clears, so

I've pulled that host (ippc09) from the valid list in ~ipp/.tcshrc, and am now restarting the pantasks servers to ensure that change is respected everywhere. I'm going to stop the replication and cleanup pantasks for the night, as those do nebulous interactions that aren't entirely necessary right now to try and get things caught up. After the restart, the register pantasks no longer has the failure rate it had prior, so I think things will get sorted out.

Wednesday : YYYY.MM.DD

Thursday : YYYY.MM.DD

Friday : 2015.10.30

  • 06:30 MEH: increasing the number of jobs doing file scans for missing MD.PV3 products, will raise the load on the ippc0x machines some
  • 20:05 EAM: stopping ipp pantasks for restart.

Saturday : YYYY.MM.DD

Sunday : YYYY.MM.DD