PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

Continuing HAF suggestion to improve communication, so we know what's going on better -- a list in the czar pages additional (non-standard) processing - so that we all know what's going on.

Daily Czaring:

  • currently there is a modified ops tag running diffs (WS labels only) as ippqub (was ippmops) under ~ippqub/src/stdscience_ws on ippc06 -- if problems (Njobs>100k, power loss on ippc06 etc), it will need to be restarted like a normal nightly processing pantasks (or IF ANY OTHER ISSUES LIKE THE ~ipp pantasks)
    ./ stdscience_ws
    • even with the modified ops tag, there are still various files MIA in nebulous and as such requires the daily czar to check and clear them as has been discussed before
  • ps_ud_QUB has also been moved to the ippqub:stdscience_ws pantasks to support updates possibly broken by missing cmf files, chip and warp updates will also be done in that pantasks as well

(Up to PS1 IPP Czar Logs)

Monday : 2015.11.02

  • 07:40 MEH: clearing 2 warps fault 3 with simple revert --
     -> psDBAlloc (psDB.c:166): Database error originated in the client library
         Failed to connect to database.  Error: Unknown MySQL server host 'scidbm' (1)
     -> warptoolConfig (warptoolConfig.c:499): unknown psLib error
         Can't configure database
     -> main (warptool.c:84): (null)
         failed to configure
  • 08:32 MEH: nightly finished now, doing regular nightly pantasks restart

Tuesday : 2015.11.03

  • 04:40 MEH: ipp087 large load, replication running at high 40 overnight.. setting to stop, ipp087 and nightly processing returning to normal..
  • 07:23 MEH: manually faulted two dependents for a QUB stamp request wanting ThreePi?.nt/2013/06/19 warp data that wasn't possible for PSS to update in the current system with the PV2.cleanup done on camera stage it seems
    pst -stopdependentjob -set_fault 25 -dep_id 5884667
    pst -stopdependentjob -set_fault 25 -dep_id 5884668
  • 07:30 MEH: setting chip back to cleaned as failed to update for MOPS due to missing mdc file and then failed to get detrends when using current recipe and file rules...
    stsci17.2 not available
    stsci13.0 not available
  • 11:57 MEH: doing regular restart of nightly pantasks
  • 17:44 CZW: ipp017 appears to be crashed. Nothing on the console other than that it was rebooted on Oct 14. Power cycling. It has an ERROR 0211: Keyboard error while booting? I hit F-1 to resume boot. It seems to be back online correctly.
  • 18:03 HAF: started diff processing on ipptest account, on ippc12, in pv3diff directory.
  • 20:30 MEH: summitcopy having timeouts and backing up, replication at 40 so set to stop -- catching up now

Wednesday : YYYY.MM.DD

Thursday : YYYY.MM.DD

Friday : YYYY.MM.DD

Saturday : 2015.11.07

  • 23:20 MEH: Richard alerted to nightly processing rate low -- initially looks like nightly pantasks haven't been restarted in a while, doing now

Sunday : YYYY.MM.DD