PS1 IPP Czar Logs for the week 2015.12.28 - 2016.01.03

(Up to PS1 IPP Czar Logs)

Monday : 2015.12.28

  • 06:28 Bill: pstamp server was idle, so set all ps_ud% chip, warp, and diff data to be cleaned. Restarted pstamp server pantasks since it was using cpu time for now apparent reason.
  • 16:45 EAM: restarting pantasks to be ready for the night.
    • everything is up and running, ready for the night.

Tuesday : 2015.12.29

  • 07:00 MEH: QUB processing still running, setting up MOPS diffims using the QUB data -- probably finish by ~1100
  • 09:26 MEH: clearing some OSS/3PI fault 5 WSdiffs
    difftool -dbname gpc1  -updatediffskyfile -fault 0 -set_quality 42 -diff_id 1297397  -skycell_id skycell.2193.048
    difftool -dbname gpc1  -updatediffskyfile -fault 0 -set_quality 42 -diff_id 1297561 -skycell_id skycell.0756.084
    difftool -dbname gpc1  -updatediffskyfile -fault 0 -set_quality 42 -diff_id 1297564 -skycell_id skycell.0840.017
    difftool -dbname gpc1  -updatediffskyfile -fault 0 -set_quality 42 -diff_id 1297601 -skycell_id skycell.0838.020
    
  • 10:16 MEH: looking at some common fault nodes -- ippc54 looks like /export/ippc54.1 isn't mounted and tmp dir not available for processing there
  • 17:22 MEH: restarting all pantasks to clear changes from last night and auto-remove ippc54 from processing until time to get ippc54.1 issue resolved
  • 19:51 MEH: many faults w/ error: mkdir /data/ipp024.0/nebulous: No such file or directory -- set neb-host repair

Wednesday : 2015.12.30

  • 12:25 MEH: test boosting pstamp pantasks to increase the rate QUB stamps are provided from last night
  • 13:34 MEH: data nodes on path to filling up, need to start looking at products to clean up soon -- may also need to redo product targeting to use the older nodes with space as well
  • 14:23 MEH: doing restart of nightly pantasks to return to default configurations now that cleanup is finished

Thursday : 2015.12.31

  • 23:33 EAM: stopping & restarting the pantasks

Friday : 2016.01.01

  • 15:06 MEH: restarting summitcopy+registration to use manual test changes in ipphosts.mhpcc.config for raw data to go to the 40T nodes w/ space to help balance space usage since monitoring processing again tonight
  • 19:48 MEH: noticing ipp046 higher load.. seems to have only 1 cpu from repair on 12/23 to get back online -- since this is a datanode and needs to be online more than needs to be processing, it needed to be removed (or at least reduced) in loaded use in pantasks (ippconfig/pantasks_hosts.input)...

Saturday : 2016.01.02

  • 11:53 MEH: restarting nightly pantasks to restore normal operations
  • 12:00 MEH: manually running WWdiff for MOPS and boosting pstamp for QUB -- w/ cleanup, something overloading systems ~1400.. turning down WWdiff

Sunday : 2016.01.03

  • 20:15 EAM : stopping and restarting pantasks