PS1 IPP Czar Logs for the week 2016.10.17 - 2016.10.23

(Up to PS1 IPP Czar Logs)

Monday : 2016.10.17

  • MEH: on 10/19 adding crontab to clear diff fault through night -- see Processing#Difffailures
  • 19:30 EAM : stopping & restarting pantasks

Tuesday : 2016.10.18

  • 23:50 MEH: nightly looking fairly sluggish and falling behind ~10 exp/hr in download -- not sure what rebalance was done after P2 and starting of relastro... ipp094 looks to be in a moderate wait cpu state, so putting neb-host repair may help there..

Wednesday : 2016.10.19

  • MEH: restoring nodes ipp067,068,091,092,086,087,088 since P2 ipptopsps finished
  • MEH: cleaning up old runs, ps_ud_% to make space for NCU processing
    • also cleaning up ~full nights worth of uncompressed .multi data again... person that hardcoded that mode needs to fix or keep on top of cleanup of that product

Thursday : 2016.10.20

  • MEH: another segfault for registration pantasks
    [2016-10-20 01:16:39] pantasks_server[7998]: segfault at a5e3d8 ip 0000000000408a4e sp 0000000042fbef20 error 4 in pantasks_server[400000+16000]
  • MEH: adjusting nightly processing during relastro run -- change made in pantasks_hosts.input, future czar will need to reallocate once relastro finished
    • ipp054 - ipp069 high mem use so take out of stdsci (8x s3, 6x s4 group roughly) --
    • replace nodes w/ 3x x0,x1 to start with
  • 18:00 CZW: Started HSC camera processing on stare00-stare04.

Friday : 2016.10.21

  • MEH: high priority NCU updates+reprocessing running from ~ippmops/stdscience on ipps nodes to start -- shouldn't need to stop, but if do so then add note with time+reason
    • adding x2+3 (x0/1b) (access as discussed at meeting) -- x2 (x0/1b) appear to have wait cpu issue for io demand >10MB/s
  • 17:25 CZW: Restarting standard IPP pantasks.

Saturday : 2016.10.22

  • MEH: NCU processing for stacks with ~ippqub/stack, using similar nodes (ipps, upper ippx)

Sunday : 2016.10.23

  • MEH: requeued a WWdiff for MOPS, the nightly science code is deficient doesn't deal with duplicate field/target visit observations properly -- needed o7684g0210o-o7684g0222o (o7684g0209o-o7684g0210o was)