PS1 IPP Czar Logs for the week 2016.08.01 - 2016.08.07

(Up to PS1 IPP Czar Logs)

Monday : 2016.08.01

  • 04:44 MEH: registration pantask crash ~01:30, restarting.. -- catchup rate reached ~80 exp/hr which is good
  • MEH: spontaneous reboot nodes ipp008,012,013,014,016,018 in cab1 still a problem -- ipp013 was in normal stdsci loading and others just light summit+registration -- take all back out of processing, ipp013 takes a long time to reboot so leave in repair, others leave up to data space
  • MEH: nightly processing hosts update -- pantasks_hosts.input -- restoring commented out nodes w/ past problems that should be ok now (guessing mostly since notes are very uninformative)
  • MEH: data targeting summary -- manually modified in builds ~ipp/psconfig/ipp-20141024.lin64/share/pantasks/modules/ipphosts.mhpcc.config and ~ippqub equivalent
    • previous was ipphosts.mhpcc.config.20160730avoidfullv2
    • current is ipphosts.mhpcc.config.20160730avoidfullv2
    • attempted to set current data load ~60% on ipp100-104 -- io on those machines looks to be in~60-80/out~20-40 MB/s during nightly processing (w/o ipptopsps etc running)
    • chips to ipp005-066, skycells (warp+diff) 054-104 (mostly 100-104), raw to 017/028,054-066,074/76/77,100-104 (mostly 100-104)
    • could add skycell set to ipp005-031 from ipp100-104 to help lower ipp10x data load or maybe from ipp067-097. not sure if ipp005-031 can handle with the space available (warps are kept on disk longer than chip+diff and skycells include warp+diff where large number of diffs are uncompressed)
    • likely need more raw to ipp100-104 from ipp054-066 soon, and later when 17/28,74/77 fill up

Tuesday : 2016.08.02

  • 16:16 CZW: Daily restart.

Wednesday : 2016.08.03

  • 08:38 MEH: started pstamp pantasks, looks like a segfault issue like registration has been experiencing
    [2016-08-03 05:05:02] pantasks_server[25722]: segfault at 86c108 ip 0000000000408a4e sp 00000000426daf20 error 4 in pantasks_server[400000+16000]
  • 16:22 CZW: Restarting pantasks. This will switch it back to the standard storage node processing.

Thursday : 2016.08.04

  • HAF 11:00 - stuck warp, retrying it: warptool -revertwarped -fault 4 -label OSS.nightlyscience -dbname gpc1
  • HAF 16:00 restart pantasks - few diffs/ registration were stuck, assumed due to high io on dvo machines
  • 16:40 CZW: Running HSC processing on the stare nodes. They were not listed on the IPP_NodeUse page, so I've claimed them.

Friday : 2016.08.05

  • 17:05 HAF restart pantasks
  • 19:15 MEH: DVO running on nightly nodes, turning off to avoid conflicts --
    • s3 8x stdsci, 1x summit+reg --
    • s4 6x stsci, 1x summit+reg --
    • replace w/ 6x m0+1, 1x x1 ; 1x m0+1; 1x m0+1

Saturday : 2016.08.06

Sunday : 2016.08.07