PS1 IPP Czar Logs for the week 2015.07.13 - 2015.07.20

Monday : 2015.07.13

  • 20:00 MEH: no nightly due to humidity, starting ipp-20141024 baseline MOPS test chunk for new ops tag comparison in stdsci -- OSS.20140603.mehtest141024

Tuesday : YYYY.MM.DD

  • 08:53 MEH: nightly looks to finally be done, queuing up the baseline diffims for MOPS test chunk -- OSS.20140603.mehtest141024
  • 13:30 MEH: using stdscience nodes for MOPS test chunk
  • 20:40 EAM: stopping ipp pantaskses to restart.

Wednesday : 2015.07.15

  • 10:25 EAM: running rsync of ipp051.0 data from stsci nodes to /data/ipp080.0/ipp051.0 -- this should take 24 hours, but only impacts ipp080 significantly
  • 14:20 CZW: staticsky update processing seems a bit sluggish. It's crossed the 100k boundary, so I'm going to stop and restart the pantasks server.

Thursday : YYYY.MM.DD

  • 9:17 CZW: power cycle ipp078, which had become unresponsive.
  • 10:16 CZW: moved staticsky updates still undone with galactic latitude less than 20 and ra < 300 to .bulge label. This should allow things that don't need all the memory to complete.
  • 15:30 MEH: restarting MOPS test chunk in ippmops using the stdscience nodes
  • 16:15 HAF: restart of pantasks

Friday : 2015.07.17

Saturday : 2015.07.18

  • 20:50 EAM: stopping and restarting all ipp pantasks for night-time efficiency
  • 21:00 EAM: the following disttool command is taking a very long time to complete (>30 min). this is probably blocking some of the gpc1 db.
    disttool -toadvance -dbname gpc1 -limit 200 -label ThreePi.WS.nightlyscience -label OSS.WS.nightlyscience -label SNIa.nightlyscience -label SNIa.WS.nightlyscience
  • 21:15 EAM: nightly science pantasks are all running. it was again challenging to get distribution running. the pantasks server started OK, but it was not responding to client requests. My best guess that, although the server was no longer listed in the process list, the kernel still maintained something to keep a connection with the (still running) disttool command above. as a result, there was a socket connection hanging around which the client would connect to, but not be able to do any real interactions. things were fine once I killed off the disttool above. This probably explains the failure to restart distribution the other night as well.

Sunday : YYYY.MM.DD

  • HAF 23:00 : gpc1 is sloooow, processing is stalling... not sure how to fix. Emailed ippdev, txted Gene. Stopped addstars (not sure how helpful that will be, they are the same as since Tuesday).