PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : 2012.02.27

  • 08:30 Mark: stdscience struggling, restarting. some misc camera runs @20ks

Tuesday : 2012.02.28

  • 15:35 CZW: ipp039 was unresponsive, and causing large numbers of faults. Cycled the power via the console (no messages).
  • 16:09 CZW: ipp009 refused to communicate with anyone as well, although this time there were messages to be saved. I'm cycling the power on it now.
  • 16:25 CZW: I think everything is back online now and working ok.

Wednesday : 2012.02.29

  • 06:13 Bill repaired 4 XY26 rawImfiles which had a bad copy and replicated one that only had a single instance on ippb02 that I found accidentally by entering the wrong exp_id.
  • 06:20 Bill killed a couple of processes that were stuck for thousands of seconds. It looks like there may be something wrong with the psphot threading. In psphotGuessModels all threads were sleeping waiting for something to happen and not doing any work. One was a ppSub one was a ppImage. After killing and reverting the jobs completed.
  • 06:30 There are many incorrect entries in the stdscience books. Pantasks has lost track of several jobs. According to the database they are full but they still have entries in the books. Time to restart stdscience.
  • 06:37 Set all data with the ps_ud labels to be cleaned.
  • 09:15 Serge: mysqldump of ippdb03 has repeatedly failed. It seems that two dumps were running at the same time, hence a race condition between the table locks. I killed both running instances. We'll see if we get an error for the noonish dump. I also deleted old databases on ippdb03: nebulous (?, incomplete, nov. 2011), gpc1_<date in 2011> as well as my new raw image monitoring stuff.

Thursday : 2012.03.01

Friday : 2012.03.02

  • 08:20 Mark pausing stare download to start nightly science using (as ipp in gpc1 on ippdb01)
    update pzDownloadExp set state ='wait' where state = 'run' and exp_name like '%a';
    
  • 09:42 Bill is restarting stdscience. pcontrol is spinning.
  • 10:00 Restarted registration. The old errors had fouled up the job counts so status was noisy.
  • 10:15 We've been getting random nebulous lookup failures and nagios messages about free space in / on the c nodes. Bill asked gavin to roll the apache log files.
  • 15:20 Mark: nightly science finished downloading, flipping stare data back to download
  • 15:50 also using tweak_ssdiff in stdscience so MD SSdiffs run @1600

Saturday : 2012.03.03

Sunday : 2012.03.04