PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

Extra/Non-standard Processing

Continuing HAF suggestion to improve communication, so we know what's going on better -- a list in the czar pages additional (non-standard) processing - so that we all know what's going on.

Daily Czaring:

  • currently there is a modified ops tag running diffs (WS labels only) as ippmops under ~ippmops/src/stdscience_ws on ippc29 -- if problems (Njobs>100k, power loss on ippc29 etc), it will need to be restarted like a normal nightly processing pantasks
    ./start_server.pl stdscience_ws
    
    • even with the modified ops tag, there are still various files MIA in nebulous and as such requires the daily czar to check and clear them as has been discussed before

MD processing:

  • ippmd/stdscience running WS diffs w/o writing images -- using ippx065-x100 (hosts_xmd) -- stop as necessary, but always communicate doing so

(Up to PS1 IPP Czar Logs)

Monday : 2015.09.28

  • 09:00 EAM : restart mysql on ippc19 with replication from ippc17 (235k seconds behind)
  • 13:15 MEH: using stdsci 4x host c2 nodes (off in stdsci)
    • cleanup pantasks to stop for few hours to avoid any large disk changes while watching this processes
  • 17:45 MEH: returning c2 nodes back to stdscience (turning back on) for nightly processing -- switching to x065-x100 nodes for MD PV3 WS diffs (see extra processing note above once setup finalized)

Tuesday : 2015.09.29

  • 07:35 MEH: fault 5 diffs to clear, nan input and reference fwhm, VectorFitPolynomial?1DOrd (psMinimizePolyFit.c:633): unknown psLib error
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.1943.064 -diff_id 1206556  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.1864.067 -diff_id 1206566  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.1945.045 -diff_id 1206572  -fault 0
    
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.1944.003 -diff_id 1206544 -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 42 -skycell_id skycell.1945.039 -diff_id 1206583 -fault 0
    
  • 11:10 MEH: clearing old WS diffs -- turning diff.revert.off while doing
    • necessary files incompletely updated w/ faulted 3 warp
    • faulting diff skycells were w/o quality set to 66... setting now
    • 139 skycells lost, 73 exposures finally cleared
  • 11:15 MEH: making modifications to ippmops:stdscience and working on restarting all pantasks
  • 12:15 MEH: removing the addstar/LAP and detrend from servers in ippMonitor as requested at IPP meeting, trying to add stdscience_ws

Wednesday : 2015.09.30

  • 09:00 MEH: new ippqub account setup by Gavin for doing QUB work, in near future will move the WS pantasks out of ippmops here
  • 12:00 MEH: ipp017 down again, power cycled and back up -- if goes down again, leaving down and off..
    • down ~12:50 -- cannot connect to console, stalls @password prompt -- asked Haydn to power off
  • 14:45 MEH: Haydn rebooting ipp096 to be on 10G now

Thursday : 2015-10-01

  • CZW: 15:00 I'm running various diskspace/nebulous checks on stare04. They should not impact processing significantly, but if the nebulous database gets unhappy, feel free to kill them. They should be perl jobs named "measure_stacks.pl".

Friday : 2015.10.02

  • 14:05 MEH: stop on ippmops:stdscience_ws while using the c2 nodes to finish some extra processing last night in a local pantasks -- mostly done, run again

Saturday : 2015.10.03

  • 15:15 EAM: restarted gmond on ipp079

Sunday : YYYY.MM.DD