PS1 IPP Czar Logs for the week YYYY.MM.DD - YYYY.MM.DD

(Up to PS1 IPP Czar Logs)

Monday : 2014.06.16

mark is czar

  • 09:10 MEH: dupe warp_id for couple exposures confusing OSS.WS diffims -- Bill notes that timeout in the faketool errors in the stdsci logs probable cause
  • 09:15 Serge also reports have messed up the required WW diffims for MOPS (v2-v3 and not the v1-v2,v3-v4) -- will need to requeue manually
  • 11:00 MEH: requeued diffims for stdsci processing after turning off all s0,s1 being used for other processing by Gene
    | pub_id | client_id | stage_id | label              | state |
    +--------+-----------+----------+--------------------+-------+
    | 897818 |         5 |   566155 | OSS.nightlyscience | full  | 
    | 897819 |         5 |   566156 | OSS.nightlyscience | full  | 
    | 897820 |         5 |   566157 | OSS.nightlyscience | full  | 
    | 897821 |         5 |   566158 | OSS.nightlyscience | full  | 
    
  • 12:00 MEH: regular restart of pantasks for the week
  • 12:25 MEH: ippc02 /tmp had ~6.5G available, flushed nebulous_server.log and restarted apache. others have ~13G
  • 12:40 MEH: ippc18 homedir needs cleanup ~18G available -- start archive of pantasks logs
  • 17:45 CZW: Promoted images in state compressed to state goto_lossy and enabled lossy compression in cleanup in attempt to recover disk space.
  • 20:10 MEH: system overloaded.. 4x4theaded loading of stacks on systems w/ 8 cores and needed for summitcopy+registration+stdscience. registration taking >1ks, stopping to clear. stdsci stop until stacks clear
  • 20:30 back on

Tuesday : YYYY.MM.DD

  • 16:11 HAF - I stopped the slave on ipp001 to ingest a copy of gpc1 (this is for multiple purposes, to have a spare gpc1 database to play with for forcedwarp addstar, and to recover from my own stupidity involving updating a minidvodb table yesterday...)

Wednesday : 2014.06.18

  • 10:20 MEH: starting MOPS updates for diffim tests using stdsci -- finished
  • 11:30 MEH: using compute3 nodes (group c2?) for local processing of MOPS diffims since want very low s/n cut and suspect it will clobber memory.. -- only 1/3 pass, changes related to stats code between ipp-20130712 and ipp-20140506 are causing massive number of fault 5.. -- finished (for now)
  • 14:45 MEH: publishing for MOPS test may use quite a bit of memory, will keep eye on since running on s0,1 nodes.. no good, if get 2-3 (4x possible on one will be bad.. moving to compute3 nodes later in afternoon).
  • 15:45 CZW: restarting pantasks servers/processing after Haydn rebooted ipp033 to try to get the memory recognized.
  • 16:30 MEH: attempting to publish MOPS test diffims using compute3
  • 17:00 CZW: pushing poll up to 120 in cleanup. The lossycomp jobs are fast, and so pantasks spends a lot of time idle.
  • 21:00 MEH: continue to use compute3 nodes for publishing..

Thursday : 2014.06.19

  • 10:30 MEH: will need to use the compute3 nodes to redo test for MOPS
  • 13:00 MEH: finished using compute3 for now

Friday : 2014.06.20

Gene is czar

  • 10:15 MEH: at the request of the czar, cleared fault 5 diffims
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.0771.031 -diff_id 568346  -fault 0
    difftool -dbname gpc1 -updatediffskyfile -set_quality 14006 -skycell_id skycell.0772.034 -diff_id 568362  -fault 0
    

Saturday : 2014.06.21

  • 06:00 EAM: needed to clear a burntool db error (check_burntool --> pending_burntool)
  • 10:30 EAM: ipp052 crashed: wiki:ipp052.crash.20140621, rebooting

Sunday : YYYY.MM.DD