PS1 IPP Czar Logs for the week 2013.06.17 - 2017.06.23

Monday : 2016.06.17

  • 16:10 Bill has a pantasks running out of ~bills/m31/test using the compute3 nodes. There are 40 exposures so it should finish in a few hours. If not I will stop it once we start downloading data.

Tuesday : 2016.06.18

mark is czar

  • 07:30 MEH: appears to be a summit fault, trying to clear, 68 incomplete summitcopy to go -- appears to be 503 errors from summitcopy, email sent to ps-camera, ps-obs if is due to camera stuck problem in night summary.
    • summitcopy stop until fixed -- restarting as well as registration, has been a while
    • Sidik is calling summit to try and get person to reboot pixel server t30 -- unknown how long until nightly data will be available
    • he notes o6461g0487o will also be messed up
  • 07:40 MEH: doing needed restart of stdscience while waiting for downloads. doing restarts on other long running pantasks -- distribution,stack,pstamp,publishing
  • 08:30 MEH: archiving pantask logs (relocate to ippc18.1 and bzip) to try and recover space on home disk ippc18.0.. not much recovered, only 80G free (9%)..
  • 11:10 MEH: summitcopy back on, downloading remaining nightly data
  • 13:00 Gene taking ipp034 out of processing for few hours of DVO work
  • 13:40 MEH:
    • o6461g0486o is MIA at summit, asking if recoverable
    • o6461g0487o is a junk image as Sidik noted, failing chip stage of course so setting chip to drop and exposure to ignored
  • 15:00 MEH: many misc files and directories in ~ipp, relocated many misc files to /export/ippc18.1/ipp/all_misc_junk_files_should_clean_when_ready.. more should be done as well or just rm'd by users when done with tests etc.

Wednesday : 2016.06.19

mark is czar

  • 01:30 MEH: summitcopy seems to be a bit more behind than normal, ~100 exp so by dawn likely ~150 still to download.
  • 07:10 MEH: yes, clearly slower with 140 left to download still. turning some wave1 hosts back on to try and speed up..
  • 10:45 MEH: nightly data downloaded and running through (all 3PI), will take <1hr to finish
  • people are looking into the download rate problem, to be resolved for tonight
  • 15:40 Gene testing biases downloading from summit after Craig swapped uplink 1Gb/s port from the 100Mb/s port -- he is seeing ~39s/imfile.
    • MEH watching the exposure counts in time, looked to be ~1 exposure/minute.
  • 20:00 MEH: can no longer monitor processing as czar until 2am

Thursday : 2016.06.20

  • 02:00 MEH: looks like registration has hanging process on ipp050. there have been recent issues like this for ipp040,050,055,057,058 lately, DVO related? taking 2x ipp050 out of stdscience.
  • 05:00 MEH: add ipp053 to the list, and ipp040 is in heavy cpu wait % (ippdvo rsync?), so taking all out of stdscience -- also putting into neb-host repair to help reduce nfs to it, some burntool files trying to be put there.
    • registration backlog now below summitcopy backlog... rebalancing of nightly processing needs to be done with wave3 when doing extra tasks
  • 12:19 Bill: restarted pstamp and update pantasks since they are busy. Added two sets of compute3 to pstamp. Need to remember to undo that before nightly observations start.
  • 14:35 Bill: shut down update pantasks and added the ps_ud labels to stdscience. I will remove them this evening.
  • 18:21 Bill: Restarted stdscience, pstamp, and update pantasks with standard label configuration.
  • 18:26 set pstamp requests with dates prior to June 1 to be cleaned. This will increase load on ippc31 which is where the workdirs are located

Friday : 2016.06.21

Saturday : 2016.06.22

Sunday : 2016.06.23

  • 06:00 MEH: doing a normal regular restart of stdscience to include MD09 nightly diffims with new refstack now
  • 06:40 MEH: setting late-May/early June MD09 to update and make diffims

