Data Transfer Summaries and Experiences -- Release 2009.07.07

(Up to PS1 Science Processing Status (20090705)) (Up to IPP for PS1)

  • CfA/Harvard: using IPP data distribution protocol with a local python client on Odyssey, 1.4TB. in about 1 day. The limiting factor has been issues with the receiving disk.
  • JHU: using the IPP data distribution client and 6-7 controllers, obtained ~750GB in 24 hours (~2.5 days to complete 2TB of data). Typical total network thoughput to system was ~6-7MB/s, peaking often to the maximum of ~12.7MB/s expected for throughput on the normal JHU building network. This is similar to what we get with transfers from Harvard/Odyssey and currently meets the needs at JHU.
  • Garching: 2TB in less than 2 days using the IPP data distribution client.
  • Edinburgh: awaiting delivery of 20TB of disk.
  • Durham: 290 GB in about 1.5 days, or 2Mbytes/sec, using a recursive wget (no IPP client, no python script).

WARNINGS from Bill 2009 July 14

  • Some runs have been inadvertently posted to the data store multiple times. Some errors at import time are to be expected.
  • The receive clien'ts script receive_file.pl isn't very careful with regard to cleaning up temporary files. Some people have had problems running out of space.
  • There were some errors in the code with regard to reporting errors related to filesets. Update ippScripts/scripts/receive_fileset.pl
  • There was a bug in the workflow for filesets that could cause errors to be reported. Fixes are in ippTools/share/receivetool_pendingfilest.sql and ippTools/share/receivetool_pendingfile.sql