DVO and Large Database Files

2010.03.21 : I have upgraded the Ohana software to handle large (>232 byte) files. This mostly involved changing the fits I/O structures to use fseeko(), ftello(), and off_t types to define the sizes of files. I extended the table and image structures to use off_t as well for NAXIS1,2, in the fear that someone will want to define a 2G vector or image.

* As part of the large-file testing, I have generated a very large DVO database catalog with 21M measurements of 141k stars (3.2GB). One interesting data point: to run addstar -resort on that file took 85 seconds, of which 37 were used for the actual resort process and the rest was I/O time (on ipp022).

* Extending the above numbers to the expect PS1 3pi survey (100G detections) would imply something like 405k sec (5d) to run resort on the entire sky. This is not completely terrible, but it does suggest that we should upgrade addstar -resort to be able to run in parallel. Since much of the time (56%) is used for I/O, this will only help if the I/O can be done faster. One (easy) strategy would be to stripe the Declination bands across multiple machines and have each band addressed by its host machine. There are 16 DEC bands for the normal 3pi region (-30 : +90), so that could potentially imply a resort time of ~7 hours instead of 5 days.