Version 19 (modified by heather, 6 years ago)

--

Ingest of LAP into PSPS

This wiki describes the ingest of LAP processing (specifically, LAP.ThreePi.20120706) into PSPS. There are several steps (not all completely finished)

Processing of LAP cam / skycal

This is currently ongoing. The first slice is to do between 18hr and 20hr (which contains kepler). This is called slice18, see the description of the slices below. Once all the projections cells within slice18 (or slicexx) are done, we start building the minidvodbs

Queue LAP cam / skycal

Once all the cam/skycals are processed within slice18, they will be queued to be built as a pair of minidvodbs:

These are not like typical minidvodbs in the sense that they will not be restricted to the 500 entries or < 1 day old rules. This may be changed, however depending on how things go.

Some minor tasks:

  • add feature to addtool to queue by area of sky (this is apparently already done)
  • add tweaks to the minidvodb script so that we don't create new minidvodbs every day or 500 entries

Build LAP cam skycal by slices

  • once queued this is straightforward. Some tweaks may include to build the minidvodbs around the cluster (not just ipp004/ipp005)

Merge dvodb into big dvodb

  • once the slices are built, we merge into a big parallel dvodb, and run the resort/averages/relphot commands on the slice in question or around the whole thing?
    setphot -D CATDIR $catdir -update -ubercal -no-metadata ~ipp/ucalqw_noref.fits
    relphot -D CATDIR $catdir -statmode WT_MEAN -max-density 2000 -basic-image-search -update g,r,i,z,y -nloop 16 -boundary-tree ~ipp/RINGS.V3.tree.v2.fits -D CAMERA gpc1
    relastro -D CATDIR $catdir -update-objects -reset -update -D CAMERA gpc1 -D CATDIR $catdir
    

IPPtoPSPS creates batches based on slices

  • this happens once the merge is done for that slice and the relphot/addstar commands are run at a minimum on that slice. To avoid boundary problems, ippToPSPS will be 3 degrees less on all edges (see slices below)
  • the intent is to load neighboring slices. So, after slice18 is loaded, we load slice16, and the ranges within ipptopsps would be:
    • ra: 243 - 297 deg
    • dec: -33 to 75 deg
    • this is a combination of slice16 and slice18 with a 3 degree border, and i propose that we be very conservative with the cap.
  • task: need ipptopsps to handle parallel dvodb

PSPS loads batches

  • nothing to say here, conrad or thomas can fill in details

Slices

These are the following slices, labeled by Heather. The goal is to load connected slices (ie if slice18 is first then load either slice20 or slice 16 next)

slice name load order ra dec ra dec ipptopsps ranges ra ipptopsps ranges dec
slice0 0 to 2hr -30 to 90 deg 0 to 30 deg -30 to 90 deg 3 to 27 deg -33 to 75 deg
slice2 2 to 4hr -30 to 90 deg 30 to 60 deg -30 to 90 deg 33 to 57 deg -33 to 75 deg
slice4 4 to 6hr -30 to 90 deg 60 to 90 deg -30 to 90 deg 63 to 87 deg -33 to 75 deg
slice6 6 to 8hr -30 to 90 deg 90 to 120 deg -30 to 90 deg 93 to 117 deg -33 to 75 deg
slice8 8* 8 to 10hr 20 to 90 deg 120 to 150 deg -30 to 90 deg 123 to 147 deg -33 to 75 deg
slice8.2 7* 8 to 10hr -30 to 20 deg 120 to 150 deg -30 to 90 deg 123 to 147 deg -33 to 75 deg
slice10 10 to 12hr 20 to 90 deg 150 to 180 deg -30 to 90 deg 153 to 177 deg -33 to 75 deg
slice10.2 6* 10 to 12hr -30 to 20 deg 150 to 180 deg -30 to 90 deg 153 to 177 deg -33 to 75 deg
slice12 12 to 14hr 20 to 90 deg 180 to 210 deg -30 to 90 deg 183 to 207 deg -33 to 75 deg
slice12.2 5* 12 to 14hr -30 to 20 deg 180 to 210 deg -30 to 90 deg 183 to 207 deg -33 to 75 deg
slice14 14 to 16hr 20 to 90 deg 210 to 240 deg -30 to 90 deg 213 to 237 deg -33 to 75 deg
slice14.2 4* 14 to 16hr -30 to 20 deg 210 to 240 deg -30 to 90 deg 213 to 237 deg -33 to 75 deg
slice16 16 to 18hr 20 to 90 deg 240 to 270 deg 20 to 90 deg 243 to 267 deg 23 to 75 deg
slice16.2 3* 16 to 18hr -30 to 20 deg 240 to 270 deg -30 to 20 deg 243 to 267 deg -33 to 17 deg
slice18 1 18 to 20hr 20 to 90 deg 270 to 300 deg 20 to 90 deg 273 to 297 deg 23 to 75 deg (this is kepler, and below is the galactic center)
slice18.2 2* 18 to 20hr -30 to 20 deg 270 to 300 deg -30 to 20 deg 273 to 297 deg -33 to 17 deg (worry about the overlap between the 2 slice18s)
slice20 20 to 22hr -30 to 90 deg 300 to 330 deg -30 to 90 deg 303 to 327 deg -33 to 75 deg
slice22 22 to 24hr -30 to 90 deg 330 to 360 deg -30 to 90 deg 333 to 357 deg -33 to 75 deg

* means this is what is currently planned. this can change prior to the actual load. The ones without a * are the slices already loaded (and their order)

The idea is that Heather will load into the dvodb all the exposures (cam stage) with a boresite within the ra/dec range, and all the projection cells within the ra/dec range for a given slice. This means that certainly there will be data outside the edges of the slices within a minidvodb slice. This is ok.

Here is an image of the proposed slice order (with the current skycal progress from 2013-01-15):

First slice

First slice is slice18, but not entirely (we will only queue from 20 up):

The command to queue:

addtool -definebyquery -label LAP.ThreePi.20120706 -set_dvodb LAP.20120706 
-set_minidvodb_group LAP.slice18.cam -set_minidvodb -set_label LAP.slice18.cam 
-stage cam -uncensored -ra_min 270 -ra_max 300 -decl_min 20 -decl_max 90 -dbname gpc1

and then I added the minidvodb:

add.minidvodb    LAP.slice18.cam LAP.slice18.cam LAP.20120706 5 GPC1
add.label LAP.slice18.cam

the rest takes care of itself. I will add skycal probably tomorrow, once I verify that this part works fine.

Here is progress of ingest into addstar (cam stage)

time todo donetodo done
12/05/2012 1:30 pm 7117 1932

I set things outside of kepler to be wait

update addRun left join addProcessedExp using (add_id) join camRun on stage_id = cam_id  join chipRun using (chip_id) join rawExp using (exp_id) set addRun.state = 'wait'  where stage = 'cam' and minidvodb_group = 'LAP.slice18.cam'  and dtime_addstar is null and addRun.state  = 'new' and (decl < 33/180*3.1415 or decl > 58/180*3.1415);

kepler(cam) is done so I set those to process

I have queued the skycal after some modifications to addstar:

 addtool -definebyquery -label LAP.ThreePi.20120706 -set_dvodb LAP.20120706 -set_minidvodb_group LAP.slice18.skycal -set_minidvodb -set_label LAP.slice18.skycal -stage skycal -ra_min 270 -ra_max 300 -decl_min 20 -decl_max 90 -dbname gpc1 -uncensored -simple

That is 33k of skycells. I add that to addstarlap:

add.minidvodb    LAP.slice18.skycal LAP.slice18.skycal LAP.20120706 5 GPC1
add.label LAP.slice18.skycal

Statistics on slice18

Here are the statistics on slice18. I wanted to see how ingesting slice18 compares to ingesting the same area of sky with various older surveys (same area of sky)

The sql query :

mysql> select minidvodb_group, stage,count(*), sum(dtime_addstar)/60/60/24., avg(dtime_addstar) from addProcessedExp join addRun using (add_id) join camRun on cam_id = stage_id join chipRun using (chip_id) join rawExp using (exp_id) where stage = 'cam' and addRun.state = 'full' and ra > 270*3.14/180 and ra < 300*3.14/180 and decl > 20*3.14/180 and decl < 90*3.14/180 group by minidvodb_group;

The result

+-----------------------------+-------+----------+------------------------------+--------------------+
| minidvodb_group             | stage | count(*) | sum(dtime_addstar)/60/60/24. | avg(dtime_addstar) |
+-----------------------------+-------+----------+------------------------------+--------------------+
| (null)                      | cam   |      100 |            0.066631944444444 |              57.57 | 
| CNP.V2                      | cam   |      154 |            0.026168981376621 |    14.681818122988 | 
| LAP.slice18.cam             | cam   |     8951 |              7.0359143898167 |    67.914535055319 | 
| LAP.ThreePi.20110809        | cam   |     5288 |              4.6303240664607 |    75.654311524622 | 
| LAP.ThreePi.20110809.V2.Cam | cam   |     6532 |              4.4052546205013 |    58.269136437739 | 
| ThreePi                     | cam   |     2547 |              1.3785300813919 |    46.762857884672 | 
| ThreePi.V2                  | cam   |      639 |             0.27396990687759 |    37.043818394716 | 
| ThreePi.V3                  | cam   |     6095 |              3.0291319420299 |    42.939622607282 | 
+-----------------------------+-------+----------+------------------------------+--------------------+

So what can we take from this? First of all it thinks it took ~7 days to ingest almost all of LAP.slice18 (there are about 40 pending). It feels longer than that, more like 1.5 weeks. Heather suspects that this is due to only having dtime_addstar, and not dtime_total in the addProcessedExp table - so I think it should be longer,but not sure by how much. However, this affect will be throughout all processing (they are all processed the same way). What we can see is that slice18 so far has taking the longest time, but the average time is similar to LAP (V2 and 20110809). So it's a bit slower than ThreePi?.V3 (I'm not sure why), but not that much slower... One thing that might cause problems is that ipp005 was both creating the dvo db and serving out files for ipptopsps - does this slow it down?

slice 18.2

This is the bottom half of slice 18. To queue the different parts:

addtool -pretend -definebyquery -label LAP.ThreePi.20120706 -set_dvodb LAP.20120706 
-set_minidvodb_group LAP.slice18.part2.cam -set_minidvodb -set_label LAP.slice18.part2.cam 
-stage cam -uncensored -ra_min 270 -ra_max 300 -decl_min -35 -decl_max 20 -dbname gpc1 

warning - I queued way more skycal than I intended for slice18.skycal - if you use decl instead of dec it doesn't complain and it also doesn't respect it. i set that label to drop

This is ok because the 'spiffing' happens on the mergeddvodb, so it will have all parts for the lower half of slice18

addtool -pretend -definebyquery -label LAP.ThreePi.20120706 -set_dvodb LAP.20120706 
-set_minidvodb_group LAP.slice18.part2.skycal -set_minidvodb -set_label LAP.slice18.part2.skycal 
-stage skycal -uncensored -ra_min 270 -ra_max 300 -dec_min -35 -dec_max 20 -dbname gpc1

I am queueing LAP.slice18.part2.skycal (15:00 on 1-15-13). First though is asking pretend how many there are... it's taking more than 3 seconds to do that, so I'm going to go to the colloquium

ok it worked.. had some problems, first: use skycal as stage, not staticsky. second, I had already queued it before, oops. So this time there are only 6000 to process.. that's probably ok

slice 16.2

this is the next slice to be worked on. first step, queue the camera stage, as staticsky is not yet done.

addtool -pretend -definebyquery -label LAP.ThreePi.20120706 -set_dvodb LAP.20120706 
-set_minidvodb_group LAP.slice16.part2.cam -set_minidvodb -set_label LAP.slice16.part2.cam 
-stage cam -uncensored -ra_min 240 -ra_max 270 -decl_min -35 -decl_max 20 -dbname gpc1 

slice 8

queue cam stage for slice 8 (1-18-2013)

addtool -pretend -definebyquery -label LAP.ThreePi.20120706 -set_dvodb LAP.20120706 
-set_minidvodb_group LAP.slice08.cam -set_minidvodb -set_label LAP.slice08.cam 
-stage cam -uncensored -ra_min 120 -ra_max 150 -decl_min 20 -decl_max 75 -dbname gpc1 

slice 8.2

queue cam stage for slice 8.2 (1-18-2013)

addtool -pretend -definebyquery -label LAP.ThreePi.20120706 -set_dvodb LAP.20120706 
-set_minidvodb_group LAP.slice08.part2.cam -set_minidvodb -set_label LAP.slice08.part2.cam 
-stage cam -uncensored -ra_min 120 -ra_max 150 -decl_min -35 -decl_max 20 -dbname gpc1 

Attachments