Changeset 40130

Show
Ignore:
Timestamp:
09/09/17 15:20:24 (5 months ago)
Author:
eugene
Message:

incorporate comments from Chris W

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • trunk/doc/release.2015/ps1.datasystem/datasystem.tex

    r40071 r40130  
    9393\label{sec:intro} 
    9494 
     95\note{missing figures: analysis elements, DVO schema} 
     96 
    9597The 1.8m Pan-STARRS\,1 telescope is located on the summit of Haleakala 
    9698on the Hawaiian island of Maui.  The wide-field optical design of the 
     
    104106The \PSONE\ camera \citep{2009amos.confE..40T}, known as GPC1, consists of a 
    105107mosaic of 60 back-illuminated CCDs manufactured by Lincoln Laboratory. 
    106 The CCDs each consist of an $8\times8$ grid of $\sim 600\times 600$ 
    107 pixel readout regions, yielding an effective $4800\times4800$ 
     108The CCDs each consist of an $8\times8$ grid of $590 \times 598$ 
     109pixel readout regions, yielding an effective $4846 \times 4868$ 
    108110detector.  Initial performance assessments are presented in 
    109111\cite{2008SPIE.7014E..0DO}.  Routine observations are conducted remotely from the 
    110112Advanced Technology Research Center in Kula, the main facility of the 
    111 University of Hawaii's Institute for Astronomy operations on Maui. 
     113University of Hawaii's Institute for Astronomy (IfA) operations on Maui. 
    112114The Pan-STARRS1 filters and photometric system have already been 
    113115described in detail in \cite{2012ApJ...750...99T}. 
     
    167169%Pan-STARRS Pixel Analysis : Source Detection  
    168170\citet[][Paper IV]{magnier2017.analysis} 
    169 describes the details of the source detection and photometry, including point-spread-function and extended source fitting models, and the techniques for ``forced" photometry measurements.  
     171describes the details of the source detection and photometry, including point-spread-function and extended source fitting models, and the techniques for ``forced'' photometry measurements.  
    170172 
    171173%Magnier et al. 2017 (Paper V)  
     
    202204reducing data from other cameras and telescopes. 
    203205 
    204 \note{overview discussion of Pan-STARRS: the telescope, survey time 
    205   period, surveys.  2 paragraphs.} 
    206  
    207 The Pan-STARRS Image Processing Pipeline consists of a suite of 
    208 software programs and data systems that are designed to reduce 
    209 astronomical images, with a focus on parallelization necessary to 
    210 speed the processing of the large images produced by the GPC1 camera. 
    211 Part of this parallelization is derived from the fact that this camera 
    212 consists of 60 independent orthogonal transfer array (OTA) devices, 
    213 and can therefore be processed simultaneously.  Although there are 
    214 multiple stages that operate on an entire exposure at once, the 
    215 majority of stages operate only on smaller segments of a full exposure 
    216 to allow the processing tasks to be spread over the machines in the 
    217 processing cluster. 
    218  
    219  
    220 \note{fix this summary once outline is solidified} 
    221  
    222 This paper presents a description of the IPP data handling system. 
    223 Section \ref{sec:subsystems} describes the major IPP subsystems that 
    224 underlie the main pipeline, providing a set of common interfaces and 
    225 tools used at multiple stages.  The main processing stages of the 
    226 pipeline are described in Section \ref{sec:stages}, although all 
    227 exposures may not necessarily pass through each of these stages.  The 
    228 hardware systems that have done the processing for the PV3 data 
    229 release are listed in Section \ref{sec:hardware}, with some details 
    230 on the scale of computing needed to reduce this large number of 
    231 exposures.  Finally, Section \ref{sec:discussion} presents a 
    232 discussion of some of the lessons learned in the creation of the IPP, 
    233 and its utility in reducing data from other cameras and telescopes. 
    234  
    235206{\color{red} {\em Note: These papers are being placed on arXiv.org to 
    236207    provide crucial support information at the time of the public 
     
    244215\label{sec:overview} 
    245216 
    246 The Pan-STARRS Data Analysis system consists of many elements to 
    247 support the wide range of activities: archiving and management of the 
     217\subsection{Elements of the Pan-STARRS Data Processing System} 
     218 
     219The Pan-STARRS data analysis system consists of many elements to 
     220support a wide range of activities: archiving and management of the 
    248221raw and processed image files; real-time nightly processing of images 
    249222for transient and moving object science; large-scale re-processing and 
    250223calibration to produce measurements for the science collaboration and 
    251 the wider public; specialized image processing tasks to facilitate 
    252 research and development of the analysis system itself; distribution 
    253 of the resulting data products to various consumers in a variety of 
    254 formats and modes. 
    255  
    256 The Pan-STARRS Data Analysis system is divided internally into several major 
     224the wider public; specialized image processing to facilitate research 
     225and development of the analysis system itself; and distribution of the 
     226resulting data products to various consumers in a variety of formats 
     227and modes. 
     228 
     229The Pan-STARRS data analysis system is divided internally into several major 
    257230components: 
    258231\begin{itemize} 
     
    260233  data analysis tasks needed to support the on-going observations. 
    261234  In this article, we focus only on those aspects used by the off-summit 
    262   analysis stages.  \note{is summit processing discussed anywhere?} 
     235  analysis stages. 
    263236\item Image Processing Pipeline (IPP) : this portion of the data 
    264237  analysis system takes the data from raw pixels on the summit 
     
    295268the summit systems are described by \note{REF?}. 
    296269 
     270\begin{figure*}[htbp] 
     271  \begin{center} 
     272 \includegraphics[width=\hsize,clip]{PS1_Data_Analysis_System_Overview.pdf} 
     273  \caption{\label{fig:analysis.elements} Elements of the Pan-STARRS\,1 
     274    Data Analysis System.  Rectangles represent data analysis steps; 
     275    ellipses represent databases; rounded rectangles represent 
     276    external groups (``customers'').  The arrows show a simplified representation 
     277  of the major flow of data between the analysis stages and data 
     278  processing elements.} 
     279  \end{center} 
     280\end{figure*} 
     281 
     282\subsection{Nightly Processing Analysis Stages} 
     283 
    297284Data analysis to support nightly science operations is driven by two 
    298285main goals: 1) rapid detection of the moving and transient sources to 
     
    309296(\IPPstage{warp}).  Warped images may either be added together 
    310297(\IPPstage{stack}) or used in an image subtraction (\IPPstage{diff}). 
    311 For nightly science operations, images for certain fields such as the 
    312 Medium Deep survey fields \citep[see][]{MDref}, are stacked together 
    313 in nightly chunks, providing deeper detection capability on 1-day 
    314 timescales.  Depending on the survey mode, difference images are 
    315 generated for the nightly stack images (vs a deep stack template) or 
    316 for individual warp images.  In the later case, the warp images may be 
    317 difference against another warp from the same night or against a 
     298As part of nightly science processing, images for certain fields such 
     299as the Medium Deep survey fields \citep[see][]{MDref}, are stacked 
     300together in nightly chunks, providing deeper detection capability on 
     3011-day timescales.  Depending on the survey mode, difference images are 
     302generated for the nightly stack images (using a deep stack template) 
     303or for individual warp images.  In the later case, the warp images may 
     304be differenced against another warp from the same night or against a 
    318305reference stack from the appropriate part of the sky. 
    319306 
     307\subsection{Re-processing Analysis Stages} 
     308 
    320309Pan-STARRS has performed several large-scale reprocessings of both the 
    321 Medium Deep and 3pi Survey data for internal consumption.  For the 3pi 
    322 Survey data, we identify these large-scale reprocessings as PV1, PV2, 
    323 and PV3, with PV3 the analysis used for the first public data release, 
    324 DR1.  We also refer to the nightly science analysis of the data as 
    325 PV0.  For these reprocessing stages, the standard steps of chip 
    326 through warp, plus stack and diff are performed, starting from raw 
    327 data, usually using a single homogenous version of the data analysis 
    328 procedures.  PV2 was a special case in which we started from the 
    329 camera level products of PV1 to speed up the turn-around to the 
    330 community.  In addition to the analysis stages listed above which are 
    331 shared with the nightly processing, these large-scale reprocessing 
    332 analyses include additional processing.  A more detailed photometric 
    333 analysis is performed on the stacks, including morphological analysis 
    334 appropriate to galaxies.  The results of the stack photometry analysis 
    335 are used to drive a forced-photometry analysis of the warp images. 
    336 The data products from the camera, stack photometry, and forced-warp 
    337 photometry analysis stages are ingested into the internal calibration 
    338 database (DVO, the Desktop Virtual Observatory) and used for 
    339 photometric and astrometric calibrations. 
     310Medium Deep and $3\pi$ Survey data for internal consumption.  For the 
     311$3\pi$ Survey data, we identify these large-scale reprocessings as 
     312PV1, PV2, and PV3, with PV3 the analysis used for the first public 
     313data release, DR1.  We also refer to the nightly science analysis of 
     314the data as PV0.  For these reprocessing stages, the standard steps of 
     315\ippstage{chip} through \ippstage{warp}, plus \ippstage{stack} and 
     316\ippstage{diff} are performed, starting from raw data, usually using a 
     317single homogenous version of the data analysis procedures.  PV2 was a 
     318special case in which we started from the camera level products of PV1 
     319to speed up the turn-around to the community.  In addition to the 
     320analysis stages listed above which are shared with the nightly 
     321processing, these large-scale reprocessing analyses include additional 
     322processing.  A more detailed photometric analysis is performed on the 
     323stacks, including morphological analysis appropriate to galaxies.  The 
     324results of the stack photometry analysis are used to drive a 
     325forced-photometry analysis of the warp images.  The data products from 
     326the camera, stack photometry, and forced-warp photometry analysis 
     327stages are ingested into the internal calibration database (DVO, the 
     328Desktop Virtual Observatory) and used for photometric and astrometric 
     329calibrations (see Section~\ref{sec:DVO}). 
    340330 
    341331\subsection{Data Access and Distribution} 
     
    371361{\bf Stage} & {\bf Primary Table} & {\bf Secondary Table(s)} & {\bf Key} & {\bf Notes} \\ 
    372362\hline 
    373   \ippstage{addstar}      & \ippdbtable{addRun}       & \ippdbtable{addProcessedExp}     & \ippdbcolumn{add_id} & \\ 
     363  \ippstage{summitcopy}   & \ippdbtable{pzDataStore}  &                                  & & Lists locations to check for new exposures.\\ 
     364                          & \ippdbtable{summitExp}    & \ippdbtable{summitImfile}        & \ippdbcolumn{summit_id} & Exposures available at the telescope.\\ 
     365                          & \ippdbtable{pzDownloadExp}& \ippdbtable{pzDownloadImfile}    & & Exposures that are being downloaded.\\ 
     366                          & \ippdbtable{newExp}       & \ippdbtable{newImfile}           & \ippdbcolumn{exp_id} & Exposures that have been saved to IPP cluster.\\ 
     367 
     368  \ippstage{registration} & \ippdbtable{rawExp}       & \ippdbtable{rawImfile}           & \ippdbcolumn{exp_id} & \\ 
     369  \ippstage{chip}         & \ippdbtable{chipRun}      & \ippdbtable{chipProcessedImfile} & \ippdbcolumn{chip_id} & \\ 
    374370  \ippstage{camera}       & \ippdbtable{camRun}       & \ippdbtable{camProcessedExp}     & \ippdbcolumn{cam_id} & \\ 
    375   \ippstage{chip}         & \ippdbtable{chipRun}      & \ippdbtable{chipProcessedImfile} & \ippdbcolumn{chip_id} & \\ 
     371  \ippstage{fake}         & \ippdbtable{fakeRun}      & \ippdbtable{fakeProcessedImfile} & \ippdbcolumn{fake_id} & \\ 
     372  \ippstage{warp}         & \ippdbtable{warpRun}      & \ippdbtable{warpImfile}          & \ippdbcolumn{warp_id} & \\ 
     373                          &                           & \ippdbtable{warpSkyCellMap}      & & Mapping of input chips to projection skycells.\\ 
     374                          &                           & \ippdbtable{warpSkyfile}         & & \\ 
     375  \ippstage{stack}        & \ippdbtable{stackRun}     & \ippdbtable{stackInputSkyfile}   & \ippdbcolumn{stack_id} & \\ 
     376                          &                           & \ippdbtable{stackSumSkyfile}     & & \\ 
     377  \ippstage{staticsky}    & \ippdbtable{staticskyRun} & \ippdbtable{staticskyInput}      & \ippdbcolumn{sky_id} & \\ 
     378                          &                           & \ippdbtable{staticskyResult}     & & \\ 
     379  \ippstage{skycal}       & \ippdbtable{skycalRun}    & \ippdbtable{skycalResult}        & \ippdbcolumn{skycal_id} & \\ 
     380  \ippstage{fullforce}    & \ippdbtable{fullForceRun} & \ippdbtable{fullForceInput}      & \ippdbcolumn{ff_id} & \\ 
     381                          &                           & \ippdbtable{fullForceResult}     & & \\ 
     382                          &                           & \ippdbtable{fullForceSummary}    & & Properties about average parameters from all results.\\ 
     383  \ippstage{diff}         & \ippdbtable{diffRun}      & \ippdbtable{diffSkyfile}         & \ippdbcolumn{diff_id} & \\ 
     384                          &                           & \ippdbtable{diffInputSkyfile}    & & \\ 
    376385  \ippstage{detrend}      & \ippdbtable{detRun}       & \ippdbtable{detRunSummary}       & \ippdbcolumn{det_id} & \\ 
    377386                          &                           & \ippdbtable{detInputExp}         & & \\ 
     
    381390                          & \ippdbtable{detResidExp}  & \ippdbtable{detResidImfile}      & & \\ 
    382391                          & \ippdbtable{detNormalizedExp} & \ippdbtable{detNormalizedImfile} & & \\ 
    383   \ippstage{diff}         & \ippdbtable{diffRun}      & \ippdbtable{diffSkyfile}         & \ippdbcolumn{diff_id} & \\ 
    384                           &                           & \ippdbtable{diffInputSkyfile}    & & \\ 
     392  \ippstage{addstar}      & \ippdbtable{addRun}       & \ippdbtable{addProcessedExp}     & \ippdbcolumn{add_id} & \\ 
    385393  \ippstage{distribution} & \ippdbtable{distRun}      & \ippdbtable{distComponent}       & \ippdbcolumn{dist_id} & \\ 
    386394                          &                           & \ippdbtable{distTarget}          & & \\ 
    387   \ippstage{fake}         & \ippdbtable{fakeRun}      & \ippdbtable{fakeProcessedImfile} & \ippdbcolumn{fake_id} & \\ 
    388   \ippstage{fullforce}    & \ippdbtable{fullForceRun} & \ippdbtable{fullForceInput}      & \ippdbcolumn{ff_id} & \\ 
    389                           &                           & \ippdbtable{fullForceResult}     & & \\ 
    390                           &                           & \ippdbtable{fullForceSummary}    & & Properties about average parameters from all results.\\ 
     395  \ippstage{publish}      & \ippdbtable{publishRun}   & \ippdbtable{publishDone}         & \ippdbcolumn{pub_id} & \\ 
     396                          &                           & \ippdbtable{publishClient}       & & \\ 
    391397  \ippstage{lap}          & \ippdbtable{lapSequence}  & \ippdbtable{lapRun}              & \ippdbcolumn{seq_id} & Sequence of full reprocessing\\ 
    392398                          & \ippdbtable{lapRun}       & \ippdbtable{lapExp}              & \ippdbcolumn{lap_id} & \\ 
    393   \ippstage{publish}      & \ippdbtable{publishRun}   & \ippdbtable{publishDone}         & \ippdbcolumn{pub_id} & \\ 
    394                           &                           & \ippdbtable{publishClient}       & & \\ 
    395   \ippstage{summitcopy}   & \ippdbtable{pzDataStore}  &                                  & & Lists locations to check for new exposures.\\ 
    396                           & \ippdbtable{summitExp}    & \ippdbtable{summitImfile}        & \ippdbcolumn{summit_id} & Exposures available at the telescope.\\ 
    397                           & \ippdbtable{pzDownloadExp}& \ippdbtable{pzDownloadImfile}    & & Exposures that are being downloaded.\\ 
    398                           & \ippdbtable{newExp}       & \ippdbtable{newImfile}           & \ippdbcolumn{exp_id} & Exposures that have been saved to IPP cluster.\\ 
    399  
    400   \ippstage{registration} & \ippdbtable{rawExp}       & \ippdbtable{rawImfile}           & \ippdbcolumn{exp_id} & \\ 
    401399  \ippstage{remote}       & \ippdbtable{remoteRun}    & \ippdbtable{remoteComponent}     & \ippdbcolumn{remote_id} & \\ 
    402   \ippstage{skycal}       & \ippdbtable{skycalRun}    & \ippdbtable{skycalResult}        & \ippdbcolumn{skycal_id} & \\ 
    403   \ippstage{stack}        & \ippdbtable{stackRun}     & \ippdbtable{stackInputSkyfile}   & \ippdbcolumn{stack_id} & \\ 
    404                           &                           & \ippdbtable{stackSumSkyfile}     & & \\ 
    405   \ippstage{staticsky}    & \ippdbtable{staticskyRun} & \ippdbtable{staticskyInput}      & \ippdbcolumn{sky_id} & \\ 
    406                           &                           & \ippdbtable{staticskyResult}     & & \\ 
    407   \ippstage{warp}         & \ippdbtable{warpRun}      & \ippdbtable{warpImfile}          & \ippdbcolumn{warp_id} & \\ 
    408                           &                           & \ippdbtable{warpSkyCellMap}      & & Mapping of input chips to projection skycells.\\ 
    409                           &                           & \ippdbtable{warpSkyfile}         & & \\ 
    410400\hline 
    411401\end{tabular} 
     
    424414successive processing stages to begin their own tasks. 
    425415 
    426 The processing database is colloquially referred to as the `gpc1' 
     416The processing database is colloquially referred to as the ``gpc1'' 
    427417database, since a single instance of the database is used to track the 
    428418processing of images and data products related to the PS1 GPC1 camera. 
    429419This same database engine also has instances (same schema, different 
    430420data) for other cameras processed by the IPP, e.g., GPC2, the test 
    431 cameras TC1, TC3, and the Imaging Sky Probe (ISP). 
     421cameras TC1, TC3, and the Imaging Sky Probe (ISP).  In general, 
     422processing information for different cameras is separate in different 
     423processing database; merging of output products takes place in DVO. 
    432424 
    433425Within the processing database, the various processing stages are 
     
    435427primary table which defines the conceptual list of processing items 
    436428either to be done, in progress, or completed.  An associated secondary 
    437 table (or set of tables) lists the details of elements which have been 
    438 processed.  Table \ref{tab: database schema} contains an outline of 
    439 the database schema, showing the relations between tables organized by 
    440 processing stage.  As an example, one critical stage is the 
    441 \ippstage{chip} processing stage (see \S\ref{sec:chip}) in which the 
    442 individual chips from an exposure are detrended and sources are 
    443 detected.  Within the gpc1 database, the primary table is called 
    444 \ippdbtable{chipRun} in which each exposure has a single entry. 
    445 Associated with this table is the \ippdbtable{chipProcessedImfile} 
    446 table, which contains one row for each of the chips 
    447 associated with the exposure (up to 60 for gpc1).  The primary tables, such as 
    448 \ippdbtable{chipRun}, are populated once the system has decided that a 
    449 specific item (e.g., an exposure) should be processed at that stage. 
    450 Initially, the entry is given a state of ``run'', denoting that the 
    451 exposure is ready to be processed.  The low-level table entries, such 
    452 as the \ippdbtable{chipProcessedImfile} entries, are only populated 
    453 once the element (e.g., the chip) has been processed by the analysis 
    454 system.  Once all elements for a given stage, e.g., chips in this 
    455 case, are completed, then the status of the top-level table entry 
    456 (\ippdbtable{chipRun}) are switched from ``run'' to ``full''. 
     429table (or set of tables) lists the details of component elements which 
     430have been processed for each top-level item.  Table \ref{tab: database 
     431  schema} contains an outline of the database schema, showing the 
     432relations between tables organized by processing stage.  As an 
     433example, one critical stage is the \ippstage{chip} processing stage 
     434(see \S\ref{sec:chip}) in which the individual chips from an exposure 
     435are detrended and sources are detected.  Within the gpc1 database, the 
     436primary table is called \ippdbtable{chipRun} in which each exposure 
     437has a single entry.  Associated with this table is the 
     438\ippdbtable{chipProcessedImfile} table, which contains one row for 
     439each of the chips associated with the exposure (up to 60 for gpc1). 
     440The primary tables, such as \ippdbtable{chipRun}, are populated once 
     441the system has decided that a specific item (e.g., an exposure) should 
     442be processed at that stage.  Initially, the entry is given a state of 
     443``run'', denoting that the exposure is ready to be processed.  The 
     444low-level table entries, such as the \ippdbtable{chipProcessedImfile} 
     445entries, are only populated once the element (e.g., the chip) has been 
     446processed by the analysis system.  Once all elements for a given 
     447stage, e.g., chips in this case, are completed, then the status of the 
     448top-level table entry (\ippdbtable{chipRun}) are switched from ``run'' 
     449to ``full''. 
    457450 
    458451If the analysis of an element (e.g., the individual OTA chip) 
     
    467460other hand, if the analysis failed because of a problem with the input 
    468461data, this is noted by setting a non-zero value in a different table 
    469 field, \ippdbcolumn{quality}.  For example, if the chip analysis 
     462field, \ippdbcolumn{quality}.  For example, if the \ippstage{chip} analysis 
    470463failed to discover any stars because the image was completely 
    471464saturated, the analysis can complete successfully (\ippdbcolumn{fault} 
     
    483476of the \ippdbcolumn{fault}s which occur are ephemeral due to current 
    484477conditions of the processing cluster, the processing stages are set up 
    485 to occasionally clear and re-try the faulted entries.  Some faults 
     478to occasionally clear and re-try the faulted entries.  Some \ippdbcolumn{fault}s 
    486479represent software bugs and in the early stages of processing were 
    487480accumulated until the corresponding software issue could be addressed; 
    488481since the start of the PS1 Science Consortium Surveys, these types of 
    489 faults have largely been eliminated.  Thus, automatic processing is 
     482\ippdbcolumn{fault}s have largely been eliminated.  Thus, automatic processing is 
    490483able to keep the data flowing even in the face of occasional network 
    491484glitches or hardware crashes. 
     
    496489As exposures are taken by the PS1 telescope \& GPC1 camera system, the 
    497490data from the 60 OTA devices are read out by the camera software 
    498 wsystem and written to disk on a collection of computers at the summit 
     491system and written to disk on a collection of computers at the summit 
    499492in the PS1 facility called ``pixel servers.'' After the images are 
    500493written to disk, a summary listing of the information about the 
    501 exposure and the chip images are added to the summit datastore. 
     494exposure and the chip images are added to the summit datastore (an 
     495internal http-based data sharing tool, see 
     496Section~\ref{sec:datastore}). 
    502497 
    503498During night-time operations, while the summit datastore is being 
     
    531526 
    532527Once the chips for an exposure have all been downloaded, the exposure 
    533 is ready to be registered.  In this context, `registration' refers to 
     528is ready to be registered.  In this context, ``registration'' refers to 
    534529the process of adding them to the database listing of known, raw 
    535 exposures (not to be confused with `registration' in the sense of 
    536 pixel re-alignment).  The result of the registration analysis is an 
     530exposures (not to be confused with ``registration'' in the sense of 
     531pixel re-alignment).  The result of the \ippstage{registration} analysis is an 
    537532entry for each exposure in the \ippdbtable{rawExp} table, and one for 
    538533each chip in the \ippdbtable{rawImfile} table.  These tables are 
    539534critical for downstream processing to identify what exposures are 
    540 available for processing in any other stage.  At the registration 
     535available for processing in any other stage.  At the \ippstage{registration} 
    541536stage, a large amount of descriptive metadata for each chip is added 
    542537to the \ippdbtable{rawImfile} table, the majority of which is 
     
    552547 
    553548Unlike much of the rest of the IPP stage, the raw exposures may only 
    554 have a single entry in the registration tables of the processing 
     549have a single entry in the \ippstage{registration} tables of the processing 
    555550database tables (\ippdbtable{rawExp} and \ippdbtable{rawImfile}). 
    556551 
    557 For GPC1, the image registration stage is also the stage at which the 
     552For GPC1, the \ippstage{registration} stage is also the stage at which the 
    558553\ippprog{burntool} analysis is run.  This analysis is more completely 
    559554described in \citet{waters2017}.  In brief, the \ippprog{burntool} 
     
    564559observation date and time listed in the headers, with the results 
    565560stored in an text table.  As a result of the sequential nature of this 
    566 analysis, the registration of exposures is blocked until the 
     561analysis, the \ippstage{registration} of exposures is blocked until the 
    567562\ippprog{burntool} has been run on the previous exposures. 
    568563 
    569 Once the registration process has finished, new science exposures that 
    570 have an \ippdbcolumn{obs_mode} value that indicates they are part of 
    571 a particular science survey are automatically launched into the 
    572 science analysis by defining entries for the \ippstage{chip} 
    573 processing stage, as described above.  This analysis can be relaunched 
    574 multiple times, such as for the large scale PV3 reprocessing. 
    575 However, this automatic process ensures the shortest time between 
    576 observation and analysis, which is particularly important in the 
    577 search for transient sources. 
     564Once the \ippstage{registration} process has finished, new science 
     565exposures that have an \ippdbcolumn{obs_mode} value that indicates 
     566they are part of a particular science survey are automatically 
     567launched into the science analysis by defining entries for the 
     568\ippstage{chip} processing stage, as described above.  The science 
     569analysis of a given exposure can be relaunched multiple times, such as 
     570for the large scale PV3 reprocessing.  The automatically-launched 
     571analysis process ensures the shortest time between observation and 
     572analysis, particularly important in the search for transient sources. 
    578573 
    579574\subsection{Chip Processing} 
     
    619614%% attempts to target the processing for each OTA to the machine on which 
    620615%% the data for that detector is stored.  The output products are then 
    621 %% primarily saved back to the same machine.  This `targetted' processing 
     616%% primarily saved back to the same machine.  This ``targetted'' processing 
    622617%% was an early design choice to minimize the system wide network load 
    623618%% during processing.  In practice, as computer disks filled up at 
     
    647642 
    648643The results of the image processing are then written to disk, 
    649 including the science, mask, and variance images, the background model 
    650 subtracted, the PSF model used in the photometry process, and a FITS 
    651 catalog of detected sources.  Additional binned images of the full OTA 
    652 are also saved, providing $16\times{}16$ and $256\times{}256$ pixel 
    653 binning scales for quick visualization.  The processing log and a 
    654 selection of summary metadata describing the processing results are 
    655 also written to disk.  This metadata is used to populate a row in the 
    656 \ippdbtable{chipProcessedImfile} table (linked to the 
    657 \ippdbtable{chipRun} entry by a shared \ippdbcolumn{chip_id} value) 
    658 to indicate that the processing of this OTA is complete. 
     644including the science, mask, and variance images, the binned 
     645background model subtracted, the PSF model used in the photometry 
     646process, and a FITS catalog of detected sources.  Additional binned 
     647images of the full OTA are also saved, using $16\times{}16$ and 
     648$256\times{}256$ pixel binning scales for quick visualization.  The 
     649processing log and a selection of summary metadata describing the 
     650processing results are also written to disk.  This metadata is used to 
     651populate a row in the \ippdbtable{chipProcessedImfile} table to 
     652indicate that the processing of this OTA is complete. 
    659653 
    660654As each OTA is processed independently of the others across a number 
    661 of computers, the \ippprog{pantasks} managing the jobs periodically 
    662 runs an \ippmisc{advance} task that checks that the number of rows in 
    663 \ippdbtable{chipProcessedImfile} with \ippdbcolumn{fault} equal to 
    664 zero matches the associated number of rows in \ippdbtable{rawImfile}. 
    665 If this condition is met, than all processing for that exposure is 
    666 finished, and the \ippdbcolumn{state} field is set to ``full''.  If 
    667 the \ippdbtable{chipRun}.\ippdbcolumn{end_stage} field is set to 
     655of computers, the \ippprog{pantasks} server managing the jobs 
     656periodically runs an \ippmisc{advance} task that checks that the 
     657number of rows in \ippdbtable{chipProcessedImfile} with 
     658\ippdbcolumn{fault} equal to zero matches the associated number of 
     659rows in \ippdbtable{rawImfile}.  If this condition is met, than all 
     660processing for that exposure is finished, and the \ippdbcolumn{state} 
     661field is set to ``full''.  If the 
     662\ippdbtable{chipRun}.\ippdbcolumn{end_stage} field is set to 
    668663\ippstage{chip}, then no further action is taken.  However, this field 
    669664is usually set to a subsequent stage (most often \ippstage{warp}), 
    670 then an entry for this exposure is added to the \ippdbtable{camRun} 
     665in which case an entry for this exposure is added to the \ippdbtable{camRun} 
    671666table, and processing continues. 
    672667 
     
    710705to help guarantee a solution in the case of a modest pointing error. 
    711706The guess astrometry is used to match the reference catalog to the 
    712 observed stellar positions in the focal plane coordinate system.  Once 
    713 an acceptable match is found, the astrometric calibration of the 
     707observed stellar positions in the focal plane coordinate system 
     708\citep[see][]{magnier2017.calibration}).   
     709 
     710Once an acceptable match is found, the astrometric calibration of the 
    714711individual chips is performed, including a fit to a single model for 
    715712the distortion introduced by the camera optics.  After the astrometic 
     
    720717used to generate synthetic w-band photometry for areas where no 
    721718PS1-based calibrated w-band photometry is available.  For more 
    722 details, see \cite{magnier2017.calibration}.  The result of these calibrations is 
    723 stored as a single multi-extension FITS table containing the results 
    724 from each OTA as a separate extension. 
     719details, see \cite{magnier2017.calibration}.  The result of these 
     720calibrations is stored as a single multi-extension FITS table 
     721containing the results from each OTA as a separate extension. 
    725722 
    726723In addition to the astrometric and photometric calibrations, the 
     
    740737processed all at once, this update also updates the associated 
    741738\ippdbtable{camRun} entry, linked by the \ippdbcolumn{cam_id}.  As 
    742 with the \ippstage{chip} stage, the 
     739with the \ippstage{chip} stage, if the 
    743740\ippdbtable{camRun}.\ippdbcolumn{end_stage} is for a subsequent 
    744741stage, an appropriate entry is added to the \ippdbtable{fakeRun} 
    745 table. 
    746  
    747 %% \subsection{Fake Analysis} 
    748 %% \label{sec:fake} 
    749 %%  
    750 %% The \ippstage{fake} stage was originally designed to do false source 
    751 %% injection and recovery, in order to determine the detection efficiency 
    752 %% of sources on the exposure.  However, early in the design of the IPP, 
    753 %% this task was moved to the rest of the photometry analysis done at the 
    754 %% \ippstage{chip} stage.  Removing the stage would require significant 
    755 %% changes to the database schema.  As a result, this conveniently named 
    756 %% stage generally does no actual data processing, and consists mainly of 
    757 %% database operations to move the exposure on to the \ippstage{warp} 
    758 %% stage.  The operations mimic the \ippstage{chip} stage, with 
    759 %% individual jobs run for each OTA that update rows in the 
    760 %% \ippdbtable{fakeProcessedImfile}, and an \ippmisc{advance} task that 
    761 %% updates the \ippdbtable{fakeRun} table and promotes the exposure to 
    762 %% the next stage by adding a row to the \ippdbtable{warpRun} table. 
     742table.   
     743 
     744\subsection{Fake Analysis} 
     745\label{sec:fake} 
     746 
     747The \ippstage{fake} stage was originally designed to do false source 
     748injection and recovery, in order to determine the detection efficiency 
     749of sources on the exposure.  However, early in the design of the IPP, 
     750this task was moved to the rest of the photometry analysis done at the 
     751\ippstage{chip} stage.  Removing the stage would require significant 
     752changes to the database schema.  As a result, this conveniently named 
     753stage generally does no actual data processing, and consists mainly of 
     754database operations to move the exposure on to the \ippstage{warp} 
     755stage.  The operations mimic the \ippstage{chip} stage, with 
     756individual jobs run for each OTA that update rows in the 
     757\ippdbtable{fakeProcessedImfile}, and an \ippmisc{advance} task that 
     758updates the \ippdbtable{fakeRun} table and promotes the exposure to 
     759the next stage by adding a row to the \ippdbtable{warpRun} table. 
    763760 
    764761\subsection{Image Warping} 
     
    776773described by a single tangent plane projection, or for larger regions 
    777774which have multiple projection centers.  For the $3\pi$ survey, the 
    778 \ippmisc{RINGS.V3} tessellation was used that used projection centers 
     775\ippmisc{RINGS.V3} tessellation was used that arrange projection centers 
    779776spaced every four degrees in both RA and DEC, with $0\farcs{}25$ 
    780777pixels.  These projections are further broken down into ``skycells'' 
     
    822819\label{sec:stack} 
    823820 
    824 The skycell images generated by the \ippstage{warp} process are added 
    825 together to make deeper, higher signal-to-noise images in the 
     821The skycell images generated by the \ippstage{warp} process can be 
     822added together to make deeper, higher signal-to-noise images in the 
    826823\ippstage{stack} stage.  These stacked images also fill in coverage 
    827824gaps between different exposures, resulting in an image of the sky 
     
    831828input images.  During nightly science processing, the 8 exposures per 
    832829filter for each Medium Deep field are combined into a set of stacks 
    833 for that field.  These so-called `nightly stacks' are used by the 
     830for that field.  These so-called ``nightly stacks'' are used by the 
    834831transient survey projects to detect faint supernovae, among other 
    835832transient events.  For the PV3 $3\pi$ analysis, all images in each 
     
    840837For the PV3 processing of the Medium Deep fields, stacks have been 
    841838generated for the nightly groups and for the full depth using all 
    842 exposures, producing ``deep stacks''.  In addition, a `best seeing' 
     839exposures, producing ``deep stacks''.  In addition, a ``best seeing'' 
    843840set of stacks have been produced \note{using image quality cuts to be 
    844841  described: need input from MEH}.  We have also generated 
    845 out-of-season stacks for the Medium Deep fields, in which all image 
     842out-of-season stacks for the Medium Deep fields, in which all images 
    846843not from a particular observing season for a field are combined into a 
    847844stack.  These later stacks are useful as deep templates when studying 
     
    850847season. 
    851848 
    852 When a given set of \ippstage{stack} stage are defined, exposures with 
    853 existing \ippstage{warp} entries that match the filter, position, and 
    854 other criteria such as seeing are grouped by their skycell.  An entry 
     849When a given set of \ippstage{stack} stage processing is defined, 
     850exposures with existing \ippstage{warp} entries that match the filter, 
     851position, and other criteria such as seeing are identified.  An entry 
    855852is then added for each skycell in the \ippdbtable{stackRun} table, 
    856853with the \ippdbcolumn{warp_id} entries for the exposures added to the 
    857854\ippdbtable{stackInputSkyfile} table, linked to the 
    858 \ippdbtable{stackRun} entry by the \ippdbcolumn{stack_id} field. 
    859 This defines the mapping for which exposures contribute to the 
    860 \ippstage{stack}.  This breaks exposures into single skycells, but as 
    861 adjacent \ippstage{stack} skycells may contain inputs from different 
    862 exposures, there is no simple way to group the processing at the 
    863 \ippstage{stack} stage into exposures. 
     855\ippdbtable{stackRun} entry by the \ippdbcolumn{stack_id} field.  This 
     856defines the mapping for which exposures contribute to the 
     857\ippstage{stack}.  The \ippstage{stack} stage processing is performed 
     858at the skycell level. 
    864859 
    865860The \ippstage{stack} jobs pass the information about the input images 
     
    867862image combinations.  See~\cite{waters2017} for details on the stack 
    868863combination algorithm.  In addition to the standard image, mask, and 
    869 variance produced at other stage, additional images are constructed 
     864variance produced at other stages, additional images are constructed 
    870865with information about the contributions to each pixel.  A number 
    871866image contains the number of input exposures used for each pixel, 
     
    887882deferred to the \ippstage{staticsky} stage.  This separation is 
    888883maintained because the photometry analysis of the \ippstage{stack} 
    889 images is performed on all 5 filters simultaneously.  By deferring 
    890 this analysis, the processing system may also decouple the generation 
    891 of the pixels from the source detection.  This makes the sequencing of 
    892 analysis somewhat easier and less subject to blocks due to a failure 
    893 in the stacking analysis.  Similar to the \ippstage{stack} stage, an 
    894 entry is created in the \ippdbtable{staticskyRun} table, linked to a 
    895 series of rows in the \ippdbtable{staticskyInput} table by a common 
    896 \ippdbcolumn{sky_id}, each of which also contains the appropriate 
    897 \ippdbcolumn{stack_id} entries for the skycell under consideration. 
     884images, including convolved galaxy model fitting, is performed on all 
     8855 filters simultaneously.  By deferring this analysis, the processing 
     886system may also decouple the generation of the pixels from the source 
     887detection.  This makes the sequencing of analysis somewhat easier and 
     888less subject to blocks due to a failure in the stacking analysis. 
     889Similar to the \ippstage{stack} stage, an entry is created in the 
     890\ippdbtable{staticskyRun} table, linked to a series of rows in the 
     891\ippdbtable{staticskyInput} table by a common \ippdbcolumn{sky_id}, 
     892each of which also contains the appropriate \ippdbcolumn{stack_id} 
     893entries for the skycell under consideration. 
    898894 
    899895The input images are passed to the \ippprog{psphotStack} program, 
     
    927923The stack photometry output catalogs are re-calibrated for both 
    928924photometry and astrometry in a process very similar to the 
    929 \ippstage{camera} calibration stage.  In the case of this 
    930 \ippstage{skycal} stage, each skycell is processed independently. 
    931 Because of this independence, when queued for processing, the entries 
    932 in the \ippdbtable{skycalRun} table contain the \IPPdbcolumn{sky_id} 
    933 and \ippdbcolumn{stack_id} entries of the parent data directly.  As 
    934 in the \ippstage{camera} stage, the \ippprog{psastro} program reads in 
    935 the stack photometry catalog, and produces a calibrated output, with 
    936 format matching the input.  A different processing recipe is supplied 
    937 to \ippprog{psastro}, which controls for the different data.  The same 
    938 reference catalog is used for the \ippstage{camera} and 
    939 \ippstage{stack} calibration stages.  Upon completion, the analysis 
    940 statistics are written to the \ippdbtable{skycalResult} table. 
     925\ippstage{camera} calibration stage.  Although the individual warps 
     926which go into the stack are calibrated based on the \ippstage{camera} 
     927stage analysis, there was some concern that these calibrations might 
     928not be sufficiently well-defined for some of the input warps, biasing 
     929the photometry of the stack.  By re-calibrating the stacks, we can be 
     930sure that the stack photometry as measured is tied to the photometric 
     931reference system. 
     932 
     933In the case of this \ippstage{skycal} stage, each skycell is processed 
     934independently.  Because of this independence, when queued for 
     935processing, the entries in the \ippdbtable{skycalRun} table contain 
     936the \ippdbcolumn{sky_id} and \ippdbcolumn{stack_id} entries of the 
     937parent data directly.  As in the \ippstage{camera} stage, the 
     938\ippprog{psastro} program reads in the stack photometry catalog, and 
     939produces a calibrated output, with format matching the input.  A 
     940different processing recipe is supplied to \ippprog{psastro}, which 
     941controls for the different data.  The same reference catalog is used 
     942for the \ippstage{camera} and \ippstage{stack} calibration stages. 
     943Upon completion, the analysis statistics are written to the 
     944\ippdbtable{skycalResult} table. 
    941945 
    942946\subsection{Forced Warp Photometry} 
     
    995999individual warp images used to generate the stack.  This 
    9961000\ippstage{fullforce} analysis is performed on all warps for a single 
    997 skycell and filter as a single unit, as this matches the arrangement 
    998 of the input source catalog from the \ippstage{skycal} stage.  When 
    999 processing is queued for this stage, an entry is added to the 
    1000 \ippdbtable{fullForceRun} primary database table linking to the 
    1001 specific \ippdbcolumn{skycal_id} entry that will be used as the 
    1002 catalog for the photometry.  The \ippdbcolumn{warp_id} values for the 
    1003 input \ippstage{warp} stage images that contributed to the 
    1004 \ippstage{stack} associated with that \ippdbcolumn{skycal_id} are 
     1001skycell and filter as a single unit within the processing database, 
     1002while individual warps are processed individually in parallel as 
     1003separate processing jobs. 
     1004 
     1005When processing is queued for this stage, an entry is added to the 
     1006\ippdbtable{fullForceRun} primary database table with a reference to 
     1007the corresponding stack and \ippdbcolumn{skycal_id} entry that is the 
     1008input source of detections to be measured.  The \ippdbcolumn{warp_id} 
     1009values for the input \ippstage{warp} stage images that contributed to 
     1010the \ippstage{stack} associated with that \ippdbcolumn{skycal_id} are 
    10051011then added to the \ippdbtable{fullForceInput} table, linked to the 
    10061012primary table by the \ippdbcolumn{ff_id} identifier.  The individual 
     
    10081014stage image products along with the \ippstage{skycal} catalog to the 
    10091015\ippprog{psphotFullForce} program. 
     1016 
     1017%% In this program, the positions of sources are loaded from the input 
     1018%% catalog.  PSF stars are pre-identified from the stack image and a PSF 
     1019%% model generated for each \ippstage{warp} image based on those stars, 
     1020%% using the same stars for all warps to the extent possible (PSF stars 
     1021%% which are excessively masked on a particular image are not used to 
     1022%% model the PSF).  The PSF model is fitted to all of the known source 
     1023%% positions in the warp images.  Aperture magnitudes, Kron magnitudes, 
     1024%% and moments are also measured at this stage for each warp.  Note that 
     1025%% the flux measurement for a faint, but significant, source from the 
     1026%% stack image may be at a low significance (less than the $5\sigma$ 
     1027%% criterion used when the photometry is not run in this forced mode) in 
     1028%% any individual warp image; the flux may even be negative for specific 
     1029%% warps.  When combined together, these low-significance measurements 
     1030%% will result in a signficant measurement as the signal-to-noise 
     1031%% increases by the square root of the number of measurements.  The 
     1032%% individual warp measurements are combined together to generate 
     1033%% averages values within DVO. 
    10101034 
    10111035The convolved galaxy models are also re-measured on the 
     
    10531077images are matched.  \note{discuss Alard-Lupton}.  
    10541078 
    1055 In the \ippstage{diff} stage, the IPP generates diffferece images for 
     1079In the \ippstage{diff} stage, the IPP generates difference images for 
    10561080appropriately specified pairs of images.  It is possible for the 
    10571081difference image to be generated from a pair of \ippstage{warp} stage 
    10581082images, from a \ippstage{warp} and a \ippstage{stack} of some variety, 
    10591083or from a pair of \ippstage{stack} stage images.  During the PS1 
    1060 survey, pairs of exposures, call TTI pairs (see~\note{Survey 
     1084survey, pairs of exposures, called TTI pairs (see~\note{Survey 
    10611085  Strategy in Chambers et al}), were obtained for each pointing within a $\approx$ 1 
    10621086hour period in the same filter, and to the extent possible with the 
     
    10741098\ippdbtable{diffRun} table, and the appropriate input images are added 
    10751099to the \ippdbtable{diffInputSkyfile} table, with one entry for each 
    1076 skycell that are covered by the images.  For a \ippstage{diff} 
     1100skycell that is covered by the images.  For a \ippstage{diff} 
    10771101generated from two \ippstage{warp} stage products, the input images 
    10781102have their \ippdbcolumn{warp_id} values recorded in the 
     
    10951119catalogs passed to the \ippprog{ppSub} program.  This does the 
    10961120subtraction, as well as the photometry of any sources detected in the 
    1097 \ippstage{diff} image.  The algorithm used for PSF matching is 
    1098 described in \citet{waters2017}.  Upon completion of these jobs, 
    1099 statistics about the processing are written to an entry in the 
     1121\ippstage{diff} image.  Sources may be detected as a positive source 
     1122(flux in the minuend is higher than the subtrahend) or as a negative 
     1123source (flux in the subtrahend is higher).  The algorithm used for PSF 
     1124matching is described in \citet{waters2017}.  Upon completion of these 
     1125jobs, statistics about the processing are written to an entry in the 
    11001126\ippdbtable{diffSkyfile} table.  An \ippmisc{advance} checks for the 
    11011127completion of all of the components listed in 
     
    11111137\begin{table}[hb] 
    11121138\begin{center} 
    1113 \caption{DVO Database Tables\label{tab:DVO_schema}} 
     1139\caption{DVO Database Tables\label{tab:DVO_schema} \note{fix order, 
     1140    drop invalid tables}} 
    11141141\begin{tabular}{ll} 
    11151142\hline 
     
    11551182DVO tracks three main classes of information: 1) average properties of 
    11561183astronomical objects; 2) measurements of those objects (from which the 
    1157 average properties are derived); 3) properties of image which provided 
     1184average properties are derived); 3) properties of the images which provided 
    11581185some or all of the measuements.  Figure~\ref{fig:DVO_schema} 
    11591186illustrates the schematic relationship between these types of 
     
    11821209measurements; those which store information about the images; those 
    11831210which store supporting information (metadata). 
    1184  
    1185 \subsubsubsection{Photcodes} 
    1186  
    1187 % photcodes 
    1188 DVO has a special metadata table called \ippdbcolumn{photcode} which 
    1189 identifies the photometry filter systems.  Entries in this table are 
    1190 used to identify the source of measurements and images.  Each row in 
    1191 the \ippdbcolumn{photcode} table includes a \ippdbcolumn{photcode} 
    1192 name, a unique numerical ID, and information about that photometry 
    1193 system.   
    11941211 
    11951212DVO includes two major classes of database tables: those containing 
     
    12081225levels each containing a finer mesh of regions covering the sky. 
    12091226 
     1227\subsubsubsection{Photcodes} 
     1228 
     1229% photcodes 
     1230DVO has a special metadata table called \ippdbtable{photcode} which 
     1231identifies the photometry filter systems.  Entries in this table are 
     1232used to identify the source of measurements and images.  Each row in 
     1233the \ippdbtable{photcode} table includes a \ippdbtable{photcode} 
     1234name, a unique numerical ID, and information about that photometry 
     1235system.   
     1236 
     1237There are 3 classes of photcodes defined within the DVO system.  One 
     1238class of photcodes define the filter systems for the average 
     1239photometry measurements; these are called \ippmisc{SEC} photcodes.  A 
     1240second class of photcode is associated with measurements from a 
     1241specific camera for which image metadata is available are called 
     1242\ippmisc{DEP} photcodes.  There are also those measurements which come 
     1243from external data sources for which DVO does not have any information 
     1244to determine a calibration (e.g., instrumental magnitudes and detector 
     1245coordinates).  These are measurements are reference values and are 
     1246assigned \ippmisc{REF} photcodes. 
     1247 
    12101248The names for \ippmisc{SEC} photcodes are the names of filter systems, 
    12111249such as $g,r,i$ or $J,H,K$.  For \ippmisc{DEP} and \ippmisc{REF} 
     
    12291267properties derived from multiple measurements, and for which the 
    12301268measurement-to-image relationship is not provided.  Ingests methods 
    1231 have been defined for example for 2MASS, WISE, Gaia, USNO-B.  In each 
     1269have been defined, for example, for 2MASS, WISE, Gaia, USNO-B.  In each 
    12321270of these cases, the astrometric and photometric measurements are 
    12331271stored in the \ippdbtable{Measure} table, with the data source 
     
    12581296discussed below) and the astrometrically calibrated position. 
    12591297Astrometric offsets for several systematic corrections discussed below 
    1260 are also defined for each measurement.  Photometry from chip, warp, 
    1261 and stack are all placed in the same table with photcodes 
     1298are also defined for each measurement.  Photometry from \ippstage{chip}, \ippstage{warp}, 
     1299and \ippstage{stack} are all placed in the same table with photcodes 
    12621300distinguishing the source \note{show example of stack and warp 
    12631301  photcodes}.  Since stacks and forced warp fluxes may have 
     
    12691307For the warp images, we also measure the weak lensing KSB parameters 
    12701308related to the shear and smear tensors \citep{1995ApJ...449..460K}. 
    1271 These measurements are stored in the \ippdbcolumn{Lensing} table, 
     1309These measurements are stored in the \ippdbtable{Lensing} table, 
    12721310along with the radial aperture fluxes for radii numbers 5, 6, \& 7 
    12731311(respectively 3.0, 4.63, and 7.43 arcsec).  This table contains one 
     
    12811319sorted \ippdbtable{Lensing} table entries.  \note{discuss failure of 
    12821320  the Lensing to Measure indexing} 
     1321 
     1322\note{Average used above but defined below} 
    12831323 
    12841324\subsubsubsection{Object Tables} 
     
    13591399these photometric distance modulus measurements are not extremely 
    13601400precise (see below), they provide a constraint on the distance is used 
    1361 in our analysis of the astrometry \citep[][see]{magnier2017.calibration}. 
     1401in our analysis of the astrometry \citep[see][]{magnier2017.calibration}. 
    13621402 
    13631403In the \ippdbtable{Measure} table, there are three fields which 
     
    14161456determined by the photometry calibration analysis and the astrometric 
    14171457flat-field corrections determined by the astrometry calibration 
    1418 analysis \citep[][see]{magnier2017.calibration}. 
     1458analysis \citep[see][]{magnier2017.calibration}. 
     1459\note{use names and match DVO schema table} 
    14191460 
    14201461\subsubsection{Sky Partition} 
    14211462 
    1422 DVO includes two major classes of database tables: those containing 
     1463\note{re-word this sentence}  DVO includes two major classes of database tables: those containing 
    14231464information about astronomical objects in the sky and those containing 
    14241465other supporting information.  The object-related tables are 
     
    14381479on the one used by the Hubble Space Telescope Guide Star Catalog 
    14391480files.  \note{add figure} Level 0 is a single region covering the full 
    1440 sky.  Level 1 divides the sky in Declination into bands 
    1441 7.5\degree\ high.  Level 2 subdivides these Declination bands in the 
     1481sky.  Level 1 divides the sky in declination into bands 
     14827.5\degree\ high.  Level 2 subdivides these declination bands in the 
    14421483RA direction, with spacing related to the stellar density.  Level 3 
    14431484divides these RA chunks into 4 - 8 smaller partitions.  This level 
     
    14591500astronomical objects in the database files, with an associated maximum 
    14601501of \approx 30 million measurements in these files.  With the compression 
    1461 scheme described above, the largest database files are \approx 
     1502scheme described below, the largest database files are \approx 
    146215033GB, which can be loaded into memory in 30 seconds on the processing 
    14631504machines that contain partition data. 
     
    14991540tables are compressed using the (to date) experimental FITS binary 
    15001541table compression strategy outlined by \note{REF}.  Table compression 
    1501 is in general an option in DVO; for the PV3 database, the large data 
     1542is an option in DVO; for the PV3 database, the large data 
    15021543volume (70TB compressed) drove the decision to compress the tables. 
    15031544 
     
    15051546The FITS binary table compression scheme uses a strategy similar to 
    15061547that used for FITS image compression (\note{REF}).  The binary tabular 
    1507 data is compressed and stored in the `HEAP' section of the FITS table 
     1548data is compressed and stored in the ``HEAP'' section of the FITS table 
    15081549extension, with pointers to the compressed data stored in the regular 
    15091550data section.  Each column in the FITS table is compressed as one (or 
     
    15111552column format (e.g., TFORM1) are replaced with keywords which describe 
    15121553the location and size of the compressed data in the HEAP section; the 
    1513 information about the uncompressed data is moved to a keyword with `Z' 
     1554information about the uncompressed data is moved to a keyword with ``Z'' 
    15141555prepended (e.g., ZFORM1) and an additional field is added to define 
    15151556the compression algorithm (e.g., ZCTYP1).  The column names (e.g., 
     
    15331574in the tables.  In practice, we have chosen a default in which 
    15341575floating point numbers use \code{GZIP_2}, character strings use 
    1535 \code{GZIP_1}, integers use \code{RICE}. 
     1576\code{GZIP_1}, and integers use \code{RICE}. 
    15361577 
    15371578\subsubsection{Addstar : DVO Ingest} 
     
    15401581Upon completion of the processing of each stage, the results of the 
    15411582photometry analysis are stored in a large number of individual catalog 
    1542 files as described in~\ref{XXX}.  The data from these files are loaded 
    1543 into a DVO database to define the astronomical objects and to allow 
    1544 for calibration analysis.  The program which loads the data into the 
    1545 DVO database is called \ippprog{addstar}, and is associated with the 
    1546 the \ippstage{addstar} processing stage.  The measurement catalogs 
    1547 generated by the \ippstage{camera}, \ippstage{staticsky}, 
    1548 \ippstage{skycal}, \ippstage{fullforce}, and \ippstage{diff} stages 
    1549 are processed loaded into DVOs in this fashion, although not every 
    1550 measurement in each catalog are included in the master DVO that is 
    1551 constructed.  For a particular re-processing version, a single master 
    1552 DVO is constructed for the positive image stages (\ippstage{camera}, 
    1553 \ippstage{staticsky}, \ippstage{skycal}, \ippstage{fullforce}) and a 
    1554 separate one is constructed for the difference image analysis stage 
    1555 results. 
     1583files as described in \cite{magnier2017.analysis}.  The data from 
     1584these files are loaded into a DVO database to define the astronomical 
     1585objects and to allow for calibration analysis.  The program which 
     1586loads the data into the DVO database is called \ippprog{addstar}, and 
     1587is associated with the the \ippstage{addstar} processing stage.  The 
     1588measurement catalogs generated by the \ippstage{camera}, 
     1589\ippstage{staticsky}, \ippstage{skycal}, \ippstage{fullforce}, and 
     1590\ippstage{diff} stages are processed loaded into DVOs in this fashion, 
     1591although not every measurement in each catalog are included in the 
     1592master DVO that is constructed.  For a particular re-processing 
     1593version, a single master DVO is constructed for the positive image 
     1594stages (\ippstage{camera}, \ippstage{staticsky}, \ippstage{skycal}, 
     1595\ippstage{fullforce}) and a separate one is constructed for the 
     1596difference image analysis stage results. 
    15561597 
    15571598The construction of the master DVO is performed in a hierarchical 
     
    15641605databases together.  In the merge, astronomical objects are joined 
    15651606together using essentially the same rules as those used to associated 
    1566 detections into objects.  One exception: the match radius may be 
     1607detections into objects with one exception: the match radius may be 
    15671608chosen to be a different size depending on the data source.  For 
    15681609example, when WISE data is merged with PS1 data, as discussed below, a 
     
    16121653a function of position in the camera (essentially an astrometric 
    16131654flat-field correction), as a function of the brightness of the star 
    1614 (the so-called Koppenh\"offer effect, see~\ref{magnier2017.calibration}), and as 
    1615 a function of airmass and color (Differential chromatic refraction). 
     1655(the so-called Koppenh\"offer effect, see~\citealt{magnier2017.calibration}), and as 
     1656a function of airmass and color (differential chromatic refraction). 
    16161657Once the systematic errors have been measured, they are applied back 
    16171658to the measurements in the database.  Within the DVO 
     
    16241665astrometry is again performed this time using the corrected positions. 
    16251666 
     1667\note{have eddie suggest wording here?} 
     1668 
    16261669Photometric calibration consists of determination of zero points for 
    16271670each exposure along with corrections for systematic effects.  In this 
    16281671case, we rely on efforts of our external collaborators for the initial 
    16291672zero point determination.  The team at CfA downloaded the per-exposure 
    1630 catalog files (`smf files') and determined the zero points of those 
     1673catalog files (``smf files'') and determined the zero points of those 
    16311674exposures which were believed to be obtained in photometric 
    1632 conditions.  This process, called `\"ubercal', is described in detail 
     1675conditions.  This process, called ``\"ubercal'', is described in detail 
    16331676by \cite{2012ApJ...756..158S} for the first (PV1) version.  In brief, photometric 
    16341677periods, with time-scales of at least \note{half of a night}, are 
     
    16381681parameters in this solution consist of a single zero point and airmass 
    16391682slope for each photometric period along with a collection of 
    1640 flat-field offsets for several large time range (`flat-field 
    1641 seasons').  For the PV3 \"ubercal analysis, the flat-field offsets 
     1683flat-field offsets for several large time range (``flat-field 
     1684seasons'').  For the PV3 \"ubercal analysis, the flat-field offsets 
    16421685were determined on a $2\times2$ grid for each chip and 5 flat-field 
    16431686seasons were chosen (listed in Table~\ref{tab:flat-field-seasons}). 
     
    16731716Telescope Sciences Institute through their Mikulski Archive for Space 
    16741717Telescopes (MAST).  The underying database at MAST is a copy of a 
    1675 database generated at the Institute for Astronomy by the subsystem 
     1718database generated at the IfA by the subsystem 
    16761719called PSPS : the \note{define PSPS}.  The construction of the PSPS 
    16771720version of the PS1 database starts once the PS1 photometry and 
     
    16811724 
    16821725The first stage of constructing the PSPS database consists of the 
    1683 generation of small files called `batches' which contain a complete 
     1726generation of small files called ``batches'' which contain a complete 
    16841727set of measurements for a small chunk of the database tables.  The 
    16851728program which is responsible for the construction of these batches is 
     
    16901733One type of batch consists of measurements from the individual 
    16911734exposures.  These batches are generated based on the output catalog 
    1692 files generated at the \ippstage{camera} stage (`smf files').  The 
     1735files generated at the \ippstage{camera} stage (``smf files'').  The 
    16931736\ippprog{ipptopsps} program loads the complete set of measurements and 
    16941737metadata from the smf catalog file, then queries the DVO database for 
     
    17571800might be run and to regularly generate new commands based on that 
    17581801concept.  The ``tasks'' are defined using the opihi scripting language 
    1759 (also shared by DVO and other user-interative programs within the 
     1802(also shared by DVO and other user-interactive programs within the 
    17601803IPP). 
    17611804 
    1762 Pantasks repeatedly checks each task in an attempt to generate a new 
    1763 command: we say pantasks attempts to `execute' the task in each of 
     1805\ippprog{Pantasks} repeatedly checks each task in an attempt to generate a new 
     1806command: we say \ippprog{pantasks} attempts to ``execute'' the task in each of 
    17641807these attempts.  Tasks may specify the time between execution 
    17651808attempts, with a 1 second default. 
     
    17731816opihi language) which is run each time the task is executed.  The 
    17741817\code{task.exec} code may refer to variables or other data structures 
    1775 defined by the opihi language within the pantasks environment.  Within 
     1818defined by the opihi language within the \ippprog{pantasks} environment.  Within 
    17761819a single \ippprog{pantasks} instance, all opihi variables and data 
    17771820structures have global context (\ie, all are visible to all tasks). 
     
    17821825 
    17831826Within the \ippprog{task.exec} macro, the command to be run must be 
    1784 defined with the function `command'.  Once the \ippprog{task.exec} 
    1785 macro exits successfully, the defined command is the added to the list of jobs 
     1827defined with the function ``command''.  Once the \ippprog{task.exec} 
     1828macro exits successfully, the defined command is then added to the list of jobs 
    17861829to be run within the UNIX environment.  Jobs may be run in one of two 
    17871830ways: locally or via the parallel processing system.  The task, or the 
    1788 \ippprog{task.exec} macro, uses the `host' command to define how to 
    1789 run the job.  If the host is set to `local', then the job is run in 
    1790 the background by pantasks itself (using the C \code{execvp} 
     1831\ippprog{task.exec} macro, uses the ``host'; command to define how to 
     1832run the job.  If the host is set to ``local'', then the job is run in 
     1833the background by \ippprog{pantasks} itself (using the C \code{execvp} 
    17911834function).  Otherwise, the job is sent to the parallel processing 
    17921835system to be run on another machine within the cluster.  If the host 
    1793 is set to the special value `anyhost', then the parallel processing 
     1836is set to the special value ``anyhost'', then the parallel processing 
    17941837system is allowed to choose the processing computer arbitrarily.  Any 
    17951838other value is taken to be the DNS name of the computer on which this 
     
    17981841that the job only runs on the specifically named computer.  Otherwise, 
    17991842the parallel processing system may choose to redirect the command to 
    1800 another computer (based on whatever rules are defined for the parallel 
    1801 processing system). 
     1843another computer using its own rules, e.g. to balance processing load 
     1844across the cluster. 
    18021845 
    18031846When the \ippprog{task.exec} macro is run, the code may choose (e.g., 
    18041847based on tests of some global variables) to exit the macro with an 
    1805 error condition, e.g., with the `break' command.  In this 
     1848error condition, e.g., with the ``break'' command.  In this 
    18061849circumstance, no job is produced by the task.  The task will be tried 
    18071850again the next time it is executed.  This feature allows for the user 
     
    18181861  online user guide?} 
    18191862 
    1820 The option `npending' may be used to limit the number of jobs which 
     1863The option ``npending'' may be used to limit the number of jobs which 
    18211864are simultaneously executed for a specific task.  For example, some 
    18221865classes of jobs should only be run one-at-a-time because they are not 
    18231866protected against collisions or they may overload a resource.  The use 
    1824 of `npending' allows these situations to be handled cleanly within 
    1825 pantasks (avoiding cumbersome coding within with program or supporting 
     1867of ``npending'' allows these situations to be handled cleanly within 
     1868\ippprog{pantasks} (avoiding cumbersome coding within with program or supporting 
    18261869script). 
    18271870 
    1828 The option `nmax' limits the total number of jobs which a task 
     1871The option ``nmax'' limits the total number of jobs which a task 
    18291872generates.  This option may be useful in cases where 
    18301873\ippprog{pantasks} is used to perform a limited set of operations. 
    18311874\note{do we actually use this in IPP?} 
    18321875 
    1833 The option `trange' allows the user to restrict the time period during 
     1876The option ``trange'' allows the user to restrict the time period during 
    18341877which the specific tasks is executed.  This option is given with a 
    18351878start and an end time for the limiting time range.  These times may be 
     
    18461889ranges may be specified \note{how are they evaluated?} 
    18471890 
    1848 The option \code{nice} specifies the `nice' level at which the job is 
     1891The option \code{nice} specifies the ``nice'' level at which the job is 
    18491892run when it is executed.  The parallel processing system must respect 
    18501893this concept. 
    18511894 
    18521895The option \code{active} can be used to turn on and off a task for 
    1853 periods.  Since a user command or a macro run by pantasks can 
     1896periods.  Since a user command or a macro run by \ippprog{pantasks} can 
    18541897re-define task options, the \code{active} state may be changed 
    18551898independently of the task execute.  This is useful for keeping tasks 
     
    18571900prevent them from running for some reason. 
    18581901 
    1859 \subsubsection{pantasks passes jobs to pcontrol} 
     1902\subsubsection{pcontrol} 
    18601903 
    18611904Jobs which are generated by \ippprog{pantasks} may be run locally on 
     
    18831926Similarly, the hosts may also have one of several states: off, down, 
    18841927busy, idle, etc.  A single host can accept a single job at a time. 
    1885 Multiple hosts instances corresponding to the same machine may be 
     1928Multiple host instances corresponding to the same machine may be 
    18861929specified allowing a single computer to run more than one simultaneous 
    18871930job.   
    18881931 
    1889 During operation, pcontrol accepts new jobs from pantasks and adds 
    1890 them to the list of jobs to execute.  It also accepts from pantasks 
     1932During operation, \ippprog{pcontrol} accepts new jobs from \ippprog{pantasks} and adds 
     1933them to the list of jobs to execute.  It also accepts from \ippprog{pantasks} 
    18911934the names of computers on which it is allowed to run those jobs. 
    18921935 
    1893 \subsubsection{pcontrol passes jobs to pclient} 
    1894  
    1895 When pcontrol is provided with the name of a computer, it will attempt 
     1936\subsubsection{pclient} 
     1937 
     1938When \ippprog{pcontrol} is provided with the name of a computer, it will attempt 
    18961939to make an connection to that machine via ssh (or rsh?).  When a 
    18971940connection is made, the remote shell is used to run a special 
    18981941interface program call \ippprog{pclient}.  This program accepts 
    1899 command lines from pcontrol and is responsible for executing the 
     1942command lines from \ippprog{pcontrol} and is responsible for executing the 
    19001943individual commands in the local shell environment.  A single ssh 
    1901 connection to a remote host keeps a single pclient shell running for a 
     1944connection to a remote host keeps a single \ippprog{pclient} shell running for a 
    19021945somewhat arbirarly long time, excuting many shell commands as needed. 
    19031946This architecture avoids wasting overhead making the ssh connection to 
     
    19061949architecture is allowed to be very light and short running if needed. 
    19071950 
    1908 After pcontrol sends a job (commands) to a specific pclient, it checks 
     1951After \ippprog{pcontrol} sends a job (commands) to a specific \ippprog{pclient}, it checks 
    19091952back occasionally to see if the command has been run and executed.  If 
    1910 it has finished, then pcontrol will query for the exit status, the 
     1953it has finished, then \ippprog{pcontrol} will query for the exit status, the 
    19111954standard output and standard error streams from the command.  (where 
    1912 do these go, back to pantasks?), with the results associated with the 
    1913 job statistics.  At that point, the pclient on the remote machine is 
    1914 ready to accept a new job from pcontrol.  If any jobs are pending in 
    1915 the list of jobs known to pcontrol, it will send those jobs to any 
     1955do these go, back to \ippprog{pantasks}?), with the results associated with the 
     1956job statistics.  At that point, the \ippprog{pclient} on the remote machine is 
     1957ready to accept a new job from \ippprog{pcontrol}.  If any jobs are pending in 
     1958the list of jobs known to \ippprog{pcontrol}, it will send those jobs to any 
    19161959machines which are idle. 
    19171960 
    1918 While pcontrol interacts with the many remote machines, it 
    1919 occasionally interacts with pantasks to report the results from the 
    1920 jobs it has been monitoring.  Pantasks occasionally requests a list of 
     1961While \ippprog{pcontrol} interacts with the many remote machines, it 
     1962occasionally interacts with \ippprog{pantasks} to report the results from the 
     1963jobs it has been monitoring.  \ippprog{Pantasks} occasionally requests a list of 
    19211964the completed jobs.  It then requests the status information for each 
    19221965completed job, including the standard error and standard output.  As 
    1923 pantasks receives this completion information, the jobs are removed 
    1924 from the list managed by pcontrol.  Thus pcontrol maintains at most a 
    1925 modest list of jobs which are `in flight', leaving all interpretation 
    1926 work to pantasks. 
    1927  
    1928 At the pantasks level, the tasks define how pantasks should use the 
     1966\ippprog{pantasks} receives this completion information, the jobs are removed 
     1967from the list managed by \ippprog{pcontrol}.  Thus \ippprog{pcontrol} maintains at most a 
     1968modest list of jobs which are ``in flight'' , leaving all interpretation 
     1969work to \ippprog{pantasks}. 
     1970 
     1971At the \ippprog{pantasks} level, the tasks define how \ippprog{pantasks} should use the 
    19291972exit status and output products from each job.  For example, the 
    19301973stderr and stdout may be specified to go to a file (with static name 
     
    19361979started.  This mode is useful for testing as all errors are reported 
    19371980back to the opihi shell.  However, when the user exits the shell, the 
    1938 pantasks instance exits, shutting down pcontrol and all remote client 
    1939 connections.  In standard operations, pantasks is run in a client 
     1981\ippprog{pantasks} instance exits, shutting down \ippprog{pcontrol} and all remote client 
     1982connections.  In standard operations, \ippprog{pantasks} is run in a client 
    19401983server mode.  The server runs continuously in the background and 
    19411984multiple users may connect via the \ippprog{pantasks_client} program. 
    19421985Users can the send commands to the server to load scripts, add 
    1943 parallel hosts, check status, and start or stop the pantasks 
     1986parallel hosts, check status, and start or stop the \ippprog{pantasks} 
    19441987operations.  
    19451988 
     
    19561999end   
    19572000\end{verbatim} 
    1958  \caption{\label{fig:task_example} Example of a simple static 
    1959    task in the opihi-based scripting language used by pantasks.  In 
    1960    this example, pantasks would run a single instance of the command 
    1961    ({\tt ls /tmp}) every 5 seconds, sending the stdout and stderr to 
    1962    the listed files. } 
     2001\caption{\label{fig:task_example} Example of a simple static 
     2002  task in the opihi-based scripting language used by ippprog{pantasks}.  In 
     2003  this example, ippprog{pantasks} would run a single instance of the command 
     2004  ({\tt ls /tmp}) every 5 seconds, sending the stdout and stderr to 
     2005  the listed files. } 
    19632006  \end{center} 
    19642007\end{figure} 
     
    19682011\subsubsection{Pantasks scripts: ippTasks} 
    19692012 
    1970 Pantasks provides an environment in which commands can be generated 
     2013\ippprog{Pantasks} provides an environment in which commands can be generated 
    19712014and extensive parallel processing managed.  The details of how to 
    19722015implement the different stages of IPP processing are captured in a 
    1973 collection of scripts written for pantasks in the \code{opihi} 
     2016collection of scripts written for \ippprog{pantasks} in the \code{opihi} 
    19742017language.  In general, each stage is defined by an associated script 
    19752018collected together under the \ippmisc{ippTasks} collection.  While 
     
    20012044row in the result set, each column in the row is stored as a separate 
    20022045line on the \ippmisc{page}, identified by the database column name.  An 
    2003 additional line, the \ippdbcolumn{pantasksState}, is added so pantasks 
     2046additional line, the \ippdbcolumn{pantasksState}, is added so \ippprog{pantasks} 
    20042047can manage the processing of the job which will be generated by this 
    2005 page.  When the page is first generate, the 
     2048page.  When the page is first generated, the 
    20062049\ippdbcolumn{pantasksState} is set to \ippmisc{INIT}, indicating that 
    20072050this \ippmisc{page} is a new addition to the \ippmisc{book}.  Once all 
     
    20182061construct the appropriate command-line (e.g., lines in the page may 
    20192062include input file names and output file names for the specific item 
    2020 in the database).  The resulting command becomes a job in the pantasks 
     2063in the database).  The resulting command becomes a job in the \ippprog{pantasks} 
    20212064collection of jobs.  Most IPP analysis stages specify that the jobs 
    2022 are then sent to pcontrol for parallel process.  Before task generates 
     2065are then sent to \ippprog{pcontrol} for parallel process.  Before task generates 
    20232066the job, the \ippdbcolumn{pantasksState} is set to \ippmisc{RUN} so a 
    20242067future execution of the task will not attempt to re-run this specific job. 
     
    20292072this responsibility is left to the program which ran the analysis. 
    20302073IPP analysis steps normally consist of two main elements: a C-language 
    2031 program to do the data analysis work and a supporting perl script 
     2074program to do the data analysis work and a supporting Perl script 
    20322075which performs the database update upon completion.  Upon completion, 
    2033 the pantasks \ippmisc{RUN} tasks is responsible for updating the 
     2076the \ippprog{pantasks} \ippmisc{RUN} tasks is responsible for updating the 
    20342077status within the book, but not within the processing database.  This 
    2035 split keeps the interactions at the pantasks level relatively light, 
     2078split keeps the interactions at the \ippprog{pantasks} level relatively light, 
    20362079leaving the overhead of the database interaction within the job 
    20372080running on one of the computing machines in the cluster. 
     
    20422085clear jobs which have failed with one of the ephemeral failure modes 
    20432086(see the discussion in Section~\ref{sec:processing.database}).  This 
    2044 step allows these failures to be cleared from the system, and schedule 
    2045 those jobs again for a retry 
     2087step allows these failures to be cleared from the system, allowing 
     2088those jobs to be scheduled again 
    20462089 
    20472090Similarly, some stages have \ippmisc{advance} tasks that update the 
     
    20662109discussed above, the query to the processing database for new items is 
    20672110restricted to a set of user-defined labels.  A given instance of 
    2068 pantasks will be supplied a set of labels which are then applied to 
    2069 all tasks managed by that pantasks.  For example, the pantasks which 
     2111\ippprog{pantasks} will be supplied a set of labels which are then applied to 
     2112all tasks managed by that \ippprog{pantasks}.  For example, the \ippprog{pantasks} which 
    20702113manages the nightly processing of the basic science analysis stages 
    2071 (chip - warp, stack, diff) is supplied with several labels which 
     2114(\ippstage{chip} - \ippstage{warp}, \ippstage{stack}, \ippstage{diff}) is supplied with several labels which 
    20722115correspond to the different kinds of observations being performed.  In 
    20732116this way, the analysis of the nightly observations is kept separate 
     
    20832126\note{then discuss the addstar sequences with manual triggering} 
    20842127 
    2085 Outside of the basic sequence of chip to warp, there is no single 
     2128Outside of the basic sequence of \ippstage{chip} to \ippstage{warp}, there is no single 
    20862129natural next step.  For example: a stack can be generated with any 
    20872130number of input warps; a difference image can be generated between a 
     
    21032146significantly reduced from the arbitrary case.   
    21042147 
    2105 {\em Queuing the diffs} is done by first examining the set of all 
     2148Queuing the diffs is done by first examining the set of all 
    21062149exposures that have been taken at the summit on the current night of 
    21072150observing, and querying information from each stage up through 
     
    21112154group are then sorted by increasing observation date 
    21122155(\ippdbcolumn{dateobs}).  The database results for each stage 
    2113 (chip-warp) are checked to ensure that the selected exposures have 
     2156(\ippstage{chip}-\ippstage{warp}) are checked to ensure that the selected exposures have 
    21142157been successfully processed for all stages through \ippstage{warp}. 
    21152158Exposure groups are ignored until all exposures have either been 
     
    21292172that were excluded due to an odd number of exposures to be paired with 
    21302173the exposure closest in time (with the exposure that was previously 
    2131 first ignored).  Exposure pairs in which at least one exposures does 
     2174first ignored).  Exposure pairs in which at least one exposure does 
    21322175not have a pre-existing difference image are queued for difference 
    21332176image analysis. 
     
    21382181exposures, as this is the number of exposures taken for each field. 
    21392182Once this number was reached, no more exposures are expected, so 
    2140 \ippstage{stack} database entries can be queued with the 
     2183\ippstage{stack} database entries can be queued from the 
    21412184\ippstage{warp} entries.  Again, failures and weather can reduce the 
    21422185number of usable exposures.  If no stack could be made for a given MD 
    21432186field with the minimum number of inputs by the time of the 
    2144 end-of-night darks, stacks are generated using using whatever 
     2187end-of-night darks, stacks are generated using whatever 
    21452188exposures are available. 
    21462189 
     
    21612204\ippdbtable{lapRun} entries can be queued that define a 
    21622205\ippdbcolumn{filter} and a \ippdbcolumn{projection_cell} to be 
    2163 considered.  A \ippdbcolumn{projection_cell} is a unit of sky defined 
    2164 to be a square four degrees on each side which has a single tangent 
    2165 plane projection \citep[][see]{waters2017}.  \note{does waters2017 
    2166   discuss RINGS.V3? if not, where?}  Once this entry is defined, is is 
    2167 populated with exposures (stored in the \ippdbtable{lapExp} table in 
    2168 the database), with any exposure located within 5 degrees of the 
    2169 center of the projection cell included.  This radius ensures that any 
    2170 exposure that overlaps the projection cell will be included.  Once the 
    2171 exposures have been added, the other exposures within the same 
    2172 sequence are checked to see if a \ippstage{chip} stage entry has been 
    2173 generated, and if so, the \ippdbcolumn{chip_id} for that entry is 
    2174 saved into the \ippdbtable{lapExp} as well.  This linkage ensures that 
    2175 each exposure is only processed once.  If no entry is found, a new 
    2176 \ippstage{chip} entry is queued for processing.  The task periodically 
    2177 checks the status of the exposures in each \ippdbtable{lapRun} entry, 
    2178 and if they have all completed the \ippstage{warp} stage, then a 
    2179 \ippstage{stack} is queued for each skycell contained within the 
     2206considered.  These projection cells match the tangent plane centers 
     2207used for the warp tessellation.  A \ippdbcolumn{projection_cell} is a 
     2208unit of sky defined to be a square four degrees on each side which has 
     2209a single tangent plane projection \citep[][see]{waters2017}. 
     2210\note{does waters2017 discuss RINGS.V3? if not, where?}  Once this 
     2211entry is defined, it is populated with all exposures (stored in the 
     2212\ippdbtable{lapExp} table in the database) that are located 
     2213within 5 degrees of the center of the projection cell included.  This 
     2214radius ensures that any exposure that overlaps the projection cell 
     2215will be included.  Once the exposures have been added, the other 
     2216exposures within the same sequence are checked to see if a 
     2217\ippstage{chip} stage entry has been generated, and if so, the 
     2218\ippdbcolumn{chip_id} for that entry is saved into the 
     2219\ippdbtable{lapExp} as well.  This linkage ensures that each exposure 
     2220is only processed once.  If no entry is found, a new \ippstage{chip} 
     2221entry is queued for processing.  The task periodically checks the 
     2222status of the exposures in each \ippdbtable{lapRun} entry, and if they 
     2223have all completed the \ippstage{warp} stage, then a \ippstage{stack} 
     2224is queued for each skycell contained within the 
    21802225\ippdbcolumn{projection_cell}. 
    21812226 
     
    21922237system per-se, but only method of tracking the locations of files 
    21932238within the file system, and of tracking duplicate copies of the same 
    2194 file.  The core of \ippprog{Nebulous} is a dedicated database engine 
    2195 which tracks ``storage objects'', the concept of a file exists in the 
     2239file.  The core of \ippprog{Nebulous} is a mysql database which tracks 
     2240``storage objects'', the equivalent concept of a file within the 
    21962241system.  Each storage object may be associated with a number of copies 
    21972242of the actual files on the disks in the storage system (called 
     
    22132258stored on a specific computer (for at least one of the instances). 
    22142259All of the analysis stages which interact with that chip could then be 
    2215 preferentially targetted to be run on that computer.  The localization 
    2216 in \ippprog{Nebulous} and the host targetted processing in pantasks 
     2260preferentially targeted to be run on that computer.  The localization 
     2261in \ippprog{Nebulous} and the host targeted processing in \ippprog{pantasks} 
    22172262can therefore work together to encourage processing to require only 
    22182263local disk access, reducing the I/O local on the network 
     
    22212266practice, the as-built IPP has had sufficient network bandwidth that 
    22222267this targetting was not required.  In practice, due to the timing of 
    2223 hardware aquisition, occasional hardware failures, and other 
    2224 organizational details, targetted processing has only been used to a 
     2268hardware acquisition, occasional hardware failures, and other 
     2269organizational details, targeted processing has only been used to a 
    22252270moderate degree within the Pan-STARRS cluster. \note{can we get a 
    22262271  number here?} 
     
    22292274 
    22302275The user interfaces to Nebulous consist of command-line programs as 
    2231 well as APIs in both C and Perl.  The basic user commands to interact 
    2232 with Nebulous are to 1) create a new storage object and associated 
    2233 instance; 2) add a new instance to an existing storage object; 3) 
    2234 remove (cull) an instance; 4) delete a storage object; and 5) find a 
    2235 file associated with a given storage objects.  Note that these user 
    2236 commands do not affect the files on disk \note{true for cull?} 
    2237 (exception: the create function will create an empty file if one does 
    2238 not exist).  They only change the state of the Nebulous database; it 
    2239 is the responsibility of the user program to read and write data to a 
    2240 file and to create the copies, etc. 
     2276well as APIs in both C and Perl.   
     2277 
     2278"The basic user commands to interact with Nebulous are to 1) query the 
     2279database for an existing storage object, and find a valid file 
     2280instance associated with that object; 2) create a new storage object, 
     2281which instantiates an empty file that can be opened for writing; 3) 
     2282replicate an existing storage object to create more file instances; 4) 
     2283cull a single file instance of storage object from the cluster; and 5) 
     2284remove a storage object, and ensure that all file instances are 
     2285removed.  The filehandles returned for newly created instances can 
     2286then be opened for reading and writing data to that instance. 
     2287 
     2288% The basic user commands to interact 
     2289% with Nebulous are to 1) create a new storage object and associated 
     2290% instance; 2) add a new instance to an existing storage object; 3) 
     2291% remove (cull) an instance; 4) delete a storage object; and 5) find a 
     2292% file associated with a given storage objects.  Note that these user 
     2293% commands do not affect the files on disk \note{true for cull?} 
     2294% (exception: the create function will create an empty file if one does 
     2295% not exist).  They only change the state of the Nebulous database; it 
     2296% is the responsibility of the user program to read and write data to a 
     2297% file and to create the copies, etc. 
    22412298 
    22422299For the Nebulous users, the identifier of a storage object is a unique 
     
    22472304computer (HOST) and disk (VOL).  The path and filename portions become 
    22482305the identifier and are recorded in the \ippmisc{storage_object} table 
    2249 in the \ippmisc{extern_id} field.  A storage object entry is then 
    2250 created in the database for this id, and an instance of the file 
    2251 created on the specified node (or at random from available nodes if 
    2252 left empty). 
     2306in the \ippmisc{ext_id} field.  A storage object entry is then created 
     2307in the database for this id, and an instance of the file created on 
     2308the specified node.  If the host is unspecified, or if the specified 
     2309volume is full, then a host is chosen at random from available nodes. 
    22532310 
    22542311Files are stored on specific computers in a \ippprog{Nebulous} 
     
    22582315\code{nebulous}.  Beneath the top-level directory are 256 
    22592316subdirectories with names of the form 00 - ff (i.e., 2 digit 
    2260 hexadecimate number).  Each subdirectory again as 256 subdirectories 
    2261 with the same naming scheme.   
     2317hexadecimal number).  Each subdirectory has 256 subdirectories with 
     2318the same naming scheme.   
    22622319 
    22632320The filename of an instance in Nebulous is deterministic and derived 
    2264 from the \ippmisc{extern_id}: the \ippmisc{extern_id} is hashed using 
     2321from the \ippmisc{ext_id}: the \ippmisc{ext_id} is hashed using 
    22652322the SHA-1 function, and the first four hexadecimal digits of this hash 
    22662323are separated into two two-digit strings and used as the top and 
     
    22692326provide a unique SQL ID for each instance.  Under the subdirectory 
    22702327identified above, the disk file name is by appending the database 
    2271 instance id with a string derived from the \code{extern_id}: forward 
     2328instance id with a string derived from the \code{ext_id}: forward 
    22722329slash characters are replaced in the name with colons so the string 
    22732330can represent a file in the UNIX filesystem.  For the example URI 
     
    23332390using only the low-latency SOAP communications. 
    23342391 
    2335 \note{need a paragraph or two on stats: how many objects, how many 
    2336   instances?} 
     2392The Nebulous database currently (2017 July) contains information about 
     23935,560,533,654 file instances for 3,543,240,981 storage objects.  All 
     2394raw data, along with permanent products such as catalogs and the 
     2395current versions of full-sky stacks, are replicated to ensure at least 
     2396two copies exist in case of hardware failure.  Based on the most 
     2397recent database ID values (which are unique and never reused), this 
     2398corresponds to roughly half of all the storage objects and file 
     2399instances ever created, due to the transient nature of many pipeline 
     2400products. 
     2401 
     2402% those numbers are so_id 6758205602 ins_id 9971666505, with ratios 
     2403% 0.5242, 0.5576) 
    23372404 
    23382405\subsection{Datastore repositories} 
     
    23432410that exposes data in a common form.  \note{add Isani / Hoblitt 
    23442411  reference?}  One of the main datastores used by the IPP is the one 
    2345 located at the summit.  This datastore exposes, a list of the 
     2412located at the summit.  This datastore exposes a list of the 
    23462413exposures obtained since the start of the PS1 operations.  Requests to 
    23472414this server may restrict to the latest by time.  Each row in the 
     
    23532420associated with that exposure.  This listing includes a link to the 
    23542421individual chip FITS files as well as an md5 checksum.  Systems which 
    2355 are allowed access may download chip FITS files via http requests to 
     2422are allowed access may download the raw chip FITS files via http requests to 
    23562423the provided links. 
    23572424 
     
    25092576These storage nodes are not fully capable of completing all processing 
    25102577on the short timescale necessary for each night's worth of data.  To 
    2511 increase the processing capability, we have a large number 
    2512 \note{actual number?} of ``compute'' nodes, that have small amounts of 
    2513 local storage, but are able to add processing power.  In addition to 
    2514 the direct processing of image data, these nodes are also used to 
    2515 manage the \ippprog{Nebulous} file interface, as well as controlling 
    2516 the job scheduling for the processing. 
     2578increase the processing capability, we have 212 ``compute'' nodes that 
     2579have small amounts of local storage, but are able to provide 
     2580additional processing power.  In addition to the direct processing of 
     2581image data, these nodes are also used to manage the \ippprog{Nebulous} 
     2582file interface, as well as controlling the job scheduling for the 
     2583processing. 
    25172584 
    25182585The final type of computer in the cluster are the database servers. 
     
    26312698products are present. 
    26322699 
    2633 Approximately half of the chip through warp processing for the PV3 
    2634 reduction was performed on Mustang, with 201,040 / 375,573 of the 
    2635 \ippstage{camera} stage products reduced there.  Only processing 
    2636 through the \ippstage{stack} stage was attempted, although with a 
    2637 smaller fraction of the total compared to the \ippstage{camera} stage, 
    2638 with 290,257 / 998,886 being produced at Los Alamos.  One reason for 
    2639 this decrease is that due to the memory constraints on the Mustang 
    2640 processing nodes, we were unable to run stacks with more than 25 
    2641 inputs there.  Stacks with this larger number of inputs overflow the 
    2642 memory of the processing node, and as they do not have disk space 
    2643 available for use as virtual memory, cause the machine to hang until 
    2644 the job time limit is reached.  These stacks were instead processed on 
    2645 the regular IPP cluster, where hosts with sufficent memory were 
    2646 available. 
     2700Approximately half of the \ippstage{chip} through \ippstage{warp} 
     2701processing for the PV3 reduction was performed on Mustang, with 
     2702201,040 / 375,573 of the \ippstage{camera} stage products reduced 
     2703there.  Only processing through the \ippstage{stack} stage was 
     2704attempted, although with a smaller fraction of the total compared to 
     2705the \ippstage{camera} stage, with 290,257 / 998,886 being produced at 
     2706Los Alamos.  One reason for this decrease is that due to the memory 
     2707constraints on the Mustang processing nodes, we were unable to run 
     2708stacks with more than 25 inputs there.  Stacks with larger numbers of 
     2709inputs overflow the memory of the processing node, and as they do not 
     2710have disk space available for use as virtual memory, cause the machine 
     2711to hang until the job time limit is reached.  These stacks were 
     2712instead processed on the regular IPP cluster, where hosts with 
     2713sufficent memory were available. 
    26472714 
    26482715\subsection{UH Cray Cluster}