LSST Science and Project Sizing Inputs Explanation LSE-82 10/10/2013
Science and Project Sizing Inputs Explanation | ||
Gregory Dubois-Felsmann and Kian-Tat Lim | ||
LSE-82 | ||
Latest Revision: October 10, 2013 | ||
This LSST document has been approved as a Content-Controlled Document by the LSST DM Technical Control Team. If this document is changed or superseded, the new document will retain the Handle designation shown above. The control is on the most recent digital document with this Handle in the LSST digital archive and not printed versions. Additional information may be found in the LSST DM TCT minutes. | ||
Change Record | ||
Version | Date | Description | Owner name |
1 | Original document created to match LSE-81 | G. Dubois-Felsmann | |
1.1 | 7/17/2011 | Updated to match version 6 of LSE-81 | G. Dubois-Felsmann |
1.2 | 8/15/2011 | Updated to match version 14 of LSE-81 | G. Dubois-Felsmann |
1.3 |
10/4/2013 | Updated to match version 24 of LSE-81 | Kian-Tat Lim |
1.4 |
10/10/2013 | TCT approved | R Allsman |
Table of Contents
Change Record i
2 | Introduction 1 |
3 | Science Estimates 1 |
4 | Camera Specifications 3 |
5 | Survey/Cadence Specifications 3 |
6 | Engineering & Facility Database Specifications 3 |
7 | Network Requirements 3 |
8 | Image Storage Requirements 4 |
9 | Data Release Production Specifications 4 |
10 | Calibration Products Productions Specifications 4 |
11 | User Image Access Specifications 4 |
12 | User Catalog Query Specifications 4 |
13 | L3 Processing Specifications 5 |
14 | EPO Specifications 5 |
15 | Common Constants and Derived Values 5 |
1 Introduction
The Data Management Sizing Model (see the DM Technical Baseline Collection-2511) begins with a collection of key parameters that drive the scale of the design. These come from several sources:
· The Science Requirements Document (LPM-17) and additional estimates of the science content of the survey
· The two levels of system requirements documents, i.e., the LSR (LSE-29) and OSS (LSE-30)
· Additional estimates of the sizes of a variety of elements of the system, beyond those captured as system requirements
· Key assumptions about the design of the Data Management data processing model.
These inputs are collected in the spreadsheet “LSST Science and Project Sizing Inputs”, LSE-81. They are sorted into the categories that follow.
The values in the tab “SciReq” in the workbook then form an interface that is respected by the compute and storage/IO/network requirements estimation spreadsheets, LDM-138 and LDM-141 respectively. To update those models, the contents of the interface tab are simply copied to the corresponding tab in the destination workbook.
The present document will in the future be extended with detailed footnotes for the provenance of numbers obtained from previous surveys, detailed analyses, etc.
2 Science Estimates
The centerpiece of this section is a set of estimates of the numbers of detectable stars and galaxies in the full planned survey. These estimates rely on the cadence specifications below, particularly the total area surveyed, the total number of visits, and the resultant mean number of visits including any given point on the sky (referred to as epochs).
The visits in the cadence are assumed to belong to two categories: ones away from the galactic confusion zone (referred to as "Universal Sky" for historical reasons) and ones close to the galactic confusion zone (referred to as "Galactic Plane" for historical reasons). These categories are necessary because the density of observable stars and galaxies is dramatically different in each category.
The galaxy estimates are based on previous surveys and on available deep-field data to explore the faint limit. The total number of observable galaxies is the number seen in the r band times a correction factor of 1.04 derived from an assumed color-magnitude distribution. This quantity is further adjusted for the ratio of the expected survey coverage area to the nominal area of 30,000 square degrees used in the estimate computation.
The growth curve for galaxies is estimated as:
Galaxies_at_survey_end * (year / surveyYears) (k_gal*1.25)
where k_gal is the growth rate parameter for galaxies from the Science Book and 1.25 is a combination of the square-root scaling with time and the standard magnitude exponent. A single-frame galaxy count is determined from the per-band galaxy counts using the same formula but substituting (1 visit / epochs) for the time ratio.
The object count for stars is estimated based on Milky Way structure models derived primarily from SDSS data and extrapolated to the faint limit expected for the survey based on observations of nearby stars. The models give the number of observable stars in each band at a given magnitude limit as well as the number observable in a deep panchromatic coadd given a particular r-band limit. These tables are given for both the sky excluding the galactic confusion zone and for the entire sky including the galactic confusion zone, but we only use the latter. We interpolate in the per-filter tables using the LSST single-frame limits from the SRD. The r-band limit derived from the number of r-band epochs after each survey year is used to define the growth curve for the total number of Objects due to stars. Again, a correction is applied for the survey. Note that we are conservatively assuming that fields in the confusion zone receive the same number of visits in each band (epochs) as those outside.
Both the star and galaxy count estimates are subject to considerable and difficult-to-quantify uncertainty because of the unprecedented combination of breadth and depth of the LSST survey and the range of filter bands available (e.g., the SDSS did not have a y filter). The star estimates are further complicated by the expected challenges of imaging and deblending in the crowded fields around the direction of the Galactic center, due to dust distribution, true bulge profile, luminosity functions, confusion limits, etc and because the survey design in that region is less fully explored than out-of-plane. The use of existing narrow-angle deep fields is essential to the estimates, but the resulting precision is limited by unknown cosmic variance effects. For these reasons, we added a 30% uncertainty factor on top of the original estimates, for both stars and galaxies, in single frames and at the end of the survey. Larger variations in the star/galaxy count are expected to be handled through margin and contingency.
The numbers of sources (single-frame detections) for both stars and galaxies are estimated by multiplying the single-frame object numbers derived above by the number of epochs in each filter. These grow linearly with time.
The count of forced sources at any given time is simply calculated as:
number_of_objects_at_any_given_time * number_of_visits_observed_so_far.
In addition, estimates are given for:
· the peak number of alerts issued in a single visit,
· the average number of alerts issued due to variables and old transients,
· the average number of alerts issued due to new transients,
· an estimate for the average number of alerts issued due to false positive detections,
· the number of moving Solar System objects detectable by the end of the survey,
· the fraction of all difference imaging sources that are not associated with previously-known objects (of any type) and that are true observations of moving Solar System objects, and
· the size of the time window in which forced photometry is performed for transient alerts, including an assumption that the average (true) transient produces detections for one month.
3 Camera Specifications
These are very basic parameters of the camera (and telescope) design that drive the raw image size and the unit of sky coverage. They derive primarily from the LSR and OSS.
4 Survey/Cadence Specifications
Starting with parameters from the OSS, this section quantifies the scope of the full survey in sky coverage. Based on operations simulations, and documented in SRD, the total number of expected visits obtained in a 10-year survey (with all losses accounted for) are provided. The exact distribution of these visits across the sky, and among filters, has a direct impact on source and object counts, and the latter is still uncertain.
We make conservative (i.e., trending toward larger requirements) estimates of the number of nights and hours of observing time.
The average number of calibration exposures per observing night is derived from the OSS specification (which in turn originates in the Calibration Plan) for the total number of calibration exposures per year. Since this number includes calibration exposures taken during both daytime and non-observing, cloudy nights, it is a conservative overestimate of the peak rate of such images as well as an appropriate estimate for deriving their total number.
Specifications that take into account the length of the commissioning period are found later, under image storage.
5 Engineering & Facility Database Specifications
This section contains simple estimates of the (relatively small) size requirements for the storage of telemetry from the Observatory and images and spectra from the auxiliary telescope. The telemetry estimates are taken from the OCS telemetry channel lists. They will be updated as the design of the OCS and the subsystem interfaces to it progress.
6 Network Requirements
The network requirements are based on the raw data volume, estimates of the availability of network bandwidth between sites and the reliability of the networks, and a model that requires excess capacity on all long-distance network links for catching up after outages.
7 Image Storage Requirements
This section begins the modeling of the processing components of DM. Here we specify assumptions for the sizes of various image caches that allow processing to proceed without all the image data on spinning disk, notably including an assumption that we will maintain 30-day sliding windows of recent calibrated images, with the remainder recreated upon demand, and of raw calibration image data to use as inputs for the calibration data products production.
Images, both science and calibration, taken during commissioning need to be archived as well, so specifications for them are included here.
We also document the assumptions for the construction of deep coadds (stacks of all exposures taken) and templates (coadds of exposures taken in particularly good seeing, for the purpose of transient detection and high-spatial-resolution measurement of brighter objects). This includes an assumption that separate templates will be maintained for different airmasses, to minimize the need for PSF-matching across large variations in seeing and extinction.
8 Data Release Production Specifications
This section expresses a model for the operation of Data Release Production and its validation with simulated data. It assumes a phased sequencing of an instance of DRP and expresses a budget of time allocated to each phase. It also includes a basic assessment of the size of the numeric data products derived from the data for each observed object.
9 Calibration Products Productions Specifications
Estimates the quantity of calibration data products to be produced.
10 User Image Access Specifications
Estimates the rate at which users will require access to image data, both small cutouts around objects and larger requests. The specifications for query size and load are taken from the SDSS experience.
Note that requests for calibrated science images trigger recomputations unless the images are in the assumed cache, mentioned above.
11 User Catalog Query Specifications
These estimates of query size and rate are based on the SDSS experience, the Science Collaboration survey of expected queries, and the Community Access White Paper, where they are discussed in more detail. We assume the high-volume database query response time will be driven by the speed of shared scans. Speed of shared scans for different catalogs (Object, Source, ForcedSource) have been selected to maximize science and keep the cost at a reasonable level.
We expect the majority of the database access will involve the Object Catalog, but we also include simultaneous queries against other Level 2 catalogs, the Level 1 database, and the Engineering and Facilities Database.
12 L3 Processing Specifications
This simply expresses the notion that a flat fiat has been imposed stating that a 10% increment on the computing and storage resources required to perform the survey and the required Data Release computing will be supplied for the generation and storage of user (“Level 3”) data products. We also ensure that sufficient storage is available for the expected population of scientists, even if that requires more than the 10% allocation.
13 EPO Specifications
These specifications bound the network data transfer load from DM to EPO.
14 Common Constants and Derived Values
This section contains the definitions of certain standard conversion constants, as well as the computation of certain commonly used sizing parameters derived from the above specifications, e.g., the size of a raw image in bytes.