LSST Science User Interface & Science User Tools Conceptual Design LDM-131 10/11/2013
Large Synoptic Survey Telescope (LSST)
Science User Interface & Science User Tools Conceptual Design
Schuyler Van Dyk and Deborah Levine
LDM-131
Latest Revision: October 11, 2013
Change Record
Version
|
Date
|
Description
|
Owner name
|
0.9
|
5/20/2011
|
Initial draft
|
Van Dyk and Levine
|
1.0
|
8/23/2013
|
First revision
|
Van Dyk
|
1.1
|
10/11/2013
|
Updates to the first revision; TCT approved
|
Van Dyk
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table of Contents
Change Record
| i |
LSST Science User Interface & Science User Tools (SUI/T) Conceptual Design
|
1
| |
1.
| Scope of SUI/T and the Conceptual Design
| 1 |
2.
| Scope and Context of this Document
| 2 |
3.
| Overview of Requirements
| 2 |
4.
| Conceptual Design
| 3 |
4.1.
| Assumptions and Approach to Design
| 3 |
4.1.1.
| LSST SUI and the "Brave New World"
| 4 |
4.1.2.
| Use Patterns in Astronomy
| 4 |
4.1.2.1.
| Protocols and the Virtual Observatory
| 5 |
4.2.
| Overview of Conceptual Design
| 5 |
4.3.
| The SUI Infrastructure
| 7 |
4.3.1.
| Data Access and Processing Control Protocols
| 9 |
4.3.2.
| Job Handling and Resource Management
| 9 |
4.3.3.
| Data Co-resident Service Engines
| 9 |
4.4.
| The Backend
| 9 |
4.5.
| In the Middle – External LSST-developed Services
| 10 |
4.6.
| SUI/T User Interfaces and Tools
| 11 |
4.6.1.
| Helpdesk/User Documentation
| 11 |
4.6.2.
| User Workspace User Interface
| 11 |
4.6.3.
| Web-Hosted Tools and Downloaded/Installed Tools
| 11 |
4.6.4.
| Alert Subscription Interface
| 12 |
4.6.5.
| "Front Door" Search and Retrieve Interface
| 12 |
Appendix A:
| Representative Use Cases
| 14 |
A.1.
| Mapping the Milky Way Use Case
| 14 |
A.2.
| Exploring The Transient Optical Sky Use Case
| 15 |
A.3.
| Taking an Inventory of the Solar System Use Case
| 16 |
A.4.
| Constraining Dark Energy and Dark Matter Use Case
| 17 |
Appendix B:
| Existing Tools & Technology Analysis
| 18 |
LSST Science User Interface & Science User Tools (SUI/T) Conceptual Design
1.
| Scope of SUI/T and the Conceptual Design
|
The Science User Interface and Tools effort covers the development and integration necessary to ensure that the LSST Science Data residing at one of the Data Access Centers are accessible to the Scientific Community. The Scientific Community is considered to include US users and all LSST international partners. Education and Public Outreach is a separate effort, but significant overlap in underlying archive interfaces is anticipated. Support for science data quality analysis and Level 3 pipeline processing is separate within LSST and not under the SUI/T umbrella.
The Science User Interface and Tools is a part of the LSST Data Management Applications effort and is contained within the WBS element 02C.05 “Science Data Quality Assessment and Analysis Tools”. This WBS element contains the software, tools, and user interfaces necessary to assess the scientific quality of the data products via documentation, quality metrics/ratings, statistical reports, and experimental results using the data products; provide coherent and intuitive mechanisms for scientists to access the data products (images, catalogs, alerts) and federate the data products with external data; provide automated and human assistance in working with the DMS and data products; enable the scientists to develop analytical codes by reusing existing DMS codes and integrating externally developed codes; execute the analytical codes on local or external platforms; capture and analyze the results of those codes.
The SUI contains five Main functional areas
·
| Archive Browsing and Data Query
|
·
| Data Visualization
|
·
| Alert Subscription and Analysis
|
·
| DAC-hosted User Workspace
|
·
| Helpdesk and User Support
|
However, it is reasonable to expect that the science data quality analysis tools may have significant overlap with the archive access and visualization utilized by researchers and that the pipeline-processing toolkit may share the User Workspace.
The science data accessed by these systems are assumed to be housed at a Data Access Center (DAC). Currently two DACs are scoped, one co-located with the LSST Archive at NCSA and one co-located with the Base facility in Chile. All DACs are assumed to be architecturally identical in terms of database schema, infrastructure and middleware, although different subsets of data might be housed at different DACs. The User Workspace storage and computationally heavy backend software are assumed to be co-located with the science data at a DAC. The Helpdesk and other tools are assumed to be web-accessible; no assumption is made about the server location.
Back to top
2.
| Scope and Context of this Document
|
This document seeks to present a conceptual architecture that is independent of specific technology, protocol or platform choices to be made during the detailed design phase. The Science User Interface and Tools as described are to be released near the end of the Construction phase, around 2019, and the specific technologies available are likely to evolve significantly between now and then. For this reason, generic terminology is used in a number of cases. Where specific technology choices are present or can be inferred, they represent a case where existing technology or protocols can be used as an example or model for a component that provides the required functionality. In addition, these are choices that could be used for early prototyping of a system. However, they should not be viewed as binding choices going into the detailed design phase.
Back to top
3.
| Overview of Requirements
|
The science requirements are documented in LSST Science User Interface/Science User Tools Requirements (LDM-130) and were the result of consultation with the SUI Focus Group and other scientists and science tool developers. The resulting list of requirements was relatively straightforward for the Archive Browser/Query Tools, Helpdesk, Alert Subscription and User workspace areas, although the Archive Browser/Query Tools performance requirements are potentially a risk area depending heavily on the database design and basic DBMS query performance. The requirements for Data Visualization are varied and some are quite aggressive, but they also become increasingly useful as the survey progresses and as increasing volumes of data, and increasing temporal information, accumulate.
Implicit requirements are levied by the rate and volume of the survey, with the focal plane containing 3.2 Gpixels and 15 of raw data nightly. The archive is expected to grow by 5.6PB of images and 0.6PB of catalog data per year.
Currently the requirements are divided into two levels of priority called Tier 1 and Tier 2. The Tier 1 requirements contain the basic functionality needed at the time the survey begins, and a significant subset of those are needed for commissioning.
The science requirements were externally reviewed in 2013 February. The review committee issued a report and a response to the report by the project is in process.
Back to top
Assumptions and Approach to Design
LSST will begin its Construction phase in an epoch where software in general, and astronomical software in particular, is evolving towards increasingly distributed, cooperative and open access approaches. At the same time, LSST will generate an extremely large coherent dataset with unique characteristics and much of the community-shared development is in its infancy at the time of this design. Therefore, the design presented aims to strike a balance between facilitating maximal compatibility with externally-developed resources and ensuring control over a sufficient portion of the system, so that the project can ensure that a useable interface to the data is available to the community when the telescope and camera have been commissioned, without relying on development it does not control.
With that in mind, a design philosophy has been established. It includes the following:
·
| Utilize Program-friendly interfaces as a communications layer between LSST-specific software running at the DAC and end-user applications. Include application programming interfaces (APIs) for any tool or service developed.
|
·
| Utilize common protocols (e.g., VO protocols) in the API layer to facilitate making the data available through any compatible service. Develop new protocols only if needed, and whenever possible, strive to do so within the context of generalized utility and community discussion.
|
·
| Use a development model in which service components can be located either with user interface components in a distributed way or co-located with the data at a DAC, so that an optimum balance can be struck between utilizing local resources and co-locating tasks that access a large fraction of the LSST data with the data.
|
·
| Facilitate access through existing tools; use COTS where targeted development is not needed (e.g., Helpdesk software).
|
·
| Concentrate development effort where most needed. We believe this to include:
|
o
| The APIs;
|
o
| A SUI infrastructure which sits between the DM-developed middleware and the APIs that enable access to the LSST data;
|
o
| Any LSST-unique functionality;
|
o
| A “front door” data search and retrieval interface that provides the Tier1 functionality to the community;
|
o
| An Alert subscription and notification service.
|
The conceptual design comprises a layered architecture, which is responsive to the requirements and to the project’s philosophy of working with shareable and extensible architectures and tools. The requirements and design are both informed by the Example Use Cases documented in Appendix A.
To facilitate concentrating the effort where it is most needed, we will carry out a comparison of the required functionality with existing tools that might provide either a model, a base for customization or plug-and-play functionality, given suitable detailed design of the underlying layers of the architecture. The beginnings of this effort appear in Appendix B. Although not complete as of this version, it is also certainly the case that this type of technology evaluation will need to be ongoing throughout the design and development of the SUI/T.
LSST SUI and the "Brave New World"
Classic science information systems design has involved building up from data access through higher-level data processing constructs and wrapped by graphical user interface (GUI) components to provide a uniform and integrated system for the end user. However, data volumes are increasing geometrically, processing is becoming ever more complex and distributed, and more and more tools are being developed specific to a narrow discipline. It is beyond the capability of any one organization to capture all the functionality desired by a diverse astronomical community in a single monolithic and opaque construct.
There is an enormous wealth of skill and existing software tools in the community that can be brought to bear on LSST data handling and analysis problems if the right framework is in place. The LSST SUI must therefore be largely about interconnectivity and facilitating development of solutions, rather than building a single top-down system.
Use Patterns in Astronomy
The breadth of potential use cases is daunting, but closer inspection of the work habits of astronomers unveils clear patterns.
For example, many researchers prefer to work within a computational environment like IDL, Matlab, or the astronomy-specific IRAF. Some even build their own integration frameworks using scripting languages, such as Python. In both of these cases the key is to provide direct access to low- and intermediate-level search, retrieval, and processing capabilities using standard (e.g., Virtual Observatory protocols). Similarly, some researchers will prefer to access LSST data in the context of some other archive or vice versa. For instance, obtaining LSST image data for a galaxy they are investigating in NED. In addition, many users prefer tools they are familiar with and balk at learning something new. This evolves with time, but the process can be seen as evolutionary, rather than revolutionary. Since many search interfaces and data visualization tools are already in use in the community, this demographic can be readily addressed by making modules "plug-and-play." For example, providing infrastructure to support a user who wants to compose catalog queries through the CDS VizieR interface, or one who wants image results sent to SAOImage DS9 or Aladin.
On the other hand, a casual user who is seeking a single image to use as supporting material for a poster or proposal may be best served by an intuitive "one-stop shopping" interface that folds in ready access to basic documentation. At the other end of the spectrum a "power user" whose research is heavily invested in a large, rich and complex dataset may gravitate towards highly specific tools and interfaces, if their use improves the speed or ease with which the data can be accessed.
A common paradigm in archive interface design is similar to online shopping and based on the idea of locating and downloading (or "ordering") products. As data volumes increase, local storage of query results becomes more and more infeasible for many problems. In these cases, the developed toolset will need to be a broker or flow manager between the data source, the processing and the user interface. For very large datasets this may even include negotiating the transfer of software to the data, rather than data to the software. This type of flow-and-process management, as well as careful consideration of what types of computation can be done at the user interface level and which must be done close to the data source, will be key to the development of a detailed design which permits the distillation of a vast dataset to something which, first, a commonly-available network and, second, a human user can handle. This last challenge can be seen as the key one in creating a workable SUI for LSST.
Protocols and the Virtual Observatory
The SUI portion of the LSST project seeks to solve many of the same problems as the international Virtual Observatory effort and the U.S. Virtual Astronomical Observatory Project in particular. The VO protocols include data location (registry and inventory services), retrieval (Simple Image Access Protocol, Table Access Protocol, etc.), process control (Universal Worker Service, etc.) and security, all of which will be needed for the LSST SUI. The VAO mission includes promoting and extending the general model of mid-level protocols. While it is important for LSST to limit dependencies on products and processes it does not control, it is a goal that the SUI be based on this work wherever possible. Where new protocols are found to be needed or old protocols extended, LSST will seek to work within the IVOA standards process.
Overview of Conceptual Design
The architecture has been designed as a set of backend components with cleanly defined, common interfaces that manipulate data so that it can be accessed by a wide variety of tools according to common protocols. This is referred to as the "SUI Infrastructure." These are presumed to use VO standards whenever they are stable and compatible with the functionality needed. The philosophy is to work upward from a solid core of enabling technology, concentrating the high-level user interface development where early functionality is critical and/or where existing tools do not provide the functionality or performance required. This allows for optimal leveraging of existing tools and facilitates a prioritized development plan. Given a limited budget, prioritization of the development during the Construction phase is absolutely imperative to ensure that the data will be accessible when the survey gets underway.
Nonetheless, development of a "front door" interface for basic archive browsing, querying and supporting visualization has been made part of the design. Significant existing development at IPAC can be leveraged to provide an LSST-customized user interface that integrates the minimum functionality needed to support efficient identification of datasets of interest for a relatively small investment, compared to what will be needed to develop the data access and computation layers. Building such an interface in no way precludes being fully compatible with the full range of VO tools available, and detailed design will be done in a loosely-coupled modular-layered manner that is readily compatible with the VO approach. We consider that the same rationale also applies to a basic Alert Subscription service, although if external brokers provide sufficient functionality by the time the service is needed, nothing in the design prevents using them.
Figure 1
is an overview of the system from a user's perspective.
Figure 1 Overview of Conceptual System from a User Perspective
The items in white are expected to be new development done or coordinated by the IPAC-led team as part of the SUI effort. The blue-coded items represent development by LSST with which we expect to interface or build upon. The blue-green items are areas where it is likely software may be useable "off-the-shelf" or may only need to be minimally customized; it also includes items that may be COTS or, even, end-user developed. "Downloaded/Installed Tools" and "Web-hosted Tools" represent the full spectrum of software available to the professional astronomer, including VOA/IVOA tools and commercial tools, such as IDL and Matlab. No tools development in these areas is planned during Construction, unless significant gaps in functionality are identified, but effort will be made to support the interfaces needed to make these tools work with the LSST Science Data.
The bulk of the development effort will be concentrated in the SUI infrastructure and the APIs (“program friendly interfaces”) to it.
This is the foundation of the design presented. These are the components that provide and manage the interface between the LSST data housed at a DAC and the services, user interfaces, and users elsewhere which need to interact with them. The infrastructure has three components that handle incoming requests for data access or computation: common data access protocols, service engines that provide computation or query management that needs to be handled close to the data due to the volume of data, and job handling to manage the load on co-resident services and communicate status to the user-side services and interfaces.
Figure 2
provides an overview of the conceptual architecture of the Infrastructure.
Figure 2 -- SUI/T Infrastructure Architecture
Data Access and Processing Control Protocols
These protocols are either existing VO protocols or comparable protocols developed to fill a gap. Some examples of existing low-level standards are those which define interfaces that allow uniform searching and retrieval of tables/catalogs (TAP; Table Access Protocol), images (SIAP; Simple Images Access Protocol), spectra (SSAP; Simple Spectral Access Protocol), time series data (STSP; Simple Time Series Access Protocol), and so on.
In the detailed design phase, existing standards will be mapped onto the needed functionality for each known case and either an appropriate standard will be adapted, or a gap will be identified and work will proceed to fill the gap.
Job Handling and Resource Management
These components handle SUI-related requests that impact resources at the DACs (analogs may also exist on external servers hosting computational engines external to the DACs). Some examples of existing systems which would be considered during detailed design in this area include Condor, developed for distributed use of under-utilized compute nodes; the ROME queue management, a job framework used for the Montage mosaic service, which has the ability to keep track of real-time applications messages and to reorder the execution queue; bgTools, a simple framework allowing "foreground" browser- initiated jobs from timing out and existing Cloud and Grid tools.
Data Co-resident Service Engines
These components provide the computation or search functions that operate directly on the LSST data through the middleware for activities where the job is too big to consider transporting the data to an external service. Examples might include, e.g., producing a color-color density map for all galaxies in the survey, cross-correlation with other large datasets, producing a reduced mosaic of the focal plane for a given epoch of observation. During detailed design, a careful examination of each task will need to be made, and those common services which need to be co-located with the data identified and a corresponding engine designed.
The SUI/T will run on dedicated web/application servers, which provide the web services framework supporting data access queries and data product retrieval, as well as computational capability, at a DAC. These servers will control and interact with the User Workspace disk storage via cluster nodes and file servers, and this overall infrastructure is networked to the database nodes. The SUI/T interface layer will handle user queries and communicate/translate these to the database via the database middleware. Files produced as query results will be returned to the User Workspace cluster for immediate access and storage available to the users. The files will be stored in workspace directories for some set of period of time until being automatically purged. The compute servers will provide capacity for user tools, as well as engines for data visualization.
In the Middle – External LSST-developed Services
These would be LSST-specific services necessary to ensure compatibility with existing tools and services that do not need to be co-resident with the data. Most of these would be developed in conjunction with tools development.
SUI/T User Interfaces and Tools
These are the components on the left-hand side of Figure 1
that are within the scope of the IPAC-led development.
Helpdesk/User Documentation
This comprises the software needed to "provide automated and human assistance in working with the DMS and data products" per the WBS. No custom development is anticipated here, the bulk of the task is to choose and configure a COTS Helpdesk system (e.g., Kayako). The remainder of the task falls into the areas of providing and cataloging documentation and website design.
User Workspace User Interface
The User Workspace Interface provides access to data housed remotely, notably at a DAC, in conjunction with time awarded to a researcher on the heavy compute resources at, e.g., NCSA. Conceptually this interface allows the user to interact with the data as if it were on a local file system, and supports uploading, downloading, read/write protection, etc.
It is anticipated that the User Workspace will either be VOSpace or will be modeled closely after VOSpace.
Web-Hosted Tools and Downloaded/Installed Tools
These are standalone, COTS, VAO, or purpose-developed tools which are either commonly used by the science community or which have been developed to meet a specific requirement not covered elsewhere. Examples that will need to be evaluated to ensure that the protocols cover their use include, e.g., IDL, Matlab, and IRAF, in addition to available VOA tools.
A number of the Tier 2 requirements, especially visualization requirements, as well as data federation and special-purpose analysis tool requirements may be met by externally-developed tools. A comprehensive exploration of existing tools and what requirements they meet is in progress as part of the conceptual design, and initial results are presented in Appendix B:
.
From a design standpoint, the use of common or negotiated protocols addresses the issue of providing the connectivity needed for users to work with the LSST data using external tools and services. During detailed design any Tier 1 requirements that are not being met by the remainder of the design will need to be designed and developed; they will likely fall into this category.
Alert Subscription Interface
This is the basic interface used to subscribe to alerts and to manage the subscription to alerts. It includes functionality to subscribe and unsubscribe, to design custom filters and to receive notification through specified means.
The development related to raising alerts and publishing them rests with the DM Pipeline development, and the presumptive mechanism is VOAlert. Some event brokers already exist and it has been argued that LSST need not develop its own brokerage service. At this point in time, this item represents a placeholder; if externally developed broker services meet the needs of the LSST community, it may not be necessary to build a tool. However, alert handling is key to many of the science goals.
"Front Door" Search and Retrieve Interface
This is an LSST-developed User Interface that meets the bulk of the Tier 1 requirements in the archive query and represents the fundamental "Science User Interface" to the LSST data. This capability is needed early and must be robust and user friendly. Because some degree of data visualization is a necessary part of working with the archive, both for catalog type queries and for image queries, the "Front door" incorporates both query and data visualization functionality in a way that appears seamless to the user.
The basic interface would initially support by-position CCD or sub-CCD level image browsing and retrieval and searches of the source and object catalogs (including forced sources, etc.), applying selection criteria against a first-tier subset of the catalog fields and image metadata identified as being of primary importance. This would include searches by target coordinates, either individually or by uploading a list, and other comparable “basic” criteria. This level of functionality is needed at the start of Commissioning. The expectation is that this will be extended to include a second-tier catalog field data and image metadata, an SQL-like expert query function, and hierarchical multi-scale image view and retrieve. The latter functionality could be delivered after the start of Commissioning; i.e., some functionality would be delivered during the Commissioning phase and some during the first year to two years of Operations.
IPAC has a heritage in developing and deploying successful systems of this type. The Spitzer, WISE and Planck archives were developed using a powerful, configurable system for building web-based user interfaces; it is configured through an XML file so that very little code must be written to implement a new archive and is designed to interface with pre-existing services, such as VAO services. It focuses on presentation, user interaction, and visualization. Although it is unlikely that this system, Firefly or the technologies it relies upon, will remain static and available during the LSST Construction phase, it provides a good model for the type of functionality the "Front Door" interface would provide.
In this paradigm searches are performed using “drop-in” components that connect external data services to the rest of the system. Data results can be visualized in multiple ways using interactive tables and true FITS image visualization. External data visualization and analysis services can also be supported in a "drop-in fashion," which allow the interface to leverage the resources available as data co-resident services, to break down large-volume problems to a browser-manageable size.
Technologies and approaches that we might consider during detailed design of a "Front Door" user interface include Canvas, for 2D graphics, and WebGL, which takes significant advantage of GPUs, for 3D graphics in the browser.
Appendix A:
| Representative Use Cases
|
This Appendix contains representative science use cases from the LSST Science Drivers as examples of paths through the software set. They are not in any way intended to be complete but rather illustrative.
A.
| Mapping the Milky Way Use Case
|
| |
B.
| Exploring The Transient Optical Sky Use Case
|
| |
C.
| Taking an Inventory of the Solar System Use Case
|
D.
| Constraining Dark Energy and Dark Matter Use Case
|
Appendix B:
| Existing Tools & Technology Analysis
|
Name
|
Type
|
Category of
Requirements Addressed
|
Scalability
|
Comments
|
VizieR
|
Tool
|
Catalog Query tool
|
|
Web accessed. Search by catalog type, column description, position (radius or box) or object name. (Deselecting type designation clunky). Upon selection of catalog, constrain by column. Lacks min/max. Can also search for catalogs by clicking on a density map of holdings. Can query multiple catalogs. Appears to require a target or target list, no search on global properties. Multiple result formats including VOtable.
|
Aladin Sky Atlas
|
Tool
|
Inage/Catalog Visualizer
|
|
Java -- Aplet or download. Display image, can use multiple windows. Zoom pan etc. overlay catalogs. Slow in my config, not good at indicating busy. Can import data readily. Some image arithmetic. SAMP. Limited support for FITS cubes. Pixel "pick" function. Program Interface. Open Source.
|
WISE Image Archive
|
Tool
|
Image browser/visualizer
|
|
Web based. WISE-data specific. Search by position. Coverage displayed on selectable background. Multiband display, catalog overlay, Fully integrated pixel pick function. 3-color synthesis. All basic image manipulation.
|
Gator
|
Tool
|
Catalog Query Tool .Catalog visualization
|
|
Web based. select catalog from list. 1 catalog at a time. Search by position, list or global properties (All sky). Constrain by column or use SQL mode. Source count mode. Can signal completion by email. Program interface. Simple visualization of search result. Integrated 2D plotting of columns.
|
Datascope
|
Tool
|
Image/Catalog Query Tool
|
|
Position-based, access to tons of catalogs, image archives appears to rely on sending to Aladin to do anything much with the data except download it.
|
TOPCaT
|
Tool
|
Table manipulation
|
|
Java "Webstart" or standalone. Search using selected VO protocols (TAP, SIA, etc.). XY plotting.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Back to top
The contents of this document are subject to configuration control and may not be changed, altered, or their provisions waived without prior approval of the LSST Change Control Board.