These design principles were first stated in an email from David Maidment, and then edited by HIS team members.
- The system should use standard software components and services with as little custom programming as possible. Our focus is on the information model and how information moves from one location to another. Our goal is to as much as possible be able to define the content that gets transmitted by existing software systems rather than requiring building new systems. There should be multiple ways of implementing the information model – in other words multiple physical models that correspond to a common logical model. It’s the logical model that we need to be clean about and not fixate on the physical models, at least not yet.
- If we have to have a few customized services then ok, but lets avoid them if we can.
- Our system should simultaneously support search by spatial location and by information type.
- We push back all the concept tagging on the data provider. There is no tagging at HIS Central.
- The CUAHSI version of this catalog should exist “in the cloud” so that it can be owned and operated by CUAHSI itself in a routine, regular manner. This should not be a research system. It should be an operational system. It may be that we prototype this CUAHSI catalog on a local server but I want the migration path “to the cloud” to be clear.
- Before we ingest any data into this system, we have a fully worked out testing and validation plan to ensure quality control.
- HIS Central must support a 'base service' of access to data at the resolution of time series of data that are as close to measured values as possible; i.e., supports QC levels of data but generally is not an interpreted product. The metadata includes concept (as we are currently doing with the ontology), sensor/method/instrumentation to support provenance of data, temporal support and spatial support of measurement. The use case for this 'base service' is for 'experts' to be able to assemble data into themes for a specific purpose. A spectrum of interpreted products from themes as currently described through modeled products (e.g., risk maps) may also be published at HIS Central but will have product-level metadata which focuses on intended use of the product. Provenance information for interpreted products will also look different from 'base' data as this will focus on the method for the derivation of the product rather than on the base data. Different discovery client may be necessary for the base product (HydroDesktop) and the derived products due to the differences in the metadata and potentially different ontologies supporting that metadata.
- HIS Central must support a catalog that permits searching at the resolution of the time series and key word. This data catalog may be assembled and maintained by using data carts, but data carts are a means to an end (the catalog) not an end in themselves.
In the context of the HIS Central redesign, the concept of a data cart has often been part of the conversation. If data carts are to be formally integrated into HIS Central as a means of submitting, extracting, or storing descriptions of time series, then the following additional principles apply.
- The data cart model as simply the metadata that defines a time series with one record per series is a very simple clean model. It can describe the heterogeneous content of one service, or the homogeneous content spanning several services. It can describe both the input information to a catalog from harvesting a service, and the output information from a catalog selected for a particular purpose. The content of the catalog is simply the aggregation of all the series it references. This is a “series oriented” model as I’ve heard you speak of before.
A series = a time-indexed sequence of values. Need a graph to show this
A data cart = a spatially-indexed set of series. Need at map to show this.
A catalog = a set of data carts, searchable by concept, space and time.
I think the simpler and cleaner the design is, the easier it is to maintain and test.
- We quality-control every data cart before it enters and is accepted into the system.
- Any dataset can be removed from the system simply by removing its data cart. The catalog could be a “virtual layer” over all the data carts in a folder. If a service is updated, its data cart is replaced.
- The public agencies are encourage to publish their metadata per series in the data cart structure so that there no more “data dumps” that we have to filter through.