This is based on the concept that there are a set of core properties that can be utilized for generalized discovery. generalized discovery does not require data values. In fact, that the structure functions without holding does not hold data values is a requirement.
In Progress
Background
In addition, the domain model should be used as a demonstration of a model that it be used as a catalog, and a model that can manage data values for a variety of sources.
- The semantics shall be based on the Observational Database Specification
- eg site, value type, data type, time support, etc
- The rules will diverge from the ODM specification.
- shall be functional without data values.
- shall be focused on series as the central concept.
Shall allow for
- Security and authentication, at the series level
- This would be easier with MS Entity Framework. With NHibernate, we will need to choose an additional framework (RhinoSecurity,
Sculpture 2.0- https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/
- Representation of site types
- Internationalization
- this needs to be better defined. Do we mean multiple language in concepts/names/etc, or extendable to an countries attributes (eg addresses).
- inclusion of organization specific information for Site, Variable, Series without needing additional columns (Allow for Name-value properties)
Shall be loosely coupled with:
- Variable names and ontology concepts tied together more effectively
- Note:loosely coupled, not implement. Ontology needs to be independent of the model. Ontology is it's own domain.
- Service Metadata catalog
- (Dec 2009) the utility has now been proved out by the Hydrodesktop MetaCatalog plug-in
Shall allow for experimentation with without adversely altering the model
- Representation of additional data forms (grids, space-time, biology, geology, soils, social science)
- Support for other languages, Java, Python, etc.
- Themes
General Definitions
- ID - Internal Database Identifier. Unique in a database
- Code - External Identifier, Assigned by agency. Unique in a data source
- GUID - External Identifier Assigned by software. Intended to be used as a cross datasource identifier, Unique across all services.
- eg when a user load data from a service into another service, the GUID for that data series should be preserved. If a user locally edits a data series, that is a new series, with a new GUID.
Initial Scope: Catalog Domain Definition ¶
- The initial scope is focused on two issues:
- A catalog of the series from the data services, and additional data sources.
- Proving ground that data series, as defined by ODM, can be stored in the catalog, and additional types of data series can be managed.
- Domain objects that will be fully defined and coded shall bef Series, Sites, and Variables ( and associated objects) for the HIS central catalog.
- Series, Sites, and Variables
- shall be trackable though globally unique identifiers
- GUID's will initially be managed by the Catalog
- Shall be associated with Data Source
Data Source
Each "data service" shall be distinctly identified by a short data source code, ideally an acronym for the data source.
The WaterOneFlow network and vocabulary codes shall be stored as datasource codes.
- has a unique PrefixCode,
- optionally, has a reference to a source to identify the responsible party.
- Details of the data source shall be externally managed, and should include:
- Responsible Party
- Webservice(s)
- Abstract
- in the data source manager, a data source shall have one or more General Categories from a controlled vocabulary
Site
- Shall have a reference to Data Source Code, called network (e.g. formerly, a network code)
- shall have a GUID.
- a distinct Code, defined by the responsible party
- Shall optionally have a Name
- Shall have a description
- shall have a geographic location
- Shall optionally be associated with one or more site types. Default=Unknown
- controlled vocabulary of Site Type.
- Shall optionally have Local Location Definition
- LocalX, Local Y, Local Z, with associated projection information (as a string)
- Have ability to associate alternate codes
- data sources occasionally change codes. This is a tracking mechanism
- Data sources may share locations. In such a case, this would be stored as an alternate reference.
- Shall have a set of properties to contain information such as Elevation, State, County, HUC codes, and comments.
- Do we want a site history?
Variable
- Shall have a reference to Data Source, called vocabulary code.
- shall have a GUID.
- a distinct Code, defined by the responsible party
- Shall have a name. The names shall be stored in a controlled vocaulary
- In best practice, this names should not have the units, time description, or method.
- Shall have a brief description that is a combination that can be used to identify the variable in text form
- Need business rules on how to create a brief description.
- Shall have a reference to an associated unit
- Shall have a reference to an associated SampleMedium, from a controlled vocabulary. Default=Unknown
- Shall have a reference to a datatype from a controlled vocabulary. Default=Unknown
- Shall have a reference to a valuetype from a controlled vocabulary. Default=Unknown
- shall have a reference to a general category from a controlled vocabulary. Default=Unknown
- Shall have the ability to reference to a list of associated alternate names and codes.
- Shall have a set of properties to contain information such as Speciation, methods, and comments
Series Core
Is the management unit for information collection.
Series is to provide a summary of a observational data for a variable for a defined location. For series to work across a large set of domains, a simple summary record, which lacks the data values will be defined.
Series Record
Is a summary record, without data values.
- Shall have a reference to a Data Source Code, called network
- shall have a GUID.
- Shall have one feature that provides a location
- shall have one variable
- Have description of the time range of the observations
- begin time
- end time
- Time zone of the observations
- As a discovery tool, details of daylight savings time is not needed.
- have an count of observational values
- an estimated value count is fine. Need to add this
- Shall have the time support information
- is regular
- time support
- time support unit, that has a unittype of time
- Observational spacing as a UnitValue (a combination of a value, and a unit)
- Shall have zero or more
- sources
- Methods
- quality control levels
- Shall have a set of properties to contain information comments, and properties as determined by responsibility parties.
- have associated provenance history containing the data loading and data processing history of the "series"
Shall have ability to be associated with zero or more data access restrictions- a series does not know what it's access restriction is, otherwise access control is needed at most objects. The authentication system implemented manages, and determines the access.
Source
A source is a
- Code
- Organization
- Description
Method
Quality Control Level
Controlled Vocabularies
Properties
Properties can be constrianed by type
Property Types:
Provenance
General Provenance
Processing Provenance. Where series information is link together.
- more than one input
- more than one output
Examples with Data Value
Data Value Core
Compatibility Layers
ODM
ODM Site
- shall expose the ODM columns as properties
- Exposed SiteProperties as Class Properties
- Local Location
- LocalX,LocalY,Local Projection
- Elevation
- verticalDatum
- State
- county
- comment
ODM Variable
- shall have a time support/scale reference
- if possible, validation rules for ODM series should enforce that series and variable time/scale references are the same.
- Exposed VariableProperties as Class Properties
ODM Series
- Exposed SiteProperties as Class Properties
- others
Data Source Domain ¶
The Data Source Domain shall be used to manage information about the data sources, and their data harvesting/loading history
Code Statndard
- All Objects should have internal identifiers. This is best done by defining a base entity object
- we need to identify what is an equivalent object, for each of this.