CUAHIS-HIS
RSS

Navigation





Quick Search
»
Advanced Search »

PoweredBy
The 2008 HIS Overview Report identifies as a limitation the time series orientation of indexing methods present in the CUAHSI HIS Observations Data Model (ODM). At present ODM has only one collection, the series catalog, defined in terms of unique combinations of siteID, variableID, qualitycontrollevelID, sourceID and MethodID. These combinations are listed in the series catalog. The purpose of this note is to explore some approaches to overcoming this limitation through other ways of organizing data collections.

ODM records data at an atomic level; each data value is its own record in the DataValues table. Each data value gets annotated by a number of metadata attributes, some recorded in columns in the DataValues table, with others in the adjoining metadata tables, following ODM's star relational schema. This was developed with the concept of a data cube (hypercube) in mind, where slices, other than time series slices, from the data may be of interest. Indeed the OLAP cube provides much of this capability. We should therefore consider defining other collections that represent other logical organizations of information in ODM. These might be represented as additional catalogues or views. The particular technological solution is not the focus of this note. Rather I provide suggestions about logical organization from a users perspective, that if agreed upon could be explored technically.

Conceptually in ODM we have a 6 dimensional cube for the data values table with the dimensions being Site, Time, Variable, Method, Source, QualityControlLevel (QC). There are other attributes on data values that could be used as a dimension, potentially increasing this. Offset (or OffsetType) seems to play the role of dimension quite a bit. Presently we have time series defined based on fixing 5 of these and allowing time to vary. In principle we could let each of the others vary individually with others fixed to obtain:

1. Time series: All times with Variable, Method, QC, Source and Site fixed.

2. Variable snapshot: All sites with Variable, Method, QC, Source, Time fixed.

3. Method set: All methods with Site, Variable, QC, Source, Time fixed.

4. Variable types: All Variables with Site, Method, QC, Source, Time fixed.

5. Sources set. All sources with Site, Method, QC, Variable, Time fixed.

6. Quality set. All QC with Site, Method, Variable, Source and Time fixed.

We have 1 already in the series catalog. I can see utility in 2. I think 3 will amount to a list of methods for measuring a variable repeated over the sites - not really that useful. 4 will be degenerate I think because once method and site is fixed there is likely only one variable. 5 will amount to a list of sources for very site specific measurements, not that useful. 6 will give the QC history of a data value.

One can of course look at other combinations, defining sets based on more than one dimension being free. There are a total of 44 such combinations (if my combinatorics are correct). Clearly not all have utility. From this I deduce that being exhaustive in enumerating these is probably not helpful, but that there may be merit in considering a few high priority ones. I suggest that in ODM we consider (for discussion purposes at this point) the following:

1. Time series: All times with Variable, Method, QC, Source and Site fixed.

2. Variable snapshot: All sites with Variable, Method, QC, Source, Time fixed.

3. Site snapshot. Site and time fixed but all other dimensions free.

4. Site collection. Only site fixed, all other dimensions free. All the data available at a site.

5. Variable latest. All sites with Variable, Method, QC, Source, Latest observation. This would serve to organize one time measurements as well as provide ready access to most recent measurements.

The question then arises as to whether we should provide catalogs for these.

Then, of course there is the catch all organizing concept of group that we have in ODM. This is general and can collect together any set of values that need to be grouped. Groups can be small or large. The generality comes at a price though. Users need to define groups. The database can not define them automatically because it can never anticipate all the groupings that may be possible.

ScrewTurn Wiki version 3.0.1.400. Some of the icons created by FamFamFam.