Request for scenarios needed for improving data retrieval
David Valentine
Others?
The web services will need to improve support for data retrieval, aka query capabilities. We are close to the limitations of the initial web services. Because we services are generic and mean to be used across many platforms, applications and languages, parameter overloading is not possible. Basically, we can’t just go adding parameters. We also should avoid overloading the inputs, like we have done for location parameter and geometries, and series information like variable parameter and VariableID.
Simply, what this means to the CUAHSI web services is that we need to support the submittal of a data structure that supports data retrieval. Basically, we need to agree on the vocabulary for a set of filters which can be used to query web services and ODM databases. WaterML has the capabilities to be used as responses to such queries.
There are several different filters:
- Combinatorial operators: AND, OR, AND-NOT.
- Location: spatial operations, or supported retrieval by known geocoding datasets (HUC codes, Networks, placenames).
- Station: support retrieval by station characteristics
- Series: support retrieval by series characteristics
- Time: Support retrieval by time. Support retrieval by date and time ranges, with spatial like operators: within, overlaps, before, after. Add capability to utilize an event gazetteer to support retrieval by named event.
- Model: support retrieval of model information
The first filter defines the operators that we can use to combine objects. Two through five support our standard parameters: location, station, vocabulary, date range. The sixth is would be an additional capability that would add to the discovery process.
While the terminology may be familiar, the capabilities would be greatly extended. Location would support query by geometries using a spatial operator, and if desirable, select by know geometrical objects, like boundaries of HUC codes, and network relationships. Site and variable would support retrieval by ID, and extended attributes or properties. Variable would also support standard attributes like, Sample Medium, Value Type, General Category, and Units,
Filters
Combinatorial Operators
Q: Do we support just basic functionality (All filters are AND or OR together), or “full functionality where components can be combined in using these operators?
Location
Spatial
- Within (box or polygon)
- Overlaps (box or polygon)
- Near (Point)
- Named
Known Features (Geometry Objects) (that can be used as polygons, or points):
These are for convenience. A user could do a query, get a geometry, and submit it, but it is useful to support some operations on common base datasets. Basically, these are the geocoding references that the hydrologic community uses.
Station, eg specific locaion
- stationCode, network
- stationID
- stationAttribute
Series
Variable
- VariableCode, vocabulary
- VariableID
- VariableAttribute
DateRange:
- Within
- overlaps
- before
- after
Method,source
- ID
- Code
- equals, starts with
- code should not a free text search.
- Name
Quality control level
Offset
Must be a combination of value and unit
- OffsetUnits, OffsetValue
- Units, equals only
- OffsetValue
- between, less than, greater than
DataValue
- between, less than, greater than
- avoiding equals, since floating point could cause rounding which might make it difficult to get certain data values
- censorCode
Instrument or Model
Model that are registered will need to have attributes.
Winigng it here:
- Model Name
- Model Type
- Model Description coantains
- Model Parameters
- Model Output
Scenarios/Discussion on need capabilities
First, to avoid confusion with the present services, redefine the methods:
- QueryValues = GetValues using a query filter
- QuerySiteInfo = GetSiteInfo using a query filter
- QueryVariableInfo = GetVariableInfo using a query.filter
Scenario 1: We want to find all stations in region
Submit to QuerySiteInfo
And
Location
Within BOX (XXXX,XXX)
Scenario 2: We want to find values for variables in a certain area for a certain time period:
Submit to QueryValues
And
Location
Within BOX(XXXX,XXX)
Series
Variable
VariableCode=ABCDE vocabulary=CUHASI
DateRange
Overlaps
BeginDate, EndDate
Note, the above query would return multiple stations, each with a time series. This is supported by the present TimeSeriesResponse.
Scenario 3: We want to find stations in a defined watershed that measures at least these two variables, for a given period: variable code=ABCDE from the CUAHSI vocabulary, OR a local variable with ID =6789:
Submit to QuerySiteInfo
And
Location
Within HUC(XXXX,XXX)
And
series
OR
Variable
VariableCode=ABCDE vocabulary=CUHASI
Variable
VariableID=6789
DateRange
After
DateScenario 4: We want to find all stations that are tagged with HUC Code “123456”
Submit to QuerySiteInfo
And
Station
StationAttribute Name=”HUC” Value=”123456”
Scenario 5: We want to find all stations that are with HUC Code “123456” but are NOT tagged with HUC Code “123456”
Submit to QuerySiteInfo
And
Location
Within HUC(123456)
ANDNOT
Station
StationAttribute Name=”HUC” Value=”123456”
Extending to support transformation:
We might want to allow for some transformation to occur
Transform Values
Output units
Compute
Add
Subtract
Divide
Scale
Transform
We want to find stations in a defined watershed that measures at least these two variables, then add the two values together.
Submit to QueryValues
And
Location
Within HUC(XXXX,XXX)
AND
Series
OR
Variable
VariableCode=ABCDE vocabulary=CUHASI
Variable
VariableID=6789
DateRange
After
Date
Compute
Add
Series
Variable
VariableCode=ABCDE vocabulary=CUHASI
Series
Variable
VariableID=6789
OGC Filters for Web Feature services
End of page has sample filtersOGC Filter Specifcation Example OGC starter framework in Java on google codeUSGS Schemas for NHD?