The CUAHSI HIS data services are designed as a geographically distributed network of data sources and functions that are integrated using web services so that they operate as a connected whole. This requires a clear and consistent model for the naming and identification of distributed data providers and for managing this system at HIS Central. Experience with the present system has revealed some difficulties that I offer some alternatives to address.
Terminology
- Central registry – catalog of information about HIS servers.
- Server – single computer from which web services are provided.
- Observation network – single ODM instance from which observations data are served. There may be multiple databases on a single server.
Present system.
A separate installation of the web services software is required for each Network.
A network on a server is identified by:
- WSDL (e.g. http://his02.usu.edu/littlebearriver/cuahsi_1_0.asmx?WSDL)
- Network and Vocabulary (e.g. LittleBearRiver and LBR). Network identifies the observation network to which a site belongs. Vocabulary identifies the context to which a given variable code belongs.
Registration of a web service providing access to an observation network in central registry involves providing the following information
- Unique name for the web service within the registry
- Location of the WSDL for the web service (i.e., the WSDL's URI)
Calling of web services involves:
- Identifying web service by its WSDL
- Identifying a site by its network and code, e.g., getSiteInfo(LittleBearRiver:sitecode), and a variable by its vocabulary and code, e.g., getVariableInfo(LBR:variablecode)
Following are problems/issues that I see with this system
- The requirement for a separate software installation for each network on the same server is inefficient and makes upgrading repetitive (software has to be upgraded for each network), leading to potential for inconsistencies (e.g. when some, but not all instances are upgraded).
- Network and vocabulary names are assigned at the server level (each web service just gets one network and vocabulary). This means that there is no system to prevent duplication of these names across servers, which could lead to confusion if the same names (but from networks on different servers) are submitted to the central registry
- The network and vocabulary names are redundant because each web service has its own WSDL that serves to identify the network and vocabulary
Alternative 1. Server naming.
- There is a single WSDL for each server. This is a URL so by internet conventions is unique.
- Registry in the central registry is comprised of providing the WSDL and entering a single name for the server, plus other information that may be required – such as the responsible party. However the information should really be kept to a minimum because all the metadata should be in ODM.
- Each server has one installation of the web services that can be configured to attach to multiple networks (ODM instances).
- The web services configuration interface offers the opportunity to specify for each attached ODM
- ODM connection string required to attach the database
- Network name
- Network code.
- Create a GetNetworks web service that returns the list of Network names and codes supported by the server
- Calling of other web services uses the syntax NetworkCode:identifier, e.g. getsiteinfo(LBR:site5)
- Federated functionality, e.g. Hydroseek, that operate off the central registry would identify servers and WSDL's and the use GetNetworks to dig into the networks hosted by each server.
Alternative 2. Central registry naming
- There is a single WSDL for each network (ODM instance). This is a URL, so by internet conventions is unique.
- There may be multiple WSDL's on each server. This may require multiple web service software installations – although there may be a clever way to have the same software respond to different WSDL's and have a server maintain a table that associates each WSDL it hosts to a network and the corresponding ODM database connection string
- Registry of network names and network codes occurs at the central registry. The central registry can therefore ensure that these are unique and acceptable, and use these names in its overall federation functions (Hydroseek)
- Web service calls do not require a network code to be specified because the network is implicit in the URL that hosts the service. Web service calls will consequently be simpler, e.g. GetSiteInfo(siteid)
Both these alternatives follow the principle that the server that registers the names uses the names. This seems like an important principle that is not followed by the current system. Alternative 1 seems simpler for software deployment, but does require network codes in the web services and network and server level naming. Alternative 2 only has network naming and has simpler web service calls, but has more complex software deployment.