Custom data service harvesting approach
Harvesting of the large, national hybrid data services (NWIS, EPA) will require the writing a custom application which will connect to both the HIS Central database and the hybrid data service database, simultaneously. The key purpose of this program is populating the series catalog information in the HIS Central database for the newly added variables. This program will be a simple console application which requires a configuration file and a variables file. Since the list of variables to be harvested are a subset of all the variables, the variables will not be harvested from the database, instead they will be provided as an excel or CSV file. The configuration file will contain:
- HIS Central database connection string
- The data service's (hybrid) database connection string
- the HIS Central sourceID for the registered data service
- The name of the file containing the list of variables to harvest.
When the program starts, it first reads the configuration file, then the variables file, then connects to the databases. It will then add each variable to the HIS Central Variables table. For every new variable added, the hybrid database will return a list of seriesCatalog records. For every seriesCatalog record, the corresponding siteCode will be used to identify the HISCentral siteID. With the siteID and the series info from the hybrid series catalog, a new series catalog record is ready to be entered into HIS Central.
Pseudo code view:
For each variable in the list{
add a new record to the his central variables tables
query the seriesCatalog table in the hybrid DB using the matched variablecode
For each series catalog record returned{
get the sitecode
Query the HIS Central sites table to get the siteID by matching sitecode
Using this siteid, add a new series catalog record to the hiscentral series catalog
}
}