CUAHIS-HIS
RSS

Navigation





Quick Search
»
Advanced Search »

PoweredBy

Handling Categorical Data

RSS
Modified on 2008/07/21 16:04 by valentinedwv Categorized as Uncategorized
Question:
  • Present Status:
    • Previous EPA output, had categories represented as text values. These were converted into a value with attributes.
      Eg. We had, and still would not have any idea what the blessed numeric values could be.
    • A DateTime-Value

<value dateTime="2007-02-03T09:03:52.38" 
   sourceID="5897"
   methodID="9652"
   qualifiers="usgs_dv:P" 
   censorCode="nc" 
   qualityControlLevel="Raw Data" 
   codedVocabularyTerm="Flood" 
   codedVocabulary="true">2114.354905803945093</value>

  • What is the data structure needed by software clients?
    • Ilya notes SAS wants a conversion table, so two structure is fine.





Discussion for Handling Categorical Data

ODM Approach:

  1. Categorical Determination from Variable DataType
    • Three tables involved
      • Variables Table, if row has column DataType == Categorical, Retrieve Categories
      • Categories Table. Retrieve all records for variableID
      • DataValues Table, retrieve all DataValues for SiteID-VariableID-TimePeriod tuple
    • Use example:
  2. Issues

    • need biz rule. If VariableID-DataValue not found,
  3. throw error, or
  4. Return message "Error-Unknown Category"


Implement in WaterML

1. Present Approach: Attach to attribute
<value dateTime="2007-02-03T09:03:52.38" 
sourceID="5897"
methodID="9652"
qualifiers="" 
censorCode="nc" 
qualityControlLevel="Raw Data" 
codedVocabularyTerm="INTERFERES" 
codedVocabulary="true">1</value>
<value dateTime="2007-02-03T09:03:52.38" 
sourceID="5897"
methodID="9652"
qualifiers="usgs_dv:P" 
censorCode="nc" 
qualityControlLevel="Raw Data" 
codedVocabularyTerm="NO INTER" 
codedVocabulary="true">2</value>
<value dateTime="2007-02-03T09:03:52.38" 
sourceID="5897"
methodID="9652"
qualifiers="usgs_dv:P" 
censorCode="nc" 
qualityControlLevel="Raw Data" 
codedVocabularyTerm="NO ACTIVITY" 
codedVocabulary="true">3</value>

2. Other Approach: Add Categorical Translation
<value dateTime="2007-02-03T09:03:52.38" 
   sourceID="5897"
  methodID="9652"
  qualifiers="usgs_dv:P" 
  censorCode="nc" 
   qualityControlLevel="Raw Data" 
   >1</value>
<value dateTime="2007-02-03T09:03:52.38" 
   sourceID="5897"
   methodID="9652"
   qualifiers="usgs_dv:P" 
   censorCode="nc" 
   qualityControlLevel="Raw Data" 
   >2</value>
<value dateTime="2007-02-03T09:03:52.38" 
   sourceID="5897"
   methodID="9652"
   qualifiers="usgs_dv:P" 
   censorCode="nc" 
    qualityControlLevel="Raw Data" 
    >3</value>
<!-- snip -->
<categories>
  <category>
    <code>1</code> <!-- number... ID? -->
    <term>INTERFERES</term>
  </category>
  <category>
    <code>2</code> 
    <term>INTERFERES</term>
  </category>
  <category>
    <code>3</code> 
    <term>INTERFERES</term>
  </category>
</categories>

EPA Sample Site and Codes

Code with some results: EPA:14485-1

ScrewTurn Wiki version 3.0.1.400. Some of the icons created by FamFamFam.