How to read a simple csv file

How to read a simple csv file#

  1. Have a clear understanding of the structure of the data file you wish to read in, namely delimiter used, the variables contained per column, the units, acceptable limits for each variable etc.

  2. As described in section ref:_how-to-build-a-data-model, create a valid directory tree where the model you will create ((mymodel)) should be saved. This can be placed in a user defined path which will be provided into the mdf_reader at a later step.

  3. Create the schema file. For example for a data file without sections, stored in comma delimited csv format, that contains 8 columns (year, month, day, longitude, latitude, wind speed, sea surface temperature and sea level pressure respectively) like the following:

YR

MO

DY

LON

LAT

W

SST

SLP

2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007

2 2 7 7 7 7 7 7 7 7 7

3 4 2 3 4 5 6 7 8 9 10

255.708 255.682 242.764 242.158 240.329 239.889 240.054 240.011 240.054 241.352 241.499

9.829 12.691 32.707 32.943 34.218 34.377 34.38 34.229 34.394 33.8 33.804

5.902 5.222 1.3 4.792 3.739 5.629 4.322 4.717 3.924 2.852 2.308

301.323 300.971 296.556 294.558 290.405 288.273 289.752 288.624 290.584 293.248 293.03

1008.57 1009.14 1010.37 1009.92 1008.32 1007.38 1008.54 1010.59 1008.85 1010.51 1011.51

The basic schema would look like this:

{
  "header":
  {
   "filed_layout":"delimited",
   "delimiter":","
 },
   "elements":
 {
     "YR": {
     "description": "year UTC",
     "column_type": "uint16",
     "valid_max": 2008,
     "valid_min": 2006,
     "units": "year",
     "missing_value":"MSNG"
     },
     "MO": {
     "description": "month UTC",
     "field_length": 2,
     "column_type": "uint8",
     "valid_max": 12,
     "valid_min": 1,
     "units": "month",
     "missing_value":"MSNG"
     },
     "DY": {
     "description": "day UTC",
     "field_length": 2,
     "column_type": "uint8",
     "valid_max": 31,
     "valid_min": 1,
     "units": "day",
     "missing_value":"MSNG"
     },
     "lon": {
     "description":"LON",
     "field_length": 6,
     "column_type": "float32",
     "valid_max": 359.99,
     "valid_min": 0.0,
     "scale": 1,
     "decimal_places": 2,
     "units": "degrees",
     "missing_value":"MSNG"
     },
     "lat": {
     "description": "LAT",
     "field_length": 5,
     "column_type": "float32",
     "valid_max": 90.0,
     "valid_min": -90.0,
     "scale": 1,
     "decimal_places": 2,
     "units": "degrees",
     "missing_value":"MSNG"
     },
     "W": {
     "description": "wind speed",
     "field_length": 4,
     "column_type": "float32",
     "valid_max": 99.9,
     "valid_min": 0.0,
     "scale": 1,
     "decimal_places": 2,
     "units": "metres per second",
     "missing_value":"MSNG"
     },
     "SST": {
     "description": "sea surface temperature",
     "field_length": 5,
     "column_type": "float32",
     "valid_max": 999.9,
     "valid_min": -999.9,
     "scale": 1,
     "decimal_places": 2,
     "units": "degree Kelvin",
     "missing_value":"MSNG"
     },
     "SLP": {
     "description": "sea level pressure",
     "field_length": 6,
     "column_type": "float32",
     "valid_max": 1074.6,
     "valid_min": 870.0,
     "scale": 1,
     "decimal_places": 2,
     "units": "hectopascal",
     "missing_value":"MSNG"
     }
 }
}

in which the file format information are given in the header and information about the data at each column are given in the elements; details on setting up the element blocks are given in Schema element block. Note that the elements in the data are parsed in the order they are declared in the schema.

In case an element expects a numeric value but is given letter type input then the data are set to missing. However, if the input is numeric even if it’s given as string it is read in.

In case the user would like to skip a column/element, they can use ignore in the elements e.g. as:

"SST": {
"description": "sea surface temperature",
"ignore": "True"
},