Overview over the cdm_reader_mapper.DataBundle class

Reading original meteorological/marine data

After reading meteorogical/marine data like ICOADS or C-RAID with the cdm_reader_mapper.read_mdf(), the function returns a so-called cdm_reader_mapper.DataBundle. As input data a string representing the path to the original data and the name of the data model (imodel) is required. The original data is stored as cdm_reader_mapper.DataBundle.data. Next to the data there is a validation mask, called cdm_reader_mapper.DataBundle.mask. This mask validates the input data against the input data model scheme. For more information see chapter Data Models.

from cdm_reader_mapper import read_mdf, test_data

data_path = test_data.test_icoads_r300_d714.source
imodel="icoads_r300_d714"

db = read_mdf(source=data_path, imodel=imodel)

#Original MDF data
db.data

#Validation mask
db.mask

Validate cdm_reader_mapper.DataBundle.data

After reading the data, the method functions cdm_reader_mapper.DataBundle.validate_datetime() validates date time information in cdm_reader_mapper.DataBundle.data:

val_dt = db.validate_datetime()

Another validation method is to validate cdm_reader_mapper.DataBundle.data against station id names with cdm_reader_mapper.DataBundle.validate_id():

val_id = db.validate_id()

Correct cdm_reader_mapper.DataBundle.data

After reading the data, in some cases, it is desired that the final CDM set of tables is composed of a combination of different data models/sources. Based on the IMMA1 reprocessing experience so far. This can be the case of adding data elements from a different data source (like adding WMO PUB 47 metadata). It is recommended to map both things separately and then make the appropriate replacements/additions based on the corresponding CDM element matching (i.e. primary_station_id).

Note

Correcting data in the CDM format is only necessary for ICOADS data.

cdm_reader_mapper.DataBundle provides two functions for correcting data in the CDM format:

  1. DataBundle.correct_pt()

  2. DataBundle.correct_datetime()

The first function applies ICOADS deck specific platform ID corrections to the data, the second one ICOADS deck specific datetime corrections.

db_cor_pt = db.correct_pt()

db_cor_dt = db.correct_datetime()

Manipulate cdm_reader_mapper.DataBundle.data and select subsets

For more details how to manipulate a cdm_reader_mapper.DataBundle or select subsets of it see DataBundle.

Map cdm_reader_mapper.DataBundle.data to the CDM

Now the meteorological data can be maqpped to the Common Data Model (CDM) using the method function DataBundle.map_model():

db_cdm = db.map_model()

cdm_tables = db_cdm.data

Note

Set inplace to True to overwrite cdm_reader_mapper.DataBundle.data:

db.map_model(inplace=True)

cdm_tables = db.data

For more information how the mapping is working, please see Overview of the mapping to the Common Data Model (CDM) and/or How to register a new data model mapping.

DupDetect

After mapping to the CDM format it is useful to check if the CDM tables contain any duplicates. The duplicate checker included in the cdm_reader_mapper toolbox is based on python record linkage toolkit RecordLinkage.

The first step is to call the method function DataBundle.duplicate_check(). This function scans the CDM tables for any duplicates.

db_dup = db.duplicate_check()

Afterwards there are two options how to deal with the detected duplicates:

  1. DataBundle.flag_duplicates()

  2. DataBundle.remove_duplicates()

The first function flags the detected duplicates. For more information about the flags see CDM code tables for duplicate_status and CDM code tables for report_quality. The second function removes the detected duplicates.