Overview over the cdm_reader_mapper.DataBundle class#

Reading original meteorological/marine data#

After reading meteorogical/marine data like ICOADS or C-RAID with the cdm_reader_mapper.read_mdf, the function returns a so-called cdm_reader_mapper.DataBundle. As input data a string representing the path to the original data and the name of the data model (imodel) is required. The original data is stored as DataBundle.data. Next to the data there is a validation mask, called DataBundle.mask. This mask validates the input data against the input data model scheme. For more information see chapter Data Models.

from cdm_reader_mapper import read_mdf, test_data

data_path = test_data.test_icoads_r300_d714.source
imodel="icoads_r300_d714"

db = read_mdf(source=data_path, imodel=imodel)

#Original MDF data
db.data

#Validation mask
db.mask

Validate DataBundle.data#

After reading the data, the method functions DataBundle.validate_datetime() validates date time information in DataBundle.data:

val_dt = db.validate_datetime()

Another validation method is to validate DataBundle.data against station id names with DataBundle.validate_id():

val_id = db.validate_id()

Correct DataBundle.data#

After reading the data, in some cases, it is desired that the final CDM set of tables is composed of a combination of different data models/sources. Based on the IMMA1 reprocessing experience so far. This can be the case of adding data elements from a different data source (like adding WMO PUB 47 metadata). It is recommended to map both things separately and then make the appropriate replacements/additions based on the corresponding CDM element matching (i.e. primary_station_id).

Note

Correcting data in the CDM format is only necessary for ICOADS data.

cdm_reader_mapper.DataBundle provides two functions for correcting data in the CDM format:

  1. DataBundle.correct_pt()

  2. DataBundle.correct_datetime()

The first function applies ICOADS deck specific platform ID corrections to the data, the second one ICOADS deck specific datetime corrections.

db_cor_pt = db.correct_pt()

db_cor_dt = db.correct_datetime()

Manipulate DataBundle.data and select subsets#

For more details how to manipulate cdm_reader_mapper.DataBundle see Manipulation. For more details how to select subsets of cdm_reader_mapper.DataBundle see and Selection.

Map DataBundle.data to the CDM#

Now the meteorological data can be maqpped to the Common Data Model (CDM) using the method function DataBundle.map_model():

db_cdm = db.map_model()

cdm_tables = db_cdm.data

Note

Set inplace to True to overwrite DataBundle.data:

db.map_model(inplace=True)

cdm_tables = db.data

For more information how the mapping is working, please see Overview of the mapping to the Common Data Model (CDM) and/or How to register a new data model mapping.

DupDetect#

After mapping to the CDM format it is useful to check if the CDM tables contain any duplicates. The duplicate checker included in the cdm_reader_mapper toolbox is based on python record linkage toolkit RecordLinkage.

The first step is to call the method function DataBundle.duplicate_check(). This function scans the CDM tables for any duplicates.

db_dup = db.duplicate_check()

Afterwards there are two options how to deal with the detected duplicates:

  1. DataBundle.flag_duplicates()

  2. DataBundle.remove_duplicates()

The first function flags the detected duplicates. For more information about the flags see CDM code tables for duplicate_status and CDM code tables for report_quality. The second function removes the detected duplicates.