API reference#

This page provides an auto-generated summary of the cdm_reader_mapper API.

Read data from disk#

read(source[, mode])

Read either original marine-meteorological data or MDF data or CDM tables from disk.

read_data(data_file[, mask_file, info_file, ...])

Read MDF data which is already on a pre-defined data model.

read_mdf(source[, imodel, ext_schema_path, ...])

Read data files compliant with a user specific data model.

read_tables(source[, data_format, prefix, ...])

Read CDM-table-like files from file system to a pandas.DataFrame.

DataBundle#

DataBundle(*args, **kwargs)

Class for manipulating the MDF data and mapping it to the CDM.

DataBundle’s method functions#

Information#

DataBundle.unique(**kwargs)

Get unique values of data.

Manipulation#

DataBundle.add(addition[, inplace])

Adding information to a DataBundle.

DataBundle.copy()

Make deep copy of a DataBundle.

DataBundle.replace_columns(df_corr[, ...])

Replace columns in data.

DataBundle.stack_h(other[, datasets, inplace])

Stack multiple DataBundle's horizontally.

DataBundle.stack_v(other[, datasets, inplace])

Stack multiple DataBundle's vertically.

Selection#

DataBundle.split_by_boolean_true([do_mask])

Split data by rows where all column entries in mask are True.

DataBundle.split_by_boolean_false([do_mask])

Split data by rows where all column entries in mask are False.

DataBundle.split_by_column_entries(selection)

Split data by rows where column entries are in a specific value list.

DataBundle.split_by_index(index[, do_mask])

Split data by rows within specific index list.

DataBundle.select_where_all_true([inplace, ...])

Select rows from data where all column entries in mask are True.

DataBundle.select_where_all_false([inplace, ...])

Select rows from data where all column entries in mask are False.

DataBundle.select_where_entry_isin(selection)

Select rows from data where column entries are in a specific value list.

Validation#

DataBundle.validate_datetime([imodel])

Validate datetime information in data.

DataBundle.validate_id([imodel])

Validate station id information in data.

Map data to CDM tables#

DataBundle.map_model([imodel, inplace])

Map data to the Common Data Model.

Correction#

DataBundle.correct_datetime([imodel, inplace])

Correct datetime information in data.

DataBundle.correct_pt([imodel, inplace])

Correct platform type information in data.

Duplicate check#

DataBundle.duplicate_check([inplace])

Duplicate check in data.

DataBundle.flag_duplicates([inplace])

Flag detected duplicates in data.

DataBundle.get_duplicates(**kwargs)

Get duplicate matches in data.

DataBundle.remove_duplicates([inplace])

Remove detected duplicates in data.

Write data on disk#

DataBundle.write([dtypes, parse_dates, ...])

Write data on disk.

DataBundle’s property attributes#

DataBundle.columns

Column labels of data.

DataBundle.data

MDF pandas.DataFrame data.

DataBundle.dtypes

Dictionary of data types on data.

DataBundle.encoding

A string representing the encoding to use in the data.

DataBundle.imodel

Name of the MDF/CDM input model.

DataBundle.mask

MDF pandas.DataFrame validation mask.

DataBundle.mode

Data mode.

DataBundle.parse_dates

Information of how to parse dates in data.

Useful functions#

correct_datetime(data, imodel[, log_level, ...])

Apply ICOADS deck specific datetime corrections.

correct_pt(data, imodel[, log_level, _base])

Apply ICOADS deck specific platform ID corrections.

duplicate_check(data[, method, ...])

Run a duplicate check on a dataset using recordlinkage.

map_model(data, imodel[, cdm_subset, ...])

Map a pandas DataFrame to the CDM header and observational tables.

read(source[, mode])

Read either original marine-meteorological data or MDF data or CDM tables from disk.

read_data(data_file[, mask_file, info_file, ...])

Read MDF data which is already on a pre-defined data model.

read_mdf(source[, imodel, ext_schema_path, ...])

Read data files compliant with a user specific data model.

read_tables(source[, data_format, prefix, ...])

Read CDM-table-like files from file system to a pandas.DataFrame.

replace_columns(df_l, df_r[, pivot_c, ...])

Replace columns in one DataFrame using row-matching from another.

split_by_boolean(data, mask, boolean[, ...])

Split a DataFrame using a boolean mask via split_dataframe_by_boolean.

split_by_boolean_true(data, mask[, ...])

Split rows where all mask columns are True.

split_by_column_entries(data, selection[, ...])

Split a DataFrame based on matching values in a given column.

split_by_index(data, index[, reset_index, ...])

Split a DataFrame by selecting specific index labels.

unique(data[, columns])

Count unique values per column in a DataFrame or a Iterable of DataFrame.

validate_datetime(data, imodel[, blank, ...])

Validate datetime columns in a dataset according to the specified model.

validate_id(data, imodel[, blank, log_level])

Validate ID column(s) in a dataset against deck-specific patterns.

write(data[, mode])

Write either MDF data or CDM tables on disk.

write_data(data[, mask, data_format, ...])

Write pandas.DataFrame to MDF file on file system.

write_tables(data[, data_format, out_dir, ...])

Write pandas.DataFrame to CDM-table file on file system.

DupDetect#

DupDetect(data, compared, method, ...)

Class to detect, flag, and remove duplicate entries in a DataFrame using a comparison matrix from recordlinkage.

DupDetect.flag_duplicates([keep, limit, ...])

Get result dataset with flagged duplicates.

DupDetect.get_duplicates([keep, limit, ...])

Identify duplicate matches based on the comparison matrix.

DupDetect.remove_duplicates([keep, limit, ...])

Remove duplicate entries from the dataset.