API reference#

This page provides an auto-generated summary of the cdm_reader_mapper API.

Read data from disk#

`read`(source[, mode])	Read either original marine-meteorological data or MDF data or CDM tables from disk.
`read_data`(data_file[, mask_file, info_file, ...])	Read MDF data which is already on a pre-defined data model.
`read_mdf`(source[, imodel, ext_schema_path, ...])	Read data files compliant with a user specific data model.
`read_tables`(source[, data_format, prefix, ...])	Read CDM-table-like files from file system to a pandas.DataFrame.

DataBundle#

DataBundle(*args, **kwargs)

Class for manipulating the MDF data and mapping it to the CDM.

DataBundle’s method functions#

Information#

DataBundle.unique(**kwargs)

Get unique values of data.

Manipulation#

`DataBundle.add`(addition[, inplace])	Adding information to a `DataBundle`.
`DataBundle.copy`()	Make deep copy of a `DataBundle`.
`DataBundle.replace_columns`(df_corr[, ...])	Replace columns in `data`.
`DataBundle.stack_h`(other[, datasets, inplace])	Stack multiple `DataBundle`'s horizontally.
`DataBundle.stack_v`(other[, datasets, inplace])	Stack multiple `DataBundle`'s vertically.

Selection#

`DataBundle.split_by_boolean_true`([do_mask])	Split `data` by rows where all column entries in `mask` are True.
`DataBundle.split_by_boolean_false`([do_mask])	Split `data` by rows where all column entries in `mask` are False.
`DataBundle.split_by_column_entries`(selection)	Split `data` by rows where column entries are in a specific value list.
`DataBundle.split_by_index`(index[, do_mask])	Split `data` by rows within specific index list.
`DataBundle.select_where_all_true`([inplace, ...])	Select rows from `data` where all column entries in `mask` are True.
`DataBundle.select_where_all_false`([inplace, ...])	Select rows from `data` where all column entries in `mask` are False.
`DataBundle.select_where_entry_isin`(selection)	Select rows from `data` where column entries are in a specific value list.

Validation#

`DataBundle.validate_datetime`([imodel])	Validate datetime information in `data`.
`DataBundle.validate_id`([imodel])	Validate station id information in `data`.

Map data to CDM tables#

DataBundle.map_model([imodel, inplace])

Map data to the Common Data Model.

Correction#

`DataBundle.correct_datetime`([imodel, inplace])	Correct datetime information in `data`.
`DataBundle.correct_pt`([imodel, inplace])	Correct platform type information in `data`.

Duplicate check#

`DataBundle.duplicate_check`([inplace])	Duplicate check in `data`.
`DataBundle.flag_duplicates`([inplace])	Flag detected duplicates in `data`.
`DataBundle.get_duplicates`(**kwargs)	Get duplicate matches in `data`.
`DataBundle.remove_duplicates`([inplace])	Remove detected duplicates in `data`.

Write data on disk#

DataBundle.write([dtypes, parse_dates, ...])

Write data on disk.

DataBundle’s property attributes#

`DataBundle.columns`	Column labels of `data`.
`DataBundle.data`	MDF pandas.DataFrame data.
`DataBundle.dtypes`	Dictionary of data types on `data`.
`DataBundle.encoding`	A string representing the encoding to use in the `data`.
`DataBundle.imodel`	Name of the MDF/CDM input model.
`DataBundle.mask`	MDF pandas.DataFrame validation mask.
`DataBundle.mode`	Data mode.
`DataBundle.parse_dates`	Information of how to parse dates in `data`.

Useful functions#

`correct_datetime`(data, imodel[, log_level, ...])	Apply ICOADS deck specific datetime corrections.
`correct_pt`(data, imodel[, log_level, _base])	Apply ICOADS deck specific platform ID corrections.
`duplicate_check`(data[, method, ...])	Run a duplicate check on a dataset using recordlinkage.
`map_model`(data, imodel[, cdm_subset, ...])	Map a pandas DataFrame to the CDM header and observational tables.
`read`(source[, mode])	Read either original marine-meteorological data or MDF data or CDM tables from disk.
`read_data`(data_file[, mask_file, info_file, ...])	Read MDF data which is already on a pre-defined data model.
`read_mdf`(source[, imodel, ext_schema_path, ...])	Read data files compliant with a user specific data model.
`read_tables`(source[, data_format, prefix, ...])	Read CDM-table-like files from file system to a pandas.DataFrame.
`replace_columns`(df_l, df_r[, pivot_c, ...])	Replace columns in one DataFrame using row-matching from another.
`split_by_boolean`(data, mask, boolean[, ...])	Split a DataFrame using a boolean mask via `split_dataframe_by_boolean`.
`split_by_boolean_true`(data, mask[, ...])	Split rows where all mask columns are `True`.
`split_by_column_entries`(data, selection[, ...])	Split a DataFrame based on matching values in a given column.
`split_by_index`(data, index[, reset_index, ...])	Split a DataFrame by selecting specific index labels.
`unique`(data[, columns])	Count unique values per column in a DataFrame or a Iterable of DataFrame.
`validate_datetime`(data, imodel[, blank, ...])	Validate datetime columns in a dataset according to the specified model.
`validate_id`(data, imodel[, blank, log_level])	Validate ID column(s) in a dataset against deck-specific patterns.
`write`(data[, mode])	Write either MDF data or CDM tables on disk.
`write_data`(data[, mask, data_format, ...])	Write pandas.DataFrame to MDF file on file system.
`write_tables`(data[, data_format, out_dir, ...])	Write pandas.DataFrame to CDM-table file on file system.

DupDetect#

DupDetect(data, compared, method, ...)

Class to detect, flag, and remove duplicate entries in a DataFrame using a comparison matrix from recordlinkage.

`DupDetect.flag_duplicates`([keep, limit, ...])	Get result dataset with flagged duplicates.
`DupDetect.get_duplicates`([keep, limit, ...])	Identify duplicate matches based on the comparison matrix.
`DupDetect.remove_duplicates`([keep, limit, ...])	Remove duplicate entries from the dataset.

API reference

Contents