cdm_reader_mapper.DataBundle

cdm_reader_mapper.DataBundle#

class cdm_reader_mapper.DataBundle(*args, **kwargs)[source]#

Class for manipulating the MDF data and mapping it to the CDM.

Parameters:
  • data (pd.DataFrame or Iterable[pd.DataFrame], optional) – MDF DataFrame.

  • columns (pd.Index, pd.MultiIndex or list, optional) – Column labels of data

  • dtypes (pd.Series or dict, optional) – Data types of data.

  • parse_dates (list or bool, optional) – Information how to parse dates on data

  • mask (pandas.DataFrame, optional) – MDF validation mask

  • imodel (str, optional) – Name of the MFD/CDM data model.

  • mode (str) – Data mode (“data” or “tables”) Default: “data”

Examples

Getting a DataBundle while reading data from disk.

>>> from cdm_reader_mapper import read_mdf
>>> db = read_mdf(source="file_on_disk", imodel="custom_model_name")

Constructing a DataBundle from already read MDf data.

>>> from cdm_reader_mapper import DataBundle
>>> read = read_mdf(source="file_on_disk", imodel="custom_model_name")
>>> data_ = read.data
>>> mask_ = read.mask
>>> db = DataBundle(data=data_, mask=mask_)

Constructing a DataBundle from already read CDM data.

>>> from cdm_reader_mapper import read_tables
>>> tables = read_tables("path_to_files").data
>>> db = DataBundle(data=tables, mode="tables")
__init__(*args, **kwargs)[source]#

Methods

__init__(*args, **kwargs)

add(addition[, inplace])

Adding information to a DataBundle.

copy()

Make deep copy of a DataBundle.

correct_datetime([imodel, inplace])

Correct datetime information in data.

correct_pt([imodel, inplace])

Correct platform type information in data.

duplicate_check([inplace])

Duplicate check in data.

flag_duplicates([inplace])

Flag detected duplicates in data.

get_duplicates(**kwargs)

Get duplicate matches in data.

map_model([imodel, inplace])

Map data to the Common Data Model.

remove_duplicates([inplace])

Remove detected duplicates in data.

replace_columns(df_corr[, subset, inplace])

Replace columns in data.

select_where_all_false([inplace, do_mask])

Select rows from data where all column entries in mask are False.

select_where_all_true([inplace, do_mask])

Select rows from data where all column entries in mask are True.

select_where_entry_isin(selection[, ...])

Select rows from data where column entries are in a specific value list.

select_where_index_isin(index[, inplace, ...])

Select rows from data where indexes within a specific index list.

split_by_boolean_false([do_mask])

Split data by rows where all column entries in mask are False.

split_by_boolean_true([do_mask])

Split data by rows where all column entries in mask are True.

split_by_column_entries(selection[, do_mask])

Split data by rows where column entries are in a specific value list.

split_by_index(index[, do_mask])

Split data by rows within specific index list.

stack_h(other[, datasets, inplace])

Stack multiple DataBundle's horizontally.

stack_v(other[, datasets, inplace])

Stack multiple DataBundle's vertically.

unique(**kwargs)

Get unique values of data.

validate_datetime([imodel])

Validate datetime information in data.

validate_id([imodel])

Validate station id information in data.

write([dtypes, parse_dates, encoding, mode])

Write data on disk.

Attributes

columns

Column labels of data.

data

MDF pandas.DataFrame data.

dtypes

Dictionary of data types on data.

encoding

A string representing the encoding to use in the data.

imodel

Name of the MDF/CDM input model.

mask

MDF pandas.DataFrame validation mask.

mode

Data mode.

parse_dates

Information of how to parse dates in data.