cdm_reader_mapper package<a class="headerlink" href="#module-cdm_reader_mapper" title="Link to this heading">¶

Returns:

DataBundle or None – DataBundle with added information or None if “inplace=True”.

Examples

>>> tables = read_tables("path_to_files")
>>> db = db.add({"data": tables})

property columns: Index | MultiIndex¶

Column labels of data.

Returns:: pd.Index or pd.MultiIndex – Column labels of the underlying MDf data.

copy()[source]¶

Make deep copy of a DataBundle.

Return type:: DataBundle
Returns:: DataBundle – Copy of a DataBundle.

Examples

>>> db2 = db.copy()

correct_datetime(imodel=None, inplace=False, **kwargs)[source]¶

Correct datetime information in data.

Parameters:

imodel (str, optional) – Name of the MFD/CDM data model.
inplace (bool, default: False) – If True overwrite data in DataBundle else return a copy of DataBundle with datetime-corrected values in data.
**kwargs (Any) – Additional keyword-arguments for correcting datetime.

Return type:

Returns:

DataBundle or None – DataBundle with corrected datetime information or None if “inplace=True”.

See also

DataBundle.correct_pt: Correct platform type information in data.
DataBundle.validate_datetime: Validate datetime information in data.
DataBundle.validate_id: Validate station id information in data.

Notes

For more information see correct_datetime()

Examples

>>> df_dt = db.correct_datetime()

correct_pt(imodel=None, inplace=False, **kwargs)[source]¶

Correct platform type information in data.

Parameters:

imodel (str, optional) – Name of the MFD/CDM data model.
inplace (bool, default: True) – If True overwrite data in DataBundle else return a copy of DataBundle with platform-corrected values in data.
**kwargs (Any) – Additional keyword-arguments for correcting platform type.

Return type:

Returns:

DataBundle or None – DataBundle with corrected platform type information or None if “inplace=True”.

See also

DataBundle.correct_datetime: Correct datetime information in data.
DataBundle.validate_id: Validate station id information in data.
DataBundle.validate_datetime: Validate datetime information in data.

Notes

For more information see correct_pt()

Examples

>>> df_pt = db.correct_pt()

property data: DataFrame | ParquetStreamReader¶

Underlying MDF data.

Returns:: pd.DataFrame or ParquetStreamReader – Underlying MDf data.

property dtypes: Series | dict[str, Any] | None¶

Dictionary of data types on data.

Returns:: pd.Series or dict or None – Data types of underlying MDF data.

property encoding: str | None¶

A string representing the encoding to use in the data.

Returns:: str or None – String representing the encoding to use in the underlying MDF data.

See also

pd.to_csv(): Write data with encoding to CSV file.

property imodel: str | None¶

Name of the MDF/CDM input model.

Returns:: str or None – Name of the MDF/CDM input model if available.

map_model(imodel=None, inplace=False, **kwargs)[source]¶

Map data to the Common Data Model.

Parameters:

imodel (str, optional) – Name of the MFD/CDM data model.
inplace (bool, default: False) – If True overwrite data in DataBundle else return a copy of DataBundle with data as CDM tables.
**kwargs (Any) – Additional keyword-arguments for mapping to CDM.

Return type:

Returns:

DataBundle or None – DataBundle containing data mapped to the CDM or None if inplace=True.

Notes

For more information see map_model()

Examples

>>> cdm_tables = db.map_model()

property mask: DataFrame | ParquetStreamReader¶

MDF validation mask.

Returns:: pd.DataFrame or ParquetStreamReader – Validation mask of the underlying MDF data.

property mode: str¶

Data mode.

Returns:: str – Current data mode.
Raises:: TypeError – If mode of the underlying data is not a string.

property parse_dates: list[Any] | bool | None¶

Information of how to parse dates in data.

Returns:: list or bool or None – Information of how to parse dates in underlying MDF data.

See also

pd.read_csv(): Read CSV file using pandas.

replace_columns(df_corr, subset=None, inplace=False, **kwargs)[source]¶

Replace columns in data.

Parameters:

df_corr (pd.DataFrame) – Data to be inplaced.
subset (str or list of str, optional) – Select subset by columns. This option is useful for multi-indexed data.
inplace (bool, default: False) – If True overwrite data in DataBundle else return a copy of DataBundle with replaced column names in data.
**kwargs (Any) – Additional keyword-arguments for replacing columns.

Return type:

Returns:

DataBundle or None – DataBundle with replaced column names or None if “inplace=True”.

Notes

For more information see replace_columns()

Examples

>>> import pandas as pd
>>> df_corr = pd.read_csv("correction_file_on_disk")
>>> df_repl = db.replace_columns(df_corr)

select_where_all_false(inplace=False, do_mask=True, **kwargs)[source]¶

Select rows from data where all column entries in mask are False.

Parameters:

inplace (bool, default: False) – If True overwrite data in DataBundle else return a copy of DataBundle with invalid values only in data.
do_mask (bool, default: True) – If True also do selection on mask.
**kwargs (Any) – Additional keyword-arguments for splitting data where all entries are False.

Return type:

Returns:

DataBundle or None – DataBundle containing rows where all column entries in mask are False or None if inplace=True.

See also

DataBundle.select_where_all_true: Select rows from data where all entries in mask are True.
DataBundle.select_where_entry_isin: Select rows from data where column entries are in a specific value list.
DataBundle.select_where_index_isin: Select rows from data within specific index list.

Notes

For more information see split_by_boolean_false()

Examples

Select without overwriting the old data.

>>> db_selected = db.select_where_all_false()

Select valid values only with overwriting the old data.

>>> db.select_where_all_false(inplace=True)
>>> df_selected = db.data

select_where_all_true(inplace=False, do_mask=True, **kwargs)[source]¶

Select rows from data where all column entries in mask are True.

Parameters:

inplace (bool, default: False) – If True overwrite data in DataBundle else return a copy of DataBundle with valid values only in data.
do_mask (bool, default: True) – If True also do selection on mask.
**kwargs (Any) – Additional keyword-arguments for splitting data where all entries are True.

Return type:

Returns:

DataBundle or None – DataBundle containing rows where all column entries in mask are True or None if inplace=True.

See also

DataBundle.select_where_all_false: Select rows from data where all entries in mask are False.
DataBundle.select_where_entry_isin: Select rows from data where column entries are in a specific value list.
DataBundle.select_where_index_isin: Select rows from data within specific index list.

Notes

For more information see split_by_boolean_true()

Examples

Select without overwriting the old data.

>>> db_selected = db.select_where_all_true()

Select overwriting the old data.

>>> db.select_where_all_true(inplace=True)
>>> df_selected = db.data

select_where_entry_isin(selection, inplace=False, do_mask=True, **kwargs)[source]¶

Select rows from data where column entries are in a specific value list.

Parameters:

selection (dict) – Keys: Column names in data. Values: Specific value list.
inplace (bool, default: False) – If True overwrite data in DataBundle else return a copy of DataBundle with selected columns only in data.
do_mask (bool, default: True) – If True also do selection on mask.
**kwargs (Any) – Additional keyword-arguments for splitting data where entries within a specific value list.

Return type:

Returns:

DataBundle or None – DataBundle containing rows where column entries are in a specific value list or None if inplace=True.

See also

DataBundle.select_where_index_isin: Select rows from data within specific index list.
DataBundle.select_where_all_true: Select rows from data where all entries in mask are True.
DataBundle.select_where_all_false: Select rows from data where all entries in mask are False.

Notes

For more information see split_by_column_entries()

Examples

Select without overwriting the old data.

>>> db_selected = db.select_where_entry_isin(
...     selection={("c1", "B1"): [26, 41]},
... )

Select with overwriting the old data.

>>> db.select_where_entry_isin(selection={("c1", "B1"): [26, 41]}, inplace=True)
>>> df_selected = db.data

select_where_index_isin(index, inplace=False, do_mask=True, **kwargs)[source]¶

Select rows from data where indexes within a specific index list.

Parameters:

index (list of int) – Specific index list.
inplace (bool, default: False) – If True overwrite data in DataBundle else return a copy of DataBundle with selected rows only in data.
do_mask (bool, default: True) – If True also do selection on mask.
**kwargs (Any) – Additional keyword-arguments for splitting data where indexes within a specific index list.

Return type:

Returns:

DataBundle or None – DataBundle containing rows where indexes are within a specific index list or None if inplace=True.

See also

DataBundle.select_where_entry_isin: Select rows from data where column entries are in a specific value list.
DataBundle.select_where_all_true: Select rows from data where all entries in mask are True.
DataBundle.select_where_all_false: Select rows from data where all entries in mask are False.

Notes

For more information see split_by_index()

Examples

Select without overwriting the old data.

>>> db_selected = db.select_where_index_isin([0, 2, 4])

Select with overwriting the old data.

>>> db.select_where_index_isin(index=[0, 2, 4], inplace=True)
>>> df_selected = db.data

split_by_boolean_false(do_mask=True, **kwargs)[source]¶

Split data by rows where all column entries in mask are False.

Parameters:

do_mask (bool, default: True) – If True also do selection on mask.
**kwargs (Any) – Additional keyword-arguments for splitting data where mask is False.

Return type:

Returns:

tuple – First DataBundle including rows where all column entries in mask are False. Second DataBundle including rows where all column entries in mask are True.

See also

DataBundle.split_by_boolean_false: Split data by rows where all entries in mask are True.
DataBundle.split_by_column_entries: Split data by rows where column entries are in a specific value list.
DataBundle.split_by_index: Split data by rows within specific index list.

Notes

For more information see split_by_boolean_false()

Examples

Split DataBundle.

>>> db_false, db_true = db.split_by_boolean_false()

split_by_boolean_true(do_mask=True, **kwargs)[source]¶

Split data by rows where all column entries in mask are True.

Parameters:

do_mask (bool, default: True) – If True also do selection on mask.
**kwargs (Any) – Additional keyword-arguments for splitting data where mask is False.

Return type:

Returns:

tuple – First DataBundle including rows where all column entries in mask are True. Second DataBundle including rows where all column entries in mask are False.

See also

DataBundle.split_by_boolean_false: Split data by rows where all entries in mask are False.
DataBundle.split_by_column_entries: Split data by rows where column entries are in a specific value list.
DataBundle.split_by_index: Split data by rows within specific index list.

Notes

For more information see split_by_boolean_true()

Examples

Split DataBundle.

>>> db_true, db_false = db.split_by_boolean_true()

split_by_column_entries(selection, do_mask=True, **kwargs)[source]¶

Split data by rows where column entries are in a specific value list.

Parameters:

selection (dict) – Keys: Column names in data. Values: Specific value list.
do_mask (bool, default: True) – If True also do selection on mask.
**kwargs (Any) – Additional keyword-arguments for splitting data by column entries.

Return type:

Returns:

tuple – First DataBundle including rows where column entries are in a specific value list. Second DataBundle including rows where column entries are not in a specific value list.

See also

DataBundle.split_by_index: Split data by rows within specific index list.
DataBundle.split_by_boolean_true: Split data by rows where all entries in mask are True.
DataBundle.split_by_boolean_false: Split data by rows where all entries in mask are False.

Notes

For more information see split_by_column_entries()

Examples

Split DataBundle.

>>> db_isin, db_isnotin = db.split_by_column_entries(
...     selection={("c1", "B1"): [26, 41]},
... )

split_by_index(index, do_mask=True, **kwargs)[source]¶

Split data by rows within specific index list.

Parameters:

index (list of int) – Specific index list.
do_mask (bool, default: True) – If True also do selection on mask.
**kwargs (Any) – Additional keyword-arguments for splitting data by index.

Return type:

Returns:

tuple – First DataBundle including rows within specific index list. Second DataBundle including rows outside specific index list.

See also

DataBundle.split_by_column_entries: Select columns from data with specific values.
DataBundle.split_by_boolean_true: Split data by rows where all entries in mask are True.
DataBundle.split_by_boolean_false: Split data by rows where all entries in mask are False.

Notes

For more information see split_by_index()

Examples

Split DataBundle.

>>> db_isin, db_isnotin = db.split_by_index([0, 2, 4])

stack_h(other, datasets=('data', 'mask'), inplace=False, **kwargs)[source]¶

Stack multiple DataBundle’s horizontally.

Parameters:

other (DataBundle or Sequence of DataBundle) – List of other DataBundle to stack horizontally.
datasets (str or Sequence of str, default: [data, mask]) – List of datasets to be stacked.
inplace (bool, default: False) – If True overwrite datasets in DataBundle else return a copy of DataBundle with stacked datasets.
**kwargs (Any) – Additional keyword-arguments for stacking DataFrames horizontally.

Return type:

Returns:

DataBundle or None – Horizontally stacked DataBundle or None if inplace=True.

See also

DataBundle.stack_v: Stack multiple DataBundle’s vertically.

Notes

This is only working with pd.DataFrames, not with iterables of pd.DataFrames!
The DataFrames in the DataBundle may have different data columns!

Examples

>>> db = db1.stack_h(db2, datasets=["data", "mask"])

stack_v(other, datasets=('data', 'mask'), inplace=False, **kwargs)[source]¶

Stack multiple DataBundle’s vertically.

Parameters:

other (DataBundle or Sequence of DataBundle) – List of other DataBundle to stack vertically.
datasets (str or Sequence of str, default: (data, mask)) – List of datasets to be stacked.
inplace (bool, default: False) – If True overwrite datasets in DataBundle else return a copy of DataBundle with stacked datasets.
**kwargs (Any) – Additional keyword-arguments for stacking DataFrames vertically.

Return type:

DataFrame | ParquetStreamReader

Returns:

DataBundle or None – Vertically stacked DataBundle or None if “inplace=True”.

See also

DataBundle.stack_h: Stack multiple DataBundle’s horizontally.

Notes

This is only working with pd.DataFrames, not with iterables of pd.DataFrames!
The DataFrames in the DataBundle have to have the same data columns!

Examples

>>> db = db1.stack_v(db2, datasets=["data", "mask"])

unique(**kwargs)[source]¶

Get unique values of data.

Parameters:: **kwargs (Any) – Additional keyword-arguments for getting unique values.
Return type:: dict[str | tuple[str, str], dict[Any, int]]
Returns:: dict – Dictionary with unique values.

Notes

For more information see unique()

Examples

>>> db.unique(columns=("c1", "B1"))

validate_datetime(imodel=None, **kwargs)[source]¶

Validate datetime information in data.

Parameters:

imodel (str, optional) – Name of the MFD/CDM data model.
**kwargs (Any) – Additional keyword-arguments for validating datetime.

Return type:

DataFrame

Returns:

pd.DataFrame – DataFrame containing True and False values for each index in data. True: All datetime information in data row are valid. False: At least one datetime information in data row is invalid.

See also

DataBundle.validate_id: Validate station id information in data.
DataBundle.correct_datetime: Correct datetime information in data.
DataBundle.correct_pt: Correct platform type information in data.

Notes

For more information see validate_datetime()

Examples

>>> val_dt = db.validate_datetime()

validate_id(imodel=None, **kwargs)[source]¶

Validate station id information in data.

Parameters:

imodel (str, optional) – Name of the MFD/CDM data model.
**kwargs (Any) – Additional keyword-arguments for validating station id.

Return type:

DataFrame

Returns:

pd.DataFrame – DataFrame containing True and False values for each index in data. True: All station ID information in data row are valid. False: At least one station ID information in data row is invalid.

See also

DataBundle.validate_datetime: Validate datetime information in data.
DataBundle.correct_pt: Correct platform type information in data.
DataBundle.correct_datetime: Correct datetime information in data.

Notes

For more information see validate_id()

Examples

>>> val_dt = db.validate_id()

write(dtypes=None, parse_dates=None, encoding=None, mode=None, **kwargs)[source]¶

Write data on disk.

Parameters:

dtypes (dict, optional) – Data types of data.
parse_dates (list or bool, optional) – Information how to parse dates on data.
encoding (str, optional) – The encoding of the input file. Overrides the value in the imodel schema file.
mode ({data, tables}, optional) – Data mode.
**kwargs (Any) – Additional keword-arguments for writing data in disk.

See also

write_data: Write MDF data and validation mask to disk.
write_tables: Write CDM tables to disk.
read: Read original marine-meteorological data as well as MDF data or CDM tables from disk.
read_data: Read MDF data and validation mask from disk.
read_mdf: Read original marine-meteorological data from disk.

Return type:: None

Notes

If mode is “data” write data using write_data(). If mode is “tables” write data using write_tables().

Examples

>>> db.write()
read_tables : Read CDM tables from disk.

cdm_reader_mapper.correct_datetime(data, imodel, log_level='INFO', base=None)[source]¶

Apply ICOADS deck specific datetime corrections.

Parameters:

data (pandas.DataFrame or Iterable[pd.DataFrame]) – Input dataset.
imodel (str) – Name of internally available data model, e.g. icoads_d300_704.
log_level (str, default: INFO) – Level of logging information to save.
base (str, optional) – Base path for datetime correction metadata. If None use internal correction path.

Return type:

DataFrame | Iterable[DataFrame]

Returns:

pandas.DataFrame or Iterable[pd.DataFrame] – A pandas.DataFrame or Iterable[pd.DataFrame] with the adjusted data.

Raises:

ValueError – If _correct_dt raises an error during correction.
TypeError – If data is not a pd.DataFrame or an Iterable[pd.DataFrame]. If data is a pd.Series.

cdm_reader_mapper.correct_pt(data, imodel, log_level='INFO', base=None)[source]¶

Apply ICOADS deck specific platform ID corrections.

Parameters:

data (pandas.DataFrame or Iterable[pd.DataFrame]) – Input dataset.
imodel (str) – Name of internally available data model, e.g. icoads_d300_704.
log_level (str, default: INFO) – Level of logging information to save.
base (str, optional) – Base path for datetime correction metadata. If None use internal correction path.

Return type:

DataFrame | Iterable[DataFrame]

Returns:

pandas.DataFrame or Iterable[pd.DataFrame] – A pandas.DataFrame or Iterable[pd.DataFrame] with the adjusted data.

Raises:

ValueError – If _correct_pt raises an error during correction. If platform column is not defined in properties file.
TypeError – If data is not a pd.DataFrame or an Iterable[pd.DataFrame]. If data is a pd.Series.

cdm_reader_mapper.map_model(data, imodel, cdm_subset=None, codes_subset=None, cdm_complete=True, drop_missing_obs=True, drop_duplicates=True, log_level='INFO')[source]¶

Map a pandas DataFrame to the CDM header and observational tables.

Parameters:

data (pandas.DataFrame or Iterable[pd.DataFrame]) – Input data to map.
imodel (str) – A specific mapping from generic data model to CDM, like map a SID-DCK from IMMA1’s core and attachments to CDM in a specific way, e.g. icoads_r300_d704.
cdm_subset (str or list, optional) – Subset of CDM model tables to map. Defaults to the full set of CDM tables defined for the imodel.
codes_subset (str or list, optional) – Subset of code mapping tables to map. Default to the full set of code mapping tables defined for the imodel.
cdm_complete (bool, default: True) – If True map entire CDM tables list.
drop_missing_obs (bool, default: True) – If True Drop observations without a valid observation value, e.g. no air_temperature value.
drop_duplicates (bool, default: True) – If True drop duplicated rows.
log_level (str, default: INFO) – Level of logging information to save.

Return type:

Returns:

cdm_tables (pandas.DataFrame) – DataFrame with MultiIndex columns (cdm_table, column_name).

Raises:

ValueError –
- If imodel is not defined. - If first split entry (‘_’) of imodel is not defined. - If mapping does not return a DataFame.
TypeError –
- If type of imodel is not supported. - If anything during mapping fails.

cdm_reader_mapper.read(source, mode='mdf', **kwargs)[source]¶

Read either original marine-meteorological data or MDF data or CDM tables from disk.

Parameters:

source (str) – Source of the input data.
mode (str, {mdf, data, tables}, default: mdf) –

Read data mode:
- “mdf” to read original marine-meteorological data from disk and convert them to MDF data
- “data” to read MDF data from disk
- “tables” to read CDM tables from disk. Map MDF data to CDM tables with DataBundle.map_model().
**kwargs (Any) – Additional keyword-arguments passed to reader function.

Return type:

Returns:

DataBundle – Containing read data as pd.DataFrame or Iterable of pd.DataFrames.

See also

read_mdf: Read original marine-meteorological data from disk.
read_data: Read MDF data and validation mask from disk.
read_tables: Read CDM tables from disk.
write: Write either MDF data or CDM tables on disk.
write_data: Write MDF data and validation mask to disk.
write_tables: Write CDM tables to disk.

Notes

kwargs are the keyword arguments for the specific mode reader.

cdm_reader_mapper.read_data(data_file, mask_file=None, info_file=None, data_format='parquet', imodel=None, col_subset=None, encoding=None, delimiter=None, **kwargs)[source]¶

Read MDF data which is already on a pre-defined data model.

Parameters:

data_file (str) – The data file (including path) to be read.
mask_file (str, optional) – The validation file (including path) to be read.
info_file (str, optional) – The information file (including path) to be read.
data_format ({"csv", "parquet", "feather"}, default: "parquet") – Format of input data file(s).
imodel (str, optional) – Name of internally available input data model, e.g. icoads_r300_d704.
col_subset (str, tuple or list, optional) – Specify the section or sections of the file to write.
- For multiple sections of the tables: e.g col_subset = [columns0,…,columnsN]
- For a single section: e.g. list type object col_subset = [columns]
Column labels could be both string or tuple.
encoding (str, optional) – The encoding of the input file. Overrides the value in the imodel schema file.
delimiter (str, optional) – The delimiter used in the input file. Overrides the value in the imodel schema file.
**kwargs (Any) – Key-word arguments that will be passed to read fuunction.

Return type:

Returns:

cdm_reader_mapper.DataBundle – DataBundle containing MDF data.

See also

read: Read original marine-meteorological data as well as MDF data or CDM tables from disk.
read_mdf: Read original marine-meteorological data from disk.
read_tables: Read CDM tables from disk.
write: Write both MDF data or CDM tables to disk.
write_data: Write MDF data and validation mask to disk.
write_tables: Write CDM tables to disk.

cdm_reader_mapper.read_mdf(source, imodel=None, ext_schema_path=None, ext_schema_file=None, ext_table_path=None, year_init=None, year_end=None, encoding=None, chunksize=None, skiprows=None, convert_flag=True, converter_dict=None, converter_kwargs=None, decode_flag=True, decoder_dict=None, validate_flag=True, sections=None, excludes=None, pd_kwargs=None, xr_kwargs=None)[source]¶

Read data files compliant with a user specific data model.

Reads a data file to a pandas DataFrame using a pre-defined data model. Read data is validates against its data model producing a boolean mask on output.

The data model needs to be input to the module as a named model (included in the module) or as the path to a valid data model.

Parameters:

source (str) – The file (including path) to be read.
imodel (str) – Name of internally available input data model, e.g. icoads_r300_d704.
ext_schema_path (str or Path-like, optional) – The path to the external input data model schema file. The schema file must have the same name as the directory. One of imodel and ext_schema_path or ext_schema_file must be set.
ext_schema_file (str or Path-like, optional) – The external input data model schema file. One of imodel and ext_schema_path or ext_schema_file must be set.
ext_table_path (str or Path-like, optional) – The path to the external table file. The table file must have the same name as the directory.
year_init (str or int, optional) – Left border of time axis.
year_end (str or int, optional) – Right border of time axis.
encoding (str, optional) – The encoding of the input file. Overrides the value in the imodel schema file.
chunksize (int, optional) – Number of reports per chunk.
skiprows (int, optional) – Number of initial rows to skip from file.
convert_flag (bool, default: True) – If True convert entries by using a pre-defined data model.
converter_dict (dict of {Hashable: func}, optional) – Functions for converting values in specific columns. If None use information from a pre-defined data model.
converter_kwargs (dict of {Hashable: kwargs}, optional) – Key-word arguments for converting values in specific columns. If None use information from a pre-defined data model.
decode_flag (bool, default: True) – If True decode entries by using a pre-defined data model.
decoder_dict (dict of {Hashable: func}, optional) – Functions for decoding values in specific columns. If None use information from a pre-defined data model.
validate_flag (bool, default: True) – Validate data entries by using a pre-defined data model.
sections (list, optional) – List with subset of data model sections to output. If None read pre-defined data model sections.
excludes (str or list of str, optional) – MDF Sections to exclude.
pd_kwargs (dict, optional) – Additional pandas arguments.
xr_kwargs (dict, optional) – Additional xarray arguments.

Return type:

Returns:

cdm_reader_mapper.DataBundle – DaaBundle containing MDF data.

See also

read: Read either original marine-meteorological or MDF data or CDM tables from disk.
read_data: Read MDF data and validation mask from disk.
read_tables: Read CDM tables from disk.
write: Write either MDF data or CDM tables to disk.
write_data: Write MDF data and validation mask to disk.
write_tables: Write CDM tables to disk.

cdm_reader_mapper.read_tables(source, data_format='parquet', prefix=None, suffix=None, extension=None, separator='-', cdm_subset=None, col_subset=None, delimiter='|', na_values=None, null_label='null', from_str=None, to_str=None, imodel=None, **kwargs)[source]¶

Read CDM-table-like files from file system to a pandas.DataFrame.

Parameters:

source (str) – The file (including path) or the path to the file(s) to be read.
data_format ({"csv", "parquet", "feather"}, default: "parquet") – Format of input data file(s).
prefix (str, optional) – Prefix of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path.
suffix (str, optional) – Suffix of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path.
extension (str, optional) – Extension of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path.
separator (str, default: -) – Separator to join the file name pattern components.
cdm_subset (str or list, optional) – Specifies a subset of tables or a single table.
- For multiple subsets of tables: This function returns a pandas.DataFrame that is multi-index at the columns, with (table-name, field) as column names. Tables are merged via the report_id field.
- For a single table: This function returns a pandas.DataFrame with a simple indexing for the columns.
Required if source is a valid file name.
col_subset (str, list or dict, optional) – Specify the section or sections of the file to read.
- For multiple sections of the tables: e.g col_subset = {table0:[columns0],…tableN:[columnsN]}
- For a single section: e.g. list type object col_subset = [columns] This variable assumes that the column names are all conform to the cdm field names.
delimiter (str, default: |) – Character or regex pattern to treat as the delimiter while reading with pandas.read_csv.
na_values (hashable, iterable of hashable or dict of {Hashable: Iterable}, optional) – Additional strings to recognize as Na/NaN while reading input file with pandas.read_csv. For more details see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
null_label (str, default: null) – String how to label non valid values in data.
from_str (bool, optional) – If True convert original string data to imodel-specific data types.
to_str (bool, optional) – If True convert original imodel-specific data types to strings.
imodel (str , *optional*) – Name of data model, e.g. icoads. Must be set if either from_str or to_str is set.
**kwargs (Any) – Additional keyword-arguments pass to data reader.

Return type: