cdm_reader_mapper.mdf_reader package

Common Data Model (CDM) MDF reader package.

Subpackages

Submodules

cdm_reader_mapper.mdf_reader.properties module

Common Data Model (CDM) reader properties.

cdm_reader_mapper.mdf_reader.reader module

Common Data Model (CDM) MDF reader.

cdm_reader_mapper.mdf_reader.reader.read_data(data_file, mask_file=None, info_file=None, data_format='parquet', imodel=None, col_subset=None, encoding=None, delimiter=None, **kwargs)[source]

Read MDF data which is already on a pre-defined data model.

Parameters:
  • data_file (str) – The data file (including path) to be read.

  • mask_file (str, optional) – The validation file (including path) to be read.

  • info_file (str, optional) – The information file (including path) to be read.

  • data_format ({"csv", "parquet", "feather"}, default: "parquet") – Format of input data file(s).

  • imodel (str, optional) – Name of internally available input data model, e.g. icoads_r300_d704.

  • col_subset (str, tuple or list, optional) – Specify the section or sections of the file to write.

    • For multiple sections of the tables: e.g col_subset = [columns0,…,columnsN]

    • For a single section: e.g. list type object col_subset = [columns]

    Column labels could be both string or tuple.

  • encoding (str, optional) – The encoding of the input file. Overrides the value in the imodel schema file.

  • delimiter (str, optional) – The delimiter used in the input file. Overrides the value in the imodel schema file.

  • **kwargs (Any) – Key-word arguments that will be passed to read fuunction.

Return type:

DataBundle

Returns:

cdm_reader_mapper.DataBundle – DataBundle containing MDF data.

See also

read

Read original marine-meteorological data as well as MDF data or CDM tables from disk.

read_mdf

Read original marine-meteorological data from disk.

read_tables

Read CDM tables from disk.

write

Write both MDF data or CDM tables to disk.

write_data

Write MDF data and validation mask to disk.

write_tables

Write CDM tables to disk.

cdm_reader_mapper.mdf_reader.reader.read_mdf(source, imodel=None, ext_schema_path=None, ext_schema_file=None, ext_table_path=None, year_init=None, year_end=None, encoding=None, chunksize=None, skiprows=None, convert_flag=True, converter_dict=None, converter_kwargs=None, decode_flag=True, decoder_dict=None, validate_flag=True, sections=None, excludes=None, pd_kwargs=None, xr_kwargs=None)[source]

Read data files compliant with a user specific data model.

Reads a data file to a pandas DataFrame using a pre-defined data model. Read data is validates against its data model producing a boolean mask on output.

The data model needs to be input to the module as a named model (included in the module) or as the path to a valid data model.

Parameters:
  • source (str) – The file (including path) to be read.

  • imodel (str) – Name of internally available input data model, e.g. icoads_r300_d704.

  • ext_schema_path (str or Path-like, optional) – The path to the external input data model schema file. The schema file must have the same name as the directory. One of imodel and ext_schema_path or ext_schema_file must be set.

  • ext_schema_file (str or Path-like, optional) – The external input data model schema file. One of imodel and ext_schema_path or ext_schema_file must be set.

  • ext_table_path (str or Path-like, optional) – The path to the external table file. The table file must have the same name as the directory.

  • year_init (str or int, optional) – Left border of time axis.

  • year_end (str or int, optional) – Right border of time axis.

  • encoding (str, optional) – The encoding of the input file. Overrides the value in the imodel schema file.

  • chunksize (int, optional) – Number of reports per chunk.

  • skiprows (int, optional) – Number of initial rows to skip from file.

  • convert_flag (bool, default: True) – If True convert entries by using a pre-defined data model.

  • converter_dict (dict of {Hashable: func}, optional) – Functions for converting values in specific columns. If None use information from a pre-defined data model.

  • converter_kwargs (dict of {Hashable: kwargs}, optional) – Key-word arguments for converting values in specific columns. If None use information from a pre-defined data model.

  • decode_flag (bool, default: True) – If True decode entries by using a pre-defined data model.

  • decoder_dict (dict of {Hashable: func}, optional) – Functions for decoding values in specific columns. If None use information from a pre-defined data model.

  • validate_flag (bool, default: True) – Validate data entries by using a pre-defined data model.

  • sections (list, optional) – List with subset of data model sections to output. If None read pre-defined data model sections.

  • excludes (str or list of str, optional) – MDF Sections to exclude.

  • pd_kwargs (dict, optional) – Additional pandas arguments.

  • xr_kwargs (dict, optional) – Additional xarray arguments.

Return type:

DataBundle

Returns:

cdm_reader_mapper.DataBundle – DaaBundle containing MDF data.

See also

read

Read either original marine-meteorological or MDF data or CDM tables from disk.

read_data

Read MDF data and validation mask from disk.

read_tables

Read CDM tables from disk.

write

Write either MDF data or CDM tables to disk.

write_data

Write MDF data and validation mask to disk.

write_tables

Write CDM tables to disk.

cdm_reader_mapper.mdf_reader.reader.validate_read_mdf_args(*, source, imodel=None, ext_schema_path=None, ext_schema_file=None, year_init=None, year_end=None, chunksize=None, skiprows=None)[source]

Validate arguments for reading an MDF file.

This function performs validation on file paths and numeric arguments required for reading an MDF dataset.

Parameters:
  • source (str or Path-like) – Source of input dataset.

  • imodel (str, optional) – Name of data model, e.g. icoads_r300_d721.

  • ext_schema_path (str or Path-like, optional) – Directory of external schema file.

  • ext_schema_file (str or Path-like, optional) – Path of external schema file.

  • year_init (int, optional) – Initial valid year.

  • year_end (int, optional) – End valid year.

  • chunksize (int, optional) – Number of lines to read from the file per chunk.

  • skiprows (int, optional) – Number of lines to skip at the start of the file.

Raises:
  • FileNotFoundError – If the source file does not exist.

  • ValueError

    • If one of imodel or ext_schema_path/ext_schema_file is not provided. - If chunksize is 0 or negative. - If skiprows is negative. - If year_init is greater than year_end. - If any input parameter does not match requested types.

Return type:

None

cdm_reader_mapper.mdf_reader.writer module

Common Data Model (CDM) MDF writer.

cdm_reader_mapper.mdf_reader.writer.write_data(data, mask=None, data_format='parquet', dtypes=None, parse_dates=False, encoding='utf-8', out_dir='.', prefix=None, suffix=None, extension=None, filename=None, separator='_', col_subset=None, delimiter=',', **kwargs)[source]

Write pandas.DataFrame to MDF file on file system.

Parameters:
  • data (pandas.DataFrame or Iterable[pd.DataFrame]) – Data to export.

  • mask (pandas.DataFrame or Iterable[pd.DataFrame], optional) – Validation mask to export.

  • data_format ({"csv", "parquet", "feather"}, default: "parquet") – Format of output data file(s).

  • dtypes (dict, optional) – Dictionary of data types on data. Dump dtypes and parse_dates to json information file.

  • parse_dates (list | bool, default: False) – Information of how to parse dates in data. Dump dtypes and parse_dates to json information file. For more information see pandas.read_csv().

  • encoding (str, default: "utf-8") – A string representing the encoding to use in the output file, defaults to utf-8.

  • out_dir (str, default: ".") – Path to the output directory.

  • prefix (str, optional) – Prefix of file name structure: <prefix>-data-*<suffix>.<extension>.

  • suffix (str, optional) – Suffix of file name structure: <prefix>-data-*<suffix>.<extension>.

  • extension (str, optional) – Extension of file name structure: <prefix>-data-*<suffix>.<extension>. By default, extension depends on data_format.

  • filename (str or dict, optional) – Name of the output file name(s). List one filename for both data and mask ({“data”:<filenameD>, “mask”:<filenameM>}). By default, automatically create file name from table name, prefix and suffix.

  • separator (str, optional) – Separator to join the file name pattern components (default “_”).

  • col_subset (str, tuple or list, optional) – Specify the section or sections of the file to write.

    • For multiple sections of the tables: e.g col_subset = [columns0,…,columnsN]

    • For a single section: e.g. list type object col_subset = [columns]

    Column labels could be both string or tuple.

  • delimiter (str, default: ",") – Character or regex pattern to treat as the delimiter while reading with df.to_csv.

  • **kwargs (Any) – Additional keyword-arguments passed to to_csv when data_format is ‘csv’.

Raises:

ValueError – If data_foramt is not one of ‘csv’, ‘parquet’ or ‘feather’. If type of data and type of mask do not match.

See also

write

Write either MDF data or CDM tables to disk.

write_tables

Write CDM tables to disk.

read

Read either original marine-meteorological data or MDF data or CDM tables from disk.

read_data

Read MDF data and validation mask from disk.

read_mdf

Read original marine-meteorological data from disk.

read_tables

Read CDM tables from disk.

Notes

Use this function after reading MDF data.

Return type:

None