cdm_reader_mapper.mdf_reader package¶
Common Data Model (CDM) MDF reader package.
Subpackages¶
- cdm_reader_mapper.mdf_reader.codes package
- cdm_reader_mapper.mdf_reader.schemas package
- cdm_reader_mapper.mdf_reader.utils package
- Submodules
- cdm_reader_mapper.mdf_reader.utils.convert_and_decode module
- cdm_reader_mapper.mdf_reader.utils.filereader module
- cdm_reader_mapper.mdf_reader.utils.parser module
- cdm_reader_mapper.mdf_reader.utils.utilities module
- cdm_reader_mapper.mdf_reader.utils.validators module
Submodules¶
cdm_reader_mapper.mdf_reader.properties module¶
Common Data Model (CDM) reader properties.
cdm_reader_mapper.mdf_reader.reader module¶
Common Data Model (CDM) MDF reader.
- cdm_reader_mapper.mdf_reader.reader.read_data(data_file, mask_file=None, info_file=None, data_format='parquet', imodel=None, col_subset=None, encoding=None, delimiter=None, **kwargs)[source]¶
Read MDF data which is already on a pre-defined data model.
- Parameters:
data_file (
str) – The data file (including path) to be read.mask_file (
str, optional) – The validation file (including path) to be read.info_file (
str, optional) – The information file (including path) to be read.data_format (
{"csv", "parquet", "feather"}, default:"parquet") – Format of input data file(s).imodel (
str, optional) – Name of internally available input data model, e.g. icoads_r300_d704.col_subset (
str,tupleorlist, optional) – Specify the section or sections of the file to write.For multiple sections of the tables: e.g col_subset = [columns0,…,columnsN]
For a single section: e.g. list type object col_subset = [columns]
Column labels could be both string or tuple.
encoding (
str, optional) – The encoding of the input file. Overrides the value in the imodel schema file.delimiter (
str, optional) – The delimiter used in the input file. Overrides the value in the imodel schema file.**kwargs (
Any) – Key-word arguments that will be passed to read fuunction.
- Return type:
- Returns:
cdm_reader_mapper.DataBundle– DataBundle containing MDF data.
See also
readRead original marine-meteorological data as well as MDF data or CDM tables from disk.
read_mdfRead original marine-meteorological data from disk.
read_tablesRead CDM tables from disk.
writeWrite both MDF data or CDM tables to disk.
write_dataWrite MDF data and validation mask to disk.
write_tablesWrite CDM tables to disk.
- cdm_reader_mapper.mdf_reader.reader.read_mdf(source, imodel=None, ext_schema_path=None, ext_schema_file=None, ext_table_path=None, year_init=None, year_end=None, encoding=None, chunksize=None, skiprows=None, convert_flag=True, converter_dict=None, converter_kwargs=None, decode_flag=True, decoder_dict=None, validate_flag=True, sections=None, excludes=None, pd_kwargs=None, xr_kwargs=None)[source]¶
Read data files compliant with a user specific data model.
Reads a data file to a pandas DataFrame using a pre-defined data model. Read data is validates against its data model producing a boolean mask on output.
The data model needs to be input to the module as a named model (included in the module) or as the path to a valid data model.
- Parameters:
source (
str) – The file (including path) to be read.imodel (
str) – Name of internally available input data model, e.g. icoads_r300_d704.ext_schema_path (
strorPath-like, optional) – The path to the external input data model schema file. The schema file must have the same name as the directory. One ofimodelandext_schema_pathorext_schema_filemust be set.ext_schema_file (
strorPath-like, optional) – The external input data model schema file. One ofimodelandext_schema_pathorext_schema_filemust be set.ext_table_path (
strorPath-like, optional) – The path to the external table file. The table file must have the same name as the directory.year_init (
strorint, optional) – Left border of time axis.year_end (
strorint, optional) – Right border of time axis.encoding (
str, optional) – The encoding of the input file. Overrides the value in the imodel schema file.chunksize (
int, optional) – Number of reports per chunk.skiprows (
int, optional) – Number of initial rows to skip from file.convert_flag (
bool, default:True) – If True convert entries by using a pre-defined data model.converter_dict (
dictof{Hashable: func}, optional) – Functions for converting values in specific columns. If None use information from a pre-defined data model.converter_kwargs (
dictof{Hashable: kwargs}, optional) – Key-word arguments for converting values in specific columns. If None use information from a pre-defined data model.decode_flag (
bool, default:True) – If True decode entries by using a pre-defined data model.decoder_dict (
dictof{Hashable: func}, optional) – Functions for decoding values in specific columns. If None use information from a pre-defined data model.validate_flag (
bool, default:True) – Validate data entries by using a pre-defined data model.sections (
list, optional) – List with subset of data model sections to output. If None read pre-defined data model sections.excludes (
strorlistofstr, optional) – MDF Sections to exclude.pd_kwargs (
dict, optional) – Additional pandas arguments.xr_kwargs (
dict, optional) – Additional xarray arguments.
- Return type:
- Returns:
cdm_reader_mapper.DataBundle– DaaBundle containing MDF data.
See also
readRead either original marine-meteorological or MDF data or CDM tables from disk.
read_dataRead MDF data and validation mask from disk.
read_tablesRead CDM tables from disk.
writeWrite either MDF data or CDM tables to disk.
write_dataWrite MDF data and validation mask to disk.
write_tablesWrite CDM tables to disk.
- cdm_reader_mapper.mdf_reader.reader.validate_read_mdf_args(*, source, imodel=None, ext_schema_path=None, ext_schema_file=None, year_init=None, year_end=None, chunksize=None, skiprows=None)[source]¶
Validate arguments for reading an MDF file.
This function performs validation on file paths and numeric arguments required for reading an MDF dataset.
- Parameters:
source (
strorPath-like) – Source of input dataset.imodel (
str, optional) – Name of data model, e.g. icoads_r300_d721.ext_schema_path (
strorPath-like, optional) – Directory of external schema file.ext_schema_file (
strorPath-like, optional) – Path of external schema file.year_init (
int, optional) – Initial valid year.year_end (
int, optional) – End valid year.chunksize (
int, optional) – Number of lines to read from the file per chunk.skiprows (
int, optional) – Number of lines to skip at the start of the file.
- Raises:
FileNotFoundError – If the source file does not exist.
If one of imodel or ext_schema_path/ext_schema_file is not provided. - If chunksize is 0 or negative. - If skiprows is negative. - If year_init is greater than year_end. - If any input parameter does not match requested types.
- Return type:
cdm_reader_mapper.mdf_reader.writer module¶
Common Data Model (CDM) MDF writer.
- cdm_reader_mapper.mdf_reader.writer.write_data(data, mask=None, data_format='parquet', dtypes=None, parse_dates=False, encoding='utf-8', out_dir='.', prefix=None, suffix=None, extension=None, filename=None, separator='_', col_subset=None, delimiter=',', **kwargs)[source]¶
Write pandas.DataFrame to MDF file on file system.
- Parameters:
data (
pandas.DataFrameorIterable[pd.DataFrame]) – Data to export.mask (
pandas.DataFrameorIterable[pd.DataFrame], optional) – Validation mask to export.data_format (
{"csv", "parquet", "feather"}, default:"parquet") – Format of output data file(s).dtypes (
dict, optional) – Dictionary of data types on data. Dump dtypes and parse_dates to json information file.parse_dates (
list | bool, default:False) – Information of how to parse dates indata. Dump dtypes and parse_dates to json information file. For more information seepandas.read_csv().encoding (
str, default:"utf-8") – A string representing the encoding to use in the output file, defaults to utf-8.out_dir (
str, default:".") – Path to the output directory.prefix (
str, optional) – Prefix of file name structure: <prefix>-data-*<suffix>.<extension>.suffix (
str, optional) – Suffix of file name structure: <prefix>-data-*<suffix>.<extension>.extension (
str, optional) – Extension of file name structure: <prefix>-data-*<suffix>.<extension>. By default, extension depends on data_format.filename (
strordict, optional) – Name of the output file name(s). List one filename for both data and mask ({“data”:<filenameD>, “mask”:<filenameM>}). By default, automatically create file name from table name, prefix and suffix.separator (
str, optional) – Separator to join the file name pattern components (default “_”).col_subset (
str,tupleorlist, optional) – Specify the section or sections of the file to write.For multiple sections of the tables: e.g col_subset = [columns0,…,columnsN]
For a single section: e.g. list type object col_subset = [columns]
Column labels could be both string or tuple.
delimiter (
str, default:",") – Character or regex pattern to treat as the delimiter while reading with df.to_csv.**kwargs (
Any) – Additional keyword-arguments passed to to_csv when data_format is ‘csv’.
- Raises:
ValueError – If data_foramt is not one of ‘csv’, ‘parquet’ or ‘feather’. If type of data and type of mask do not match.
See also
writeWrite either MDF data or CDM tables to disk.
write_tablesWrite CDM tables to disk.
readRead either original marine-meteorological data or MDF data or CDM tables from disk.
read_dataRead MDF data and validation mask from disk.
read_mdfRead original marine-meteorological data from disk.
read_tablesRead CDM tables from disk.
Notes
Use this function after reading MDF data.
- Return type: