cdm_reader_mapper.read_tables

Contents

cdm_reader_mapper.read_tables#

cdm_reader_mapper.read_tables(source, data_format='parquet', prefix=None, suffix=None, extension=None, separator='-', cdm_subset=None, col_subset=None, delimiter='|', na_values=None, null_label='null', imodel=None, from_str=None, to_str=None, **kwargs)[source]#

Read CDM-table-like files from file system to a pandas.DataFrame.

Parameters:
  • source (str) – The file (including path) or the path to the file(s) to be read.

  • data_format ({"csv", "parquet", "feather"}, default: "parquet") – Format of input data file(s).

  • prefix (str, optional) – Prefix of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path.

  • suffix (str, optional) – Suffix of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path.

  • extension (str, optional) – Extension of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path. Default: “psv”

  • separator (str, optional) – Separator to join the file name pattern components. Default: “-”

  • cdm_subset (str or list, optional) – Specifies a subset of tables or a single table.

    • For multiple subsets of tables: This function returns a pandas.DataFrame that is multi-index at the columns, with (table-name, field) as column names. Tables are merged via the report_id field.

    • For a single table: This function returns a pandas.DataFrame with a simple indexing for the columns.

    Required if source is a valid file name.

  • col_subset (str, list or dict, optional) – Specify the section or sections of the file to read.

    • For multiple sections of the tables: e.g col_subset = {table0:[columns0],…tableN:[columnsN]}

    • For a single section: e.g. list type object col_subset = [columns] This variable assumes that the column names are all conform to the cdm field names.

  • delimiter (str) – Character or regex pattern to treat as the delimiter while reading with pandas.read_csv. Default: ‘|’

  • na_values (hashable, iterable of hashable or dict of {Hashable: Iterable}, optional) – Additional strings to recognize as Na/NaN while reading input file with pandas.read_csv. For more details see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

  • null_label (str) – String how to label non valid values in data. Default: null

Return type:

DataBundle

Returns:

cdm_reader_mapper.DataBundle

See also

read

Read either original marine-meteorological data or MDF data or CDM tables from disk.

read_data

Read MDF data and validation mask from disk.

read_mdf

Read original marine-meteorological data from disk.

write

Write either MDF data or CDM tables to disk.

write_tables

Write CDM tables to disk.

write_data

Write MDF data and validation mask to disk.