cdm_reader_mapper.read_tables#

cdm_reader_mapper.read_tables(source, data_format='parquet', prefix=None, suffix=None, extension=None, separator='-', cdm_subset=None, col_subset=None, delimiter='|', na_values=None, null_label='null', imodel=None, from_str=None, to_str=None, **kwargs)[source]#

Read CDM-table-like files from file system to a pandas.DataFrame.

Parameters:

source (str) – The file (including path) or the path to the file(s) to be read.
data_format ({"csv", "parquet", "feather"}, default: "parquet") – Format of input data file(s).
prefix (str, optional) – Prefix of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path.
suffix (str, optional) – Suffix of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path.
extension (str, optional) – Extension of file name structure: <prefix>-<table>-*<suffix>.<extension>. Could de used if source is a valid directory path. Default: “psv”
separator (str, optional) – Separator to join the file name pattern components. Default: “-”
cdm_subset (str or list, optional) – Specifies a subset of tables or a single table.
- For multiple subsets of tables: This function returns a pandas.DataFrame that is multi-index at the columns, with (table-name, field) as column names. Tables are merged via the report_id field.
- For a single table: This function returns a pandas.DataFrame with a simple indexing for the columns.
Required if source is a valid file name.
col_subset (str, list or dict, optional) – Specify the section or sections of the file to read.
- For multiple sections of the tables: e.g col_subset = {table0:[columns0],…tableN:[columnsN]}
- For a single section: e.g. list type object col_subset = [columns] This variable assumes that the column names are all conform to the cdm field names.
delimiter (str) – Character or regex pattern to treat as the delimiter while reading with pandas.read_csv. Default: ‘|’
na_values (hashable, iterable of hashable or dict of {Hashable: Iterable}, optional) – Additional strings to recognize as Na/NaN while reading input file with pandas.read_csv. For more details see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
null_label (str) – String how to label non valid values in data. Default: null

Return type:

DataBundle

Returns:

cdm_reader_mapper.DataBundle

cdm_reader_mapper.read_tables

Contents

cdm_reader_mapper.read_tables#