cdm_reader_mapper.cdm_mapper.utils package

Climate Data Model (CDM) mapper utilities.

Submodules

cdm_reader_mapper.cdm_mapper.utils.conversions module

Convert Common Datamodel (CDM) mapping table elements from/to string types.

class cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverter(converters, args=None)[source]

Bases: object

Base class for managing type conversion functions.

Parameters:
  • converters (dict) – Mapping of type names to conversion functions.

  • args (dict) – Optional mapping of type names to argument names for converters.

get_args(key)[source]

Retrieve the argument name associated with a converter type.

Parameters:

key (str) – The type name to look up.

Return type:

str | None

Returns:

str or None – The argument name if found, else None.

class cdm_reader_mapper.cdm_mapper.utils.conversions.ConvertFromStr[source]

Bases: cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverter

Converter class for converting string representations into Python types.

Provides default converters for integers, floats, timestamps, and strings, including array variants.

class cdm_reader_mapper.cdm_mapper.utils.conversions.ConvertToStr[source]

Bases: cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverter

Converter class for converting Python types to string representations.

Provides default converters for integers, floats, timestamps, and strings, including array variants. Supports optional arguments for certain types (e.g., decimal_places for numeric types).

cdm_reader_mapper.cdm_mapper.utils.conversions.convert_from_str_df(data, imodel, cdm_subset=None, null_label='null')[source]

Convert string-encoded values in a DataFrame to native pandas dtypes.

Parameters:
  • data (pd.DataFrame) – Input DataFrame containing string representations of values.

  • imodel (str) – Input data model identifier used to determine column types.

  • cdm_subset (str or list, optional) – Subset of CDM tables to process. If None, all tables are considered.

  • null_label (str or None, default "null") – Label representing null values in the input data.

Return type:

DataFrame

Returns:

pd.DataFrame – DataFrame with values converted from string representations to appropriate pandas dtypes.

cdm_reader_mapper.cdm_mapper.utils.conversions.convert_from_str_series(series, column_atts, null_label='null')[source]

Convert a Series of string values to a native pandas dtype.

Parameters:
  • series (pd.Series) – Input Series containing string representations of values.

  • column_atts (dict) – Dictionary defining column metadata, including the “data_type” used to select the appropriate converter.

  • null_label (str or None, default "null") – Label representing null values in the input data.

Return type:

Series

Returns:

pd.Series – Series with values converted to the appropriate pandas dtype.

cdm_reader_mapper.cdm_mapper.utils.conversions.convert_to_str_df(data, imodel, cdm_subset=None, null_label='null')[source]

Convert DataFrame values to string representations based on a data model.

Parameters:
  • data (pd.DataFrame) – Input DataFrame containing values to convert.

  • imodel (str) – Input data model identifier used to determine column types.

  • cdm_subset (str or list, optional) – Subset of CDM tables to process. If None, all tables are considered.

  • null_label (str or None, default "null") – Label to use for null or missing values in the output.

Return type:

DataFrame

Returns:

pd.DataFrame – DataFrame with values converted to string representations.

cdm_reader_mapper.cdm_mapper.utils.conversions.convert_to_str_series(series, column_atts, null_label='null')[source]

Convert a Series to string representations based on column metadata.

Parameters:
  • series (pd.Series) – Input Series containing values to convert.

  • column_atts (dict) – Dictionary defining column metadata, including the “data_type” used to select the appropriate converter.

  • null_label (str or None, default "null") – Label representing null values in the input data.

Return type:

Series

Returns:

pd.Series – Series with values converted to string representations.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions module

Common Data Model (CDM) mappings.

Created on Wed Apr 3 10:31:18 2019

imodel: imma1

Functions to map imodel elements to CDM elements

Main functions are those invoqued in the mappings files (table_name.json)

Main functions need to be part of class mapping_functions()

Main functions get:
  • 1 positional argument (pd.Series or pd.DataFrame with imodel data or imodel element name)

  • Optionally, keyword arguments

Main function return: pd.Series, np.array or scalars

Auxiliary functions can be used and defined in or outside class mapping_functions

@author: iregon

class cdm_reader_mapper.cdm_mapper.utils.mapping_functions.MappingFunctions(imodel)[source]

Bases: object

Class for mapping Common Data Model (CDM) elements from IMMA1, GDAC, ICOADS, C-RAID, MAROB, Pub47, and IMMT datasets.

Parameters:

imodel (str) – Name of the input data model, e.g icoads_r302_d992.

datetime_cmems(series, format='%Y-%m-%d %H:%M:%S')[source]

Convert CMEMS date strings to pandas datetime.

Parameters:
  • series (pd.Series) – Series of date strings.

  • format (str, optional) – Datetime format string (default: “%Y-%m-%d %H:%M:%S”).

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – DatetimeIndex of converted dates.

datetime_craid(series, format='%Y-%m-%d %H:%M:%S.%f')[source]

Convert C-RAID date strings to pandas datetime.

Parameters:
  • series (pd.Series) – Series of date strings.

  • format (str, optional) – Datetime format string (default: “%Y-%m-%d %H:%M:%S.%f”).

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – DatetimeIndex of converted dates.

datetime_decimalhour_to_hm(row)[source]

Convert a decimal hour to hours and minutes.

Parameters:

row (pd.Series) – A Series containing a decimal hour value at index 4.

Return type:

Series

Returns:

pd.Series – A Series with ‘HR’ (hour) and ‘M’ (minute).

datetime_imma1(df)[source]

Convert IMMA1 dataset to pandas datetime object.

Parameters:

df (pd.DataFrame) – IMMA1 dataset with columns for year, month, day, and decimal hour.

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – DatetimeIndex of converted timestamps.

datetime_imma1_701(df)[source]

Convert IMMA1 deck 701 dataset to pandas datetime object with UTC fallback.

Parameters:

df (pd.DataFrame) – IMMA1 deck 701 dataset with columns for date and time.

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – DatetimeIndex with converted timestamps.

datetime_imma1_to_utc(df)[source]

Convert to pandas datetime object for IMMA1 deck 701 format.

Set missing hour to 12 and use latitude and longitude information to convert local midday to UTC time.

Parameters:

df (pd.DataFrame) – IMMA1 deck 701 dataset containing year, month, day, latitude, and longitude.

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – DatetimeIndex with timestamps converted to UTC.

datetime_immt(df)[source]

Convert IMMT dataset to pandas datetime object.

Parameters:

df (pd.DataFrame) – IMMT dataset containing year, month, day, hour.

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – DatetimeIndex of converted timestamps.

datetime_marob(series, format='%Y-%m-%dT%H:%M:%S')[source]

Convert MAROB date strings to pandas datetime.

Parameters:
  • series (pd.Series) – Series of date strings.

  • format (str, optional) – Datetime format string (default: “%d.%m.%y %H:%M:%S).

Return type:

Series

Returns:

pd.Series – Series of converted dates.

datetime_utcnow(df)[source]

Return the current UTC datetime.

Parameters:

df (pd.DataFrame) – Ignored. Present for API consistency.

Return type:

datetime

Returns:

datetime.datetime – Current UTC datetime.

df_col_join(df, sep)[source]

Join all columns of a pandas DataFrame into a single Series of strings.

Parameters:
  • df (pd.DataFrame) – Input DataFrame.

  • sep (str) – Separator to use between column values.

Return type:

Series

Returns:

pd.Series – Series with joined string values from each row.

feet_to_m(series)[source]

Convert values from feet to meters.

Parameters:

series (pd.Series) – Series of values in feet.

Return type:

Series

Returns:

pd.Series – Series of values in meters, rounded to 2 decimals.

float_opposite(series)[source]

Return the opposite (negation) of a numeric Series.

Parameters:

series (pd.Series) – Input numeric Series.

Return type:

Series

Returns:

pd.Series – Series with negated values.

float_scale(series, factor=1)[source]

Multiply a numeric Series by a scale factor.

Parameters:
  • series (pd.Series) – Numeric Series to scale.

  • factor (float, default 1) – Scale factor to multiply by.

Return type:

Series

Returns:

pd.Series – Scaled Series, or empty float Series if input is non-numeric.

gdac_latitude(df)[source]

Adjust latitude sign based on quadrant.

Parameters:

df (pd.DataFrame) – Input DataFrame with columns ‘Qc’ and ‘LaLaLa’.

Return type:

Series

Returns:

pd.Series – Series of latitude values with adjusted sign.

Raises:

KeyError – If required columns are missing.

gdac_longitude(df)[source]

Adjust longitude sign based on quadrant.

Parameters:

df (pd.DataFrame) – Input DataFrame with columns ‘Qc’ and ‘LoLoLoLo’.

Return type:

Series

Returns:

pd.Series – Series of longitude values with adjusted sign.

Raises:

KeyError – If required columns are missing.

gdac_pressure(df)[source]

Decode or re-encode the non-standard pressure representation used by IMMT.

IMMT stores pressure as a scaled integer with an implicit offset: values below 1_000 represent readings above 1_000 hPa (e.g. raw 0025 → 1_002.5 hPa after adding 10_000 and multiplying by 0.1). Values ≥ 1_000 need only the scale factor applied.

Parameters:

df (pd.DataFrame) – Input DataFrame with column ‘PPPP’.

Return type:

Series

Returns:

pd.Series – Series of converted pressure values.

Raises:

KeyError – If required columns are missing.

gdac_uid(df, prepend='', append='')[source]

Generate a unique UID based on timestamp and ship’s callsign (ID).

Parameters:
  • df (pd.DataFrame) – Input DataFrame with columns ‘AAAA’, ‘MM’, ‘YY’, ‘GG’.

  • prepend (str, default "") – String to prepend to UID.

  • append (str, default "") – String to append to UID.

Return type:

Series

Returns:

pd.Series – Series of generated unique IDs.

icoads_wd_conversion(series)[source]

Convert ICOADS wind direction codes.

Codes 361 -> 0, 362 -> NaN.

Parameters:

series (pd.Series) – Input ICOADS wind direction Series.

Return type:

Series

Returns:

pd.Series – Converted wind direction Series.

icoads_wd_integer_to_float(series)[source]

Convert ICOADS wind direction integer Series to float, applying conversion rules.

Parameters:

series (pd.Series) – ICOADS wind direction integer Series.

Return type:

Series

Returns:

pd.Series – Float wind direction Series.

integer_to_float(s)[source]

Convert a numeric or integer Series to float. Non-numeric Series returns empty float Series.

Parameters:

s (pd.Series) – Input Series.

Return type:

Series

Returns:

pd.Series – Float Series.

Raises:

TypeError – If input is not a pandas Series.

lineage(df)[source]

Get the lineage string for a dataset, combining timestamp and model lineage.

Parameters:

df (pd.DataFrame) – Input dataset (used for context, not data manipulation).

Return type:

str

Returns:

str – Lineage string including timestamp and imodel entry.

location_accuracy(df)[source]

Compute location accuracy based on two columns (li_array, lat_array).

Parameters:

df (pd.DataFrame) – Input DataFrame with at least two columns.

Return type:

Series

Returns:

pd.Series – Series of location accuracy values.

longitude_360to180(series)[source]

Convert longitudes from 0-360 to -180 to 180 range.

Parameters:

series (pd.Series) – Input longitude Series.

Return type:

Series

Returns:

pd.Series – Converted longitude Series.

observing_programme(series)[source]

Map observing programme codes to lists.

Parameters:

series (pd.Series) – Series of programme codes (string or int).

Return type:

Series

Returns:

pd.Series – Series of mapped observing programme lists.

pressue_hpa_in_pa(series)[source]

Convert pressure from hPa in Pa.

Parameters:

series (pd.Series) – Series of presuure in hPa.

Return type:

Series

Returns:

pd.Series – Series of pressure in Pa.

select_column(df)[source]

Select the last column with non-null values, prioritizing the rightmost column.

Parameters:

df (pd.DataFrame) – Input DataFrame.

Return type:

Series

Returns:

pd.Series – Series with selected column values.

string_add(series, prepend='', append='', separator='')[source]

Add strings to Series elements with optional zero-fill.

Parameters:
  • series (pd.Series) – Series to modify.

  • prepend (str, default "") – String to prepend.

  • append (str, default "") – String to append.

  • separator (str, default "") – Separator between series values.

Return type:

Series

Returns:

pd.Series – Series with modified string values.

string_join_add(df, prepend=None, append=None, separator='', zfill_col=None, zfill=None)[source]

Join DataFrame columns into a single string and optionally prepend/append strings.

Parameters:
  • df (pd.DataFrame) – Input DataFrame with string or numeric columns.

  • prepend (str or None, optional) – String to prepend to each joined value, by default None.

  • append (str or None, optional) – String to append to each joined value, by default None.

  • separator (str, default "") – Separator to use when joining columns.

  • zfill_col (list, optional) – List of column indices to apply zero-fill.

  • zfill (list, optional) – List of widths for zero-fill, corresponding to zfill_col.

Return type:

Series

Returns:

pd.Series – Series of joined and modified strings.

temperature_celsius_to_kelvin(df)[source]

Convert temperatures from Celsius to Kelvin using the model-specific method.

Parameters:

df (pd.DataFrame) – Input DataFrame with temperature data.

Return type:

Series

Returns:

pd.Series – Series of temperatures in Kelvin.

time_accuracy(series)[source]

Map time accuracy codes to seconds.

Parameters:

series (pd.Series) – Series of time accuracy codes as strings.

Return type:

Series

Returns:

pd.Series – Series with time accuracy in seconds.

velocity_kmh_in_ms(series)[source]

Convert velocity from kilometers per hour to meters per second.

Parameters:

series (pd.Series) – Series of velocity in kilometers per hour.

Return type:

Series

Returns:

pd.Series – Series of velocity in meters per second.

velocity_kn_in_ms(series)[source]

Convert velocity from knots in meters per second.

Parameters:

series (pd.Series) – Series of velocity in kilometers per hour.

Return type:

Series

Returns:

pd.Series – Series of velocity in meters per second.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.convert_to_str(a)[source]

Convert a value to string.

Parameters:

a (str or None) – Input value.

Return type:

str | None

Returns:

str or None – Converted string or None if input is None or empty.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.convert_to_utc_i(date, zone)[source]

Convert a pandas datetime series from local timezone to UTC.

Parameters:
  • date (pd.Series) – Datetime series.

  • zone (str) – Timezone string.

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – Datetime series converted to UTC.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.coord_360_to_180i(lon)[source]

Convert longitude from 0-360 to -180 to 180 degrees.

Parameters:

lon (float) – Longitude in degrees (0-360).

Return type:

float

Returns:

float – Longitude in decimal degrees (-180 to 180).

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.coord_dmh_to_90i(deg, min, hemis)[source]

Convert latitude from degrees, minutes, hemisphere to decimal degrees.

Parameters:
  • deg (float) – Degrees.

  • min (float) – Minutes (0 <= min < 60).

  • hemis (str) – Hemisphere, “N” or “S”.

Return type:

float

Returns:

float – Latitude in decimal degrees (-90 to 90).

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.find_entry(imodel, d)[source]

Find entry in a dictionary, handling imodel suffix stripping.

Parameters:
  • imodel (str or None) – Imodel element name.

  • d (dict) – Dictionary to search.

Return type:

str | None

Returns:

str or None – Corresponding value if found, otherwise None.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.location_accuracy_i(li, lat)[source]

Compute approximate location accuracy in km based on ICOADS code.

Parameters:
  • li (int or float) – Location index code.

  • lat (float) – Latitude.

Return type:

float

Returns:

float – Location accuracy in km.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.longitude_360to180_i(lon)[source]

Convert longitude from 0-360 to -180 to 180 degrees.

Parameters:

lon (float) – Longitude in degrees.

Return type:

float

Returns:

float – Longitude in decimal degrees (-180 to 180).

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.series_strptime(series, format)[source]

Convert series with strings to series with datetime.

Parameters:
  • series (pd.Series) – Series with strings.

  • format (str) – String time format.

Return type:

Series

Returns:

pd.Series – Series with datetime.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.string_add_i(a, b, c, sep)[source]

Concatenate strings a, b, c with separator, ignoring None values.

Parameters:
  • a, b, c (any) – Input values.

  • sep (str) – Separator string.

Return type:

str | None

Returns:

str or None – Concatenated string.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.time_zone_i(lat, lon)[source]

Get timezone for latitude and longitude.

Parameters:
  • lat (float) – Latitude (-90 to 90).

  • lon (float) – Longitude (-180 to 180).

Return type:

str | None

Returns:

str or None – Timezone name if available, otherwise None.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.to_int(value)[source]

Convert a value to integer, return pd.NA for invalid input.

Parameters:

value (any) – Input value.

Return type:

int | pd.NA

Returns:

int or pd.NA – Converted integer or NA if invalid.

cdm_reader_mapper.cdm_mapper.utils.utilities module

Utility function for reading and writing CDM tables.

cdm_reader_mapper.cdm_mapper.utils.utilities.adjust_filename(filename, table='', extension='psv')[source]

Adjust a filename by optionally prepending a table name and appending an extension.

Parameters:
  • filename (str) – Original filename.

  • table (str, optional) – Table name to prepend if not already present in the filename (default is “”).

  • extension (str, optional) – File extension to append if not already present (default is “psv”).

Return type:

str

Returns:

str – Adjusted filename with optional table prefix and file extension.

Notes

  1. If table is not already part of the filename, it will be prepended with a dash.

  2. If the filename does not contain an extension (no ‘.’), the specified extension is appended. Default extension is ‘psv’.

Examples

>>> adjust_filename("data", table="header")
'header-data.psv'
>>> adjust_filename("header-data.psv", table="header")
'header-data.psv'
>>> adjust_filename("data.txt", table="header")
'header-data.txt'
cdm_reader_mapper.cdm_mapper.utils.utilities.dict_to_tuple_list(dic)[source]

Convert a dictionary with scalar or list values into a list of (key, value) tuples.

If a value is a list, each item in the list will produce its own tuple. If a value is a scalar, a single tuple is produced.

Parameters:

dic (dict) – Dictionary containing keys and values. Values may be scalars or lists.

Return type:

list[tuple[Any, Any]]

Returns:

list of tuple – List of (key, value) tuples. If a dictionary value is a list, each list item becomes a separate tuple.

Examples

>>> dict_to_tuple_list({"A": [1, 2], "B": 3})
[('A', 1), ('A', 2), ('B', 3)]
cdm_reader_mapper.cdm_mapper.utils.utilities.get_cdm_subset(cdm_subset)[source]

Normalize and validate a CDM subset specification.

This function ensures that the returned value is always a list of valid CDM table names (as defined in properties.cdm_tables). It accepts:

  • None returns the full list of CDM tables.

  • A single string validated and returned as a one-element list.

  • An iterable of strings each entry is validated and returned unchanged.

Parameters:

cdm_subset (str, iterable of str or None) – CDM subset input to normalize. May be: - None: full list of CDM tables is returned. - str: returned as a list containing that string. - Any iterable (e.g., list) of strings: returned unchanged after validation.

Return type:

list[str]

Returns:

list of str – A list of CDM table names that are guaranteed to exist in properties.cdm_tables.

Raises:

ValueError – If any provided table name is not in properties.cdm_tables.

cdm_reader_mapper.cdm_mapper.utils.utilities.get_usecols(tb, col_subset)[source]

Normalize a column subset specification for use with pandas.read_csv.

This function converts various forms of column subset input into a standardized list of column names suitable for the usecols argument in pandas.read_csv.

Parameters:
  • tb (str) – Table name. Only used if col_subset is a dictionary.

  • col_subset (str, iterable of str, dict, or None) – Column subset specification. Acceptable formats: - A single column name as a string. - An iterable of column names (list, tuple, set, etc.). - A dictionary mapping table names to column lists. - None (read all columns).

Return type:

list[str] | None

Returns:

list of str or None – Normalized list of column names suitable for pandas usecols, or None if no restriction is applied.

Raises:

TypeError – If col_subset is not a string, iterable, dict, or None.

Notes

  1. If col_subset is a string, it is returned as a single-element list.

  2. If col_subset is an iterable of strings (e.g., list, tuple, set), it is converted to a list.

  3. If col_subset is a dictionary, it is interpreted as a mapping {table_name: list_of_columns} and returns the entry corresponding to the given table tb (or None if missing).

  4. If col_subset is None, the function returns None, meaning all columns should be read.