cdm_reader_mapper.cdm_mapper.utils package¶

Climate Data Model (CDM) mapper utilities.

Submodules¶

cdm_reader_mapper.cdm_mapper.utils.conversions module¶

Convert Common Datamodel (CDM) mapping table elements from/to string types.

class cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverter(converters, args=None)[source]¶

Bases: object

Base class for managing type conversion functions.

Parameters:

converters (dict) – Mapping of type names to conversion functions.
args (dict) – Optional mapping of type names to argument names for converters.

get_args(key)[source]¶

Retrieve the argument name associated with a converter type.

Parameters:: key (str) – The type name to look up.
Return type:: str | None
Returns:: str or None – The argument name if found, else None.

class cdm_reader_mapper.cdm_mapper.utils.conversions.ConvertFromStr[source]¶

Bases: cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverter

Converter class for converting string representations into Python types.

Provides default converters for integers, floats, timestamps, and strings, including array variants.

class cdm_reader_mapper.cdm_mapper.utils.conversions.ConvertToStr[source]¶

Bases: cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverter

Converter class for converting Python types to string representations.

Provides default converters for integers, floats, timestamps, and strings, including array variants. Supports optional arguments for certain types (e.g., decimal_places for numeric types).

cdm_reader_mapper.cdm_mapper.utils.conversions.convert_from_str_df(data, imodel, cdm_subset=None, null_label='null')[source]¶

Convert string-encoded values in a DataFrame to native pandas dtypes.

Parameters:

data (pd.DataFrame) – Input DataFrame containing string representations of values.
imodel (str) – Input data model identifier used to determine column types.
cdm_subset (str or list, optional) – Subset of CDM tables to process. If None, all tables are considered.
null_label (str or None, default "null") – Label representing null values in the input data.

Return type:

DataFrame

Returns:

pd.DataFrame – DataFrame with values converted from string representations to appropriate pandas dtypes.

cdm_reader_mapper.cdm_mapper.utils.conversions.convert_from_str_series(series, column_atts, null_label='null')[source]¶

Convert a Series of string values to a native pandas dtype.

Parameters:

series (pd.Series) – Input Series containing string representations of values.
column_atts (dict) – Dictionary defining column metadata, including the “data_type” used to select the appropriate converter.
null_label (str or None, default "null") – Label representing null values in the input data.

Return type:

Series

Returns:

pd.Series – Series with values converted to the appropriate pandas dtype.

cdm_reader_mapper.cdm_mapper.utils.conversions.convert_to_str_df(data, imodel, cdm_subset=None, null_label='null')[source]¶

Convert DataFrame values to string representations based on a data model.

Parameters:

data (pd.DataFrame) – Input DataFrame containing values to convert.
imodel (str) – Input data model identifier used to determine column types.
cdm_subset (str or list, optional) – Subset of CDM tables to process. If None, all tables are considered.
null_label (str or None, default "null") – Label to use for null or missing values in the output.

Return type:

DataFrame

Returns:

pd.DataFrame – DataFrame with values converted to string representations.

cdm_reader_mapper.cdm_mapper.utils.conversions.convert_to_str_series(series, column_atts, null_label='null')[source]¶

Convert a Series to string representations based on column metadata.

Parameters:

series (pd.Series) – Input Series containing values to convert.
column_atts (dict) – Dictionary defining column metadata, including the “data_type” used to select the appropriate converter.
null_label (str or None, default "null") – Label representing null values in the input data.

Return type:

Series

Returns:

pd.Series – Series with values converted to string representations.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions module¶

Common Data Model (CDM) mappings.

Created on Wed Apr 3 10:31:18 2019

imodel: imma1

Functions to map imodel elements to CDM elements

Main functions are those invoqued in the mappings files (table_name.json)

Main functions need to be part of class mapping_functions()

Main functions get:

1 positional argument (pd.Series or pd.DataFrame with imodel data or imodel element name)
Optionally, keyword arguments

Main function return: pd.Series, np.array or scalars

Auxiliary functions can be used and defined in or outside class mapping_functions

@author: iregon

class cdm_reader_mapper.cdm_mapper.utils.mapping_functions.MappingFunctions(imodel)[source]¶

Bases: object

Class for mapping Common Data Model (CDM) elements from IMMA1, GDAC, ICOADS, C-RAID, MAROB, Pub47, and IMMT datasets.

Parameters:: imodel (str) – Name of the input data model, e.g icoads_r302_d992.

datetime_cmems(series, format='%Y-%m-%d %H:%M:%S')[source]¶

Convert CMEMS date strings to pandas datetime.

Parameters:

series (pd.Series) – Series of date strings.
format (str, optional) – Datetime format string (default: “%Y-%m-%d %H:%M:%S”).

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – DatetimeIndex of converted dates.

datetime_craid(series, format='%Y-%m-%d %H:%M:%S.%f')[source]¶

Convert C-RAID date strings to pandas datetime.

Parameters:

series (pd.Series) – Series of date strings.
format (str, optional) – Datetime format string (default: “%Y-%m-%d %H:%M:%S.%f”).

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – DatetimeIndex of converted dates.

datetime_decimalhour_to_hm(row)[source]¶

Convert a decimal hour to hours and minutes.

Parameters:: row (pd.Series) – A Series containing a decimal hour value at index 4.
Return type:: Series
Returns:: pd.Series – A Series with ‘HR’ (hour) and ‘M’ (minute).

datetime_imma1(df)[source]¶

Convert IMMA1 dataset to pandas datetime object.

Parameters:: df (pd.DataFrame) – IMMA1 dataset with columns for year, month, day, and decimal hour.
Return type:: DatetimeIndex
Returns:: pd.DatetimeIndex – DatetimeIndex of converted timestamps.

datetime_imma1_701(df)[source]¶

Convert IMMA1 deck 701 dataset to pandas datetime object with UTC fallback.

Parameters:: df (pd.DataFrame) – IMMA1 deck 701 dataset with columns for date and time.
Return type:: DatetimeIndex
Returns:: pd.DatetimeIndex – DatetimeIndex with converted timestamps.

datetime_imma1_to_utc(df)[source]¶

Convert to pandas datetime object to UTC time.

Set missing hour to 12 and use latitude and longitude information to convert local midday to UTC time.

Parameters:: df (pd.DataFrame) – IMMA1 dataset containing year, month, day, latitude, and longitude.
Return type:: DatetimeIndex
Returns:: pd.DatetimeIndex – DatetimeIndex with timestamps converted to UTC.

datetime_immt(df)[source]¶

Convert IMMT dataset to pandas datetime object.

Parameters:: df (pd.DataFrame) – IMMT dataset containing year, month, day, hour.
Return type:: DatetimeIndex
Returns:: pd.DatetimeIndex – DatetimeIndex of converted timestamps.

datetime_marob(series, format='%Y-%m-%dT%H:%M:%S')[source]¶

Convert MAROB date strings to pandas datetime.

Parameters:

series (pd.Series) – Series of date strings.
format (str, optional) – Datetime format string (default: “%d.%m.%y %H:%M:%S).

Return type:

Series

Returns:

pd.Series – Series of converted dates.

datetime_utcnow(df)[source]¶

Return the current UTC datetime.

Parameters:: df (pd.DataFrame) – Ignored. Present for API consistency.
Return type:: datetime
Returns:: datetime.datetime – Current UTC datetime.

df_col_join(df, sep)[source]¶

Join all columns of a pandas DataFrame into a single Series of strings.

Parameters:

df (pd.DataFrame) – Input DataFrame.
sep (str) – Separator to use between column values.

Return type:

Series

Returns:

pd.Series – Series with joined string values from each row.

feet_to_m(series)[source]¶

Convert values from feet to meters.

Parameters:: series (pd.Series) – Series of values in feet.
Return type:: Series
Returns:: pd.Series – Series of values in meters, rounded to 2 decimals.

float_opposite(series)[source]¶

Return the opposite (negation) of a numeric Series.

Parameters:: series (pd.Series) – Input numeric Series.
Return type:: Series
Returns:: pd.Series – Series with negated values.

float_scale(series, factor=1)[source]¶

Multiply a numeric Series by a scale factor.

Parameters:

series (pd.Series) – Numeric Series to scale.
factor (float, default 1) – Scale factor to multiply by.

Return type:

Series

Returns:

pd.Series – Scaled Series, or empty float Series if input is non-numeric.

gdac_latitude(df)[source]¶

Adjust latitude sign based on quadrant.

Parameters:: df (pd.DataFrame) – Input DataFrame with columns ‘Qc’ and ‘LaLaLa’.
Return type:: Series
Returns:: pd.Series – Series of latitude values with adjusted sign.
Raises:: KeyError – If required columns are missing.

gdac_longitude(df)[source]¶

Adjust longitude sign based on quadrant.

Parameters:: df (pd.DataFrame) – Input DataFrame with columns ‘Qc’ and ‘LoLoLoLo’.
Return type:: Series
Returns:: pd.Series – Series of longitude values with adjusted sign.
Raises:: KeyError – If required columns are missing.

gdac_pressure(df)[source]¶

Decode or re-encode the non-standard pressure representation used by IMMT.

IMMT stores pressure as a scaled integer with an implicit offset: values below 1_000 represent readings above 1_000 hPa (e.g. raw 0025 → 1_002.5 hPa after adding 10_000 and multiplying by 0.1). Values ≥ 1_000 need only the scale factor applied.

Parameters:: df (pd.DataFrame) – Input DataFrame with column ‘PPPP’.
Return type:: Series
Returns:: pd.Series – Series of converted pressure values.
Raises:: KeyError – If required columns are missing.

gdac_uid(df, prepend='', append='')[source]¶

Generate a unique UID based on timestamp and ship’s callsign (ID).

Parameters:

df (pd.DataFrame) – Input DataFrame with columns ‘AAAA’, ‘MM’, ‘YY’, ‘GG’.
prepend (str, default "") – String to prepend to UID.
append (str, default "") – String to append to UID.

Return type:

Series

Returns:

pd.Series – Series of generated unique IDs.

icoads_wd_conversion(series)[source]¶

Convert ICOADS wind direction codes.

Codes 361 -> 0, 362 -> NaN.

Parameters:: series (pd.Series) – Input ICOADS wind direction Series.
Return type:: Series
Returns:: pd.Series – Converted wind direction Series.

icoads_wd_integer_to_float(series)[source]¶

Convert ICOADS wind direction integer Series to float, applying conversion rules.

Parameters:: series (pd.Series) – ICOADS wind direction integer Series.
Return type:: Series
Returns:: pd.Series – Float wind direction Series.

integer_to_float(s)[source]¶

Convert a numeric or integer Series to float. Non-numeric Series returns empty float Series.

Parameters:: s (pd.Series) – Input Series.
Return type:: Series
Returns:: pd.Series – Float Series.
Raises:: TypeError – If input is not a pandas Series.

lineage(df)[source]¶

Get the lineage string for a dataset, combining timestamp and model lineage.

Parameters:: df (pd.DataFrame) – Input dataset (used for context, not data manipulation).
Return type:: str
Returns:: str – Lineage string including timestamp and imodel entry.

location_accuracy(df)[source]¶

Compute location accuracy based on two columns (li_array, lat_array).

Parameters:: df (pd.DataFrame) – Input DataFrame with at least two columns.
Return type:: Series
Returns:: pd.Series – Series of location accuracy values.

longitude_360to180(series)[source]¶

Convert longitudes from 0-360 to -180 to 180 range.

Parameters:: series (pd.Series) – Input longitude Series.
Return type:: Series
Returns:: pd.Series – Converted longitude Series.

observing_programme(series)[source]¶

Map observing programme codes to lists.

Parameters:: series (pd.Series) – Series of programme codes (string or int).
Return type:: Series
Returns:: pd.Series – Series of mapped observing programme lists.

pressue_hpa_in_pa(series)[source]¶

Convert pressure from hPa in Pa.

Parameters:: series (pd.Series) – Series of presuure in hPa.
Return type:: Series
Returns:: pd.Series – Series of pressure in Pa.

select_column(df)[source]¶

Select the last column with non-null values, prioritizing the rightmost column.

Parameters:: df (pd.DataFrame) – Input DataFrame.
Return type:: Series
Returns:: pd.Series – Series with selected column values.

string_add(series, prepend='', append='', separator='')[source]¶

Add strings to Series elements with optional zero-fill.

Parameters:

series (pd.Series) – Series to modify.
prepend (str, default "") – String to prepend.
append (str, default "") – String to append.
separator (str, default "") – Separator between series values.

Return type:

Series

Returns:

pd.Series – Series with modified string values.

string_join_add(df, prepend=None, append=None, separator='', zfill_col=None, zfill=None)[source]¶

Join DataFrame columns into a single string and optionally prepend/append strings.

Parameters:

df (pd.DataFrame) – Input DataFrame with string or numeric columns.
prepend (str or None, optional) – String to prepend to each joined value, by default None.
append (str or None, optional) – String to append to each joined value, by default None.
separator (str, default "") – Separator to use when joining columns.
zfill_col (list, optional) – List of column indices to apply zero-fill.
zfill (list, optional) – List of widths for zero-fill, corresponding to zfill_col.

Return type:

Series

Returns:

pd.Series – Series of joined and modified strings.

temperature_celsius_to_kelvin(df)[source]¶

Convert temperatures from Celsius to Kelvin using the model-specific method.

Parameters:: df (pd.DataFrame) – Input DataFrame with temperature data.
Return type:: Series
Returns:: pd.Series – Series of temperatures in Kelvin.

time_accuracy(series)[source]¶

Map time accuracy codes to seconds.

Parameters:: series (pd.Series) – Series of time accuracy codes as strings.
Return type:: Series
Returns:: pd.Series – Series with time accuracy in seconds.

velocity_kmh_in_ms(series)[source]¶

Convert velocity from kilometers per hour to meters per second.

Parameters:: series (pd.Series) – Series of velocity in kilometers per hour.
Return type:: Series
Returns:: pd.Series – Series of velocity in meters per second.

velocity_kn_in_ms(series)[source]¶

Convert velocity from knots in meters per second.

Parameters:: series (pd.Series) – Series of velocity in kilometers per hour.
Return type:: Series
Returns:: pd.Series – Series of velocity in meters per second.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.convert_to_str(a)[source]¶

Convert a value to string.

Parameters:: a (str or None) – Input value.
Return type:: str | None
Returns:: str or None – Converted string or None if input is None or empty.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.convert_to_utc_i(date, zone)[source]¶

Convert a pandas datetime series from local timezone to UTC.

Parameters:

date (pd.Series) – Datetime series.
zone (str) – Timezone string.

Return type:

DatetimeIndex

Returns:

pd.DatetimeIndex – Datetime series converted to UTC.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.coord_360_to_180i(lon)[source]¶

Convert longitude from 0-360 to -180 to 180 degrees.

Parameters:: lon (float) – Longitude in degrees (0-360).
Return type:: float
Returns:: float – Longitude in decimal degrees (-180 to 180).

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.coord_dmh_to_90i(deg, min, hemis)[source]¶

Convert latitude from degrees, minutes, hemisphere to decimal degrees.

Parameters:

deg (float) – Degrees.
min (float) – Minutes (0 <= min < 60).
hemis (str) – Hemisphere, “N” or “S”.

Return type:

float

Returns:

float – Latitude in decimal degrees (-90 to 90).

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.find_entry(imodel, d)[source]¶

Find entry in a dictionary, handling imodel suffix stripping.

Parameters:

imodel (str or None) – Imodel element name.
d (dict) – Dictionary to search.

Return type:

str | None

Returns:

str or None – Corresponding value if found, otherwise None.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.location_accuracy_i(li, lat)[source]¶

Compute approximate location accuracy in km based on ICOADS code.

Parameters:

li (int or float) – Location index code.
lat (float) – Latitude.

Return type:

float

Returns:

float – Location accuracy in km.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.longitude_360to180_i(lon)[source]¶

Convert longitude from 0-360 to -180 to 180 degrees.

Parameters:: lon (float) – Longitude in degrees.
Return type:: float
Returns:: float – Longitude in decimal degrees (-180 to 180).

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.series_strptime(series, format)[source]¶

Convert series with strings to series with datetime.

Parameters:

series (pd.Series) – Series with strings.
format (str) – String time format.

Return type:

Series

Returns:

pd.Series – Series with datetime.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.string_add_i(a, b, c, sep)[source]¶

Concatenate strings a, b, c with separator, ignoring None values.

Parameters:

a, b, c (any) – Input values.
sep (str) – Separator string.

Return type:

str | None

Returns:

str or None – Concatenated string.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.time_zone_i(lat, lon)[source]¶

Get timezone for latitude and longitude.

Parameters:

lat (float) – Latitude (-90 to 90).
lon (float) – Longitude (-180 to 180).

Return type:

str | None

Returns:

str or None – Timezone name if available, otherwise None.

cdm_reader_mapper.cdm_mapper.utils.mapping_functions.to_int(value)[source]¶

Convert a value to integer, return pd.NA for invalid input.

Parameters:: value (any) – Input value.
Return type:: int | pd.NA
Returns:: int or pd.NA – Converted integer or NA if invalid.

cdm_reader_mapper.cdm_mapper.utils.utilities module¶

Utility function for reading and writing CDM tables.

cdm_reader_mapper.cdm_mapper.utils.utilities.adjust_filename(filename, table='', extension='psv')[source]¶

Adjust a filename by optionally prepending a table name and appending an extension.

Parameters:

filename (str) – Original filename.
table (str, optional) – Table name to prepend if not already present in the filename (default is “”).
extension (str, optional) – File extension to append if not already present (default is “psv”).

Return type:

str

Returns:

str – Adjusted filename with optional table prefix and file extension.

Notes

If table is not already part of the filename, it will be prepended with a dash.
If the filename does not contain an extension (no ‘.’), the specified extension is appended. Default extension is ‘psv’.

Examples

>>> adjust_filename("data", table="header")
'header-data.psv'

>>> adjust_filename("header-data.psv", table="header")
'header-data.psv'

>>> adjust_filename("data.txt", table="header")
'header-data.txt'

cdm_reader_mapper.cdm_mapper.utils.utilities.dict_to_tuple_list(dic)[source]¶

Convert a dictionary with scalar or list values into a list of (key, value) tuples.

If a value is a list, each item in the list will produce its own tuple. If a value is a scalar, a single tuple is produced.

Parameters:: dic (dict) – Dictionary containing keys and values. Values may be scalars or lists.
Return type:: list[tuple[Any, Any]]
Returns:: list of tuple – List of (key, value) tuples. If a dictionary value is a list, each list item becomes a separate tuple.

Examples

>>> dict_to_tuple_list({"A": [1, 2], "B": 3})
[('A', 1), ('A', 2), ('B', 3)]

cdm_reader_mapper.cdm_mapper.utils.utilities.get_cdm_subset(cdm_subset)[source]¶

Normalize and validate a CDM subset specification.

This function ensures that the returned value is always a list of valid CDM table names (as defined in properties.cdm_tables). It accepts:

None returns the full list of CDM tables.
A single string validated and returned as a one-element list.
An iterable of strings each entry is validated and returned unchanged.

Parameters:: cdm_subset (str, iterable of str or None) – CDM subset input to normalize. May be: - None: full list of CDM tables is returned. - str: returned as a list containing that string. - Any iterable (e.g., list) of strings: returned unchanged after validation.
Return type:: list[str]
Returns:: list of str – A list of CDM table names that are guaranteed to exist in properties.cdm_tables.
Raises:: ValueError – If any provided table name is not in properties.cdm_tables.

cdm_reader_mapper.cdm_mapper.utils.utilities.get_usecols(tb, col_subset)[source]¶

Normalize a column subset specification for use with pandas.read_csv.

This function converts various forms of column subset input into a standardized list of column names suitable for the usecols argument in pandas.read_csv.

Parameters:

tb (str) – Table name. Only used if col_subset is a dictionary.
col_subset (str, iterable of str, dict, or None) – Column subset specification. Acceptable formats: - A single column name as a string. - An iterable of column names (list, tuple, set, etc.). - A dictionary mapping table names to column lists. - None (read all columns).

Return type:

list[str] | None

Returns:

list of str or None – Normalized list of column names suitable for pandas usecols, or None if no restriction is applied.

Raises:

TypeError – If col_subset is not a string, iterable, dict, or None.

Notes

If col_subset is a string, it is returned as a single-element list.
If col_subset is an iterable of strings (e.g., list, tuple, set), it is converted to a list.
If col_subset is a dictionary, it is interpreted as a mapping {table_name: list_of_columns} and returns the entry corresponding to the given table tb (or None if missing).
If col_subset is None, the function returns None, meaning all columns should be read.