cdm_reader_mapper.cdm_mapper.utils package¶
Climate Data Model (CDM) mapper utilities.
Submodules¶
cdm_reader_mapper.cdm_mapper.utils.conversions module¶
Convert Common Datamodel (CDM) mapping table elements from/to string types.
- class cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverter(converters, args=None)[source]¶
Bases:
objectBase class for managing type conversion functions.
- Parameters:
- class cdm_reader_mapper.cdm_mapper.utils.conversions.ConvertFromStr[source]¶
Bases:
cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverterConverter class for converting string representations into Python types.
Provides default converters for integers, floats, timestamps, and strings, including array variants.
- class cdm_reader_mapper.cdm_mapper.utils.conversions.ConvertToStr[source]¶
Bases:
cdm_reader_mapper.cdm_mapper.utils.conversions.BaseConverterConverter class for converting Python types to string representations.
Provides default converters for integers, floats, timestamps, and strings, including array variants. Supports optional arguments for certain types (e.g., decimal_places for numeric types).
- cdm_reader_mapper.cdm_mapper.utils.conversions.convert_from_str_df(data, imodel, cdm_subset=None, null_label='null')[source]¶
Convert string-encoded values in a DataFrame to native pandas dtypes.
- Parameters:
data (
pd.DataFrame) – Input DataFrame containing string representations of values.imodel (
str) – Input data model identifier used to determine column types.cdm_subset (
strorlist, optional) – Subset of CDM tables to process. If None, all tables are considered.null_label (
strorNone, default"null") – Label representing null values in the input data.
- Return type:
- Returns:
pd.DataFrame– DataFrame with values converted from string representations to appropriate pandas dtypes.
- cdm_reader_mapper.cdm_mapper.utils.conversions.convert_from_str_series(series, column_atts, null_label='null')[source]¶
Convert a Series of string values to a native pandas dtype.
- Parameters:
- Return type:
- Returns:
pd.Series– Series with values converted to the appropriate pandas dtype.
- cdm_reader_mapper.cdm_mapper.utils.conversions.convert_to_str_df(data, imodel, cdm_subset=None, null_label='null')[source]¶
Convert DataFrame values to string representations based on a data model.
- Parameters:
data (
pd.DataFrame) – Input DataFrame containing values to convert.imodel (
str) – Input data model identifier used to determine column types.cdm_subset (
strorlist, optional) – Subset of CDM tables to process. If None, all tables are considered.null_label (
strorNone, default"null") – Label to use for null or missing values in the output.
- Return type:
- Returns:
pd.DataFrame– DataFrame with values converted to string representations.
cdm_reader_mapper.cdm_mapper.utils.mapping_functions module¶
Common Data Model (CDM) mappings.
Created on Wed Apr 3 10:31:18 2019
imodel: imma1
Functions to map imodel elements to CDM elements
Main functions are those invoqued in the mappings files (table_name.json)
Main functions need to be part of class mapping_functions()
- Main functions get:
1 positional argument (pd.Series or pd.DataFrame with imodel data or imodel element name)
Optionally, keyword arguments
Main function return: pd.Series, np.array or scalars
Auxiliary functions can be used and defined in or outside class mapping_functions
@author: iregon
- class cdm_reader_mapper.cdm_mapper.utils.mapping_functions.MappingFunctions(imodel)[source]¶
Bases:
objectClass for mapping Common Data Model (CDM) elements from IMMA1, GDAC, ICOADS, C-RAID, MAROB, Pub47, and IMMT datasets.
- Parameters:
imodel (
str) – Name of the input data model, e.g icoads_r302_d992.
- datetime_cmems(series, format='%Y-%m-%d %H:%M:%S')[source]¶
Convert CMEMS date strings to pandas datetime.
- Parameters:
series (
pd.Series) – Series of date strings.format (
str, optional) – Datetime format string (default: “%Y-%m-%d %H:%M:%S”).
- Return type:
- Returns:
pd.DatetimeIndex– DatetimeIndex of converted dates.
- datetime_craid(series, format='%Y-%m-%d %H:%M:%S.%f')[source]¶
Convert C-RAID date strings to pandas datetime.
- Parameters:
series (
pd.Series) – Series of date strings.format (
str, optional) – Datetime format string (default: “%Y-%m-%d %H:%M:%S.%f”).
- Return type:
- Returns:
pd.DatetimeIndex– DatetimeIndex of converted dates.
- datetime_decimalhour_to_hm(row)[source]¶
Convert a decimal hour to hours and minutes.
- Parameters:
row (
pd.Series) – A Series containing a decimal hour value at index 4.- Return type:
- Returns:
pd.Series– A Series with ‘HR’ (hour) and ‘M’ (minute).
- datetime_imma1(df)[source]¶
Convert IMMA1 dataset to pandas datetime object.
- Parameters:
df (
pd.DataFrame) – IMMA1 dataset with columns for year, month, day, and decimal hour.- Return type:
- Returns:
pd.DatetimeIndex– DatetimeIndex of converted timestamps.
- datetime_imma1_701(df)[source]¶
Convert IMMA1 deck 701 dataset to pandas datetime object with UTC fallback.
- Parameters:
df (
pd.DataFrame) – IMMA1 deck 701 dataset with columns for date and time.- Return type:
- Returns:
pd.DatetimeIndex– DatetimeIndex with converted timestamps.
- datetime_imma1_to_utc(df)[source]¶
Convert to pandas datetime object for IMMA1 deck 701 format.
Set missing hour to 12 and use latitude and longitude information to convert local midday to UTC time.
- Parameters:
df (
pd.DataFrame) – IMMA1 deck 701 dataset containing year, month, day, latitude, and longitude.- Return type:
- Returns:
pd.DatetimeIndex– DatetimeIndex with timestamps converted to UTC.
- datetime_immt(df)[source]¶
Convert IMMT dataset to pandas datetime object.
- Parameters:
df (
pd.DataFrame) – IMMT dataset containing year, month, day, hour.- Return type:
- Returns:
pd.DatetimeIndex– DatetimeIndex of converted timestamps.
- datetime_marob(series, format='%Y-%m-%dT%H:%M:%S')[source]¶
Convert MAROB date strings to pandas datetime.
- datetime_utcnow(df)[source]¶
Return the current UTC datetime.
- Parameters:
df (
pd.DataFrame) – Ignored. Present for API consistency.- Return type:
- Returns:
datetime.datetime– Current UTC datetime.
- df_col_join(df, sep)[source]¶
Join all columns of a pandas DataFrame into a single Series of strings.
- feet_to_m(series)[source]¶
Convert values from feet to meters.
- Parameters:
series (
pd.Series) – Series of values in feet.- Return type:
- Returns:
pd.Series– Series of values in meters, rounded to 2 decimals.
- float_opposite(series)[source]¶
Return the opposite (negation) of a numeric Series.
- Parameters:
series (
pd.Series) – Input numeric Series.- Return type:
- Returns:
pd.Series– Series with negated values.
- gdac_pressure(df)[source]¶
Decode or re-encode the non-standard pressure representation used by IMMT.
IMMT stores pressure as a scaled integer with an implicit offset: values below 1_000 represent readings above 1_000 hPa (e.g. raw 0025 → 1_002.5 hPa after adding 10_000 and multiplying by 0.1). Values ≥ 1_000 need only the scale factor applied.
- gdac_uid(df, prepend='', append='')[source]¶
Generate a unique UID based on timestamp and ship’s callsign (ID).
- icoads_wd_conversion(series)[source]¶
Convert ICOADS wind direction codes.
Codes 361 -> 0, 362 -> NaN.
- Parameters:
series (
pd.Series) – Input ICOADS wind direction Series.- Return type:
- Returns:
pd.Series– Converted wind direction Series.
- icoads_wd_integer_to_float(series)[source]¶
Convert ICOADS wind direction integer Series to float, applying conversion rules.
- Parameters:
series (
pd.Series) – ICOADS wind direction integer Series.- Return type:
- Returns:
pd.Series– Float wind direction Series.
- integer_to_float(s)[source]¶
Convert a numeric or integer Series to float. Non-numeric Series returns empty float Series.
- location_accuracy(df)[source]¶
Compute location accuracy based on two columns (li_array, lat_array).
- Parameters:
df (
pd.DataFrame) – Input DataFrame with at least two columns.- Return type:
- Returns:
pd.Series– Series of location accuracy values.
- longitude_360to180(series)[source]¶
Convert longitudes from 0-360 to -180 to 180 range.
- Parameters:
series (
pd.Series) – Input longitude Series.- Return type:
- Returns:
pd.Series– Converted longitude Series.
- observing_programme(series)[source]¶
Map observing programme codes to lists.
- Parameters:
series (
pd.Series) – Series of programme codes (string or int).- Return type:
- Returns:
pd.Series– Series of mapped observing programme lists.
- pressue_hpa_in_pa(series)[source]¶
Convert pressure from hPa in Pa.
- Parameters:
series (
pd.Series) – Series of presuure in hPa.- Return type:
- Returns:
pd.Series– Series of pressure in Pa.
- select_column(df)[source]¶
Select the last column with non-null values, prioritizing the rightmost column.
- Parameters:
df (
pd.DataFrame) – Input DataFrame.- Return type:
- Returns:
pd.Series– Series with selected column values.
- string_add(series, prepend='', append='', separator='')[source]¶
Add strings to Series elements with optional zero-fill.
- string_join_add(df, prepend=None, append=None, separator='', zfill_col=None, zfill=None)[source]¶
Join DataFrame columns into a single string and optionally prepend/append strings.
- Parameters:
df (
pd.DataFrame) – Input DataFrame with string or numeric columns.prepend (
strorNone, optional) – String to prepend to each joined value, by default None.append (
strorNone, optional) – String to append to each joined value, by default None.separator (
str, default"") – Separator to use when joining columns.zfill_col (
list, optional) – List of column indices to apply zero-fill.zfill (
list, optional) – List of widths for zero-fill, corresponding to zfill_col.
- Return type:
- Returns:
pd.Series– Series of joined and modified strings.
- temperature_celsius_to_kelvin(df)[source]¶
Convert temperatures from Celsius to Kelvin using the model-specific method.
- Parameters:
df (
pd.DataFrame) – Input DataFrame with temperature data.- Return type:
- Returns:
pd.Series– Series of temperatures in Kelvin.
- time_accuracy(series)[source]¶
Map time accuracy codes to seconds.
- Parameters:
series (
pd.Series) – Series of time accuracy codes as strings.- Return type:
- Returns:
pd.Series– Series with time accuracy in seconds.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.convert_to_str(a)[source]¶
Convert a value to string.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.convert_to_utc_i(date, zone)[source]¶
Convert a pandas datetime series from local timezone to UTC.
- Parameters:
date (
pd.Series) – Datetime series.zone (
str) – Timezone string.
- Return type:
- Returns:
pd.DatetimeIndex– Datetime series converted to UTC.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.coord_360_to_180i(lon)[source]¶
Convert longitude from 0-360 to -180 to 180 degrees.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.coord_dmh_to_90i(deg, min, hemis)[source]¶
Convert latitude from degrees, minutes, hemisphere to decimal degrees.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.find_entry(imodel, d)[source]¶
Find entry in a dictionary, handling imodel suffix stripping.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.location_accuracy_i(li, lat)[source]¶
Compute approximate location accuracy in km based on ICOADS code.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.longitude_360to180_i(lon)[source]¶
Convert longitude from 0-360 to -180 to 180 degrees.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.series_strptime(series, format)[source]¶
Convert series with strings to series with datetime.
- cdm_reader_mapper.cdm_mapper.utils.mapping_functions.string_add_i(a, b, c, sep)[source]¶
Concatenate strings a, b, c with separator, ignoring None values.
cdm_reader_mapper.cdm_mapper.utils.utilities module¶
Utility function for reading and writing CDM tables.
- cdm_reader_mapper.cdm_mapper.utils.utilities.adjust_filename(filename, table='', extension='psv')[source]¶
Adjust a filename by optionally prepending a table name and appending an extension.
- Parameters:
- Return type:
- Returns:
str– Adjusted filename with optional table prefix and file extension.
Notes
If table is not already part of the filename, it will be prepended with a dash.
If the filename does not contain an extension (no ‘.’), the specified extension is appended. Default extension is ‘psv’.
Examples
>>> adjust_filename("data", table="header") 'header-data.psv'
>>> adjust_filename("header-data.psv", table="header") 'header-data.psv'
>>> adjust_filename("data.txt", table="header") 'header-data.txt'
- cdm_reader_mapper.cdm_mapper.utils.utilities.dict_to_tuple_list(dic)[source]¶
Convert a dictionary with scalar or list values into a list of (key, value) tuples.
If a value is a list, each item in the list will produce its own tuple. If a value is a scalar, a single tuple is produced.
- Parameters:
dic (
dict) – Dictionary containing keys and values. Values may be scalars or lists.- Return type:
- Returns:
listoftuple– List of (key, value) tuples. If a dictionary value is a list, each list item becomes a separate tuple.
Examples
>>> dict_to_tuple_list({"A": [1, 2], "B": 3}) [('A', 1), ('A', 2), ('B', 3)]
- cdm_reader_mapper.cdm_mapper.utils.utilities.get_cdm_subset(cdm_subset)[source]¶
Normalize and validate a CDM subset specification.
This function ensures that the returned value is always a list of valid CDM table names (as defined in properties.cdm_tables). It accepts:
None returns the full list of CDM tables.
A single string validated and returned as a one-element list.
An iterable of strings each entry is validated and returned unchanged.
- Parameters:
cdm_subset (
str, iterable ofstrorNone) – CDM subset input to normalize. May be: - None: full list of CDM tables is returned. - str: returned as a list containing that string. - Any iterable (e.g., list) of strings: returned unchanged after validation.- Return type:
- Returns:
listofstr– A list of CDM table names that are guaranteed to exist in properties.cdm_tables.- Raises:
ValueError – If any provided table name is not in properties.cdm_tables.
- cdm_reader_mapper.cdm_mapper.utils.utilities.get_usecols(tb, col_subset)[source]¶
Normalize a column subset specification for use with pandas.read_csv.
This function converts various forms of column subset input into a standardized list of column names suitable for the usecols argument in pandas.read_csv.
- Parameters:
tb (
str) – Table name. Only used if col_subset is a dictionary.col_subset (
str, iterable ofstr,dict, orNone) – Column subset specification. Acceptable formats: - A single column name as a string. - An iterable of column names (list, tuple, set, etc.). - A dictionary mapping table names to column lists. - None (read all columns).
- Return type:
- Returns:
listofstrorNone– Normalized list of column names suitable for pandas usecols, or None if no restriction is applied.- Raises:
TypeError – If col_subset is not a string, iterable, dict, or None.
Notes
If col_subset is a string, it is returned as a single-element list.
If col_subset is an iterable of strings (e.g., list, tuple, set), it is converted to a list.
If col_subset is a dictionary, it is interpreted as a mapping {table_name: list_of_columns} and returns the entry corresponding to the given table tb (or None if missing).
If col_subset is None, the function returns None, meaning all columns should be read.