Mapping data from ICOADS deck 704 to the Common Data Model (CDM)¶
Here we extract supplemental metadata from ICOADSv3.0 stored in the IMMA version 1 format. We will then map this data (including the supplemental data) to the Common Data Model (CDM) format defined in the CDM Documentation..
The supplementary data are mapped to the CDM using the tables and codes specific to deck 704. The generic ICOADS tables are used to map the common ICOADS data components.
We are analysing deck: 704, the US Marine Meteorological Journals Collection
from __future__ import annotations
import pandas as pd
from cdm_reader_mapper import read_mdf, test_data
We first read the supplemental data information from the c99 imma format for a subset of the data (e.g. 1878/10). For this we need to use the "icoads_r300_d704" schema. The convention for schema names is: "format_version_deck"
format/data model: “icoads”
version/release: “r300” (release 3.0.0)
deck: “d704”
In this notebook we load the icoads r3.0.0 deck 704 test file to use as an example.
schema = "icoads_r300_d704"
data_file_path = test_data.test_icoads_r300_d704["source"] # Load the example file from the cdm_reader_mapper test data
data_bundle = read_mdf(data_file_path, imodel=schema)
data_raw = data_bundle.data
WARNING:root:Unknown column_type 'object' for column '('c8', 'PUID')'
WARNING:root:Unknown column_type 'object' for column '('c95', 'ARCR')'
WARNING:root:Unknown column_type 'object' for column '('c96', 'ARCI')'
WARNING:root:Unknown column_type 'object' for column '('c97', 'ARCE')'
WARNING:root:Unknown column_type 'object' for column '('c99_sentinel', 'BLK')'
/home/docs/checkouts/readthedocs.org/user_builds/cdm-reader-mapper/conda/latest/lib/python3.13/site-packages/cdm_reader_mapper/mdf_reader/utils/validators.py:240: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
to_bool = data[validated_columns].applymap(convert_str_boolean)
/home/docs/checkouts/readthedocs.org/user_builds/cdm-reader-mapper/conda/latest/lib/python3.13/site-packages/cdm_reader_mapper/mdf_reader/utils/validators.py:241: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
false_mask = to_bool.applymap(_is_false)
/home/docs/checkouts/readthedocs.org/user_builds/cdm-reader-mapper/conda/latest/lib/python3.13/site-packages/cdm_reader_mapper/mdf_reader/utils/validators.py:242: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
true_mask = to_bool.applymap(_is_true)
The data from the c99 column for this deck is separated in the following sub sections:
c99_sentinal
c99_journal
c99_voyage
c99_daily
c99_data4
c99_data5
data_raw.c99_sentinel.head()
| ATTI | ATTL | BLK | |
|---|---|---|---|
| 0 | 99 | 0 | None |
| 1 | 99 | 0 | None |
| 2 | 99 | 0 | None |
| 3 | 99 | 0 | None |
| 4 | 99 | 0 | None |
pd.options.display.max_columns = None
data_raw.c99_journal.head()
| sentinel | reel_no | journal_no | frame_no | ship_name | journal_ed | rig | ship_material | vessel_type | vessel_length | vessel_beam | commander | country | screw_paddle | hold_depth | tonnage | baro_type | baro_height | baro_cdate | baro_loc | baro_units | baro_cor | thermo_mount | SST_I | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 002 | 0018 | 0003 | Panay | 78 | 01 | 1 | 1 | 187 | 37 | S.P.Bray,Jr | 01 | 3 | 23 | 1190 | 2 | 14 | None | Bulkhead of cabin | 1 | - .102 | 2 | None |
| 1 | 1 | 002 | 0018 | 0003 | Panay | 78 | 01 | 1 | 1 | 187 | 37 | S.P.Bray,Jr | 01 | 3 | 23 | 1190 | 2 | 14 | None | Bulkhead of cabin | 1 | - .102 | 2 | None |
| 2 | 1 | 002 | 0018 | 0003 | Panay | 78 | 01 | 1 | 1 | 187 | 37 | S.P.Bray,Jr | 01 | 3 | 23 | 1190 | 2 | 14 | None | Bulkhead of cabin | 1 | - .102 | 2 | None |
| 3 | 1 | 002 | 0018 | 0003 | Panay | 78 | 01 | 1 | 1 | 187 | 37 | S.P.Bray,Jr | 01 | 3 | 23 | 1190 | 2 | 14 | None | Bulkhead of cabin | 1 | - .102 | 2 | None |
| 4 | 1 | 002 | 0018 | 0003 | Panay | 78 | 01 | 1 | 1 | 187 | 37 | S.P.Bray,Jr | 01 | 3 | 23 | 1190 | 2 | 14 | None | Bulkhead of cabin | 1 | - .102 | 2 | None |
data_raw.c99_voyage.head()
| sentinel | reel_no | journal_no | frame_start | from_city | to_city | |
|---|---|---|---|---|---|---|
| 0 | 2 | 002 | 0018 | 0014 | Boston | Rio de Janeiro |
| 1 | 2 | 002 | 0018 | 0014 | Boston | Rio de Janeiro |
| 2 | 2 | 002 | 0018 | 0014 | Boston | Rio de Janeiro |
| 3 | 2 | 002 | 0018 | 0014 | Boston | Rio de Janeiro |
| 4 | 2 | 002 | 0018 | 0014 | Boston | Rio de Janeiro |
data_raw.c99_daily.head()
| sentinel | reel_no | journal_no | frame_start | frame | year | month | day | distance | lat_deg_an | lat_min_an | lat_hemis_an | lon_deg_an | lon_min_an | lon_hemis_an | lat_deg_on | lat_min_on | lat_hemis_on | lon_deg_of | lon_min_of | lon_hemis_of | current_speed | current_direction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | NaN | <NA> | <NA> | None | <NA> | <NA> | None | 42 | 20 | N | 66 | 30 | W | 0.1 | E |
| 1 | 3 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | NaN | <NA> | <NA> | None | <NA> | <NA> | None | 42 | 20 | N | 66 | 30 | W | 0.1 | E |
| 2 | 3 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | NaN | <NA> | <NA> | None | <NA> | <NA> | None | 42 | 20 | N | 66 | 30 | W | 0.1 | E |
| 3 | 3 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | NaN | <NA> | <NA> | None | <NA> | <NA> | None | 42 | 20 | N | 66 | 30 | W | 0.1 | E |
| 4 | 3 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | NaN | <NA> | <NA> | None | <NA> | <NA> | None | 42 | 20 | N | 66 | 30 | W | 0.1 | E |
data_raw.c99_data4.head()
| sentinel | reel_no | journal_no | frame_start | frame | year | month | day | time_ind | hour | ship_speed | compass_ind | ship_course_compass | compass_correction | ship_course_true | wind_dir_mag | wind_dir_true | wind_force | barometer | temp_ind | attached_thermometer | air_temperature | wet_bulb_temperature | sea_temperature | present_weather | clouds | sky_clear | sea_state | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | 1 | 2 | 8.5 | None | EXS | <NA> | None | WSW | None | 06 | 2960 | 1 | 5.8 | NaN | NaN | NaN | BOC | CU | 5 | R |
| 1 | 4 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | 1 | 4 | 8.5 | None | EXS | <NA> | None | WSW | None | 06 | 2960 | 1 | 5.6 | NaN | NaN | NaN | BOC | SC | 3 | R |
| 2 | 4 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | 1 | 6 | 8.5 | None | EXS | <NA> | None | W | None | 06 | 2962 | 1 | 5.6 | 4.8 | NaN | 5.2 | OCG | SC | 0 | R |
| 3 | 4 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | 1 | 8 | 8.0 | None | EXS | <NA> | None | W | None | 06 | 2964 | 1 | 5.6 | 4.8 | NaN | 5.2 | CG | SC | 0 | R |
| 4 | 4 | 002 | 0018 | 0014 | 0015 | 1878 | 10 | 20 | 1 | 10 | 8.5 | None | EXS | <NA> | None | W | None | 06 | 2969 | 1 | 5.7 | 4.8 | NaN | 5.0 | BC | SC | 2 | L |
data_raw.c99_data5.head()
| sentinel | reel_no | journal_no | frame_start | frame | year | month | day | time_ind | hour | ship_speed | compass_ind | ship_course_compass | blank | ship_course_true | wind_dir_mag | wind_dir_true | wind_force | barometer | temp_ind | attached_thermometer | air_temperature | wet_bulb_temperature | sea_temperature | present_weather | clouds | sky_clear | sea_state | compass_correction_ind | compass_correction | compass_correction_dir | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | None | None | None | None | <NA> | <NA> | <NA> | None | <NA> | NaN | None | None | None | None | None | None | None | None | None | NaN | NaN | NaN | NaN | None | None | <NA> | None | None | NaN | None |
| 1 | None | None | None | None | None | <NA> | <NA> | <NA> | None | <NA> | NaN | None | None | None | None | None | None | None | None | None | NaN | NaN | NaN | NaN | None | None | <NA> | None | None | NaN | None |
| 2 | None | None | None | None | None | <NA> | <NA> | <NA> | None | <NA> | NaN | None | None | None | None | None | None | None | None | None | NaN | NaN | NaN | NaN | None | None | <NA> | None | None | NaN | None |
| 3 | None | None | None | None | None | <NA> | <NA> | <NA> | None | <NA> | NaN | None | None | None | None | None | None | None | None | None | NaN | NaN | NaN | NaN | None | None | <NA> | None | None | NaN | None |
| 4 | None | None | None | None | None | <NA> | <NA> | <NA> | None | <NA> | NaN | None | None | None | None | None | None | None | None | None | NaN | NaN | NaN | NaN | None | None | <NA> | None | None | NaN | None |
Now that we have separated the c99 data into the different sections, we see that this deck is composed of two types of data, which are the same:
- c99_data4
- c99_data5
Both sections have the same name in variables. To map the correct section into the CDM it is necessary to impose a filter on the sections composed only of NaN data. The problem is that we dont know which years in the time series will have a section c99_data4 and which will have a c99_data5
Note that this solution of excluding one section, will only work for decks from which sections are exclusive: Among the sections listed in the block, only one of them appears in every report.
We can now use the "icoads_r300_d704" model to map the raw data to the Common Data Model glamod/common_data_model. The method function map_model contains all the functions for the model to convert variables to the correct units and/or specification following the CDM Documentation.
To run the data model we need three things:
raw data (the data we just read above)
attributes of the raw data (sections and column names)
the name of the model
cdm_tables = data_bundle.map_model()
2026-06-04 12:08:20,355 - root - INFO - Initialized basic logging configuration successfully
/home/docs/checkouts/readthedocs.org/user_builds/cdm-reader-mapper/conda/latest/lib/python3.13/site-packages/cdm_reader_mapper/cdm_mapper/mapper.py:75: FutureWarning: 'any' with datetime64 dtypes is deprecated and will raise in a future version. Use (obj != pd.Timestamp(0)).any() instead.
list_cols = [col for col in df.columns if df[col].apply(lambda x: isinstance(x, list)).any()]
/home/docs/checkouts/readthedocs.org/user_builds/cdm-reader-mapper/conda/latest/lib/python3.13/site-packages/cdm_reader_mapper/cdm_mapper/mapper.py:75: FutureWarning: 'any' with datetime64 dtypes is deprecated and will raise in a future version. Use (obj != pd.Timestamp(0)).any() instead.
list_cols = [col for col in df.columns if df[col].apply(lambda x: isinstance(x, list)).any()]
Now, have we succeeded in writing some of the data to the CDM format?
We were looking to write the following data
Header section¶
Platform type and sub type
primary station id: original ship names
Longitude and Latitudes: converted from Degrees Minutes and Hemisphere to Decimal degrees
Location accuracy
Observations tables¶
Observations-at: latitude, longitude and location precisionObservations-dpt: latitude, longitude and location precisionObservations-slp: latitude, longitude and location precisionz_coordinate_type: Barometer height in feet converted to m.
original units: written in the CDM code format
Observations-sst: latitude, longitude and location precisionObservations-wbt: latitude, longitude and location precisionObservations-wd: latitude, longitude and location precisionObservations-ws: latitude, longitude and location precision
data = cdm_tables["header"]
data.head()
| report_id | region | sub_region | application_area | observing_programme | report_type | station_name | station_type | platform_type | platform_sub_type | primary_station_id | station_record_number | primary_station_id_scheme | longitude | latitude | location_accuracy | location_method | location_quality | crs | station_speed | station_course | station_heading | height_of_station_above_local_ground | height_of_station_above_sea_level | height_of_station_above_sea_level_accuracy | sea_level_datum | report_meaning_of_timestamp | report_timestamp | report_duration | report_time_accuracy | report_time_quality | report_time_reference | profile_id | events_at_station | report_quality | duplicate_status | duplicates | record_timestamp | history | processing_level | processing_codes | source_id | source_record_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ICOADS-300-020N16 | <NA> | <NA> | [1, 7, 10, 11] | [5, 7, 56] | 0 | Panay | 2 | 2 | 26 | Panay | 1 | 8 | -68.410004 | 42.28 | <NA> | <NA> | 0 | 0 | 4.11552 | 90.0 | <NA> | 0.0 | 0.0 | <NA> | <NA> | 2 | 1878-10-20 06:00:00 | 11 | 3600.0 | 2 | <NA> | <NA> | <NA> | 0 | 4 | <NA> | 2026-06-04 12:08:20.391796+00:00 | 2026-06-04 12:08:20. Initial conversion from I... | <NA> | <NA> | ICOADS-3-0-0T-125-704-1878-10 | 020N16 |
| 1 | ICOADS-300-020N1P | <NA> | <NA> | [1, 7, 10, 11] | [5, 7, 56] | 0 | Panay | 2 | 2 | 26 | Panay | 1 | 8 | -68.029999 | 42.31 | <NA> | <NA> | 0 | 0 | 4.11552 | 90.0 | <NA> | 0.0 | 0.0 | <NA> | <NA> | 2 | 1878-10-20 08:00:00 | 11 | 3600.0 | 2 | <NA> | <NA> | <NA> | 0 | 4 | <NA> | 2026-06-04 12:08:20.391796+00:00 | 2026-06-04 12:08:20. Initial conversion from I... | <NA> | <NA> | ICOADS-3-0-0T-125-704-1878-10 | 020N1P |
| 2 | ICOADS-300-020N25 | <NA> | <NA> | [1, 7, 10, 11] | [5, 7, 56] | 0 | Panay | 2 | 2 | 26 | Panay | 1 | 8 | -67.639999 | 42.33 | <NA> | <NA> | 0 | 0 | 4.11552 | 90.0 | <NA> | 0.0 | 0.0 | <NA> | <NA> | 2 | 1878-10-20 10:00:00 | 11 | 3600.0 | 2 | <NA> | <NA> | <NA> | 0 | 4 | <NA> | 2026-06-04 12:08:20.391796+00:00 | 2026-06-04 12:08:20. Initial conversion from I... | <NA> | <NA> | ICOADS-3-0-0T-125-704-1878-10 | 020N25 |
| 3 | ICOADS-300-020N2Q | <NA> | <NA> | [1, 7, 10, 11] | [5, 7, 56] | 0 | Panay | 2 | 2 | 26 | Panay | 1 | 8 | -67.290001 | 42.35 | <NA> | <NA> | 0 | 0 | 4.11552 | 90.0 | <NA> | 0.0 | 0.0 | <NA> | <NA> | 2 | 1878-10-20 12:00:00 | 11 | 3600.0 | 2 | <NA> | <NA> | <NA> | 0 | 4 | <NA> | 2026-06-04 12:08:20.391796+00:00 | 2026-06-04 12:08:20. Initial conversion from I... | <NA> | <NA> | ICOADS-3-0-0T-125-704-1878-10 | 020N2Q |
| 4 | ICOADS-300-020N3A | <NA> | <NA> | [1, 7, 10, 11] | [5, 7, 56] | 0 | Panay | 2 | 2 | 26 | Panay | 1 | 8 | -66.900002 | 42.37 | <NA> | <NA> | 0 | 0 | 4.11552 | 90.0 | <NA> | 0.0 | 0.0 | <NA> | <NA> | 2 | 1878-10-20 14:00:00 | 11 | 3600.0 | 2 | <NA> | <NA> | <NA> | 0 | 4 | <NA> | 2026-06-04 12:08:20.391796+00:00 | 2026-06-04 12:08:20. Initial conversion from I... | <NA> | <NA> | ICOADS-3-0-0T-125-704-1878-10 | 020N3A |
We now show an example of Lat and Lon
data.latitude.head(), data.longitude.head()
(0 42.28
1 42.31
2 42.33
3 42.35
4 42.37
Name: latitude, dtype: Float64,
0 -68.410004
1 -68.029999
2 -67.639999
3 -67.290001
4 -66.900002
Name: longitude, dtype: Float64)
data_raw.c99_daily[
[
"lat_deg_on",
"lat_min_on",
"lat_hemis_on",
"lon_deg_of",
"lon_min_of",
"lon_hemis_of",
]
].head()
| lat_deg_on | lat_min_on | lat_hemis_on | lon_deg_of | lon_min_of | lon_hemis_of | |
|---|---|---|---|---|---|---|
| 0 | 42 | 20 | N | 66 | 30 | W |
| 1 | 42 | 20 | N | 66 | 30 | W |
| 2 | 42 | 20 | N | 66 | 30 | W |
| 3 | 42 | 20 | N | 66 | 30 | W |
| 4 | 42 | 20 | N | 66 | 30 | W |
This has been successfully converted to Decimal degrees with the right (-) for each hemisphere.
Now for the SLP we have other information:
data_raw.c99_journal[["baro_type", "baro_height", "baro_units"]].head()
| baro_type | baro_height | baro_units | |
|---|---|---|---|
| 0 | 2 | 14 | 1 |
| 1 | 2 | 14 | 1 |
| 2 | 2 | 14 | 1 |
| 3 | 2 | 14 | 1 |
| 4 | 2 | 14 | 1 |
Baro type original code table
{
"1":"aneroid",
"2":"mercurial"
}
Baro units original code table. It has been left like this:
{
"1":"inches",
"2":"millimeters",
"3":"millibars",
"4":"unable to determine",
"5":"Paris inches"
}
Our CDM table will be
{
"1":1001,
"2":1002,
"3":1003,
"4":9999,
"5":1005
}
9999 will be the "fill_value": 9999 that indicates to the CDM-mapper that these are NaN values.
data_obs = cdm_tables["observations-slp"]
data_obs.head()
| observation_id | report_id | data_policy_licence | date_time | date_time_meaning | observation_duration | longitude | latitude | crs | z_coordinate | z_coordinate_type | observation_height_above_station_surface | observed_variable | secondary_variable | observation_value | value_significance | secondary_value | units | code_table | conversion_flag | location_method | location_precision | z_coordinate_method | bbox_min_longitude | bbox_max_longitude | bbox_min_latitude | bbox_max_latitude | spatial_representativeness | quality_flag | numerical_precision | sensor_id | sensor_automation_status | exposure_of_sensor | original_precision | original_units | original_code_table | original_value | conversion_method | processing_code | processing_level | adjustment_id | traceability | advanced_qc | advanced_uncertainty | advanced_homogenisation | source_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ICOADS-300-020N16-SLP | ICOADS-300-020N16 | 0 | 1878-10-20 06:00:00 | 2 | 8 | -68.410004 | 42.28 | 0 | 4.27 | 0 | 4.27 | 58 | <NA> | 99610.0 | 2 | <NA> | 32 | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 3 | 2 | <NA> | <NA> | 5 | 3 | <NA> | 1001 | <NA> | 996.1 | 7 | <NA> | 3 | <NA> | 2 | 0 | 0 | 0 | ICOADS-3-0-0T-125-704-1878-10 |
| 1 | ICOADS-300-020N1P-SLP | ICOADS-300-020N1P | 0 | 1878-10-20 08:00:00 | 2 | 8 | -68.029999 | 42.31 | 0 | 4.27 | 0 | 4.27 | 58 | <NA> | 99630.0 | 2 | <NA> | 32 | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 3 | 2 | <NA> | <NA> | 5 | 3 | <NA> | 1001 | <NA> | 996.3 | 7 | <NA> | 3 | <NA> | 2 | 0 | 0 | 0 | ICOADS-3-0-0T-125-704-1878-10 |
| 2 | ICOADS-300-020N25-SLP | ICOADS-300-020N25 | 0 | 1878-10-20 10:00:00 | 2 | 8 | -67.639999 | 42.33 | 0 | 4.27 | 0 | 4.27 | 58 | <NA> | 99690.0 | 2 | <NA> | 32 | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 3 | 2 | <NA> | <NA> | 5 | 3 | <NA> | 1001 | <NA> | 996.9 | 7 | <NA> | 3 | <NA> | 2 | 0 | 0 | 0 | ICOADS-3-0-0T-125-704-1878-10 |
| 3 | ICOADS-300-020N2Q-SLP | ICOADS-300-020N2Q | 0 | 1878-10-20 12:00:00 | 2 | 8 | -67.290001 | 42.35 | 0 | 4.27 | 0 | 4.27 | 58 | <NA> | 99760.0 | 2 | <NA> | 32 | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 3 | 2 | <NA> | <NA> | 5 | 3 | <NA> | 1001 | <NA> | 997.6 | 7 | <NA> | 3 | <NA> | 2 | 0 | 0 | 0 | ICOADS-3-0-0T-125-704-1878-10 |
| 4 | ICOADS-300-020N3A-SLP | ICOADS-300-020N3A | 0 | 1878-10-20 14:00:00 | 2 | 8 | -66.900002 | 42.37 | 0 | 4.27 | 0 | 4.27 | 58 | <NA> | 99920.0 | 2 | <NA> | 32 | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 3 | 2 | <NA> | <NA> | 5 | 3 | <NA> | 1001 | <NA> | 999.2 | 7 | <NA> | 3 | <NA> | 2 | 0 | 0 | 0 | ICOADS-3-0-0T-125-704-1878-10 |