How to read meteorological data with `read_mdf` function#

from __future__ import annotations

import pandas as pd

from cdm_reader_mapper import properties, read_mdf, test_data

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      1 from __future__ import annotations
      2 
      3 import pandas as pd
      4 
----> 5 from cdm_reader_mapper import properties, read_mdf, test_data

ModuleNotFoundError: No module named 'cdm_reader_mapper'

The cdm_reader_mapper.read_mdf function and is a tool designed to read data files compliant with a user specified data model.

It was developed with the initial idea to read the IMMA data format, but it was further enhanced to account for other meteorological data formats.

Lets see an example for a typical file from ICOADSv3.0.. We pick an specific monthly output for a Source/Deck. In this case data from the Marine Meterological Journals data set SID/DCK: 125-704 for Oct 1878.

The .imma file looks like this:

data_path = test_data.test_icoads_r300_d704["source"]

data_ori = pd.read_table(data_path)
data_ori.head()

2025-08-13 11:12:54,043 - root - INFO - Attempting to fetch remote file: icoads/r300/d704/input/icoads_r300_d704_1878-10-01_subset.imma.md5

	18781020 600 4228 29159 130623 10Panay 12325123 9961 4 165 17128704125 5 0 1 1FF111F11AAA1AAAA1AAA 9815020N163002199 0 100200180003Panay 78011118737S.P.Bray,Jr 013231190214 Bulkhead of cabin 1- .1022200200180014Boston Rio de Janeiro 300200180014001518781020 4220N 6630W 10 E 400200180014001518781020102 85 EXS WSW 0629601 58 BOC CU05R
0	18781020 800 4231 29197 130623 10Panay 1...
1	187810201000 4233 29236 130623 10Panay 1...
2	187810201200 4235 29271 130623 10Panay 1...
3	187810201400 4237 29310 130623 10Panay 1...

Very messy to just read into python!

This is why we need the mdf_reader tool, to helps us put those imma files in a pandas.DataFrame format. For that we need need a schema.

A schema file gathers a collection of descriptors that enable the mdf_reader tool to access the content of a data model/ schema and extract the sections of the raw data file that contains meaningful information. These schema files are the bones of the data model, basically .json files outlining the structure of the incoming raw data.

The mdf_reader takes this information and translate the characteristics of the data to a python pandas dataframe.

The tool has several schema templates build in.

properties.supported_data_models

['craid', 'gdac', 'icoads', 'pub47']

Schemas can be designed to be deck specific like the example below

schema = "icoads_r300_d704"

data = read_mdf(data_path, imodel=schema)

2025-08-13 11:12:54,483 - root - INFO - READING DATA MODEL SCHEMA FILE...
2025-08-13 11:12:54,489 - root - INFO - EXTRACTING DATA FROM MODEL: icoads_r300_d704
2025-08-13 11:12:54,489 - root - INFO - Getting data string from source...
2025-08-13 11:12:54,491 - root - INFO - Reading with encoding = utf-8
2025-08-13 11:12:54,509 - root - INFO - Extracting and reading sections
2025-08-13 11:12:54,687 - root - WARNING - Data numeric elements with missing upper or lower threshold: ('c1', 'BSI'),('c1', 'AQZ'),('c1', 'AQA'),('c1', 'UQZ'),('c1', 'UQA'),('c1', 'VQZ'),('c1', 'VQA'),('c1', 'PQZ'),('c1', 'PQA'),('c1', 'DQZ'),('c1', 'DQA'),('c5', 'OS'),('c5', 'OP'),('c5', 'FM'),('c5', 'IMMV'),('c5', 'IX'),('c5', 'W2'),('c5', 'WMI'),('c5', 'SD2'),('c5', 'SP2'),('c5', 'IS'),('c5', 'RS'),('c5', 'IC1'),('c5', 'IC2'),('c5', 'IC3'),('c5', 'IC4'),('c5', 'IC5'),('c5', 'IR'),('c5', 'RRR'),('c5', 'TR'),('c5', 'NU'),('c5', 'QCI'),('c5', 'QI1'),('c5', 'QI2'),('c5', 'QI3'),('c5', 'QI4'),('c5', 'QI5'),('c5', 'QI6'),('c5', 'QI7'),('c5', 'QI8'),('c5', 'QI9'),('c5', 'QI10'),('c5', 'QI11'),('c5', 'QI12'),('c5', 'QI13'),('c5', 'QI14'),('c5', 'QI15'),('c5', 'QI16'),('c5', 'QI17'),('c5', 'QI18'),('c5', 'QI19'),('c5', 'QI20'),('c5', 'QI21'),('c5', 'QI22'),('c5', 'QI23'),('c5', 'QI24'),('c5', 'QI25'),('c5', 'QI26'),('c5', 'QI27'),('c5', 'QI28'),('c5', 'QI29'),('c5', 'RHI'),('c5', 'AWSI'),('c6', 'FBSRC'),('c6', 'MST'),('c7', 'OPM'),('c7', 'LOT'),('c9', 'CCe'),('c9', 'WWe'),('c9', 'Ne'),('c9', 'NHe'),('c9', 'He'),('c9', 'CLe'),('c9', 'CMe'),('c9', 'CHe'),('c9', 'SBI'),('c95', 'DPRO'),('c95', 'DPRP'),('c95', 'UFR'),('c95', 'ASIR'),('c96', 'ASII'),('c97', 'ASIE'),('c99_journal', 'vessel_length'),('c99_journal', 'vessel_beam'),('c99_journal', 'hold_depth'),('c99_journal', 'tonnage'),('c99_journal', 'baro_height'),('c99_daily', 'year'),('c99_daily', 'month'),('c99_daily', 'day'),('c99_daily', 'distance'),('c99_daily', 'lat_deg_an'),('c99_daily', 'lat_min_an'),('c99_daily', 'lon_deg_an'),('c99_daily', 'lon_min_an'),('c99_daily', 'lat_deg_on'),('c99_daily', 'lat_min_on'),('c99_daily', 'lon_deg_of'),('c99_daily', 'lon_min_of'),('c99_daily', 'current_speed'),('c99_data4', 'year'),('c99_data4', 'month'),('c99_data4', 'day'),('c99_data4', 'hour'),('c99_data4', 'ship_speed'),('c99_data4', 'compass_correction'),('c99_data4', 'attached_thermometer'),('c99_data4', 'air_temperature'),('c99_data4', 'wet_bulb_temperature'),('c99_data4', 'sea_temperature'),('c99_data4', 'sky_clear'),('c99_data5', 'year'),('c99_data5', 'month'),('c99_data5', 'day'),('c99_data5', 'hour'),('c99_data5', 'ship_speed'),('c99_data5', 'attached_thermometer'),('c99_data5', 'air_temperature'),('c99_data5', 'wet_bulb_temperature'),('c99_data5', 'sea_temperature'),('c99_data5', 'sky_clear'),('c99_data5', 'compass_correction')
2025-08-13 11:12:54,687 - root - WARNING - Corresponding upper and/or lower bounds set to +/-inf for validation
2025-08-13 11:12:55,183 - root - INFO - Create an output DataBundle object

A new schema can be build for a particular deck and source as shown in this notebook. The imma1_d704 schema was build upon the imma1 schema/data model but extra sections have been added to the .json files to include supplemental data from ICOADS documentation. This is a snapshot of the data inside the imma1_d704.json.

"c99_journal": {
            "header": {"sentinal": "1", "field_layout":"fixed_width","length": 117},
            "elements": {
              "sentinal":{
                  "description": "Journal header record identifier",
                  "field_length": 1,
                  "column_type": "str"
              },
              "reel_no":{
                  "description": "Microfilm reel number. See if we want the zero padding or not...",
                  "field_length": 3,
                  "column_type": "str",
                  "LMR6": true
              }
            ...

Now metadata information can be extracted as a component of the padas dataframe.

data.data.c99_journal

	sentinel	reel_no	journal_no	frame_no	ship_name	journal_ed	rig	ship_material	vessel_type	vessel_length	...	hold_depth	tonnage	baro_type	baro_height	baro_cdate	baro_loc	baro_units	baro_cor	thermo_mount	SST_I
0	1	002	0018	0003	Panay	78	01	1	1	187	...	23	1190	2	14	None	Bulkhead of cabin	1	- .102	2	None
1	1	002	0018	0003	Panay	78	01	1	1	187	...	23	1190	2	14	None	Bulkhead of cabin	1	- .102	2	None
2	1	002	0018	0003	Panay	78	01	1	1	187	...	23	1190	2	14	None	Bulkhead of cabin	1	- .102	2	None
3	1	002	0018	0003	Panay	78	01	1	1	187	...	23	1190	2	14	None	Bulkhead of cabin	1	- .102	2	None
4	1	002	0018	0003	Panay	78	01	1	1	187	...	23	1190	2	14	None	Bulkhead of cabin	1	- .102	2	None

5 rows × 24 columns

To learn how to construct a schema or data model for a particular deck/source, visit this other tutorial notebook

How to read meteorological data with read_mdf function

How to read meteorological data with read_mdf function#

How to read meteorological data with `read_mdf` function#