{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# How to read meteorological data with `read_mdf` function"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "from __future__ import annotations\n",
        "\n",
        "import pandas as pd\n",
        "\n",
        "from cdm_reader_mapper import properties, read_mdf, test_data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The `cdm_reader_mapper.read_mdf` function and is a tool designed to read data files compliant with a user specified [data\n",
        "model](https://cds.climate.copernicus.eu/toolbox/doc/how-to/15_how_to_understand_the_common_data_model/15_how_to_understand_the_common_data_model.html).\n",
        "\n",
        "It was developed with the initial idea to read the [IMMA](https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf) data format, but it was further enhanced to account for other meteorological data formats.\n",
        "\n",
        "Lets see an example for a typical file from [ICOADSv3.0.](https://icoads.noaa.gov/r3.html). We pick an specific monthly output for a Source/Deck. In this case data from the Marine Meterological Journals data set SID/DCK: **125-704 for Oct 1878.**\n",
        "\n",
        "The `.imma` file looks like this:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>18781020 600 4228 29159 130623  10Panay      12325123       9961                         4                   165 17128704125 5 0  1                1FF111F11AAA1AAAA1AAA     9815020N163002199 0 100200180003Panay                     78011118737S.P.Bray,Jr    013231190214        Bulkhead of cabin        1- .1022200200180014Boston              Rio de Janeiro      300200180014001518781020               4220N 6630W 10 E      400200180014001518781020102 85 EXS             WSW           0629601 58             BOC  CU05R</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>18781020 800 4231 29197 130623  10Panay      1...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>187810201000 4233 29236 130623  10Panay      1...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>187810201200 4235 29271 130623  10Panay      1...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>187810201400 4237 29310 130623  10Panay      1...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "  18781020 600 4228 29159 130623  10Panay      12325123       9961                         4                   165 17128704125 5 0  1                1FF111F11AAA1AAAA1AAA     9815020N163002199 0 100200180003Panay                     78011118737S.P.Bray,Jr    013231190214        Bulkhead of cabin        1- .1022200200180014Boston              Rio de Janeiro      300200180014001518781020               4220N 6630W 10 E      400200180014001518781020102 85 EXS             WSW           0629601 58             BOC  CU05R\n",
              "0  18781020 800 4231 29197 130623  10Panay      1...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
              "1  187810201000 4233 29236 130623  10Panay      1...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
              "2  187810201200 4235 29271 130623  10Panay      1...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \n",
              "3  187810201400 4237 29310 130623  10Panay      1...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   "
            ]
          },
          "execution_count": 2,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "data_path = test_data.test_icoads_r300_d704[\"source\"]\n",
        "\n",
        "data_ori = pd.read_table(data_path)\n",
        "data_ori.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Very messy to just read into python!\n",
        "\n",
        "This is why we need the `mdf_reader` tool, to helps us put those imma files in a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) format. For that we need need a **schema**.\n",
        "\n",
        "A **schema** file gathers a collection of descriptors that enable the `mdf_reader` tool to access the content\n",
        "of a `data model/ schema` and extract the sections of the raw data file that contains meaningful information. These **schema files** are the `bones` of the data model, basically `.json` files outlining the structure of the incoming raw data.\n",
        "\n",
        "The `mdf_reader` takes this information and translate the characteristics of the data to a python pandas dataframe.\n",
        "\n",
        "The tool has several **schema** templates build in."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "typing.Literal['craid', 'gdac', 'icoads', 'pub47', 'marob', 'cmems']"
            ]
          },
          "execution_count": 3,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "properties.SupportedDataModels"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**Schemas** can be designed to be deck specific like the example below"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "WARNING:root:Unknown column_type 'object' for column '('c8', 'PUID')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c95', 'ARCR')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c96', 'ARCI')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c97', 'ARCE')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c99_sentinel', 'BLK')'\n",
            "C:\\Users\\llierham\\mobaxterm\\github_ll\\cdm_reader_mapper\\src\\cdm_reader_mapper\\mdf_reader\\utils\\validators.py:240: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n",
            "  to_bool = data[validated_columns].applymap(convert_str_boolean)\n",
            "C:\\Users\\llierham\\mobaxterm\\github_ll\\cdm_reader_mapper\\src\\cdm_reader_mapper\\mdf_reader\\utils\\validators.py:241: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n",
            "  false_mask = to_bool.applymap(_is_false)\n",
            "C:\\Users\\llierham\\mobaxterm\\github_ll\\cdm_reader_mapper\\src\\cdm_reader_mapper\\mdf_reader\\utils\\validators.py:242: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n",
            "  true_mask = to_bool.applymap(_is_true)\n"
          ]
        }
      ],
      "source": [
        "schema = \"icoads_r300_d704\"\n",
        "\n",
        "data = read_mdf(data_path, imodel=schema)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "A new **schema** can be build for a particular deck and source as shown in this notebook. The `imma1_d704` schema was build upon the `imma1` schema/data model but extra sections have been added to the `.json` files to include supplemental data from ICOADS documentation. This is a snapshot of the data inside the `imma1_d704.json`.\n",
        "\n",
        "```\n",
        "\"c99_journal\": {\n",
        "            \"header\": {\"sentinal\": \"1\", \"field_layout\":\"fixed_width\",\"length\": 117},\n",
        "            \"elements\": {\n",
        "              \"sentinal\":{\n",
        "                  \"description\": \"Journal header record identifier\",\n",
        "                  \"field_length\": 1,\n",
        "                  \"column_type\": \"str\"\n",
        "              },\n",
        "              \"reel_no\":{\n",
        "                  \"description\": \"Microfilm reel number. See if we want the zero padding or not...\",\n",
        "                  \"field_length\": 3,\n",
        "                  \"column_type\": \"str\",\n",
        "                  \"LMR6\": true\n",
        "              }\n",
        "            ...\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now metadata information can be extracted as a component of the padas dataframe."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>sentinel</th>\n",
              "      <th>reel_no</th>\n",
              "      <th>journal_no</th>\n",
              "      <th>frame_no</th>\n",
              "      <th>ship_name</th>\n",
              "      <th>journal_ed</th>\n",
              "      <th>rig</th>\n",
              "      <th>ship_material</th>\n",
              "      <th>vessel_type</th>\n",
              "      <th>vessel_length</th>\n",
              "      <th>...</th>\n",
              "      <th>hold_depth</th>\n",
              "      <th>tonnage</th>\n",
              "      <th>baro_type</th>\n",
              "      <th>baro_height</th>\n",
              "      <th>baro_cdate</th>\n",
              "      <th>baro_loc</th>\n",
              "      <th>baro_units</th>\n",
              "      <th>baro_cor</th>\n",
              "      <th>thermo_mount</th>\n",
              "      <th>SST_I</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>002</td>\n",
              "      <td>0018</td>\n",
              "      <td>0003</td>\n",
              "      <td>Panay</td>\n",
              "      <td>78</td>\n",
              "      <td>01</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>187</td>\n",
              "      <td>...</td>\n",
              "      <td>23</td>\n",
              "      <td>1190</td>\n",
              "      <td>2</td>\n",
              "      <td>14</td>\n",
              "      <td>None</td>\n",
              "      <td>Bulkhead of cabin</td>\n",
              "      <td>1</td>\n",
              "      <td>- .102</td>\n",
              "      <td>2</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1</td>\n",
              "      <td>002</td>\n",
              "      <td>0018</td>\n",
              "      <td>0003</td>\n",
              "      <td>Panay</td>\n",
              "      <td>78</td>\n",
              "      <td>01</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>187</td>\n",
              "      <td>...</td>\n",
              "      <td>23</td>\n",
              "      <td>1190</td>\n",
              "      <td>2</td>\n",
              "      <td>14</td>\n",
              "      <td>None</td>\n",
              "      <td>Bulkhead of cabin</td>\n",
              "      <td>1</td>\n",
              "      <td>- .102</td>\n",
              "      <td>2</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>1</td>\n",
              "      <td>002</td>\n",
              "      <td>0018</td>\n",
              "      <td>0003</td>\n",
              "      <td>Panay</td>\n",
              "      <td>78</td>\n",
              "      <td>01</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>187</td>\n",
              "      <td>...</td>\n",
              "      <td>23</td>\n",
              "      <td>1190</td>\n",
              "      <td>2</td>\n",
              "      <td>14</td>\n",
              "      <td>None</td>\n",
              "      <td>Bulkhead of cabin</td>\n",
              "      <td>1</td>\n",
              "      <td>- .102</td>\n",
              "      <td>2</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>1</td>\n",
              "      <td>002</td>\n",
              "      <td>0018</td>\n",
              "      <td>0003</td>\n",
              "      <td>Panay</td>\n",
              "      <td>78</td>\n",
              "      <td>01</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>187</td>\n",
              "      <td>...</td>\n",
              "      <td>23</td>\n",
              "      <td>1190</td>\n",
              "      <td>2</td>\n",
              "      <td>14</td>\n",
              "      <td>None</td>\n",
              "      <td>Bulkhead of cabin</td>\n",
              "      <td>1</td>\n",
              "      <td>- .102</td>\n",
              "      <td>2</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>1</td>\n",
              "      <td>002</td>\n",
              "      <td>0018</td>\n",
              "      <td>0003</td>\n",
              "      <td>Panay</td>\n",
              "      <td>78</td>\n",
              "      <td>01</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>187</td>\n",
              "      <td>...</td>\n",
              "      <td>23</td>\n",
              "      <td>1190</td>\n",
              "      <td>2</td>\n",
              "      <td>14</td>\n",
              "      <td>None</td>\n",
              "      <td>Bulkhead of cabin</td>\n",
              "      <td>1</td>\n",
              "      <td>- .102</td>\n",
              "      <td>2</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>5 rows × 24 columns</p>\n",
              "</div>"
            ],
            "text/plain": [
              "  sentinel reel_no journal_no frame_no ship_name journal_ed rig ship_material  \\\n",
              "0        1     002       0018     0003     Panay         78  01             1   \n",
              "1        1     002       0018     0003     Panay         78  01             1   \n",
              "2        1     002       0018     0003     Panay         78  01             1   \n",
              "3        1     002       0018     0003     Panay         78  01             1   \n",
              "4        1     002       0018     0003     Panay         78  01             1   \n",
              "\n",
              "  vessel_type  vessel_length  ...  hold_depth tonnage baro_type baro_height  \\\n",
              "0           1            187  ...          23    1190         2          14   \n",
              "1           1            187  ...          23    1190         2          14   \n",
              "2           1            187  ...          23    1190         2          14   \n",
              "3           1            187  ...          23    1190         2          14   \n",
              "4           1            187  ...          23    1190         2          14   \n",
              "\n",
              "   baro_cdate           baro_loc baro_units  baro_cor thermo_mount SST_I  \n",
              "0        None  Bulkhead of cabin          1    - .102            2  None  \n",
              "1        None  Bulkhead of cabin          1    - .102            2  None  \n",
              "2        None  Bulkhead of cabin          1    - .102            2  None  \n",
              "3        None  Bulkhead of cabin          1    - .102            2  None  \n",
              "4        None  Bulkhead of cabin          1    - .102            2  None  \n",
              "\n",
              "[5 rows x 24 columns]"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "data.data.c99_journal"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To learn how to construct a schema or data model for a particular deck/source, visit this other [tutorial notebook](https://github.com/glamod/cdm_reader_mapper/blob/main/docs/example_notebooks/CLIWOC_datamodel.ipynb)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.13.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}