{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Generating a data model for CLIWOC\n",
        "\n",
        "The purpose of this notebook is to demonstrate the structure of data models used by the `cdm_reader_mapper` toolbox.\n",
        "\n",
        "## ICOADS IMMA\n",
        "\n",
        "A common format for marine observational records is the ICOADS IMMA format. This is a text format, where each line contains the data (including metadata) for an individual record. The format is _attachment_ based, each record is constructed from a selection of (typically) fixed-width sections (called attachments) containing different subsets of the data or metadata associated with the record. Documentation on the format, and the available attachments can be found at [https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf](https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf).\n",
        "\n",
        "Records within the same file can contain different attachments, meaning that the IMMA format is not a fixed-width format, as line lengths will vary between records. Each record, however, must contain a certain subset of the attachments (in this case the `core` (or `c0`), `c1`, and `c98` attachments). \n",
        "\n",
        "## Supplementary Data\n",
        "\n",
        "Additional data or metadata can be provided in the `c99` attachment. This attachment is not fixed-width as different sources or decks can provide different collections of supplementary data.\n",
        "\n",
        "## CLIWOC\n",
        "\n",
        "In this example we use a subset of ICOADS release 3.0.0 IMMA formatted data for deck 730, which is data from the Climatological Database for the World's Oceans (CLIWOC). There is a large amount of supplementary data available in the `c99` attachment, which for deck 730 can be split into multiple sections. Here, we will start with the standard schema for the ICOADS IMMA format (included in `cdm_reader_mapper` as the `\"icoads\"` `imodel`), and extend the schema with fields for a subset of the `c99` attachment. We will add fields for the _logbook_ section of the `c99` attachment for this deck.\n",
        "\n",
        "An internal schema already exists for this deck (`\"icoads_r300_d730\"`), the purpose of this notebook is to demonstrate how one can extend the `\"icoads\"` data model to parse `c99` data.\n",
        "\n",
        "## Overview\n",
        "\n",
        "* An initial read of the data subset using the `\"icoads\"` data model which does not parse the `c99` attachment.\n",
        "* Extension of the `\"icoads\"` schema to add fields for the logbook section of the `c99` attachment for deck 730.\n",
        "* Construction of a code table for a categorical field in the `c99` attachment.\n",
        "* Comparison with the internal schema for deck 730."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "from __future__ import annotations\n",
        "import json\n",
        "import shutil\n",
        "import warnings\n",
        "\n",
        "import pandas as pd\n",
        "\n",
        "from cdm_reader_mapper import read_mdf, test_data\n",
        "from cdm_reader_mapper.mdf_reader.properties import _base as base\n",
        "\n",
        "\n",
        "try:\n",
        "    from importlib.resources import files as get_files\n",
        "except ImportError:\n",
        "    from importlib_resources import files as get_files\n",
        "\n",
        "import pathlib\n",
        "from collections import OrderedDict\n",
        "from tempfile import TemporaryDirectory\n",
        "\n",
        "\n",
        "warnings.filterwarnings(\"ignore\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## The Data\n",
        "\n",
        "For this example we load a subset of ICOADS data for deck 730 from the `cdm_reader_mapper` test data. This is the data that will be used throughout this notebook."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_file_path = test_data.test_icoads_r300_d730[\"source\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initial Read\n",
        "\n",
        "First we read the data using the basic `\"icoads\"` data model. This isn't necessary for extending the schema, it is to highlight the raw `c99` data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "WARNING:root:Unknown column_type 'object' for column '('c8', 'PUID')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c95', 'ARCR')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c96', 'ARCI')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c97', 'ARCE')'\n"
          ]
        }
      ],
      "source": [
        "data_bundle = read_mdf(data_file_path, imodel=\"icoads\")\n",
        "data_raw = data_bundle.data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Supplementary (`c99`) data\n",
        "\n",
        "By looking at the `c99` section we can see that the supplementary data has not been parsed."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "0    99 0 AGI     ARCHIVO GENERAL DE INDIAS        ...\n",
              "1    99 0 CARAN   CENTRE D'ACCUEIL ET DE RECHERCHE ...\n",
              "2    99 0 RAZ     RIJKSARCHIEF ZEELAND             ...\n",
              "3    99 0 NMM     NATIONAL MARITIME MUSEUM         ...\n",
              "4    99 0 AGI     ARCHIVO GENERAL DE INDIAS        ...\n",
              "Name: c99, dtype: object"
            ]
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "data_raw[\"c99\"].head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "'99 0 NMM     NATIONAL MARITIME MUSEUM                          GREENWICH UNITED KINGDOM                                                                                                      NMM ADM/L/R13                 ENGLISH                       0492500N 405000E                1 1BERMUDA                                    LIZARD                                            N87:17E            230                                                                                                                                                0 21771100112TUESDAY                   12             RAINBOW                       BRITISH 5TH RATE       RN                                THOMAS COLLINGWOOD            CAPTAIN                  CHARLES WARREN                2ND OFFICER/LIEUTENANT                                                          BERMUDA                                      SPITHEAD                                          0                                                        17710829S25E                  39.00                                                                                                        UNKNOWN        UNKNOWN         -22                    LEAGUESNM     180 DEGREES                                                                                                                                                                                                 ESE, E                                                                                                                                                                         FRESH GALES AND SQUALLY                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             00000000CLIWOC VERSION 2.0'"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "data_raw[\"c99\"].iloc[3]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Creating a data model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Custom Schema\n",
        "\n",
        "To use a custom schema we need to use the `ext_schema_path` argument in `read_mdf`. The structure of the directory is:\n",
        "\n",
        "```\n",
        "name_of_model/\n",
        "    name_of_model.json\n",
        "    code_tables/\n",
        "        ...\n",
        "```\n",
        "\n",
        "The `code_tables` sub-directory contains the code tables that map the key columns in the data to their values.\n",
        "\n",
        "In this example we create a temporary directory for the data model, so that it is cleaned up after the notebook is finished; in reality you would want to store the data model in a permanent directory!\n",
        "\n",
        "We start from the basic `\"icoads\"` model. The `c99` section will be based on the `\"icoads_r300_d730\"` schema and code tables.\n",
        "\n",
        "#### Copy the `\"icoads\"` schema\n",
        "\n",
        "First we create a copy of the `\"icoads\"` schema (located at `mdf_reader/schemas/icoads/icoads.json`). NOTE: `cdm_reader_mapper.mdf_reader.properties._base` is used so that we have a relative path to the original schema and code tables."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [],
      "source": [
        "tmp_dir = TemporaryDirectory()\n",
        "my_model_name = \"cliwoc\"\n",
        "my_model_path = pathlib.Path(tmp_dir.name) / my_model_name\n",
        "my_model_path.mkdir(exist_ok=True)\n",
        "\n",
        "# Get a copy of the \"imma1\" schema\n",
        "icoads_schema_path = icoads_code_tables_path = get_files(f\"{base}.schemas.icoads\")\n",
        "icoads_schema_path = pathlib.Path(icoads_schema_path) / \"icoads.json\"\n",
        "\n",
        "my_schema_path = my_model_path / (my_model_name + \".json\")\n",
        "copy = shutil.copyfile(icoads_schema_path, my_schema_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Copy the code tables\n",
        "\n",
        "We now copy each of the `\"icoads\"` code tables. This includes generic `icoads` code tables (located in `mdf_reader/codes/icoads`)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Get code tables and copy to the directory\n",
        "my_code_tables_path = my_model_path / \"code_tables\"\n",
        "my_code_tables_path.mkdir(exist_ok=True)\n",
        "\n",
        "# Original code table directories (general ICOADS and Deck specific)\n",
        "icoads_code_tables_path = get_files(f\"{base}.codes.icoads\")\n",
        "\n",
        "# Get filenames for each of the code tables\n",
        "code_table_files = list(icoads_code_tables_path.glob(\"ICOADS.*.json\"))\n",
        "\n",
        "# Copy each file\n",
        "for file in code_table_files:\n",
        "    basename = pathlib.Path(file).name\n",
        "    out_path = my_code_tables_path / basename\n",
        "    shutil.copyfile(file, out_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Extending the schema: CLIWOC logbook information\n",
        "\n",
        "For this example we'll load the schema into the environment as a dictionary (we use an ordered dictionary to guarantee that the ordering of the fields is maintained!)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {},
      "outputs": [],
      "source": [
        "with pathlib.Path(my_schema_path).open() as io:\n",
        "    schema = json.load(io, object_pairs_hook=OrderedDict)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now add the contents for section `c99`. There are some standard (\"header\"_ fields we need to supply. The `\"sentinal\"` is the prefix for the attachment, this is printed in the raw supplementary data and identifies the start of the attachment.\n",
        "\n",
        "We also need to specify the length of the attachment and the layout.\n",
        "\n",
        "We then add our data fields to the `elements` field for the `c99` section. We'll add the fields for the logbook component of the supplementary data for CLIWOC data, there are additional components we can resolve but we'll keep it to the logbook for this example."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {},
      "outputs": [],
      "source": [
        "schema[\"sections\"][\"c99\"][\"header\"][\"sentinal\"] = \"99 0 \"\n",
        "schema[\"sections\"][\"c99\"][\"header\"][\"disable_read\"] = False\n",
        "schema[\"sections\"][\"c99\"][\"header\"][\"field_layout\"] = \"fixed_width\"\n",
        "schema[\"sections\"][\"c99\"][\"header\"][\"length\"] = 245 + 5  # Sentinal length\n",
        "schema[\"sections\"][\"c99\"][\"elements\"] = OrderedDict(\n",
        "    {\n",
        "        \"sentinal\": {\n",
        "            \"description\": \"attachment sentinal\",\n",
        "            \"field_length\": 5,\n",
        "            \"column_type\": \"str\",\n",
        "            \"ignore\": True,\n",
        "        },\n",
        "        \"InstAbbr\": {\n",
        "            \"description\": \"Abbreviation of the Institute storing the original data\",\n",
        "            \"field_length\": 8,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"InstName\": {\n",
        "            \"description\": \"Full name of the Institute storing the original data\",\n",
        "            \"field_length\": 50,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"InstCity\": {\n",
        "            \"description\": \"City where the Institute storing the data is located\",\n",
        "            \"field_length\": 10,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"InstCountry\": {\n",
        "            \"description\": \"Country where the Institute storing the data is located\",\n",
        "            \"field_length\": 14,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"ArchiveID\": {\n",
        "            \"description\": \"Administrative number under which the data is found within the Institute storing the data\",\n",
        "            \"field_length\": 15,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"ArchiveName\": {\n",
        "            \"description\": \"Administrative name under which the data is found within the Institute storing the data\",\n",
        "            \"field_length\": 17,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"ArchivePart\": {\n",
        "            \"description\": \"Part of the archive set in which the data is found within the Institute storing the data\",\n",
        "            \"field_length\": 39,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"ArchivePartSpec\": {\n",
        "            \"description\": \"Specification of the part of the archive set in which the data is found within the Institute storing the data\",\n",
        "            \"field_length\": 31,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"LogbookID\": {\n",
        "            \"description\": \"Identificaion Number of the logbook containing the data\",\n",
        "            \"field_length\": 30,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"LogbookLang\": {\n",
        "            \"description\": \"Language of the logbook containing the data\",\n",
        "            \"field_length\": 7,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"ImageID\": {\n",
        "            \"description\": \"Identificaion Number of the original image of the logbook\",\n",
        "            \"field_length\": 23,\n",
        "            \"column_type\": \"str\",\n",
        "        },\n",
        "        \"IllustrationAvail\": {\n",
        "            \"description\": \"Illustration available on the current page of the logbook\",\n",
        "            \"field_length\": 1,\n",
        "            \"column_type\": \"key\",\n",
        "            \"codetable\": \"CLIWOC_ILLUSTRATION_I\",\n",
        "        },\n",
        "    }\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can now write the dictionary to the schema file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {},
      "outputs": [],
      "source": [
        "json_object = json.dumps(schema, indent=2)\n",
        "\n",
        "with pathlib.Path(my_schema_path).open(\"w\") as outfile:\n",
        "    outfile.write(json_object)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### `ImageAvail` Code Table\n",
        "\n",
        "One of the fields we have added has `\"column_type\"` of `\"key\"`. This is used to indicate categorical data, where the key value maps to a larger descriptive value. We also specified a code table for this field, which should describe that mapping. Let's create that table now. As with the schema it should be json formatted.\n",
        "\n",
        "For this field, we have two possible values. We save the dictionary to a json file in the code_tables directory, the name of the file must match the `\"codetable\"` value for the field (plus the `\".json\"` extension)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {},
      "outputs": [],
      "source": [
        "illustration_avail_codes = {\n",
        "    \"0\": \"No illustration on the current logbook page.\",\n",
        "    \"1\": \"Illustration available on the current logbook page.\",\n",
        "}\n",
        "illustration_avail_path = my_code_tables_path / \"CLIWOC_ILLUSTRATION_I.json\"\n",
        "\n",
        "json_object = json.dumps(illustration_avail_codes, indent=2)\n",
        "\n",
        "with pathlib.Path(illustration_avail_path).open(\"w\") as outfile:\n",
        "    outfile.write(json_object)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Reading\n",
        "\n",
        "We can now read the data file with the schema we have just created (copied...). We specify the path to the data model (the directory containing the schema json file) and the path to the code tables."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "ERROR:root:imodel is not defined.\n"
          ]
        }
      ],
      "source": [
        "my_bundle = read_mdf(\n",
        "    data_file_path,  # Path to the data file\n",
        "    ext_schema_path=my_model_path,  # Path to the directory containing the schema json file\n",
        "    ext_table_path=my_code_tables_path,  # Path to the directory containing the json code tables\n",
        ")\n",
        "my_data = my_bundle.data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Analysing the output\n",
        "\n",
        "We can now investigate components of the c99 section."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead tr th {\n",
              "        text-align: left;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th colspan=\"12\" halign=\"left\">c99</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th>InstAbbr</th>\n",
              "      <th>InstName</th>\n",
              "      <th>InstCity</th>\n",
              "      <th>InstCountry</th>\n",
              "      <th>ArchiveID</th>\n",
              "      <th>ArchiveName</th>\n",
              "      <th>ArchivePart</th>\n",
              "      <th>ArchivePartSpec</th>\n",
              "      <th>LogbookID</th>\n",
              "      <th>LogbookLang</th>\n",
              "      <th>ImageID</th>\n",
              "      <th>IllustrationAvail</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>AGI</td>\n",
              "      <td>ARCHIVO GENERAL DE INDIAS</td>\n",
              "      <td>SEVILLE</td>\n",
              "      <td>SPAIN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>None</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>CORREOS, 275A R11</td>\n",
              "      <td>SPANISH</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>CARAN</td>\n",
              "      <td>CENTRE D'ACCUEIL ET DE RECHERCHE DES ARCH. NAT...</td>\n",
              "      <td>PARIS</td>\n",
              "      <td>FRANCE</td>\n",
              "      <td>NaN</td>\n",
              "      <td>None</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>COTE - 4/JJ/39</td>\n",
              "      <td>FRENCH</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>RAZ</td>\n",
              "      <td>RIJKSARCHIEF ZEELAND</td>\n",
              "      <td>MIDDELBURG</td>\n",
              "      <td>NEDERLAND</td>\n",
              "      <td>20</td>\n",
              "      <td>None</td>\n",
              "      <td>MCC</td>\n",
              "      <td>1391</td>\n",
              "      <td>MCC_20_1391</td>\n",
              "      <td>DUTCH</td>\n",
              "      <td>MCC_20_1391_0032</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>NMM</td>\n",
              "      <td>NATIONAL MARITIME MUSEUM</td>\n",
              "      <td>GREENWICH</td>\n",
              "      <td>UNITED KINGDOM</td>\n",
              "      <td>NaN</td>\n",
              "      <td>None</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NMM ADM/L/R13</td>\n",
              "      <td>ENGLISH</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>AGI</td>\n",
              "      <td>ARCHIVO GENERAL DE INDIAS</td>\n",
              "      <td>SEVILLE</td>\n",
              "      <td>SPAIN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>None</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>CORREOS, 193B R3</td>\n",
              "      <td>SPANISH</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       c99                                                                 \\\n",
              "  InstAbbr                                           InstName    InstCity   \n",
              "0      AGI                          ARCHIVO GENERAL DE INDIAS     SEVILLE   \n",
              "1    CARAN  CENTRE D'ACCUEIL ET DE RECHERCHE DES ARCH. NAT...       PARIS   \n",
              "2      RAZ                               RIJKSARCHIEF ZEELAND  MIDDELBURG   \n",
              "3      NMM                           NATIONAL MARITIME MUSEUM   GREENWICH   \n",
              "4      AGI                          ARCHIVO GENERAL DE INDIAS     SEVILLE   \n",
              "\n",
              "                                                                     \\\n",
              "      InstCountry ArchiveID ArchiveName ArchivePart ArchivePartSpec   \n",
              "0           SPAIN       NaN        None         NaN             NaN   \n",
              "1          FRANCE       NaN        None         NaN             NaN   \n",
              "2       NEDERLAND        20        None         MCC            1391   \n",
              "3  UNITED KINGDOM       NaN        None         NaN             NaN   \n",
              "4           SPAIN       NaN        None         NaN             NaN   \n",
              "\n",
              "                                                                      \n",
              "           LogbookID LogbookLang           ImageID IllustrationAvail  \n",
              "0  CORREOS, 275A R11     SPANISH               NaN                 0  \n",
              "1     COTE - 4/JJ/39      FRENCH               NaN                 0  \n",
              "2        MCC_20_1391       DUTCH  MCC_20_1391_0032                 0  \n",
              "3      NMM ADM/L/R13     ENGLISH               NaN                 0  \n",
              "4   CORREOS, 193B R3     SPANISH               NaN                 0  "
            ]
          },
          "execution_count": 13,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "my_data[[\"c99\"]].head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead tr th {\n",
              "        text-align: left;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th colspan=\"12\" halign=\"left\">c99</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th>InstAbbr</th>\n",
              "      <th>InstName</th>\n",
              "      <th>InstCity</th>\n",
              "      <th>InstCountry</th>\n",
              "      <th>ArchiveID</th>\n",
              "      <th>ArchiveName</th>\n",
              "      <th>ArchivePart</th>\n",
              "      <th>ArchivePartSpec</th>\n",
              "      <th>LogbookID</th>\n",
              "      <th>LogbookLang</th>\n",
              "      <th>ImageID</th>\n",
              "      <th>IllustrationAvail</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>unique</th>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>top</th>\n",
              "      <td>AGI</td>\n",
              "      <td>ARCHIVO GENERAL DE INDIAS</td>\n",
              "      <td>SEVILLE</td>\n",
              "      <td>SPAIN</td>\n",
              "      <td>20</td>\n",
              "      <td>NaN</td>\n",
              "      <td>MCC</td>\n",
              "      <td>1391</td>\n",
              "      <td>CORREOS, 275A R11</td>\n",
              "      <td>SPANISH</td>\n",
              "      <td>MCC_20_1391_0032</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>freq</th>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "            c99                                                            \\\n",
              "       InstAbbr                   InstName InstCity InstCountry ArchiveID   \n",
              "count         5                          5        5           5         1   \n",
              "unique        4                          4        4           4         1   \n",
              "top         AGI  ARCHIVO GENERAL DE INDIAS  SEVILLE       SPAIN        20   \n",
              "freq          2                          2        2           2         1   \n",
              "\n",
              "                                                                               \\\n",
              "       ArchiveName ArchivePart ArchivePartSpec          LogbookID LogbookLang   \n",
              "count            0           1               1                  5           5   \n",
              "unique           0           1               1                  5           4   \n",
              "top            NaN         MCC            1391  CORREOS, 275A R11     SPANISH   \n",
              "freq           NaN           1               1                  1           2   \n",
              "\n",
              "                                            \n",
              "                 ImageID IllustrationAvail  \n",
              "count                  1                 5  \n",
              "unique                 1                 1  \n",
              "top     MCC_20_1391_0032                 0  \n",
              "freq                   1                 5  "
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "my_data[[\"c99\"]].describe(include=\"all\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Internal Schema\n",
        "\n",
        "`cdm_reader_mapper` already includes a data model for the CLIWOC deck. The model parses all sections of supplementary data and provides all required code tables. Let's now read in the data using the `\"icoads_r300_d730\"` model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "WARNING:root:Unknown column_type 'object' for column '('c8', 'PUID')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c95', 'ARCR')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c96', 'ARCI')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c97', 'ARCE')'\n",
            "WARNING:root:Unknown column_type 'object' for column '('c99_sentinel', 'BLK')'\n"
          ]
        }
      ],
      "source": [
        "all_data = read_mdf(\n",
        "    data_file_path,\n",
        "    imodel=\"icoads_r300_d730\",\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The `c99` section has been split into multiple sections. There is no `c99` section in the output, however we now have:\n",
        "\n",
        "* `c99_logbook`\n",
        "* `c99_voyage`\n",
        "* `c99_data`\n",
        "\n",
        "We can compare the `c99_logbook` section to the output of our model. We see that we have extracted the same data, although we chose different column names for the elements."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead tr th {\n",
              "        text-align: left;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th colspan=\"12\" halign=\"left\">c99_logbook</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th>InstAbbr</th>\n",
              "      <th>InstName</th>\n",
              "      <th>InstPlace</th>\n",
              "      <th>InstLand</th>\n",
              "      <th>NumArchiveSet</th>\n",
              "      <th>NameArchiveSet</th>\n",
              "      <th>ArchivePart</th>\n",
              "      <th>Specification</th>\n",
              "      <th>Logbook_id</th>\n",
              "      <th>Logbook_language</th>\n",
              "      <th>Image_No</th>\n",
              "      <th>Illustr</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>unique</th>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>top</th>\n",
              "      <td>AGI</td>\n",
              "      <td>ARCHIVO GENERAL DE INDIAS</td>\n",
              "      <td>SEVILLE</td>\n",
              "      <td>SPAIN</td>\n",
              "      <td>20</td>\n",
              "      <td>NaN</td>\n",
              "      <td>MCC</td>\n",
              "      <td>1391</td>\n",
              "      <td>CORREOS, 275A R11</td>\n",
              "      <td>SPANISH</td>\n",
              "      <td>MCC_20_1391_0032</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>freq</th>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       c99_logbook                                                \\\n",
              "          InstAbbr                   InstName InstPlace InstLand   \n",
              "count            5                          5         5        5   \n",
              "unique           4                          4         4        4   \n",
              "top            AGI  ARCHIVO GENERAL DE INDIAS   SEVILLE    SPAIN   \n",
              "freq             2                          2         2        2   \n",
              "\n",
              "                                                               \\\n",
              "       NumArchiveSet NameArchiveSet ArchivePart Specification   \n",
              "count              1              0           1             1   \n",
              "unique             1              0           1             1   \n",
              "top               20            NaN         MCC          1391   \n",
              "freq               1            NaN           1             1   \n",
              "\n",
              "                                                                      \n",
              "               Logbook_id Logbook_language          Image_No Illustr  \n",
              "count                   5                5                 1       5  \n",
              "unique                  5                4                 1       1  \n",
              "top     CORREOS, 275A R11          SPANISH  MCC_20_1391_0032       0  \n",
              "freq                    1                2                 1       5  "
            ]
          },
          "execution_count": 16,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "all_data.data[[\"c99_logbook\"]].describe(include=\"all\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead tr th {\n",
              "        text-align: left;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th colspan=\"12\" halign=\"left\">c99</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th>InstAbbr</th>\n",
              "      <th>InstName</th>\n",
              "      <th>InstCity</th>\n",
              "      <th>InstCountry</th>\n",
              "      <th>ArchiveID</th>\n",
              "      <th>ArchiveName</th>\n",
              "      <th>ArchivePart</th>\n",
              "      <th>ArchivePartSpec</th>\n",
              "      <th>LogbookID</th>\n",
              "      <th>LogbookLang</th>\n",
              "      <th>ImageID</th>\n",
              "      <th>IllustrationAvail</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>unique</th>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>top</th>\n",
              "      <td>AGI</td>\n",
              "      <td>ARCHIVO GENERAL DE INDIAS</td>\n",
              "      <td>SEVILLE</td>\n",
              "      <td>SPAIN</td>\n",
              "      <td>20</td>\n",
              "      <td>NaN</td>\n",
              "      <td>MCC</td>\n",
              "      <td>1391</td>\n",
              "      <td>CORREOS, 275A R11</td>\n",
              "      <td>SPANISH</td>\n",
              "      <td>MCC_20_1391_0032</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>freq</th>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "            c99                                                            \\\n",
              "       InstAbbr                   InstName InstCity InstCountry ArchiveID   \n",
              "count         5                          5        5           5         1   \n",
              "unique        4                          4        4           4         1   \n",
              "top         AGI  ARCHIVO GENERAL DE INDIAS  SEVILLE       SPAIN        20   \n",
              "freq          2                          2        2           2         1   \n",
              "\n",
              "                                                                               \\\n",
              "       ArchiveName ArchivePart ArchivePartSpec          LogbookID LogbookLang   \n",
              "count            0           1               1                  5           5   \n",
              "unique           0           1               1                  5           4   \n",
              "top            NaN         MCC            1391  CORREOS, 275A R11     SPANISH   \n",
              "freq           NaN           1               1                  1           2   \n",
              "\n",
              "                                            \n",
              "                 ImageID IllustrationAvail  \n",
              "count                  1                 5  \n",
              "unique                 1                 1  \n",
              "top     MCC_20_1391_0032                 0  \n",
              "freq                   1                 5  "
            ]
          },
          "execution_count": 17,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "my_data[[\"c99\"]].describe(include=\"all\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Additional Sections\n",
        "\n",
        "We can also look at the additional components we did not parse in our model.\n",
        "\n",
        "We can note some remaining issues with the model as we look at the extra data. Most of the challenges relate to language translations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead tr th {\n",
              "        text-align: left;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th colspan=\"58\" halign=\"left\">c99_voyage</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th></th>\n",
              "      <th>drLatDeg</th>\n",
              "      <th>drLatMin</th>\n",
              "      <th>drLatSec</th>\n",
              "      <th>drLatHem</th>\n",
              "      <th>drLonDeg</th>\n",
              "      <th>drLonMin</th>\n",
              "      <th>drLonSec</th>\n",
              "      <th>drLonHem</th>\n",
              "      <th>LatDeg</th>\n",
              "      <th>LatMin</th>\n",
              "      <th>LatSec</th>\n",
              "      <th>LatHem</th>\n",
              "      <th>LonDeg</th>\n",
              "      <th>LonMin</th>\n",
              "      <th>LonSec</th>\n",
              "      <th>LonHem</th>\n",
              "      <th>LatInd</th>\n",
              "      <th>LonInd</th>\n",
              "      <th>ZeroMeridian</th>\n",
              "      <th>LMname1</th>\n",
              "      <th>LMdirection1</th>\n",
              "      <th>LMdistance1</th>\n",
              "      <th>LMname2</th>\n",
              "      <th>LMdirection2</th>\n",
              "      <th>LMdistance2</th>\n",
              "      <th>LMname3</th>\n",
              "      <th>LMdirection3</th>\n",
              "      <th>LMdistance4</th>\n",
              "      <th>PosCoastal</th>\n",
              "      <th>Calendar_type</th>\n",
              "      <th>logbook_date</th>\n",
              "      <th>TimeOB</th>\n",
              "      <th>Day_of_the_week</th>\n",
              "      <th>PartDay</th>\n",
              "      <th>Watch</th>\n",
              "      <th>Glasses</th>\n",
              "      <th>Start_day</th>\n",
              "      <th>ShipName</th>\n",
              "      <th>Nationality</th>\n",
              "      <th>Ship_type</th>\n",
              "      <th>Company</th>\n",
              "      <th>Name1</th>\n",
              "      <th>Rank1</th>\n",
              "      <th>Name2</th>\n",
              "      <th>Rank2</th>\n",
              "      <th>Name3</th>\n",
              "      <th>Rank3</th>\n",
              "      <th>voyage_from</th>\n",
              "      <th>voyage_to</th>\n",
              "      <th>Anchored_ind</th>\n",
              "      <th>AnchorPlace</th>\n",
              "      <th>DASno</th>\n",
              "      <th>VoyageIni</th>\n",
              "      <th>Course_ship</th>\n",
              "      <th>Ship_speed</th>\n",
              "      <th>Distance</th>\n",
              "      <th>EncName</th>\n",
              "      <th>EncNat</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>4.000000</td>\n",
              "      <td>4.000000</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4</td>\n",
              "      <td>2.000000</td>\n",
              "      <td>2.00000</td>\n",
              "      <td>2.0</td>\n",
              "      <td>2</td>\n",
              "      <td>2.000000</td>\n",
              "      <td>2.000000</td>\n",
              "      <td>2.0</td>\n",
              "      <td>2</td>\n",
              "      <td>3.000000</td>\n",
              "      <td>3.000000</td>\n",
              "      <td>3.0</td>\n",
              "      <td>3</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1.0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1.0</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>2</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>4</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>unique</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>2</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>2</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>&lt;NA&gt;</td>\n",
              "      <td>2</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>2</td>\n",
              "      <td>4</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>4</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>top</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>N</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>E</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>N</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>E</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>TENERIFE</td>\n",
              "      <td>LIZARD</td>\n",
              "      <td>N87:17E</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>17711001</td>\n",
              "      <td>NaN</td>\n",
              "      <td>TUESDAY</td>\n",
              "      <td>3</td>\n",
              "      <td>VM</td>\n",
              "      <td>&lt;NA&gt;</td>\n",
              "      <td>UNKNOWN</td>\n",
              "      <td>EL COLON</td>\n",
              "      <td>SPANISH</td>\n",
              "      <td>PAQUEBOTE</td>\n",
              "      <td>MCC</td>\n",
              "      <td>THOMAS D'ORVES</td>\n",
              "      <td>CAPITAN</td>\n",
              "      <td>CHARLES WARREN</td>\n",
              "      <td>2ND OFFICER/LIEUTENANT</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>LA HABANA</td>\n",
              "      <td>LA CORUÑA</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>17710819</td>\n",
              "      <td>WTZ</td>\n",
              "      <td>NaN</td>\n",
              "      <td>175.00</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>freq</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>3</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>2</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>2</td>\n",
              "      <td>3</td>\n",
              "      <td>2</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>&lt;NA&gt;</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>5</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>mean</th>\n",
              "      <td>27.250000</td>\n",
              "      <td>24.250000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>26.500000</td>\n",
              "      <td>36.00000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>22.000000</td>\n",
              "      <td>9.500000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>121.666667</td>\n",
              "      <td>42.666667</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>230.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>12.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>8.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>std</th>\n",
              "      <td>21.884165</td>\n",
              "      <td>16.879475</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>19.091883</td>\n",
              "      <td>19.79899</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>29.698485</td>\n",
              "      <td>13.435029</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>195.208436</td>\n",
              "      <td>11.239810</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>&lt;NA&gt;</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>min</th>\n",
              "      <td>1.000000</td>\n",
              "      <td>5.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>13.000000</td>\n",
              "      <td>22.00000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>4.000000</td>\n",
              "      <td>33.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>230.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>12.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>8.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25%</th>\n",
              "      <td>13.750000</td>\n",
              "      <td>17.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>19.750000</td>\n",
              "      <td>29.00000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>11.500000</td>\n",
              "      <td>4.750000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>9.000000</td>\n",
              "      <td>36.500000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>230.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>12.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>8.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>50%</th>\n",
              "      <td>29.500000</td>\n",
              "      <td>23.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>26.500000</td>\n",
              "      <td>36.00000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>22.000000</td>\n",
              "      <td>9.500000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>14.000000</td>\n",
              "      <td>40.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>230.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>12.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>8.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>75%</th>\n",
              "      <td>43.000000</td>\n",
              "      <td>30.250000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>33.250000</td>\n",
              "      <td>43.00000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>32.500000</td>\n",
              "      <td>14.250000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>180.500000</td>\n",
              "      <td>47.500000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>230.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>12.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>8.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>max</th>\n",
              "      <td>49.000000</td>\n",
              "      <td>46.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>40.000000</td>\n",
              "      <td>50.00000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>43.000000</td>\n",
              "      <td>19.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>347.000000</td>\n",
              "      <td>55.000000</td>\n",
              "      <td>0.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>230.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>12.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>8.0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       c99_voyage                                                             \\\n",
              "         drLatDeg   drLatMin drLatSec drLatHem   drLonDeg  drLonMin drLonSec   \n",
              "count    4.000000   4.000000      4.0        4   2.000000   2.00000      2.0   \n",
              "unique        NaN        NaN      NaN        2        NaN       NaN      NaN   \n",
              "top           NaN        NaN      NaN        N        NaN       NaN      NaN   \n",
              "freq          NaN        NaN      NaN        3        NaN       NaN      NaN   \n",
              "mean    27.250000  24.250000      0.0      NaN  26.500000  36.00000      0.0   \n",
              "std     21.884165  16.879475      0.0      NaN  19.091883  19.79899      0.0   \n",
              "min      1.000000   5.000000      0.0      NaN  13.000000  22.00000      0.0   \n",
              "25%     13.750000  17.000000      0.0      NaN  19.750000  29.00000      0.0   \n",
              "50%     29.500000  23.000000      0.0      NaN  26.500000  36.00000      0.0   \n",
              "75%     43.000000  30.250000      0.0      NaN  33.250000  43.00000      0.0   \n",
              "max     49.000000  46.000000      0.0      NaN  40.000000  50.00000      0.0   \n",
              "\n",
              "                                                                            \\\n",
              "       drLonHem     LatDeg     LatMin LatSec LatHem      LonDeg     LonMin   \n",
              "count         2   2.000000   2.000000    2.0      2    3.000000   3.000000   \n",
              "unique        1        NaN        NaN    NaN      2         NaN        NaN   \n",
              "top           E        NaN        NaN    NaN      N         NaN        NaN   \n",
              "freq          2        NaN        NaN    NaN      1         NaN        NaN   \n",
              "mean        NaN  22.000000   9.500000    0.0    NaN  121.666667  42.666667   \n",
              "std         NaN  29.698485  13.435029    0.0    NaN  195.208436  11.239810   \n",
              "min         NaN   1.000000   0.000000    0.0    NaN    4.000000  33.000000   \n",
              "25%         NaN  11.500000   4.750000    0.0    NaN    9.000000  36.500000   \n",
              "50%         NaN  22.000000   9.500000    0.0    NaN   14.000000  40.000000   \n",
              "75%         NaN  32.500000  14.250000    0.0    NaN  180.500000  47.500000   \n",
              "max         NaN  43.000000  19.000000    0.0    NaN  347.000000  55.000000   \n",
              "\n",
              "                                                                      \\\n",
              "       LonSec LonHem LatInd LonInd ZeroMeridian LMname1 LMdirection1   \n",
              "count     3.0      3      5      5            5       1            1   \n",
              "unique    NaN      2      2      3            4       1            1   \n",
              "top       NaN      E      1      2     TENERIFE  LIZARD      N87:17E   \n",
              "freq      NaN      2      3      2            2       1            1   \n",
              "mean      0.0    NaN    NaN    NaN          NaN     NaN          NaN   \n",
              "std       0.0    NaN    NaN    NaN          NaN     NaN          NaN   \n",
              "min       0.0    NaN    NaN    NaN          NaN     NaN          NaN   \n",
              "25%       0.0    NaN    NaN    NaN          NaN     NaN          NaN   \n",
              "50%       0.0    NaN    NaN    NaN          NaN     NaN          NaN   \n",
              "75%       0.0    NaN    NaN    NaN          NaN     NaN          NaN   \n",
              "max       0.0    NaN    NaN    NaN          NaN     NaN          NaN   \n",
              "\n",
              "                                                                          \\\n",
              "       LMdistance1 LMname2 LMdirection2 LMdistance2 LMname3 LMdirection3   \n",
              "count          1.0       0            0         0.0       0            0   \n",
              "unique         NaN       0            0         NaN       0            0   \n",
              "top            NaN     NaN          NaN         NaN     NaN          NaN   \n",
              "freq           NaN     NaN          NaN         NaN     NaN          NaN   \n",
              "mean         230.0     NaN          NaN         NaN     NaN          NaN   \n",
              "std            NaN     NaN          NaN         NaN     NaN          NaN   \n",
              "min          230.0     NaN          NaN         NaN     NaN          NaN   \n",
              "25%          230.0     NaN          NaN         NaN     NaN          NaN   \n",
              "50%          230.0     NaN          NaN         NaN     NaN          NaN   \n",
              "75%          230.0     NaN          NaN         NaN     NaN          NaN   \n",
              "max          230.0     NaN          NaN         NaN     NaN          NaN   \n",
              "\n",
              "                                                                 \\\n",
              "       LMdistance4 PosCoastal Calendar_type logbook_date TimeOB   \n",
              "count          0.0          5             5            5    5.0   \n",
              "unique         NaN          1             1            1    NaN   \n",
              "top            NaN          0             2     17711001    NaN   \n",
              "freq           NaN          5             5            5    NaN   \n",
              "mean           NaN        NaN           NaN          NaN   12.0   \n",
              "std            NaN        NaN           NaN          NaN    0.0   \n",
              "min            NaN        NaN           NaN          NaN   12.0   \n",
              "25%            NaN        NaN           NaN          NaN   12.0   \n",
              "50%            NaN        NaN           NaN          NaN   12.0   \n",
              "75%            NaN        NaN           NaN          NaN   12.0   \n",
              "max            NaN        NaN           NaN          NaN   12.0   \n",
              "\n",
              "                                                                              \\\n",
              "       Day_of_the_week PartDay Watch Glasses Start_day  ShipName Nationality   \n",
              "count                1       1     1     1.0         5         5           5   \n",
              "unique               1       1     1    <NA>         2         5           4   \n",
              "top            TUESDAY       3    VM    <NA>   UNKNOWN  EL COLON     SPANISH   \n",
              "freq                 1       1     1    <NA>         4         1           2   \n",
              "mean               NaN     NaN   NaN     8.0       NaN       NaN         NaN   \n",
              "std                NaN     NaN   NaN    <NA>       NaN       NaN         NaN   \n",
              "min                NaN     NaN   NaN     8.0       NaN       NaN         NaN   \n",
              "25%                NaN     NaN   NaN     8.0       NaN       NaN         NaN   \n",
              "50%                NaN     NaN   NaN     8.0       NaN       NaN         NaN   \n",
              "75%                NaN     NaN   NaN     8.0       NaN       NaN         NaN   \n",
              "max                NaN     NaN   NaN     8.0       NaN       NaN         NaN   \n",
              "\n",
              "                                                                    \\\n",
              "        Ship_type Company           Name1    Rank1           Name2   \n",
              "count           4       2               4        4               1   \n",
              "unique          4       2               4        3               1   \n",
              "top     PAQUEBOTE     MCC  THOMAS D'ORVES  CAPITAN  CHARLES WARREN   \n",
              "freq            1       1               1        2               1   \n",
              "mean          NaN     NaN             NaN      NaN             NaN   \n",
              "std           NaN     NaN             NaN      NaN             NaN   \n",
              "min           NaN     NaN             NaN      NaN             NaN   \n",
              "25%           NaN     NaN             NaN      NaN             NaN   \n",
              "50%           NaN     NaN             NaN      NaN             NaN   \n",
              "75%           NaN     NaN             NaN      NaN             NaN   \n",
              "max           NaN     NaN             NaN      NaN             NaN   \n",
              "\n",
              "                                                                   \\\n",
              "                         Rank2 Name3 Rank3 voyage_from  voyage_to   \n",
              "count                        1     0     0           5          5   \n",
              "unique                       1     0     0           5          4   \n",
              "top     2ND OFFICER/LIEUTENANT   NaN   NaN   LA HABANA  LA CORUÑA   \n",
              "freq                         1   NaN   NaN           1          2   \n",
              "mean                       NaN   NaN   NaN         NaN        NaN   \n",
              "std                        NaN   NaN   NaN         NaN        NaN   \n",
              "min                        NaN   NaN   NaN         NaN        NaN   \n",
              "25%                        NaN   NaN   NaN         NaN        NaN   \n",
              "50%                        NaN   NaN   NaN         NaN        NaN   \n",
              "75%                        NaN   NaN   NaN         NaN        NaN   \n",
              "max                        NaN   NaN   NaN         NaN        NaN   \n",
              "\n",
              "                                                                        \\\n",
              "       Anchored_ind AnchorPlace DASno VoyageIni Course_ship Ship_speed   \n",
              "count             5           0     0         5           2          0   \n",
              "unique            1           0     0         5           2          0   \n",
              "top               0         NaN   NaN  17710819         WTZ        NaN   \n",
              "freq              5         NaN   NaN         1           1        NaN   \n",
              "mean            NaN         NaN   NaN       NaN         NaN        NaN   \n",
              "std             NaN         NaN   NaN       NaN         NaN        NaN   \n",
              "min             NaN         NaN   NaN       NaN         NaN        NaN   \n",
              "25%             NaN         NaN   NaN       NaN         NaN        NaN   \n",
              "50%             NaN         NaN   NaN       NaN         NaN        NaN   \n",
              "75%             NaN         NaN   NaN       NaN         NaN        NaN   \n",
              "max             NaN         NaN   NaN       NaN         NaN        NaN   \n",
              "\n",
              "                                \n",
              "       Distance EncName EncNat  \n",
              "count         4       0      0  \n",
              "unique        4       0      0  \n",
              "top      175.00     NaN    NaN  \n",
              "freq          1     NaN    NaN  \n",
              "mean        NaN     NaN    NaN  \n",
              "std         NaN     NaN    NaN  \n",
              "min         NaN     NaN    NaN  \n",
              "25%         NaN     NaN    NaN  \n",
              "50%         NaN     NaN    NaN  \n",
              "75%         NaN     NaN    NaN  \n",
              "max         NaN     NaN    NaN  "
            ]
          },
          "execution_count": 18,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pd.options.display.max_columns = None\n",
        "all_data.data[[\"c99_voyage\"]].describe(include=\"all\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "0     TENERIFE\n",
              "1    GREENWICH\n",
              "2      NL_0_01\n",
              "3      BERMUDA\n",
              "4     TENERIFE\n",
              "Name: ZeroMeridian, dtype: object"
            ]
          },
          "execution_count": 19,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "all_data.data[[\"c99_voyage\"]].c99_voyage.ZeroMeridian.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Ship types and languages\n",
        "\n",
        "For example, the ship types on this deck will be given in many different languages. There is no code table for this variable in the CLIWOC website."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "0    PAQUEBOTE\n",
              "2        SNAUW\n",
              "3     5TH RATE\n",
              "4     PAQUEBOT\n",
              "Name: Ship_type, dtype: object"
            ]
          },
          "execution_count": 20,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "all_data.data[[\"c99_voyage\"]].c99_voyage.Ship_type.dropna().head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>AT_reading_units</th>\n",
              "      <th>SST_reading_units</th>\n",
              "      <th>AP_reading_units</th>\n",
              "      <th>BART_reading_units</th>\n",
              "      <th>ReferenceCourse</th>\n",
              "      <th>ReferenceWindDirection</th>\n",
              "      <th>Decl</th>\n",
              "      <th>Distance_units</th>\n",
              "      <th>Distance_units_to_landmark</th>\n",
              "      <th>Distance_units_travelled</th>\n",
              "      <th>Longitude_units</th>\n",
              "      <th>units_of_measurement</th>\n",
              "      <th>humidity_units</th>\n",
              "      <th>water_at_pump_units</th>\n",
              "      <th>wind_scale</th>\n",
              "      <th>BARO_type</th>\n",
              "      <th>BARO_brand</th>\n",
              "      <th>API</th>\n",
              "      <th>Humidity_method</th>\n",
              "      <th>compas_error</th>\n",
              "      <th>compas_correction</th>\n",
              "      <th>AT_outside</th>\n",
              "      <th>SST</th>\n",
              "      <th>AP</th>\n",
              "      <th>wind_dir</th>\n",
              "      <th>current_dir</th>\n",
              "      <th>current_speed</th>\n",
              "      <th>attached_tem</th>\n",
              "      <th>pump_water</th>\n",
              "      <th>Humidity</th>\n",
              "      <th>wind_force</th>\n",
              "      <th>weather</th>\n",
              "      <th>prcp_descriptor</th>\n",
              "      <th>sea_state</th>\n",
              "      <th>shape_coulds</th>\n",
              "      <th>dir_coulds</th>\n",
              "      <th>Clearness</th>\n",
              "      <th>cloud_fraction</th>\n",
              "      <th>gusts</th>\n",
              "      <th>Rain</th>\n",
              "      <th>Fog</th>\n",
              "      <th>Snow</th>\n",
              "      <th>Thunder</th>\n",
              "      <th>Hail</th>\n",
              "      <th>Sea_ice</th>\n",
              "      <th>Trivial_correction</th>\n",
              "      <th>Release</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>count</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>4</td>\n",
              "      <td>5</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>4</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>unique</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>5</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>4</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>5</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>4</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>top</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>UNKNOWN</td>\n",
              "      <td>UNKNOWN</td>\n",
              "      <td>-20</td>\n",
              "      <td>NaN</td>\n",
              "      <td>LEAGUES</td>\n",
              "      <td>MILLAS</td>\n",
              "      <td>360 DEGREES</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>S</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>EN REFREGONES FUERTES Y DESPUES BONANCIBLE</td>\n",
              "      <td>MUY MALOS CARICES. AGUACEROS, RELAMPAGOS Y TRU...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>GRANDE DEL O Y DEL ENE</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>CLIWOC VERSION 2.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>freq</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>2</td>\n",
              "      <td>5</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>3</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>mean</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>std</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>min</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25%</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>50%</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>75%</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>max</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       AT_reading_units SST_reading_units AP_reading_units BART_reading_units  \\\n",
              "count                 0                 0                0                  0   \n",
              "unique                0                 0                0                  0   \n",
              "top                 NaN               NaN              NaN                NaN   \n",
              "freq                NaN               NaN              NaN                NaN   \n",
              "mean                NaN               NaN              NaN                NaN   \n",
              "std                 NaN               NaN              NaN                NaN   \n",
              "min                 NaN               NaN              NaN                NaN   \n",
              "25%                 NaN               NaN              NaN                NaN   \n",
              "50%                 NaN               NaN              NaN                NaN   \n",
              "75%                 NaN               NaN              NaN                NaN   \n",
              "max                 NaN               NaN              NaN                NaN   \n",
              "\n",
              "       ReferenceCourse ReferenceWindDirection Decl Distance_units  \\\n",
              "count                2                      5    5              0   \n",
              "unique               1                      1    5              0   \n",
              "top            UNKNOWN                UNKNOWN  -20            NaN   \n",
              "freq                 2                      5    1            NaN   \n",
              "mean               NaN                    NaN  NaN            NaN   \n",
              "std                NaN                    NaN  NaN            NaN   \n",
              "min                NaN                    NaN  NaN            NaN   \n",
              "25%                NaN                    NaN  NaN            NaN   \n",
              "50%                NaN                    NaN  NaN            NaN   \n",
              "75%                NaN                    NaN  NaN            NaN   \n",
              "max                NaN                    NaN  NaN            NaN   \n",
              "\n",
              "       Distance_units_to_landmark Distance_units_travelled Longitude_units  \\\n",
              "count                           1                        4               5   \n",
              "unique                          1                        4               2   \n",
              "top                       LEAGUES                   MILLAS     360 DEGREES   \n",
              "freq                            1                        1               3   \n",
              "mean                          NaN                      NaN             NaN   \n",
              "std                           NaN                      NaN             NaN   \n",
              "min                           NaN                      NaN             NaN   \n",
              "25%                           NaN                      NaN             NaN   \n",
              "50%                           NaN                      NaN             NaN   \n",
              "75%                           NaN                      NaN             NaN   \n",
              "max                           NaN                      NaN             NaN   \n",
              "\n",
              "       units_of_measurement humidity_units water_at_pump_units wind_scale  \\\n",
              "count                     0              0                   0          0   \n",
              "unique                    0              0                   0          0   \n",
              "top                     NaN            NaN                 NaN        NaN   \n",
              "freq                    NaN            NaN                 NaN        NaN   \n",
              "mean                    NaN            NaN                 NaN        NaN   \n",
              "std                     NaN            NaN                 NaN        NaN   \n",
              "min                     NaN            NaN                 NaN        NaN   \n",
              "25%                     NaN            NaN                 NaN        NaN   \n",
              "50%                     NaN            NaN                 NaN        NaN   \n",
              "75%                     NaN            NaN                 NaN        NaN   \n",
              "max                     NaN            NaN                 NaN        NaN   \n",
              "\n",
              "       BARO_type BARO_brand  API Humidity_method compas_error  \\\n",
              "count          0          0    0               0            0   \n",
              "unique         0          0    0               0            0   \n",
              "top          NaN        NaN  NaN             NaN          NaN   \n",
              "freq         NaN        NaN  NaN             NaN          NaN   \n",
              "mean         NaN        NaN  NaN             NaN          NaN   \n",
              "std          NaN        NaN  NaN             NaN          NaN   \n",
              "min          NaN        NaN  NaN             NaN          NaN   \n",
              "25%          NaN        NaN  NaN             NaN          NaN   \n",
              "50%          NaN        NaN  NaN             NaN          NaN   \n",
              "75%          NaN        NaN  NaN             NaN          NaN   \n",
              "max          NaN        NaN  NaN             NaN          NaN   \n",
              "\n",
              "       compas_correction  AT_outside  SST   AP wind_dir current_dir  \\\n",
              "count                  0         0.0  0.0    0        5           0   \n",
              "unique                 0         NaN  NaN    0        5           0   \n",
              "top                  NaN         NaN  NaN  NaN        S         NaN   \n",
              "freq                 NaN         NaN  NaN  NaN        1         NaN   \n",
              "mean                 NaN         NaN  NaN  NaN      NaN         NaN   \n",
              "std                  NaN         NaN  NaN  NaN      NaN         NaN   \n",
              "min                  NaN         NaN  NaN  NaN      NaN         NaN   \n",
              "25%                  NaN         NaN  NaN  NaN      NaN         NaN   \n",
              "50%                  NaN         NaN  NaN  NaN      NaN         NaN   \n",
              "75%                  NaN         NaN  NaN  NaN      NaN         NaN   \n",
              "max                  NaN         NaN  NaN  NaN      NaN         NaN   \n",
              "\n",
              "       current_speed  attached_tem pump_water Humidity  \\\n",
              "count              0           0.0          0        0   \n",
              "unique             0           NaN          0        0   \n",
              "top              NaN           NaN        NaN      NaN   \n",
              "freq             NaN           NaN        NaN      NaN   \n",
              "mean             NaN           NaN        NaN      NaN   \n",
              "std              NaN           NaN        NaN      NaN   \n",
              "min              NaN           NaN        NaN      NaN   \n",
              "25%              NaN           NaN        NaN      NaN   \n",
              "50%              NaN           NaN        NaN      NaN   \n",
              "75%              NaN           NaN        NaN      NaN   \n",
              "max              NaN           NaN        NaN      NaN   \n",
              "\n",
              "                                        wind_force  \\\n",
              "count                                            5   \n",
              "unique                                           5   \n",
              "top     EN REFREGONES FUERTES Y DESPUES BONANCIBLE   \n",
              "freq                                             1   \n",
              "mean                                           NaN   \n",
              "std                                            NaN   \n",
              "min                                            NaN   \n",
              "25%                                            NaN   \n",
              "50%                                            NaN   \n",
              "75%                                            NaN   \n",
              "max                                            NaN   \n",
              "\n",
              "                                                  weather prcp_descriptor  \\\n",
              "count                                                   2               0   \n",
              "unique                                                  2               0   \n",
              "top     MUY MALOS CARICES. AGUACEROS, RELAMPAGOS Y TRU...             NaN   \n",
              "freq                                                    1             NaN   \n",
              "mean                                                  NaN             NaN   \n",
              "std                                                   NaN             NaN   \n",
              "min                                                   NaN             NaN   \n",
              "25%                                                   NaN             NaN   \n",
              "50%                                                   NaN             NaN   \n",
              "75%                                                   NaN             NaN   \n",
              "max                                                   NaN             NaN   \n",
              "\n",
              "                     sea_state shape_coulds dir_coulds Clearness  \\\n",
              "count                        4            0          0         0   \n",
              "unique                       4            0          0         0   \n",
              "top     GRANDE DEL O Y DEL ENE          NaN        NaN       NaN   \n",
              "freq                         1          NaN        NaN       NaN   \n",
              "mean                       NaN          NaN        NaN       NaN   \n",
              "std                        NaN          NaN        NaN       NaN   \n",
              "min                        NaN          NaN        NaN       NaN   \n",
              "25%                        NaN          NaN        NaN       NaN   \n",
              "50%                        NaN          NaN        NaN       NaN   \n",
              "75%                        NaN          NaN        NaN       NaN   \n",
              "max                        NaN          NaN        NaN       NaN   \n",
              "\n",
              "       cloud_fraction gusts Rain  Fog Snow Thunder Hail Sea_ice  \\\n",
              "count               0     5    5    5    5       5    5       5   \n",
              "unique              0     1    2    1    1       2    1       1   \n",
              "top               NaN     0    0    0    0       0    0       0   \n",
              "freq              NaN     5    4    5    5       4    5       5   \n",
              "mean              NaN   NaN  NaN  NaN  NaN     NaN  NaN     NaN   \n",
              "std               NaN   NaN  NaN  NaN  NaN     NaN  NaN     NaN   \n",
              "min               NaN   NaN  NaN  NaN  NaN     NaN  NaN     NaN   \n",
              "25%               NaN   NaN  NaN  NaN  NaN     NaN  NaN     NaN   \n",
              "50%               NaN   NaN  NaN  NaN  NaN     NaN  NaN     NaN   \n",
              "75%               NaN   NaN  NaN  NaN  NaN     NaN  NaN     NaN   \n",
              "max               NaN   NaN  NaN  NaN  NaN     NaN  NaN     NaN   \n",
              "\n",
              "       Trivial_correction             Release  \n",
              "count                   5                   5  \n",
              "unique                  1                   2  \n",
              "top                     0  CLIWOC VERSION 2.0  \n",
              "freq                    5                   4  \n",
              "mean                  NaN                 NaN  \n",
              "std                   NaN                 NaN  \n",
              "min                   NaN                 NaN  \n",
              "25%                   NaN                 NaN  \n",
              "50%                   NaN                 NaN  \n",
              "75%                   NaN                 NaN  \n",
              "max                   NaN                 NaN  "
            ]
          },
          "execution_count": 21,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "all_data.data[[\"c99_data\"]].c99_data.describe(include=\"all\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Wind force scales and languages\n",
        "\n",
        "What about the different scales for the wind force, given different languages?"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "0    EN REFREGONES FUERTES Y DESPUES BONANCIBLE\n",
              "1                                        FOIBLE\n",
              "2               STIJVE GEREEFDE MARSZEILSKOELTE\n",
              "3                       FRESH GALES AND SQUALLY\n",
              "4                                    BONANCIBLE\n",
              "Name: wind_force, dtype: object"
            ]
          },
          "execution_count": 22,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "all_data.data[[\"c99_data\"]].c99_data.wind_force.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.13.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}