Generating a data model for CLIWOC

The purpose of this notebook is to demonstrate the structure of data models used by the cdm_reader_mapper toolbox.

ICOADS IMMA

A common format for marine observational records is the ICOADS IMMA format. This is a text format, where each line contains the data (including metadata) for an individual record. The format is attachment based, each record is constructed from a selection of (typically) fixed-width sections (called attachments) containing different subsets of the data or metadata associated with the record. Documentation on the format, and the available attachments can be found at https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf.

Records within the same file can contain different attachments, meaning that the IMMA format is not a fixed-width format, as line lengths will vary between records. Each record, however, must contain a certain subset of the attachments (in this case the core (or c0), c1, and c98 attachments).

Supplementary Data

Additional data or metadata can be provided in the c99 attachment. This attachment is not fixed-width as different sources or decks can provide different collections of supplementary data.

CLIWOC

In this example we use a subset of ICOADS release 3.0.0 IMMA formatted data for deck 730, which is data from the Climatological Database for the World’s Oceans (CLIWOC). There is a large amount of supplementary data available in the c99 attachment, which for deck 730 can be split into multiple sections. Here, we will start with the standard schema for the ICOADS IMMA format (included in cdm_reader_mapper as the "icoads" imodel), and extend the schema with fields for a subset of the c99 attachment. We will add fields for the logbook section of the c99 attachment for this deck.

An internal schema already exists for this deck ("icoads_r300_d730"), the purpose of this notebook is to demonstrate how one can extend the "icoads" data model to parse c99 data.

Overview

  • An initial read of the data subset using the "icoads" data model which does not parse the c99 attachment.

  • Extension of the "icoads" schema to add fields for the logbook section of the c99 attachment for deck 730.

  • Construction of a code table for a categorical field in the c99 attachment.

  • Comparison with the internal schema for deck 730.

from __future__ import annotations
import json
import shutil
import warnings

import pandas as pd

from cdm_reader_mapper import read_mdf, test_data
from cdm_reader_mapper.mdf_reader.properties import _base as base


try:
    from importlib.resources import files as get_files
except ImportError:
    from importlib_resources import files as get_files

import pathlib
from collections import OrderedDict
from tempfile import TemporaryDirectory


warnings.filterwarnings("ignore")

The Data

For this example we load a subset of ICOADS data for deck 730 from the cdm_reader_mapper test data. This is the data that will be used throughout this notebook.

data_file_path = test_data.test_icoads_r300_d730["source"]

Initial Read

First we read the data using the basic "icoads" data model. This isn’t necessary for extending the schema, it is to highlight the raw c99 data.

data_bundle = read_mdf(data_file_path, imodel="icoads")
data_raw = data_bundle.data
WARNING:root:Unknown column_type 'object' for column '('c8', 'PUID')'
WARNING:root:Unknown column_type 'object' for column '('c95', 'ARCR')'
WARNING:root:Unknown column_type 'object' for column '('c96', 'ARCI')'
WARNING:root:Unknown column_type 'object' for column '('c97', 'ARCE')'

Supplementary (c99) data

By looking at the c99 section we can see that the supplementary data has not been parsed.

data_raw["c99"].head()
0    99 0 AGI     ARCHIVO GENERAL DE INDIAS        ...
1    99 0 CARAN   CENTRE D'ACCUEIL ET DE RECHERCHE ...
2    99 0 RAZ     RIJKSARCHIEF ZEELAND             ...
3    99 0 NMM     NATIONAL MARITIME MUSEUM         ...
4    99 0 AGI     ARCHIVO GENERAL DE INDIAS        ...
Name: c99, dtype: object
data_raw["c99"].iloc[3]
'99 0 NMM     NATIONAL MARITIME MUSEUM                          GREENWICH UNITED KINGDOM                                                                                                      NMM ADM/L/R13                 ENGLISH                       0492500N 405000E                1 1BERMUDA                                    LIZARD                                            N87:17E            230                                                                                                                                                0 21771100112TUESDAY                   12             RAINBOW                       BRITISH 5TH RATE       RN                                THOMAS COLLINGWOOD            CAPTAIN                  CHARLES WARREN                2ND OFFICER/LIEUTENANT                                                          BERMUDA                                      SPITHEAD                                          0                                                        17710829S25E                  39.00                                                                                                        UNKNOWN        UNKNOWN         -22                    LEAGUESNM     180 DEGREES                                                                                                                                                                                                 ESE, E                                                                                                                                                                         FRESH GALES AND SQUALLY                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             00000000CLIWOC VERSION 2.0'

Creating a data model

Custom Schema

To use a custom schema we need to use the ext_schema_path argument in read_mdf. The structure of the directory is:

name_of_model/
    name_of_model.json
    code_tables/
        ...

The code_tables sub-directory contains the code tables that map the key columns in the data to their values.

In this example we create a temporary directory for the data model, so that it is cleaned up after the notebook is finished; in reality you would want to store the data model in a permanent directory!

We start from the basic "icoads" model. The c99 section will be based on the "icoads_r300_d730" schema and code tables.

Copy the "icoads" schema

First we create a copy of the "icoads" schema (located at mdf_reader/schemas/icoads/icoads.json). NOTE: cdm_reader_mapper.mdf_reader.properties._base is used so that we have a relative path to the original schema and code tables.

tmp_dir = TemporaryDirectory()
my_model_name = "cliwoc"
my_model_path = pathlib.Path(tmp_dir.name) / my_model_name
my_model_path.mkdir(exist_ok=True)

# Get a copy of the "imma1" schema
icoads_schema_path = icoads_code_tables_path = get_files(f"{base}.schemas.icoads")
icoads_schema_path = pathlib.Path(icoads_schema_path) / "icoads.json"

my_schema_path = my_model_path / (my_model_name + ".json")
copy = shutil.copyfile(icoads_schema_path, my_schema_path)

Copy the code tables

We now copy each of the "icoads" code tables. This includes generic icoads code tables (located in mdf_reader/codes/icoads).

# Get code tables and copy to the directory
my_code_tables_path = my_model_path / "code_tables"
my_code_tables_path.mkdir(exist_ok=True)

# Original code table directories (general ICOADS and Deck specific)
icoads_code_tables_path = get_files(f"{base}.codes.icoads")

# Get filenames for each of the code tables
code_table_files = list(icoads_code_tables_path.glob("ICOADS.*.json"))

# Copy each file
for file in code_table_files:
    basename = pathlib.Path(file).name
    out_path = my_code_tables_path / basename
    shutil.copyfile(file, out_path)

Extending the schema: CLIWOC logbook information

For this example we’ll load the schema into the environment as a dictionary (we use an ordered dictionary to guarantee that the ordering of the fields is maintained!).

with pathlib.Path(my_schema_path).open() as io:
    schema = json.load(io, object_pairs_hook=OrderedDict)

We now add the contents for section c99. There are some standard (“header”_ fields we need to supply. The "sentinal" is the prefix for the attachment, this is printed in the raw supplementary data and identifies the start of the attachment.

We also need to specify the length of the attachment and the layout.

We then add our data fields to the elements field for the c99 section. We’ll add the fields for the logbook component of the supplementary data for CLIWOC data, there are additional components we can resolve but we’ll keep it to the logbook for this example.

schema["sections"]["c99"]["header"]["sentinal"] = "99 0 "
schema["sections"]["c99"]["header"]["disable_read"] = False
schema["sections"]["c99"]["header"]["field_layout"] = "fixed_width"
schema["sections"]["c99"]["header"]["length"] = 245 + 5  # Sentinal length
schema["sections"]["c99"]["elements"] = OrderedDict(
    {
        "sentinal": {
            "description": "attachment sentinal",
            "field_length": 5,
            "column_type": "str",
            "ignore": True,
        },
        "InstAbbr": {
            "description": "Abbreviation of the Institute storing the original data",
            "field_length": 8,
            "column_type": "str",
        },
        "InstName": {
            "description": "Full name of the Institute storing the original data",
            "field_length": 50,
            "column_type": "str",
        },
        "InstCity": {
            "description": "City where the Institute storing the data is located",
            "field_length": 10,
            "column_type": "str",
        },
        "InstCountry": {
            "description": "Country where the Institute storing the data is located",
            "field_length": 14,
            "column_type": "str",
        },
        "ArchiveID": {
            "description": "Administrative number under which the data is found within the Institute storing the data",
            "field_length": 15,
            "column_type": "str",
        },
        "ArchiveName": {
            "description": "Administrative name under which the data is found within the Institute storing the data",
            "field_length": 17,
            "column_type": "str",
        },
        "ArchivePart": {
            "description": "Part of the archive set in which the data is found within the Institute storing the data",
            "field_length": 39,
            "column_type": "str",
        },
        "ArchivePartSpec": {
            "description": "Specification of the part of the archive set in which the data is found within the Institute storing the data",
            "field_length": 31,
            "column_type": "str",
        },
        "LogbookID": {
            "description": "Identificaion Number of the logbook containing the data",
            "field_length": 30,
            "column_type": "str",
        },
        "LogbookLang": {
            "description": "Language of the logbook containing the data",
            "field_length": 7,
            "column_type": "str",
        },
        "ImageID": {
            "description": "Identificaion Number of the original image of the logbook",
            "field_length": 23,
            "column_type": "str",
        },
        "IllustrationAvail": {
            "description": "Illustration available on the current page of the logbook",
            "field_length": 1,
            "column_type": "key",
            "codetable": "CLIWOC_ILLUSTRATION_I",
        },
    }
)

We can now write the dictionary to the schema file.

json_object = json.dumps(schema, indent=2)

with pathlib.Path(my_schema_path).open("w") as outfile:
    outfile.write(json_object)

ImageAvail Code Table

One of the fields we have added has "column_type" of "key". This is used to indicate categorical data, where the key value maps to a larger descriptive value. We also specified a code table for this field, which should describe that mapping. Let’s create that table now. As with the schema it should be json formatted.

For this field, we have two possible values. We save the dictionary to a json file in the code_tables directory, the name of the file must match the "codetable" value for the field (plus the ".json" extension).

illustration_avail_codes = {
    "0": "No illustration on the current logbook page.",
    "1": "Illustration available on the current logbook page.",
}
illustration_avail_path = my_code_tables_path / "CLIWOC_ILLUSTRATION_I.json"

json_object = json.dumps(illustration_avail_codes, indent=2)

with pathlib.Path(illustration_avail_path).open("w") as outfile:
    outfile.write(json_object)

Reading

We can now read the data file with the schema we have just created (copied…). We specify the path to the data model (the directory containing the schema json file) and the path to the code tables.

my_bundle = read_mdf(
    data_file_path,  # Path to the data file
    ext_schema_path=my_model_path,  # Path to the directory containing the schema json file
    ext_table_path=my_code_tables_path,  # Path to the directory containing the json code tables
)
my_data = my_bundle.data
ERROR:root:imodel is not defined.

Analysing the output

We can now investigate components of the c99 section.

my_data[["c99"]].head()
c99
InstAbbr InstName InstCity InstCountry ArchiveID ArchiveName ArchivePart ArchivePartSpec LogbookID LogbookLang ImageID IllustrationAvail
0 AGI ARCHIVO GENERAL DE INDIAS SEVILLE SPAIN NaN None NaN NaN CORREOS, 275A R11 SPANISH NaN 0
1 CARAN CENTRE D'ACCUEIL ET DE RECHERCHE DES ARCH. NAT... PARIS FRANCE NaN None NaN NaN COTE - 4/JJ/39 FRENCH NaN 0
2 RAZ RIJKSARCHIEF ZEELAND MIDDELBURG NEDERLAND 20 None MCC 1391 MCC_20_1391 DUTCH MCC_20_1391_0032 0
3 NMM NATIONAL MARITIME MUSEUM GREENWICH UNITED KINGDOM NaN None NaN NaN NMM ADM/L/R13 ENGLISH NaN 0
4 AGI ARCHIVO GENERAL DE INDIAS SEVILLE SPAIN NaN None NaN NaN CORREOS, 193B R3 SPANISH NaN 0
my_data[["c99"]].describe(include="all")
c99
InstAbbr InstName InstCity InstCountry ArchiveID ArchiveName ArchivePart ArchivePartSpec LogbookID LogbookLang ImageID IllustrationAvail
count 5 5 5 5 1 0 1 1 5 5 1 5
unique 4 4 4 4 1 0 1 1 5 4 1 1
top AGI ARCHIVO GENERAL DE INDIAS SEVILLE SPAIN 20 NaN MCC 1391 CORREOS, 275A R11 SPANISH MCC_20_1391_0032 0
freq 2 2 2 2 1 NaN 1 1 1 2 1 5

Internal Schema

cdm_reader_mapper already includes a data model for the CLIWOC deck. The model parses all sections of supplementary data and provides all required code tables. Let’s now read in the data using the "icoads_r300_d730" model.

all_data = read_mdf(
    data_file_path,
    imodel="icoads_r300_d730",
)
WARNING:root:Unknown column_type 'object' for column '('c8', 'PUID')'
WARNING:root:Unknown column_type 'object' for column '('c95', 'ARCR')'
WARNING:root:Unknown column_type 'object' for column '('c96', 'ARCI')'
WARNING:root:Unknown column_type 'object' for column '('c97', 'ARCE')'
WARNING:root:Unknown column_type 'object' for column '('c99_sentinel', 'BLK')'

The c99 section has been split into multiple sections. There is no c99 section in the output, however we now have:

  • c99_logbook

  • c99_voyage

  • c99_data

We can compare the c99_logbook section to the output of our model. We see that we have extracted the same data, although we chose different column names for the elements.

all_data.data[["c99_logbook"]].describe(include="all")
c99_logbook
InstAbbr InstName InstPlace InstLand NumArchiveSet NameArchiveSet ArchivePart Specification Logbook_id Logbook_language Image_No Illustr
count 5 5 5 5 1 0 1 1 5 5 1 5
unique 4 4 4 4 1 0 1 1 5 4 1 1
top AGI ARCHIVO GENERAL DE INDIAS SEVILLE SPAIN 20 NaN MCC 1391 CORREOS, 275A R11 SPANISH MCC_20_1391_0032 0
freq 2 2 2 2 1 NaN 1 1 1 2 1 5
my_data[["c99"]].describe(include="all")
c99
InstAbbr InstName InstCity InstCountry ArchiveID ArchiveName ArchivePart ArchivePartSpec LogbookID LogbookLang ImageID IllustrationAvail
count 5 5 5 5 1 0 1 1 5 5 1 5
unique 4 4 4 4 1 0 1 1 5 4 1 1
top AGI ARCHIVO GENERAL DE INDIAS SEVILLE SPAIN 20 NaN MCC 1391 CORREOS, 275A R11 SPANISH MCC_20_1391_0032 0
freq 2 2 2 2 1 NaN 1 1 1 2 1 5

Additional Sections

We can also look at the additional components we did not parse in our model.

We can note some remaining issues with the model as we look at the extra data. Most of the challenges relate to language translations.

pd.options.display.max_columns = None
all_data.data[["c99_voyage"]].describe(include="all")
c99_voyage
drLatDeg drLatMin drLatSec drLatHem drLonDeg drLonMin drLonSec drLonHem LatDeg LatMin LatSec LatHem LonDeg LonMin LonSec LonHem LatInd LonInd ZeroMeridian LMname1 LMdirection1 LMdistance1 LMname2 LMdirection2 LMdistance2 LMname3 LMdirection3 LMdistance4 PosCoastal Calendar_type logbook_date TimeOB Day_of_the_week PartDay Watch Glasses Start_day ShipName Nationality Ship_type Company Name1 Rank1 Name2 Rank2 Name3 Rank3 voyage_from voyage_to Anchored_ind AnchorPlace DASno VoyageIni Course_ship Ship_speed Distance EncName EncNat
count 4.000000 4.000000 4.0 4 2.000000 2.00000 2.0 2 2.000000 2.000000 2.0 2 3.000000 3.000000 3.0 3 5 5 5 1 1 1.0 0 0 0.0 0 0 0.0 5 5 5 5.0 1 1 1 1.0 5 5 5 4 2 4 4 1 1 0 0 5 5 5 0 0 5 2 0 4 0 0
unique NaN NaN NaN 2 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN 2 2 3 4 1 1 NaN 0 0 NaN 0 0 NaN 1 1 1 NaN 1 1 1 <NA> 2 5 4 4 2 4 3 1 1 0 0 5 4 1 0 0 5 2 0 4 0 0
top NaN NaN NaN N NaN NaN NaN E NaN NaN NaN N NaN NaN NaN E 1 2 TENERIFE LIZARD N87:17E NaN NaN NaN NaN NaN NaN NaN 0 2 17711001 NaN TUESDAY 3 VM <NA> UNKNOWN EL COLON SPANISH PAQUEBOTE MCC THOMAS D'ORVES CAPITAN CHARLES WARREN 2ND OFFICER/LIEUTENANT NaN NaN LA HABANA LA CORUÑA 0 NaN NaN 17710819 WTZ NaN 175.00 NaN NaN
freq NaN NaN NaN 3 NaN NaN NaN 2 NaN NaN NaN 1 NaN NaN NaN 2 3 2 2 1 1 NaN NaN NaN NaN NaN NaN NaN 5 5 5 NaN 1 1 1 <NA> 4 1 2 1 1 1 2 1 1 NaN NaN 1 2 5 NaN NaN 1 1 NaN 1 NaN NaN
mean 27.250000 24.250000 0.0 NaN 26.500000 36.00000 0.0 NaN 22.000000 9.500000 0.0 NaN 121.666667 42.666667 0.0 NaN NaN NaN NaN NaN NaN 230.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.0 NaN NaN NaN 8.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
std 21.884165 16.879475 0.0 NaN 19.091883 19.79899 0.0 NaN 29.698485 13.435029 0.0 NaN 195.208436 11.239810 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN NaN NaN <NA> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
min 1.000000 5.000000 0.0 NaN 13.000000 22.00000 0.0 NaN 1.000000 0.000000 0.0 NaN 4.000000 33.000000 0.0 NaN NaN NaN NaN NaN NaN 230.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.0 NaN NaN NaN 8.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25% 13.750000 17.000000 0.0 NaN 19.750000 29.00000 0.0 NaN 11.500000 4.750000 0.0 NaN 9.000000 36.500000 0.0 NaN NaN NaN NaN NaN NaN 230.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.0 NaN NaN NaN 8.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
50% 29.500000 23.000000 0.0 NaN 26.500000 36.00000 0.0 NaN 22.000000 9.500000 0.0 NaN 14.000000 40.000000 0.0 NaN NaN NaN NaN NaN NaN 230.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.0 NaN NaN NaN 8.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
75% 43.000000 30.250000 0.0 NaN 33.250000 43.00000 0.0 NaN 32.500000 14.250000 0.0 NaN 180.500000 47.500000 0.0 NaN NaN NaN NaN NaN NaN 230.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.0 NaN NaN NaN 8.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
max 49.000000 46.000000 0.0 NaN 40.000000 50.00000 0.0 NaN 43.000000 19.000000 0.0 NaN 347.000000 55.000000 0.0 NaN NaN NaN NaN NaN NaN 230.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.0 NaN NaN NaN 8.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
all_data.data[["c99_voyage"]].c99_voyage.ZeroMeridian.head()
0     TENERIFE
1    GREENWICH
2      NL_0_01
3      BERMUDA
4     TENERIFE
Name: ZeroMeridian, dtype: object

Ship types and languages

For example, the ship types on this deck will be given in many different languages. There is no code table for this variable in the CLIWOC website.

all_data.data[["c99_voyage"]].c99_voyage.Ship_type.dropna().head()
0    PAQUEBOTE
2        SNAUW
3     5TH RATE
4     PAQUEBOT
Name: Ship_type, dtype: object
all_data.data[["c99_data"]].c99_data.describe(include="all")
AT_reading_units SST_reading_units AP_reading_units BART_reading_units ReferenceCourse ReferenceWindDirection Decl Distance_units Distance_units_to_landmark Distance_units_travelled Longitude_units units_of_measurement humidity_units water_at_pump_units wind_scale BARO_type BARO_brand API Humidity_method compas_error compas_correction AT_outside SST AP wind_dir current_dir current_speed attached_tem pump_water Humidity wind_force weather prcp_descriptor sea_state shape_coulds dir_coulds Clearness cloud_fraction gusts Rain Fog Snow Thunder Hail Sea_ice Trivial_correction Release
count 0 0 0 0 2 5 5 0 1 4 5 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0 5 0 0 0.0 0 0 5 2 0 4 0 0 0 0 5 5 5 5 5 5 5 5 5
unique 0 0 0 0 1 1 5 0 1 4 2 0 0 0 0 0 0 0 0 0 0 NaN NaN 0 5 0 0 NaN 0 0 5 2 0 4 0 0 0 0 1 2 1 1 2 1 1 1 2
top NaN NaN NaN NaN UNKNOWN UNKNOWN -20 NaN LEAGUES MILLAS 360 DEGREES NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN S NaN NaN NaN NaN NaN EN REFREGONES FUERTES Y DESPUES BONANCIBLE MUY MALOS CARICES. AGUACEROS, RELAMPAGOS Y TRU... NaN GRANDE DEL O Y DEL ENE NaN NaN NaN NaN 0 0 0 0 0 0 0 0 CLIWOC VERSION 2.0
freq NaN NaN NaN NaN 2 5 1 NaN 1 1 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN 1 1 NaN 1 NaN NaN NaN NaN 5 4 5 5 4 5 5 5 4
mean NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
min NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
50% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
75% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
max NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Wind force scales and languages

What about the different scales for the wind force, given different languages?

all_data.data[["c99_data"]].c99_data.wind_force.head()
0    EN REFREGONES FUERTES Y DESPUES BONANCIBLE
1                                        FOIBLE
2               STIJVE GEREEFDE MARSZEILSKOELTE
3                       FRESH GALES AND SQUALLY
4                                    BONANCIBLE
Name: wind_force, dtype: object