Generating a data model for CLIWOC¶

The purpose of this notebook is to demonstrate the structure of data models used by the cdm_reader_mapper toolbox.

ICOADS IMMA¶

A common format for marine observational records is the ICOADS IMMA format. This is a text format, where each line contains the data (including metadata) for an individual record. The format is attachment based, each record is constructed from a selection of (typically) fixed-width sections (called attachments) containing different subsets of the data or metadata associated with the record. Documentation on the format, and the available attachments can be found at https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf.

Records within the same file can contain different attachments, meaning that the IMMA format is not a fixed-width format, as line lengths will vary between records. Each record, however, must contain a certain subset of the attachments (in this case the core (or c0), c1, and c98 attachments).

Supplementary Data¶

Additional data or metadata can be provided in the c99 attachment. This attachment is not fixed-width as different sources or decks can provide different collections of supplementary data.

CLIWOC¶

In this example we use a subset of ICOADS release 3.0.0 IMMA formatted data for deck 730, which is data from the Climatological Database for the World’s Oceans (CLIWOC). There is a large amount of supplementary data available in the c99 attachment, which for deck 730 can be split into multiple sections. Here, we will start with the standard schema for the ICOADS IMMA format (included in cdm_reader_mapper as the "icoads" imodel), and extend the schema with fields for a subset of the c99 attachment. We will add fields for the logbook section of the c99 attachment for this deck.

An internal schema already exists for this deck ("icoads_r300_d730"), the purpose of this notebook is to demonstrate how one can extend the "icoads" data model to parse c99 data.

Overview¶

An initial read of the data subset using the "icoads" data model which does not parse the c99 attachment.
Extension of the "icoads" schema to add fields for the logbook section of the c99 attachment for deck 730.
Construction of a code table for a categorical field in the c99 attachment.
Comparison with the internal schema for deck 730.

from __future__ import annotations
import json
import shutil
import warnings

import pandas as pd

from cdm_reader_mapper import read_mdf, test_data
from cdm_reader_mapper.mdf_reader.properties import _base as base


try:
    from importlib.resources import files as get_files
except ImportError:
    from importlib_resources import files as get_files

import pathlib
from collections import OrderedDict
from tempfile import TemporaryDirectory


warnings.filterwarnings("ignore")

The Data¶

For this example we load a subset of ICOADS data for deck 730 from the cdm_reader_mapper test data. This is the data that will be used throughout this notebook.

data_file_path = test_data.test_icoads_r300_d730["source"]

Initial Read¶

First we read the data using the basic "icoads" data model. This isn’t necessary for extending the schema, it is to highlight the raw c99 data.

data_bundle = read_mdf(data_file_path, imodel="icoads")
data_raw = data_bundle.data

WARNING:root:Unknown column_type 'object' for column '('c8', 'PUID')'

WARNING:root:Unknown column_type 'object' for column '('c95', 'ARCR')'

WARNING:root:Unknown column_type 'object' for column '('c96', 'ARCI')'

WARNING:root:Unknown column_type 'object' for column '('c97', 'ARCE')'

Supplementary (`c99`) data¶

By looking at the c99 section we can see that the supplementary data has not been parsed.

data_raw["c99"].head()

  99 0 AGI     ARCHIVO GENERAL DE INDIAS        ...
  99 0 CARAN   CENTRE D'ACCUEIL ET DE RECHERCHE ...
  99 0 RAZ     RIJKSARCHIEF ZEELAND             ...
  99 0 NMM     NATIONAL MARITIME MUSEUM         ...
  99 0 AGI     ARCHIVO GENERAL DE INDIAS        ...
Name: c99, dtype: object

data_raw["c99"].iloc[3]

'99 0 NMM     NATIONAL MARITIME MUSEUM                          GREENWICH UNITED KINGDOM                                                                                                      NMM ADM/L/R13                 ENGLISH                       0492500N 405000E                1 1BERMUDA                                    LIZARD                                            N87:17E            230                                                                                                                                                0 21771100112TUESDAY                   12             RAINBOW                       BRITISH 5TH RATE       RN                                THOMAS COLLINGWOOD            CAPTAIN                  CHARLES WARREN                2ND OFFICER/LIEUTENANT                                                          BERMUDA                                      SPITHEAD                                          0                                                        17710829S25E                  39.00                                                                                                        UNKNOWN        UNKNOWN         -22                    LEAGUESNM     180 DEGREES                                                                                                                                                                                                 ESE, E                                                                                                                                                                         FRESH GALES AND SQUALLY                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             00000000CLIWOC VERSION 2.0'

Creating a data model¶

Custom Schema¶

To use a custom schema we need to use the ext_schema_path argument in read_mdf. The structure of the directory is:

name_of_model/
    name_of_model.json
    code_tables/
        ...

The code_tables sub-directory contains the code tables that map the key columns in the data to their values.

In this example we create a temporary directory for the data model, so that it is cleaned up after the notebook is finished; in reality you would want to store the data model in a permanent directory!

We start from the basic "icoads" model. The c99 section will be based on the "icoads_r300_d730" schema and code tables.

Copy the `"icoads"` schema¶

First we create a copy of the "icoads" schema (located at mdf_reader/schemas/icoads/icoads.json). NOTE: cdm_reader_mapper.mdf_reader.properties._base is used so that we have a relative path to the original schema and code tables.

tmp_dir = TemporaryDirectory()
my_model_name = "cliwoc"
my_model_path = pathlib.Path(tmp_dir.name) / my_model_name
my_model_path.mkdir(exist_ok=True)

# Get a copy of the "imma1" schema
icoads_schema_path = icoads_code_tables_path = get_files(f"{base}.schemas.icoads")
icoads_schema_path = pathlib.Path(icoads_schema_path) / "icoads.json"

my_schema_path = my_model_path / (my_model_name + ".json")
copy = shutil.copyfile(icoads_schema_path, my_schema_path)

Copy the code tables¶

We now copy each of the "icoads" code tables. This includes generic icoads code tables (located in mdf_reader/codes/icoads).

# Get code tables and copy to the directory
my_code_tables_path = my_model_path / "code_tables"
my_code_tables_path.mkdir(exist_ok=True)

# Original code table directories (general ICOADS and Deck specific)
icoads_code_tables_path = get_files(f"{base}.codes.icoads")

# Get filenames for each of the code tables
code_table_files = list(icoads_code_tables_path.glob("ICOADS.*.json"))

# Copy each file
for file in code_table_files:
    basename = pathlib.Path(file).name
    out_path = my_code_tables_path / basename
    shutil.copyfile(file, out_path)

Extending the schema: CLIWOC logbook information¶

For this example we’ll load the schema into the environment as a dictionary (we use an ordered dictionary to guarantee that the ordering of the fields is maintained!).

with pathlib.Path(my_schema_path).open() as io:
    schema = json.load(io, object_pairs_hook=OrderedDict)

We now add the contents for section c99. There are some standard (“header”_ fields we need to supply. The "sentinal" is the prefix for the attachment, this is printed in the raw supplementary data and identifies the start of the attachment.

We also need to specify the length of the attachment and the layout.

We then add our data fields to the elements field for the c99 section. We’ll add the fields for the logbook component of the supplementary data for CLIWOC data, there are additional components we can resolve but we’ll keep it to the logbook for this example.

schema["sections"]["c99"]["header"]["sentinal"] = "99 0 "
schema["sections"]["c99"]["header"]["disable_read"] = False
schema["sections"]["c99"]["header"]["field_layout"] = "fixed_width"
schema["sections"]["c99"]["header"]["length"] = 245 + 5  # Sentinal length
schema["sections"]["c99"]["elements"] = OrderedDict(
    {
        "sentinal": {
            "description": "attachment sentinal",
            "field_length": 5,
            "column_type": "str",
            "ignore": True,
        },
        "InstAbbr": {
            "description": "Abbreviation of the Institute storing the original data",
            "field_length": 8,
            "column_type": "str",
        },
        "InstName": {
            "description": "Full name of the Institute storing the original data",
            "field_length": 50,
            "column_type": "str",
        },
        "InstCity": {
            "description": "City where the Institute storing the data is located",
            "field_length": 10,
            "column_type": "str",
        },
        "InstCountry": {
            "description": "Country where the Institute storing the data is located",
            "field_length": 14,
            "column_type": "str",
        },
        "ArchiveID": {
            "description": "Administrative number under which the data is found within the Institute storing the data",
            "field_length": 15,
            "column_type": "str",
        },
        "ArchiveName": {
            "description": "Administrative name under which the data is found within the Institute storing the data",
            "field_length": 17,
            "column_type": "str",
        },
        "ArchivePart": {
            "description": "Part of the archive set in which the data is found within the Institute storing the data",
            "field_length": 39,
            "column_type": "str",
        },
        "ArchivePartSpec": {
            "description": "Specification of the part of the archive set in which the data is found within the Institute storing the data",
            "field_length": 31,
            "column_type": "str",
        },
        "LogbookID": {
            "description": "Identificaion Number of the logbook containing the data",
            "field_length": 30,
            "column_type": "str",
        },
        "LogbookLang": {
            "description": "Language of the logbook containing the data",
            "field_length": 7,
            "column_type": "str",
        },
        "ImageID": {
            "description": "Identificaion Number of the original image of the logbook",
            "field_length": 23,
            "column_type": "str",
        },
        "IllustrationAvail": {
            "description": "Illustration available on the current page of the logbook",
            "field_length": 1,
            "column_type": "key",
            "codetable": "CLIWOC_ILLUSTRATION_I",
        },
    }
)

We can now write the dictionary to the schema file.

json_object = json.dumps(schema, indent=2)

with pathlib.Path(my_schema_path).open("w") as outfile:
    outfile.write(json_object)

`ImageAvail` Code Table¶

One of the fields we have added has "column_type" of "key". This is used to indicate categorical data, where the key value maps to a larger descriptive value. We also specified a code table for this field, which should describe that mapping. Let’s create that table now. As with the schema it should be json formatted.

For this field, we have two possible values. We save the dictionary to a json file in the code_tables directory, the name of the file must match the "codetable" value for the field (plus the ".json" extension).

illustration_avail_codes = {
    "0": "No illustration on the current logbook page.",
    "1": "Illustration available on the current logbook page.",
}
illustration_avail_path = my_code_tables_path / "CLIWOC_ILLUSTRATION_I.json"

json_object = json.dumps(illustration_avail_codes, indent=2)

with pathlib.Path(illustration_avail_path).open("w") as outfile:
    outfile.write(json_object)

Reading¶

We can now read the data file with the schema we have just created (copied…). We specify the path to the data model (the directory containing the schema json file) and the path to the code tables.

my_bundle = read_mdf(
    data_file_path,  # Path to the data file
    ext_schema_path=my_model_path,  # Path to the directory containing the schema json file
    ext_table_path=my_code_tables_path,  # Path to the directory containing the json code tables
)
my_data = my_bundle.data

ERROR:root:imodel is not defined.

Analysing the output¶

We can now investigate components of the c99 section.

my_data[["c99"]].head()

	c99
	InstAbbr	InstName	InstCity	InstCountry	ArchiveID	ArchiveName	ArchivePart	ArchivePartSpec	LogbookID	LogbookLang	ImageID	IllustrationAvail
0	AGI	ARCHIVO GENERAL DE INDIAS	SEVILLE	SPAIN	None	None	None	None	CORREOS, 275A R11	SPANISH	None	0
1	CARAN	CENTRE D'ACCUEIL ET DE RECHERCHE DES ARCH. NAT...	PARIS	FRANCE	None	None	None	None	COTE - 4/JJ/39	FRENCH	None	0
2	RAZ	RIJKSARCHIEF ZEELAND	MIDDELBURG	NEDERLAND	20	None	MCC	1391	MCC_20_1391	DUTCH	MCC_20_1391_0032	0
3	NMM	NATIONAL MARITIME MUSEUM	GREENWICH	UNITED KINGDOM	None	None	None	None	NMM ADM/L/R13	ENGLISH	None	0
4	AGI	ARCHIVO GENERAL DE INDIAS	SEVILLE	SPAIN	None	None	None	None	CORREOS, 193B R3	SPANISH	None	0

my_data[["c99"]].describe(include="all")

	c99
	InstAbbr	InstName	InstCity	InstCountry	ArchiveID	ArchiveName	ArchivePart	ArchivePartSpec	LogbookID	LogbookLang	ImageID	IllustrationAvail
count	5	5	5	5	1	0	1	1	5	5	1	5
unique	4	4	4	4	1	0	1	1	5	4	1	1
top	AGI	ARCHIVO GENERAL DE INDIAS	SEVILLE	SPAIN	20	NaN	MCC	1391	CORREOS, 275A R11	SPANISH	MCC_20_1391_0032	0
freq	2	2	2	2	1	NaN	1	1	1	2	1	5

Internal Schema¶

cdm_reader_mapper already includes a data model for the CLIWOC deck. The model parses all sections of supplementary data and provides all required code tables. Let’s now read in the data using the "icoads_r300_d730" model.

all_data = read_mdf(
    data_file_path,
    imodel="icoads_r300_d730",
)

WARNING:root:Unknown column_type 'object' for column '('c8', 'PUID')'

WARNING:root:Unknown column_type 'object' for column '('c95', 'ARCR')'

WARNING:root:Unknown column_type 'object' for column '('c96', 'ARCI')'

WARNING:root:Unknown column_type 'object' for column '('c97', 'ARCE')'

WARNING:root:Unknown column_type 'object' for column '('c99_sentinel', 'BLK')'

The c99 section has been split into multiple sections. There is no c99 section in the output, however we now have:

c99_logbook
c99_voyage
c99_data

We can compare the c99_logbook section to the output of our model. We see that we have extracted the same data, although we chose different column names for the elements.

all_data.data[["c99_logbook"]].describe(include="all")

	c99_logbook
	InstAbbr	InstName	InstPlace	InstLand	NumArchiveSet	NameArchiveSet	ArchivePart	Specification	Logbook_id	Logbook_language	Image_No	Illustr
count	5	5	5	5	1	0	1	1	5	5	1	5
unique	4	4	4	4	1	0	1	1	5	4	1	1
top	AGI	ARCHIVO GENERAL DE INDIAS	SEVILLE	SPAIN	20	NaN	MCC	1391	CORREOS, 275A R11	SPANISH	MCC_20_1391_0032	0
freq	2	2	2	2	1	NaN	1	1	1	2	1	5

my_data[["c99"]].describe(include="all")

	c99
	InstAbbr	InstName	InstCity	InstCountry	ArchiveID	ArchiveName	ArchivePart	ArchivePartSpec	LogbookID	LogbookLang	ImageID	IllustrationAvail
count	5	5	5	5	1	0	1	1	5	5	1	5
unique	4	4	4	4	1	0	1	1	5	4	1	1
top	AGI	ARCHIVO GENERAL DE INDIAS	SEVILLE	SPAIN	20	NaN	MCC	1391	CORREOS, 275A R11	SPANISH	MCC_20_1391_0032	0
freq	2	2	2	2	1	NaN	1	1	1	2	1	5

Additional Sections¶

We can also look at the additional components we did not parse in our model.

We can note some remaining issues with the model as we look at the extra data. Most of the challenges relate to language translations.

pd.options.display.max_columns = None
all_data.data[["c99_voyage"]].describe(include="all")

	c99_voyage
	drLatDeg	drLatMin	drLatSec	drLatHem	drLonDeg	drLonMin	drLonSec	drLonHem	LatDeg	LatMin	LatSec	LatHem	LonDeg	LonMin	LonSec	LonHem	LatInd	LonInd	ZeroMeridian	LMname1	LMdirection1	LMdistance1	LMname2	LMdirection2	LMdistance2	LMname3	LMdirection3	LMdistance4	PosCoastal	Calendar_type	logbook_date	TimeOB	Day_of_the_week	PartDay	Watch	Glasses	Start_day	ShipName	Nationality	Ship_type	Company	Name1	Rank1	Name2	Rank2	Name3	Rank3	voyage_from	voyage_to	Anchored_ind	AnchorPlace	DASno	VoyageIni	Course_ship	Ship_speed	Distance	EncName	EncNat
count	4.000000	4.000000	4.0	4	2.000000	2.00000	2.0	2	2.000000	2.000000	2.0	2	3.000000	3.000000	3.0	3	5	5	5	1	1	1.0	0	0	0.0	0	0	0.0	5	5	5	5.0	1	1	1	1.0	5	5	5	4	2	4	4	1	1	0	0	5	5	5	0	0	5	2	0	4	0	0
unique	NaN	NaN	NaN	2	NaN	NaN	NaN	1	NaN	NaN	NaN	2	NaN	NaN	NaN	2	2	3	4	1	1	NaN	0	0	NaN	0	0	NaN	1	1	1	NaN	1	1	1	<NA>	2	5	4	4	2	4	3	1	1	0	0	5	4	1	0	0	5	2	0	4	0	0
top	NaN	NaN	NaN	N	NaN	NaN	NaN	E	NaN	NaN	NaN	N	NaN	NaN	NaN	E	1	2	TENERIFE	LIZARD	N87:17E	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	2	17711001	NaN	TUESDAY	3	VM	<NA>	UNKNOWN	EL COLON	SPANISH	PAQUEBOTE	MCC	THOMAS D'ORVES	CAPITAN	CHARLES WARREN	2ND OFFICER/LIEUTENANT	NaN	NaN	LA HABANA	LA CORUÑA	0	NaN	NaN	17710819	WTZ	NaN	175.00	NaN	NaN
freq	NaN	NaN	NaN	3	NaN	NaN	NaN	2	NaN	NaN	NaN	1	NaN	NaN	NaN	2	3	2	2	1	1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	5	5	5	NaN	1	1	1	<NA>	4	1	2	1	1	1	2	1	1	NaN	NaN	1	2	5	NaN	NaN	1	1	NaN	1	NaN	NaN
mean	27.250000	24.250000	0.0	NaN	26.500000	36.00000	0.0	NaN	22.000000	9.500000	0.0	NaN	121.666667	42.666667	0.0	NaN	NaN	NaN	NaN	NaN	NaN	230.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	12.0	NaN	NaN	NaN	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
std	21.884165	16.879475	0.0	NaN	19.091883	19.79899	0.0	NaN	29.698485	13.435029	0.0	NaN	195.208436	11.239810	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
min	1.000000	5.000000	0.0	NaN	13.000000	22.00000	0.0	NaN	1.000000	0.000000	0.0	NaN	4.000000	33.000000	0.0	NaN	NaN	NaN	NaN	NaN	NaN	230.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	12.0	NaN	NaN	NaN	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25%	13.750000	17.000000	0.0	NaN	19.750000	29.00000	0.0	NaN	11.500000	4.750000	0.0	NaN	9.000000	36.500000	0.0	NaN	NaN	NaN	NaN	NaN	NaN	230.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	12.0	NaN	NaN	NaN	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
50%	29.500000	23.000000	0.0	NaN	26.500000	36.00000	0.0	NaN	22.000000	9.500000	0.0	NaN	14.000000	40.000000	0.0	NaN	NaN	NaN	NaN	NaN	NaN	230.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	12.0	NaN	NaN	NaN	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
75%	43.000000	30.250000	0.0	NaN	33.250000	43.00000	0.0	NaN	32.500000	14.250000	0.0	NaN	180.500000	47.500000	0.0	NaN	NaN	NaN	NaN	NaN	NaN	230.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	12.0	NaN	NaN	NaN	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
max	49.000000	46.000000	0.0	NaN	40.000000	50.00000	0.0	NaN	43.000000	19.000000	0.0	NaN	347.000000	55.000000	0.0	NaN	NaN	NaN	NaN	NaN	NaN	230.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	12.0	NaN	NaN	NaN	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

all_data.data[["c99_voyage"]].c99_voyage.ZeroMeridian.head()

   TENERIFE
  GREENWICH
    NL_0_01
    BERMUDA
   TENERIFE
Name: ZeroMeridian, dtype: object

Ship types and languages¶

For example, the ship types on this deck will be given in many different languages. There is no code table for this variable in the CLIWOC website.

all_data.data[["c99_voyage"]].c99_voyage.Ship_type.dropna().head()

  PAQUEBOTE
      SNAUW
   5TH RATE
   PAQUEBOT
Name: Ship_type, dtype: object

all_data.data[["c99_data"]].c99_data.describe(include="all")

	AT_reading_units	SST_reading_units	AP_reading_units	BART_reading_units	ReferenceCourse	ReferenceWindDirection	Decl	Distance_units	Distance_units_to_landmark	Distance_units_travelled	Longitude_units	units_of_measurement	humidity_units	water_at_pump_units	wind_scale	BARO_type	BARO_brand	API	Humidity_method	compas_error	compas_correction	AT_outside	SST	AP	wind_dir	current_dir	current_speed	attached_tem	pump_water	Humidity	wind_force	weather	prcp_descriptor	sea_state	shape_coulds	dir_coulds	Clearness	cloud_fraction	gusts	Rain	Fog	Snow	Thunder	Hail	Sea_ice	Trivial_correction	Release
count	0	0	0	0	2	5	5	0	1	4	5	0	0	0	0	0	0	0	0	0	0	0.0	0.0	0	5	0	0	0.0	0	0	5	2	0	4	0	0	0	0	5	5	5	5	5	5	5	5	5
unique	0	0	0	0	1	1	5	0	1	4	2	0	0	0	0	0	0	0	0	0	0	NaN	NaN	0	5	0	0	NaN	0	0	5	2	0	4	0	0	0	0	1	2	1	1	2	1	1	1	2
top	NaN	NaN	NaN	NaN	UNKNOWN	UNKNOWN	-20	NaN	LEAGUES	MILLAS	360 DEGREES	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	S	NaN	NaN	NaN	NaN	NaN	EN REFREGONES FUERTES Y DESPUES BONANCIBLE	MUY MALOS CARICES. AGUACEROS, RELAMPAGOS Y TRU...	NaN	GRANDE DEL O Y DEL ENE	NaN	NaN	NaN	NaN	0	0	0	0	0	0	0	0	CLIWOC VERSION 2.0
freq	NaN	NaN	NaN	NaN	2	5	1	NaN	1	1	3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1	NaN	NaN	NaN	NaN	NaN	1	1	NaN	1	NaN	NaN	NaN	NaN	5	4	5	5	4	5	5	5	4
mean	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
std	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
min	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25%	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
50%	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
75%	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
max	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Wind force scales and languages¶

What about the different scales for the wind force, given different languages?

all_data.data[["c99_data"]].c99_data.wind_force.head()

  EN REFREGONES FUERTES Y DESPUES BONANCIBLE
                                      FOIBLE
             STIJVE GEREEFDE MARSZEILSKOELTE
                     FRESH GALES AND SQUALLY
                                  BONANCIBLE
Name: wind_force, dtype: object