cdm_reader_mapper.metmetpy package¶
Internal metmetpy information package.
- cdm_reader_mapper.metmetpy.correct_datetime(data, imodel, log_level='INFO', base=None)[source]¶
Apply ICOADS deck specific datetime corrections.
- Parameters:
data (
pandas.DataFrameorIterable[pd.DataFrame]) – Input dataset.imodel (
str) – Name of internally available data model, e.g. icoads_d300_704.log_level (
str, default:INFO) – Level of logging information to save.base (
str, optional) – Base path for datetime correction metadata. If None use internal correction path.
- Return type:
- Returns:
pandas.DataFrameorIterable[pd.DataFrame]– A pandas.DataFrame or Iterable[pd.DataFrame] with the adjusted data.- Raises:
ValueError – If _correct_dt raises an error during correction.
TypeError – If data is not a pd.DataFrame or an Iterable[pd.DataFrame]. If data is a pd.Series.
- cdm_reader_mapper.metmetpy.correct_pt(data, imodel, log_level='INFO', base=None)[source]¶
Apply ICOADS deck specific platform ID corrections.
- Parameters:
data (
pandas.DataFrameorIterable[pd.DataFrame]) – Input dataset.imodel (
str) – Name of internally available data model, e.g. icoads_d300_704.log_level (
str, default:INFO) – Level of logging information to save.base (
str, optional) – Base path for datetime correction metadata. If None use internal correction path.
- Return type:
- Returns:
pandas.DataFrameorIterable[pd.DataFrame]– A pandas.DataFrame or Iterable[pd.DataFrame] with the adjusted data.- Raises:
ValueError – If _correct_pt raises an error during correction. If platform column is not defined in properties file.
TypeError – If data is not a pd.DataFrame or an Iterable[pd.DataFrame]. If data is a pd.Series.
- cdm_reader_mapper.metmetpy.validate_datetime(data, imodel, blank=False, log_level='INFO')[source]¶
Validate datetime columns in a dataset according to the specified model.
- Parameters:
data (
pd.DataFrame,pd.Series, orIterable[pd.DataFrame,pd.Series]) – Input dataset or series containing ID values.imodel (
str) – Name of internally available data model, e.g., “icoads_r300_d201”.blank (
bool, optional) – If True, empty values are considered valid. Default is False.log_level (
str, optional) – Logging level. Default is “INFO”.
- Return type:
- Returns:
pd.SeriesorNone– Boolean Series indicating whether each ID is valid. Returns None if validation cannot be performed due to missing data, columns, or deck definitions.- Raises:
TypeError – If data is not a pd.DataFrame or a pd.Series or an Iterable[pd.DataFrame | pd.Series].
ValueError – If no columns found for datetime conversion.
- cdm_reader_mapper.metmetpy.validate_id(data, imodel, blank=False, log_level='INFO')[source]¶
Validate ID column(s) in a dataset against deck-specific patterns.
- Parameters:
data (
pd.DataFrame,pd.Series, orIterable[pd.DataFrame,pd.Series]) – Input dataset or series containing ID values.imodel (
str) – Name of internally available data model, e.g., “icoads_r300_d201”.blank (
bool, optional) – If True, empty values are considered valid. Default is False.log_level (
str, optional) – Logging level. Default is “INFO”.
- Return type:
- Returns:
pd.SeriesorNone– Boolean Series indicating whether each ID is valid. Returns None if validation cannot be performed due to missing data, columns, or deck definitions.- Raises:
TypeError – If data is not a pd.DataFrame or a pd.Series or an Iterable[pd.DataFrame | pd.Series].
Value Error – If dataset imodel has no deck information. If no ID conversion columns found. If input deck is not defined in ID library files.
FilenotFounderror – If dataset imodel has no ID deck library.
Notes
Uses _get_id_col to determine which column(s) contain IDs.
Uses _get_patterns to get regex patterns for the deck.
Empty values match “^$” pattern if blank=True.
Subpackages¶
Submodules¶
cdm_reader_mapper.metmetpy.correct module¶
Initial metmetpy correction package.
Created on Tue Jun 25 09:00:19 2019
Corrects datetime fields from a given deck in a data model.
To account for dataframes stored in TextParsers and for eventual use of data columns other than those to be fixed in this or other metmetpy modules, the input and output are the full data set.
Correctionsare data model and deck specific and are registered in ./lib/data_model.json: multiple decks in the same input data are not supported.
Reference names of different metadata fields used in the metmetpy modules and its location column|(section,column) in a data model are registered in ../properties.py in metadata_datamodels.
If the data model is not available in ./lib it is assumed to no corrections are needed. If the data model is not available in metadata_models, the module will return with no output (will break full processing downstream of its invocation) logging an error.
Corrects the platform type field of data from a given data model. To account for dataframes stored in TextParsers and for eventual use of data columns other than those to be fixed (dependencies) in this or other metmetpy modules, the input and output are the full data set.
Correction to apply is data model and deck specific and is registered in ./lib/data_model.json: multiple decks in input data are not supported.
The ones in imma1 (only available so far) come from Liz’s construct_monthly_files.R. PT corrections are rather simple with no dependencies other than dck and can be basically classified in:
for a set of decks, set missing PT to known type 5.
- for a set of decks, set PT=4,5 to 99: state nan. This decks are mainly
buoys, misc (rigs, etc…) Why?, is it just to filter out from the processing ship data from decks where you do not expect to have them? This does not apply here, it is not an error of the metadata per se, we will select PT on a deck specific basis, SO THIS IS OBVIOUSLY NOT APPLIED HERE
- for a set of sid-dck (2), with ship data, numeric id thought to be buoy
(moored-6 of drifting-7, ?): set to 6,7? which, not really important so far, we just want to make sure it is not flagged as a ship….
Reference names of different metadata fields used in the metmetpy modules and its location column|(section,column) in a data model are registered in ../properties.py in metadata_datamodels.
If the data model is not available in ./lib or in metadata_models, the module will return with no output (will break full processing downstream of its invocation) logging an error.
@author: iregon
- cdm_reader_mapper.metmetpy.correct.correct_datetime(data, imodel, log_level='INFO', base=None)[source]¶
Apply ICOADS deck specific datetime corrections.
- Parameters:
data (
pandas.DataFrameorIterable[pd.DataFrame]) – Input dataset.imodel (
str) – Name of internally available data model, e.g. icoads_d300_704.log_level (
str, default:INFO) – Level of logging information to save.base (
str, optional) – Base path for datetime correction metadata. If None use internal correction path.
- Return type:
- Returns:
pandas.DataFrameorIterable[pd.DataFrame]– A pandas.DataFrame or Iterable[pd.DataFrame] with the adjusted data.- Raises:
ValueError – If _correct_dt raises an error during correction.
TypeError – If data is not a pd.DataFrame or an Iterable[pd.DataFrame]. If data is a pd.Series.
- cdm_reader_mapper.metmetpy.correct.correct_pt(data, imodel, log_level='INFO', base=None)[source]¶
Apply ICOADS deck specific platform ID corrections.
- Parameters:
data (
pandas.DataFrameorIterable[pd.DataFrame]) – Input dataset.imodel (
str) – Name of internally available data model, e.g. icoads_d300_704.log_level (
str, default:INFO) – Level of logging information to save.base (
str, optional) – Base path for datetime correction metadata. If None use internal correction path.
- Return type:
- Returns:
pandas.DataFrameorIterable[pd.DataFrame]– A pandas.DataFrame or Iterable[pd.DataFrame] with the adjusted data.- Raises:
ValueError – If _correct_pt raises an error during correction. If platform column is not defined in properties file.
TypeError – If data is not a pd.DataFrame or an Iterable[pd.DataFrame]. If data is a pd.Series.
cdm_reader_mapper.metmetpy.properties module¶
Internal metmetpy properties.
Created on Wed Jul 10 09:18:41 2019
@author: iregon
cdm_reader_mapper.metmetpy.validate module¶
Internal metmetpy validation package.
Created on Tue Jun 25 09:00:19 2019
- Validates the datetime fields of a data model:
-1. extracts or creates the datetime field of a data model as defined in submodule model_datetimes. -2. validates to False where NaT: no datetime or conversion to datetime failure
Validation is data model specific.
Output is a boolean series.
Does not account for input dataframes/series stored in TextParsers: as opposed to correction modules, the output is only a boolean series which is external to the input data ….
If the datetime conversion (or extraction) for a given data model is not available in submodule model_datetimes, the module will return with no output (will break full processing downstream of its invocation) logging an error.
Reference names of different metadata fields used in the metmetpy modules and its location column|(section,column) in a data model are registered in ../properties.py in metadata_datamodels.
NaN, NaT: will validate to False.
Validates ID field in a pandas dataframe against a list of regex patterns. Output is a boolean series.
Does not account for input dataframes/series stored in TextParsers: as opposed to correction modules, the output is only a boolean series which is external to the input data ….
- Validations are dataset and deck specific following patterns stored in
./lib/dataset.json.: multiple decks in input data are not supported.
If the dataset is not available in the lib, the module will return with no output (will break full processing downstream of its invocation) logging an error.
ID corrections assume that the id field read from the source has been white space stripped. Care must be taken that the way a data model is read before input to this module, is coherent to the way patterns are defined for that data model.
NaN: will validate to true if blank pattern (‘^$’) in list, otherwise to False.
If patterns:{} for dck (empty but defined in data model file), will warn and validate all to True, with NaN to False
@author: iregon
- cdm_reader_mapper.metmetpy.validate.validate_datetime(data, imodel, blank=False, log_level='INFO')[source]¶
Validate datetime columns in a dataset according to the specified model.
- Parameters:
data (
pd.DataFrame,pd.Series, orIterable[pd.DataFrame,pd.Series]) – Input dataset or series containing ID values.imodel (
str) – Name of internally available data model, e.g., “icoads_r300_d201”.blank (
bool, optional) – If True, empty values are considered valid. Default is False.log_level (
str, optional) – Logging level. Default is “INFO”.
- Return type:
- Returns:
pd.SeriesorNone– Boolean Series indicating whether each ID is valid. Returns None if validation cannot be performed due to missing data, columns, or deck definitions.- Raises:
TypeError – If data is not a pd.DataFrame or a pd.Series or an Iterable[pd.DataFrame | pd.Series].
ValueError – If no columns found for datetime conversion.
- cdm_reader_mapper.metmetpy.validate.validate_id(data, imodel, blank=False, log_level='INFO')[source]¶
Validate ID column(s) in a dataset against deck-specific patterns.
- Parameters:
data (
pd.DataFrame,pd.Series, orIterable[pd.DataFrame,pd.Series]) – Input dataset or series containing ID values.imodel (
str) – Name of internally available data model, e.g., “icoads_r300_d201”.blank (
bool, optional) – If True, empty values are considered valid. Default is False.log_level (
str, optional) – Logging level. Default is “INFO”.
- Return type:
- Returns:
pd.SeriesorNone– Boolean Series indicating whether each ID is valid. Returns None if validation cannot be performed due to missing data, columns, or deck definitions.- Raises:
TypeError – If data is not a pd.DataFrame or a pd.Series or an Iterable[pd.DataFrame | pd.Series].
Value Error – If dataset imodel has no deck information. If no ID conversion columns found. If input deck is not defined in ID library files.
FilenotFounderror – If dataset imodel has no ID deck library.
Notes
Uses _get_id_col to determine which column(s) contain IDs.
Uses _get_patterns to get regex patterns for the deck.
Empty values match “^$” pattern if blank=True.