cdm_reader_mapper.unique

Contents

cdm_reader_mapper.unique#

cdm_reader_mapper.unique(data, columns=None)#

Count unique values per column in a DataFrame or a Iterable of DataFrame.

Parameters:
  • data (pandas.DataFrame or Iterable[pd.DataFrame]) – Input dataset.

  • columns (str, list or tuple, optional) – Name(s) of the data column(s) to be selected. If None, all columns are used.

Return type:

dict[str, dict[Any, int]]

Returns:

Dict[str, Dict[Any, int]] – Dictionary where each key is a column name, and each value is a dictionary mapping unique values (including NaN as ‘nan’) to their counts.

Notes

  • Works with large files via ParquetStreamReader by iterating through chunks.