cdm_reader_mapper.split_by_column_entries

cdm_reader_mapper.split_by_column_entries#

cdm_reader_mapper.split_by_column_entries(data, selection, reset_index=False, inverse=False, return_rejected=False)[source]#

Split a DataFrame based on matching values in a given column.

Parameters:
  • data (pandas.DataFrame) – DataFrame to be split.

  • selection (dict) – Mapping of a column name to an iterable of allowed values. Example: {"city": ["London", "Berlin"]}.

  • reset_index (bool, optional) – Whether to reset index in returned DataFrames.

  • inverse (bool, optional) – If True, invert the selection.

  • return_rejected (bool, optional) – If True, return rejected rows as the second output. If False, the rejected output is empty but dtype-preserving.

Return type:

tuple[DataFrame | ParquetStreamReader, DataFrame | ParquetStreamReader, Index | MultiIndex, Index | MultiIndex]

Returns:

(pandas.DataFrame or ParquetStreamReader, pandas.DataFrame or ParquetStreamReader, pd.Index or pd.MultiIndex, pd.Index or pd.MultiIndex) – Selected rows (all mask columns True), rejected rows, original indexes of selection and original indexes of rejection.