cdm_reader_mapper.DupDetect.remove_duplicates

cdm_reader_mapper.DupDetect.remove_duplicates#

DupDetect.remove_duplicates(keep='first', limit='default', equal_musts=None)[source]#

Remove duplicate entries from the dataset.

Parameters:
  • keep (str or int) – Which entry to keep (‘first’ or ‘last’).

  • limit (str or float, optional) – Minimum similarity score to declare duplicates.

  • equal_musts (str or list[str], optional) – Columns that must exactly match.

Return type:

DataFrame

Returns:

pd.DataFrame – Dataset without duplicates.