cdm_reader_mapper.DupDetect.flag_duplicates

cdm_reader_mapper.DupDetect.flag_duplicates#

DupDetect.flag_duplicates(keep='first', limit='default', equal_musts=None)[source]#

Get result dataset with flagged duplicates.

Parameters:
  • keep (str, [``”first”, ``"last"]) – Which entry should be kept in result dataset.

  • limit (float, optional) – Limit of total score that as to be exceeded to be declared as a duplicate. Default: .991

  • equal_musts (str or list, optional) – Hashable of column name(s) that must totally be equal to be declared as a duplicate. Default: All column names found in method_kwargs.

Return type:

DataFrame

Returns:

pandas.DataFrame – Input DataFrame with flagged duplicates. Flags for duplicate_status: see duplicate_status Flags for report_quality: see quality_flag