cdm_reader_mapper.DupDetect.flag_duplicates#
- DupDetect.flag_duplicates(keep='first', limit='default', equal_musts=None)[source]#
Get result dataset with flagged duplicates.
- Parameters:
keep (
str,[``”first”, ``"last"]) – Which entry should be kept in result dataset.limit (
float, optional) – Limit of total score that as to be exceeded to be declared as a duplicate. Default: .991equal_musts (
strorlist, optional) – Hashable of column name(s) that must totally be equal to be declared as a duplicate. Default: All column names found in method_kwargs.
- Return type:
- Returns:
pandas.DataFrame– Input DataFrame with flagged duplicates. Flags forduplicate_status: see duplicate_status Flags forreport_quality: see quality_flag