cdm_reader_mapper.DupDetect.get_duplicates#
- DupDetect.get_duplicates(keep='first', limit='default', equal_musts=None, overwrite=True)[source]#
Identify duplicate matches based on the comparison matrix.
- Parameters:
keep (
strorint) – Which entry to keep: ‘first’, ‘last’, or -1, 0.limit (
strorfloat, optional) – Threshold of total similarity score to consider as duplicate.equal_musts (
strorlist[str], optional) – Columns that must exactly match.overwrite (
bool) – Whether to recompute matches if already calculated.
- Return type:
- Returns:
pd.DataFrame– DataFrame containing matched duplicates.