cdm_reader_mapper.DupDetect

cdm_reader_mapper.DupDetect#

class cdm_reader_mapper.DupDetect(data, compared, method, method_kwargs, compare_kwargs)[source]#

Class to detect, flag, and remove duplicate entries in a DataFrame using a comparison matrix from recordlinkage.

Parameters:
  • data (pd.DataFrame) – Original dataset.

  • compared (pd.DataFrame) – Comparison matrix of the dataset.

  • method (str) – Duplicate detection method used for recordlinkage indexing.

  • method_kwargs (dict) – Keyword arguments for recordlinkage indexing method.

  • compare_kwargs (dict) – Keyword arguments used for recordlinkage.Compare.

__init__(data, compared, method, method_kwargs, compare_kwargs)[source]#

Methods

__init__(data, compared, method, ...)

flag_duplicates([keep, limit, equal_musts])

Get result dataset with flagged duplicates.

get_duplicates([keep, limit, equal_musts, ...])

Identify duplicate matches based on the comparison matrix.

remove_duplicates([keep, limit, equal_musts])

Remove duplicate entries from the dataset.