cdm_reader_mapper.DupDetect#
- class cdm_reader_mapper.DupDetect(data, compared, method, method_kwargs, compare_kwargs)[source]#
Class to detect, flag, and remove duplicate entries in a DataFrame using a comparison matrix from recordlinkage.
- Parameters:
data (
pd.DataFrame) – Original dataset.compared (
pd.DataFrame) – Comparison matrix of the dataset.method (
str) – Duplicate detection method used for recordlinkage indexing.method_kwargs (
dict) – Keyword arguments for recordlinkage indexing method.compare_kwargs (
dict) – Keyword arguments used for recordlinkage.Compare.
Methods
__init__(data, compared, method, ...)flag_duplicates([keep, limit, equal_musts])Get result dataset with flagged duplicates.
get_duplicates([keep, limit, equal_musts, ...])Identify duplicate matches based on the comparison matrix.
remove_duplicates([keep, limit, equal_musts])Remove duplicate entries from the dataset.