excel - removing duplicates using 2 columns -
i trying remove duplicate id's in sheet. example here few rows of data
id | department | sales | update date 1 | sales | 100 | 2 | marketing | 100 | 2 | marketing | 200 | 30/06/2015 2 | marketing | 300 | 05/07/2015 i want remove duplicate id's base on update date column. want have following remaining:
id | department | sales | update date 1 | sales | 100 | 2 | marketing | 300 | 05/07/2015 so checks latest update row of id , removes others.
any advice on using vba or macro great form part of automated script.
one way achieve want read rows , iterate through each duplicate row , find keep, delete based on finding highest update_date.
i have wrote macro that. here code:
first: create blank module in vba editor , paste following code:
public type row id string updated date row_number integer 'to know rows delete later is_duplicate boolean 'to mark if current row duplicate to_keep boolean 'to decide whether keep or delete verified boolean 'needed avoid evaluating rows same id end type sub removeduplicates() range("a2").select 'go first row dim cnt integer 'keep record of how many rows cnt = 0 'begin empty array dim rows() row 'declared without count '== step 1: read data , store in array =============== while activecell.value <> "" cnt = cnt + 1 redim preserve rows(cnt) 'expand size of array 1 rows(cnt - 1).row_number = activecell.row 'keep record of current row address if activecell.offset(1, 0).value = activecell.value or _ activecell.offset(-1, 0).value = activecell.value 'if cell above/below has samve id current cell, it's duplicates rows(cnt - 1).is_duplicate = true else rows(cnt - 1).is_duplicate = false end if rows(cnt - 1).id = activecell.value 'store id rows(cnt - 1).updated = activecell.offset(0, 3).value 'store date activecell.offset(1, 0).select 'move next row below loop '=== step 2: iterating throw array , deciding keep, delete ========= = 0 cnt - 1 if rows(i).is_duplicate , not rows(i).verified 'the current id duplicated, , of other rows same id has not been verified find_to_keep rows, rows(i).id, cnt 'helper sub analyze each row end if next '==== step 3: iterating throw array delete ones marked delete ========== = cnt - 1 0 step -1 'we have reverse order because deleted rows contain data other valid rows if rows(i).is_duplicate , not rows(i).to_keep 'if current row duplicate , not marked (to keep) must deleted dim r integer r = rows(i).row_number 'get rows number (address) of row range(r & ":" & r).entirerow.delete shift:=xlshiftup 'delete row , shift other rows below end if next end sub sub find_to_keep(byref rows() row, byval id string, byval cnt integer) dim max_date date 'temparary variable hold maximum date dim to_keep integer 'temporary variable hold location of row keep ' -- step a: go throw array , find rows id specified in sub parameter = 0 cnt - 1 if rows(i).id = id 'if row has date higher our current max_date, read date if rows(i).updated > max_date max_date = rows(i).updated to_keep = end if end if next '-- step b: know row keep, need do: ' 1- mark other rows having same id verified (to avoid looping through them again) ' 2- mark row highest date (to_keep) = true = 0 cnt - 1 if rows(i).id = id if = to_keep rows(i).to_keep = true else rows(i).to_keep = false end if rows(i).verified = true end if next end sub here looks like: 
and if like, have attached entire workbook reference: remove_duplicates.xlsm
Comments
Post a Comment