• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Can't shake off duplicates efficiently using macro

shahin

Active Member
I've written a macro to remove duplicate values by deleting the entire row. The macro is supposed to compare the value of Range A1 to any of the Range underneath within the same column to be sure that they are not the same. If it is then it will delete the entire row in which duplicate value is found. My macro can do that but I need to run the macro twice to get the whole thing done because after first run i can still see some duplicates. However, when i run it twice or thrice, it can shake off all duplicate values. One more thing: it takes almost 3 minutes to shake off 2500 duplicates out of 6000. That means it works slowly as well. What change should i make in my macro to do the whole stuff in a single run and within fewer times possible?

Here is what I was trying with:

Code:
    Sub Dup_removal()
        Application.ScreenUpdating = False
        Dim i As Long
        Dim j As Long
        i = 2
        Do Until Cells(i, 1).Value = ""
            j = i + 1
            Do Until Cells(j, 1).Value = ""
                If Cells(i, 1).Value = Cells(j, 1).Value Then
                    Cells(j, 1).Select
                    Selection.EntireRow.Delete
                End If
                j = j + 1
            Loop
            i = i + 1
        Loop
        Application.ScreenUpdating = True
    End Sub
 
Hi ,

The fastest way to weed out duplicates would be to use a dictionary to store the keys , in this case the values in column A.

For each value , use the Exists method to check whether the key already exists in the dictionary ; if it does , then delete that particular row and move on to the next.

In deleting rows , start from the last row of data and work your way upwards towards the first row of data.

Another option is to first sort the entire data on the key column ; this way all duplicates will be clustered together , and you can do a wholesale delete for each set of duplicates.

Narayan
 
Last edited:
It is slow because you are working directly off the Worksheet Cells

Load the whole area into an array in VBA

Then check for duplicates as Narayan mentioned using a dictionary

Super fast
 
Hi ,

Try this :
Code:
Public Sub RemoveDuplicates()
          Debug.Print Time
          Dim dict As Scripting.Dictionary
          Dim cell As Range
          Dim currval As Variant
        
          Set dict = New Scripting.Dictionary
          lastrow = Me.Range("A" & Rows.Count).End(xlUp).Row
          j = 1
        
          For i = lastrow To 2 Step -1
              With Me.Cells(i, 1)
                    currval = .Value
            
                    If dict.Exists(currval) Then
                      .EntireRow.Delete
                    Else
                      dict.Add currval, j
                      j = j + 1
                    End If
                End With
          Next
          Debug.Print Time
End Sub
Narayan
 
You are awesome, Narayan. It's a great way to accomplish this type of thing. Thanks a lot.
 
@Narayan, It took 3 seconds only to do the whole operation. Btw, why "Me" keyword was putting a barrier? I ran the macro after taking it out, though! Forgive my ignorance.
 
Last edited:
Hi ,

You might have copied the code into a Module ; the Me keyword works in the Worksheet section , where it refers to the worksheet which has the code ; in a Module , it will generate an error.

Narayan
 
What if there are several duplicates in several columns and I would like to delete them all. The other day, the below code Narayan provided me to remove duplicates which is indeed a damn efficient one. However, when i try to give it a little twitch to satisfy my need, it neither works nor throws any error. Definitely I'm doing something wrong here but unable to figure it out.
Code:
Sub RemoveDuplicates()
      Dim dict As Scripting.Dictionary
      Dim cell As Range
      Dim currval As Variant
  
      Set dict = New Scripting.Dictionary
      lastrow = Range("A" & Rows.Count).End(xlToRight).End(xlUp).Row
      j = 1 
      For i = lastrow To 2 Step -1
          With Cells(i, 1) Or Cells(i, 2) Or Cells(i, 3)
                currval = .Value
      
                If dict.Exists(currval) Then
                  .EntireRow.Delete
                Else
                  dict.Add currval, j
                  j = j + 1
                End If
            End With
      Next
End Sub
 
Last edited:
Hi ,

This is where the Debug mode and the Immediate window come in handy.

In the Immediate window , type in the following and see what is displayed :

?Range("A" & Rows.Count).Address

?Range("A" & Rows.Count).End(xlToRight).Address

?Range("A" & Rows.Count).End(xlToRight).End(xlUp).Address

You will be able to recognize the problem when you see the value of lastrow , and how it will impact the For ... Next loop.

Narayan
 
Hi ,

However , this will also not work :

With Cells(i, 1) Or Cells(i, 2) Or Cells(i, 3)

You will have to rewrite the code as follows :
Code:
Sub RemoveDuplicates()
      Dim dict As Scripting.Dictionary
      Dim cell As Range
      Dim currval As Variant
      Dim found As Long

      Application.ScreenUpdating = False
    
      Set dict = New Scripting.Dictionary
      lastrow = Range("A" & Rows.Count).End(xlUp).Row
      j = 1
      For i = lastrow To 1 Step -1
          found = 0
          currval1 = Cells(i, 1).Value

          If dict.Exists(currval1) Then
            found = 1
          End If
        
          currval2 = Cells(i, 2).Value
          If dict.Exists(currval2) Then
            found = found + 2
          End If
              
          currval3 = Cells(i, 3).Value
          If dict.Exists(currval3) Then
            found = found + 4
          End If
          
          If found Mod 2 = 0 Then
            dict.Add currval1, j
            j = j + 1
          End If
          
          If (found mod 4) <= 1 Then
            dict.Add currval2, j
            j = j + 1
          End If
          
          If found < 4 Then
            dict.Add currval3, j
            j = j + 1
          End If
        
'        The following check deletes the row if all three cells have duplicates
'        If you wish to delete a row if even one of the cells has a duplicate then
'        check for :
'        If found <> 0 Then

          If found = 7 Then
            Cells(i, 1).EntireRow.Delete
          End If
      Next
    
      Application.ScreenUpdating = True
End Sub
Narayan
 
Last edited:
Hi ,

I just noticed that the logic for adding items to the dictionary is wrong.

The second check should be :

If (found mod 4) <= 1 Then

instead of :

If (found mod 4) = 0 Then


Narayan
 
Last edited:
Back
Top