• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Identifying Row Duplicates Based on Columns

willetts1

New Member
I am doing an analysis on strength of schedule for some sports teams and instead of having the actual results data, I have to rely on pulling the data from the teams' schedules via Web scrape.

For example, I look at the Oklahoma schedule they play Texas on October 24th and win 31-17. Conversely the Texas schedule reflects the game on the 24th with a loss of 17-31. I only want the results once to avoid double counting. I have 200+ teams * ~20 games per team with multiple overlaps.

I have also uploaded a small sample workbook. Thanks.
 

Attachments

  • CHANDOO_SAMPLE.xlsx
    9.1 KB · Views: 2
Something like this?
=COUNTIFS($A$2:A2,A2,$B$2:B2,B2,$C$2:C2,C2)+COUNTIFS($A$2:A2,A2,$B$2:B2,C2,$C$2:C2,B2)

Copy down. Any row indicated with number greater than 1 is duplicate.

Filter on the column and delete anything above 1.
 

Attachments

  • CHANDOO_SAMPLE.xlsx
    10.1 KB · Views: 5
If 'non-formula'-sample would be possible ...
then test this...
Copy Your data as in sample and ...
Click [Date] to see all rows and
Click [Team] to see less rows.
> Ideas? ... Questions?
 

Attachments

  • CHANDOO_SAMPLE.xlsb
    17 KB · Views: 3
@Chichiro thanks, I should've been a little more detailed. I added the game # to show the related games. That field does not exist in the data pull. I have Date, Time, Location, Team, Opponent and Results.

@vletm thanks as well, however, I have over 1,500 games to cull through. I'd like something a little more automated.
 
? Can't you just add the column after data pull and clean?

If you want to remove "duplicate" during the import process. I'd need to see how the info is actually imported to Excel.

Is it via code (MSXML2), PowerQuery, or some other method?
 
? Can't you just add the column after data pull and clean?

If you want to remove "duplicate" during the import process. I'd need to see how the info is actually imported to Excel.

Is it via code (MSXML2), PowerQuery, or some other method?

Web scrape to CSV
 
Right... so is that via code? How is info written? What conditions are used to pull info? Without it, only thing I can suggest is to clean the data after it's been imported.
 
@willetts1
... a little more automated?
1) I don't have Your data ... so You have to copy Yourself.
2) Of course, that could work at once after You have copied that data.
3) Something else?
Which part would be 'a little more automated'?
 
@Chihiro, @vletm Thanks guys.

I'll continue to play around with this myself. The issue is not on the data pull nor do I need that automated. I've got that down and don't need help there.

What I was looking for was a formula (or VBA) to identify duplicate matchups as was shown in the sample file. I am not pulling results but rather from team schedules and since there's overlap in those schedules you will have dupes.

Look at a football schedule, e.g., Chelsea at Man City, Man City vs Chelsea, etc. I only want a single instance and result. Not both.
 
Last edited:
Look at a football schedule, e.g., Chelsea at Man City, Man City vs Chelsea, etc. I only want a single instance and result. Not both.
... You get single results after press [Team]-button, did You press it?
 
... my formula also does exactly that. It will identify any match that's duplicate. Just use autofilter on the column and delete any showing 2+.
 
Look at a football schedule, e.g., Chelsea at Man City, Man City vs Chelsea, etc. I only want a single instance and result. Not both.
... You get single results after press [Team]-button, did You press it?

That works with one caveat--some teams play 2 games in a day. I suspect there I could expand the VBA to include Time and the score.

Thanks again.
 
Do You mean that
if Chelsea - Man City play from 10:00 with result 1-2 and
same day Man City - Chelsea from 17:30 with result 2-1?
or Chelsea play the 2nd afternoon game with other team than Man City?
That my 'sample' works with Your original case with those informations You have given. It's possible to modify, but it's challenge without even idea of that and some missing information.
>> just copy+paste and press [Team] >> Ready
 
Back
Top