• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Finding out what's missing

David A Rainey

New Member
Every month I download a listing of credit card purchases from our processor. The listing includes date, amount, order number, name, and partial card number. This list is usually about 10,000 lines long. The next report I download is from our storefront operator. This list also contains the same data. The problem is they will never match for various reasons. This could be due to shipping date etc.

In order to reconcile this list I need to find out which orders are unique, appear on one list and not the other, and vice versa. What would be the best way to do this?
 
Without dummy data, hard to say.

Most often culprits are one or more of following:
1. Invisible character
2. DateTime formatting difference
3. Special Character
4. Trailing or preceding space(s)

For credit card transaction, there should be unique transaction ID that you should be able to match on.

Edit: woops I miss-read your post. See link below for method(s) you can use.
http://chandoo.org/forum/threads/finding-the-differences-between-two-sheets.31436/#post-187006
 
Can you post some dummy or sanitized data so that folks can get a better idea of what you're comparing?
Anything you can provide prevents us making assumptions, which tend to be incorrect, or at least, misleading ...
 
Here is a sample file. It is two separate downloads, pasted side by side, but the actual download would be about 6,000 records each. Ultimately I need to verify the credit card deposit (right hand sight) to the output data from the store (left hand side). They will never match due to different posting dates, shipping credits, numerous other issues. So ultimately I need to find out which data on the left column does not have a matching entry on the right column, and which data on the right column does not have a matching entry on the left column.

MS Query Subtract/Join will probably work if I can figure out how to do the query. I couldn't open the sample file, the format couldn't be recognized. I have Excel 2000.

For some reason I can't attach the actual file, so I had to paste it in the message.
order # order date amount name cc type cc number order # order date amount name cc type cc number
125445 10/20/2016 325 joe e. amex xxxx1023 125445 10/20/2016 325 joe e. amex xxxx1023
125446 10/20/2016 170 joe f amex xxxx1045 125446 10/20/2016 170 joe f amex xxxx1045
125449 10/20/2016 35 name amex xxxx6543 125449 10/20/2016 35 name amex xxxx6543
125450 10/20/2016 30 name amex xxxx9301 125450 10/20/2016 30 name amex xxxx9301
125451 10/20/2016 65 name amex xxxx4503 125451 10/25/2016 65 name amex xxxx4503
125452 10/20/2016 65 name amex xxxx4503 125453 10/1/2016 65 name amex xxxx4500
125453 10/20/2016 65 name amex xxxx4500 125454 10/20/2016 78 name amex xxxx1023
125454 10/20/2016 78 name amex xxxx1023 125455 10/20/2016 60 name amex xxxx1045
125455 10/20/2016 60 name amex xxxx1045 125456 10/20/2016 90 name amex xxxx6543
125456 10/20/2016 90 name amex xxxx6543 125457 10/20/2016 123 name amex xxxx9301
125457 10/20/2016 123 name amex xxxx9301 125449 10/26/2016 -18 name amex xxxx6543
125458 10/20/2016 65 name amex xxxx4503 125458 10/26/2016 65 name amex xxxx4503
125459 10/20/2016 65 name amex xxxx4503 125460 10/20/2016 65 name amex xxxx4500
125460 10/20/2016 65 name amex xxxx4500 125461 10/20/2016 78 name amex xxxx4503
125461 10/20/2016 78 name amex xxxx4503 125462 10/20/2016 60 name amex xxxx4503
125462 10/20/2016 60 name amex xxxx4503 125463 10/20/2016 90 name amex xxxx4500
125463 10/20/2016 90 name amex xxxx4500 125464 10/20/2016 123 name amex xxxx4503
125464 10/20/2016 123 name amex xxxx4503 125465 10/20/2016 65 name amex xxxx4503
125465 10/20/2016 65 name amex xxxx4503 125444 10/20/2016 100 name amex xxxx4500
125466 10/20/2016 100 name amex xxxx4500 125467 10/15/2016 325 name visa xxxx3356
125490 9/20/2016 200 name visa xxxx6577
 
Hi ,

This posting of data does not help much.

Can you clarify the following ?

1. Is the data in two different files , or do you download the data into one file ? How is it that the data when downloaded from two different sources at different times is appearing side by side ?

If we wish to carry out a match using Excel functions , there is no necessity for the data to be side by side ; they can easily be in two different worksheets of a workbook , or even in two different workbooks.

Having the data in two different worksheets in one workbook would make things easiest.

2. If we are to compare based on the order number , two things need to be fulfilled :
  • there should be exactly one entry per order number in each source of data
  • the data pertaining to an order number should appear in one line ; there should not be multiple rows of data per order number
  • there should not be any mistakes in the order number in either source of data ; we should be able to take the order number as it appears in the data.
Can you clarify / confirm the above ?

Narayan
 
Hi ,

This posting of data does not help much.

Can you clarify the following ?

1. Is the data in two different files , or do you download the data into one file ? How is it that the data when downloaded from two different sources at different times is appearing side by side ?

If we wish to carry out a match using Excel functions , there is no necessity for the data to be side by side ; they can easily be in two different worksheets of a workbook , or even in two different workbooks.

Having the data in two different worksheets in one workbook would make things easiest.

2. If we are to compare based on the order number , two things need to be fulfilled :
  • there should be exactly one entry per order number in each source of data
  • the data pertaining to an order number should appear in one line ; there should not be multiple rows of data per order number
  • there should not be any mistakes in the order number in either source of data ; we should be able to take the order number as it appears in the data.
Can you clarify / confirm the above ?
Hi ,

This posting of data does not help much.


Can you clarify the following ?

1. Is the data in two different files , or do you download the data into one file ? How is it that the data when downloaded from two different sources at different times is appearing side by side ?

If we wish to carry out a match using Excel functions , there is no necessity for the data to be side by side ; they can easily be in two different worksheets of a workbook , or even in two different workbooks.

Having the data in two different worksheets in one workbook would make things easiest.

2. If we are to compare based on the order number , two things need to be fulfilled :
  • there should be exactly one entry per order number in each source of data
  • the data pertaining to an order number should appear in one line ; there should not be multiple rows of data per order number
  • there should not be any mistakes in the order number in either source of data ; we should be able to take the order number as it appears in the data.
Can you clarify / confirm the above ?

Narayan


Narayan

The data comes from two separate files. One is a download from our website, the other is a download from our credit card processor. I wasn't able upload a file for some reason, so I posted them both in the message. All of the data from each download is on a single line for each order number. There are no multiple rows,but there is at times multiple entries for with the same order number coming from the credit card download, due to credits, etc., that can't be processed on our website.

On the pasted data, the information on the left side is part of one download, the information starting with the order number to the right is part of another download. I need to record all of the differences between the two sets.
 
Here's sample using PowerQuery. This has been set up to read from 2 source files.

In Consolidate workbook, "SetUp" sheet contains 4 named ranges. Which is used to pass parameter to PowerQuery.

Change Path, File & Sheet to suite your need, FullPath is calculated. If you just want to test how it works without changing parameter. Just create folder "C:\Test" on your computer and place all files there.

I've used Narayan's sheet name and data (just added column headers).

You can paste new data into Web & Processor workbook's existing sheet and query will update when you refresh table/connection.

Note that there is custom M function "replaceCCN" added to Merge1 & Merge2 query (just for aesthetics to remove credit card number when match is found, it's not a requirement).

Below is the custom M and the line using custom M.
Code:
    replaceCCN = (inputText, old, new) =>

    if Text.Contains(inputText, old) then
        new
    else
        inputText,
   
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","xxxx",null,replaceCCN,{"CardNumber.1"})

Alternate to what I did here, you can also do Left Anti join of initial queries to show only the missing items.
 

Attachments

  • Consolidate.xlsb
    22.6 KB · Views: 6
  • Processor.xlsb
    9.5 KB · Views: 4
  • Web.xlsb
    9 KB · Views: 3
Here's sample using PowerQuery. This has been set up to read from 2 source files.

In Consolidate workbook, "SetUp" sheet contains 4 named ranges. Which is used to pass parameter to PowerQuery.

Change Path, File & Sheet to suite your need, FullPath is calculated. If you just want to test how it works without changing parameter. Just create folder "C:\Test" on your computer and place all files there.

I've used Narayan's sheet name and data (just added column headers).

You can paste new data into Web & Processor workbook's existing sheet and query will update when you refresh table/connection.

Note that there is custom M function "replaceCCN" added to Merge1 & Merge2 query (just for aesthetics to remove credit card number when match is found, it's not a requirement).

Below is the custom M and the line using custom M.
Code:
    replaceCCN = (inputText, old, new) =>

    if Text.Contains(inputText, old) then
        new
    else
        inputText,
  
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","xxxx",null,replaceCCN,{"CardNumber.1"})

Alternate to what I did here, you can also do Left Anti join of initial queries to show only the missing items.

Thanks for the input, but I think a Power Query is way over my head. I created the directory and put the files there, added new data, and tried to refresh, but kept receiving error messages. I desperately need classes. I was able to get the pivot table to work, however. It won't help when there are duplicate order numbers (such as an order, and the same order refunding shipping), but it will take away hours of manually pasting and matching data. I find it refreshing that people will take the time out their day to help someone they don't know with a problem, so thank you again.
 
Hi ,

I am uploading a file which contains the data you have posted ; see if you can apply this same formula to your workbook.

Narayan

The pivot table did in fact work. It will free up hours of time trying to manually match the two sets of number against each other. I can probably write a formula to find out which might have duplicate entries (due to non posted credits). I have one quick question for you. I know this may sound stupid, but what does the "ISNA" part of the Vlookup formula refer to?

Thank you for your help!
 
When VLOOKUP can't find a match, it will return #N/A! error.

By nesting VLOOKUP function within ISNA() whenever #N/A! error is returned by VLOOKUP it will evaluate to TRUE. Which is then used as logical operator for IF function.
 
Back
Top