• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Excel File size

Monty

Well-Known Member
Hello Everyone!

Am facing a big issue at work as my excel file consists of 8 lakhs of data while opening/closing and saving file take hell out of time...Problem is i can have data only in excel no other choice.

What i did...Converted my file to .xlsb to make file size reduce...but the problem still exists.

Any comments/Suggestions on how to handle files with huge data...

Monty!
 
That's... 800,000 rows? How many columns?

What Version of OS, Ram & CPU along with Excel version (32/64 bit) do you have?
 
Hello Chihiro.

Thaks' for the quick response.

Details

OS : Win 7
RAM : 4gb
XlVer : 2013 (64 Bit)
Columns : 36

But still no escape need to work on these requirements with excel file.
 
Last edited:
Hmm Ram is bit lacking. You'd want 8gb, preferably 16gb.

Tested 26 column 1mil row file, with 8 char length data in each cell.

Took about 10 sec on my machine to save and open.
OS - Win7 (64 bit)
CPU - i5-4200M @ 2.5GHz
RAM - 16GB
Excel - 2016 (64 Bit)

If you have any add-in try disabling it to speed up things.

Unfortunately there aren't much else you can do to speed it up, if you need to directly interact with the workbook.

How is the workbook used?
 
Hello Chihiro.

Thanks testing file for me...i have to live with it at work...Can not change at system front.

Am running a project with two sheets

Sheet Names:

Tempate: User will input the rows which may go upto 50k

Database : We have some 8 laks rows.


1) Need to loop through rows and create a concatenation of row as do not have provision of creating a helper column called combination in the template sheet so need to handle with vba.
2) that combination should be checked with "Database sheet" if found not action to be taken but not found there are certain validation to check.

Please find the simple test file.

Thanks
Monty!
 

Attachments

  • Test.xlsb
    11.1 KB · Views: 2
Last edited:
Hello Chihiro.

This is how i just started with!

Code:
Sub Test()
Dim Tempsht As Worksheet
Dim DataSht As Worksheet
Dim Lrow As Long
Dim Concat As String

Set Tempsht = Worksheets("Template")
Set DataSht = Worksheets("Database")

Tempsht.Activate

Lrow = Tempsht.Range("A" & Rows.Count).End(xlUp).Row

    For i = 2 To Lrow
   'Creating concatination from template to check with Database sheet.
        Concat = Range("A" & i).Value & "|" & Range("B" & i).Value
   
   
    Next i

End Sub
 
I'm bit confused with your description.

While you state if combination is found in database no action is to be taken... But Case 1 to 3 state that there is action needed.

Also in your sample file, every combination in Template is found in Database.
 
Yes chihiro.

We are not just checking the combination, if the combination exist need to check further below columns
Approve 1 Approve 2

Valid Entry

If combination found and if you have some names in Approve 1 and Approve 2 then it is valid entry

What is not valid entry

1) If combination found and if you have Approve 1 or Approve 2 "Blank" then it is Invalid entry.

2) If combination found and if you have Approve 1 or Approve 2 as some text as "Testing" then it is Invalid entry.

3) If combination found and if you have Approve 1 or Approve 2 as some text as "Inprogress" then it is Invalid entry.
Above three validation makes an entry in the template sheet as invalid and need to high light with red color

Example:
Database sheet
upload_2017-5-12_0-32-8.png

Template sheet
upload_2017-5-12_0-33-2.png


As per the screen shot only "E" is valid combination when checking with database sheet as Approver 1 and Approver 2 has some names apart from "Blank", "Inprogress" and "Testing".

Hope am not confused you..Please let me know any questions.
 
Last edited:
Hi, Monty!
Just because of the 800k+ rows, with formulas, I'd suggest you to move to Access or any version of SQL database.
Regards!
 
Hello Sir.

I was Testing with Access initially then SQL server database so that 150 users can access the data for validation..But unfortunately we did't get the access for server as per policy .

So trying to do the same validation in Within excel with same data dumb in Tab "Database".

Am sure this would become more easy working for access or Sql server but no option for me.

Thanks
Monty!
 
I was Testing with Access initially then SQL server database so that 150 users can access the data for validation..But unfortunately we did't get the access for server as per policy .

Wait... so each user is to have the database sheet in the workbook that they will be working in?
 
Every user will hold separate file as there is no server concept to access.

This macro file will be passed on to every user.
 
Every file will have Two tabs "Template sheet" and "Database sheet" which will used by every use standalone.
 
Testing with Access initially then SQL server database so that 150 users can access the data for validation..But unfortunately we did't get the access for server as per policy .
Hi, Monty!
IMHO you shouldn't go on with Excel. If corporate policies don't allow you to use the right tool, then someone doesn't think that it has to be implemented... What's this person opinion?
Regards!
 
Hello Sir.

This project started with expecting for SQL server so that total dump will be in the tables and user will hold only template sheet on clicking on the button this entry should go and check with the sql server tables...But now we end up using excel individually.
 
Ouch. Whomever is responsible for shooting down DB option should be held responsible for creating the application :p
 
Hi, Monty!
Then that guy is the system owner... Once found the father of the beast, everybody else is responsibility free... Hence, enjoy the future as far as you can from that monster... Unless you want to share or be charged with the delegated responsibility.
Regards!
 
Thanks for the feedback.

But still i wanted to check with one user working performance after writing the code.Please let me know any help.

Code:
Sub Test()
Dim Tempsht As Worksheet
Dim DataSht As Worksheet
Dim Lrow As Long
Dim Concat As String

Set Tempsht = Worksheets("Template")
Set DataSht = Worksheets("Database")

Tempsht.Activate

Lrow = Tempsht.Range("A" & Rows.Count).End(xlUp).Row

    For i = 2 To Lrow
  'Creating concatination from template to check with Database sheet.
      Concat = Range("A" & i).Value & "|" & Range("B" & i).Value
 
 
    Next i

End Sub
 
Hi !
Converted my file to .xlsb to make file size reduce...but the problem still exists.
.xlsb file format is just to reduce size on disk,
nothing with how Excel "works" !

Excel needs to load all in memory : application, add-ins, ribbon,
workbooks & worksheets, formulas, data, …
As it is just how Excel works so there is almost nothing to do
except to increase the RAM of computer !

Excel can create and manage some Access database.
But for big data, Access is far away the best choice.

On a recent thread on my local forum, a guy asked for a help to end
its project. An helper warned him again like several weeks ago
for this kind of database project Excel is under level, better is to move
to Access. The guy did'nt want that way again as it was so closed to ending.
After some help, two days later the guy was happy to finish but
complained about a long time execution for a main procedure.
Helper tested on his side : 53 minutes to finish the process.
The guy said : « Ok it's normal, almost same time on my side. »

Few hours later, the helper tried on Access : less than a minute !
The guy was surprised and disappointed o_O and wrote
« Ok but 'cause your are an expert too on Access. »
The helper answered his code was like a beginner but yes,
as an expert he optimized his code and his new version lasts
around 20 seconds ! Versus 53 minutes under Excel … :eek:

Access is the right choice for big data, safety of data,
shared work between users, work on a lan, …

Often database and maintenance procedures are under Access
but some data can be exported to Excel
(or Excel can directly grab Access data via SQL)
for analysis for example …
 
Hi Monty ,

Execution time of code , or recalculation time of a worksheet is solely a factor of the complexity of data processing or calculations that are taking place. Opening and closing a workbook is also a factor of these , since if recalculation mode is Automatic , Excel will do this operation both before saving a workbook , and after opening it.

This in turn is to some extent dependent on the algorithm that is used to carry out the data processing or calculations.

Can you mention your working file size in MB ?

Narayan
 
@Monty

Did bit of test. Using approx 800k rows, takes about 8.4 sec to just find matching row# for 8 items from template.

There may be faster way... but I wouldn't pursue this avenue if I were you.
Code:
Sub Test()
Dim TempArr
Dim cel As Range, x
Dim sTime As Single, eTime As Single

sTime = Timer
myArray = Sheets("DataBase").Range("A1").CurrentRegion

ReDim TempArr(1 To UBound(myArray, 1), 1 To 3)

For i = 1 To UBound(myArray, 1)
    TempArr(i, 1) = myArray(i, 1) & "|" & myArray(i, 2)
    TempArr(i, 2) = myArray(i, 3)
    TempArr(i, 3) = myArray(i, 4)
Next

For Each cel In Range("A1:A8")
    x = Application.Match(cel.Value & "|" & cel.Offset(, 1).Value, Application.Index(TempArr, , 1), 0)
    Debug.Print x
Next
eTime = Timer

Debug.Print eTime - sTime
End Sub

Edit: Also did test with Scripting.Dictionary, but just adding 800k unique keys took about 28 sec.
 
Last edited:
Hi ,

What are the operations involved here ?

Can we assume that the Database tab is not updated often ? If so , then the concatenation of the fields in each record in this tab can be done using formulae , as a one-time or at least infrequent occurrence.

The concatenation of the fields in each record in the Template tab needs to be done using VBA ; for 50,000 records , with 36 fields , this is bound to take time. If this were to be done using VBA and Access , what would be the difference ?

Matching the 50,000 concatenated values with the values in the Database tab can be made fast if we sort the Database tab on the concatenated field ; this can be done in code or manually as a one-time activity.

Repeating the above match operation over 50,000 records is again going to take time. The fact that all of this is happening in computer RAM means there is not going to be any great advantage if this were to be done using Access.

The basic point is that Monty needs to mention what the present benchmark time is before we conclude that Excel is at fault.

Narayan
 
Concat can be done for 800K rows in about 3 sec. I initially tested with code and formula. Didn't make much difference so I pasted them as values for second test.

Checking if the matching record exist in database will be the bottle neck here.

Code:
Sub Test()
Dim TempArr
Dim cel As Range, i
Dim sTime As Single, eTime As Single

sTime = Timer
myArray = Sheets("DataBase").Range("A1").CurrentRegion

For Each cel In Range("A1:A8")
    i = Application.Match(cel.Value & "|" & cel.Offset(, 1).Value, Application.Index(myArray, , 5), 0)
    If IsError(i) Then
        cel.Resize(, 2).Interior.Color = vbRed
        cel.Resize(, 2).Font.Color = vbWhite
        GoTo Skip
    End If
    If myArray(i, 3) = "" Or myArray(i, 4) = "" Then
        cel.Resize(, 2).Interior.Color = vbRed
        cel.Resize(, 2).Font.Color = vbWhite
    ElseIf myArray(i, 3) = "Testing" Or myArray(i, 4) = "Testing" Then
        cel.Resize(, 2).Interior.Color = vbRed
        cel.Resize(, 2).Font.Color = vbWhite
    ElseIf myArray(i, 3) = "InProgress" Or myArray(i, 4) = "InProgress" Then
        cel.Resize(, 2).Interior.Color = vbRed
        cel.Resize(, 2).Font.Color = vbWhite
    End If
Skip:
Next
eTime = Timer
Debug.Print eTime - sTime
End Sub

It really depends on how many record needs to be checked from Template sheet.

But added issue is when Database needs any change. Updating for all 150 users is going to be a pain.

If using DB, you can ensure everyone is using same database table and perform checking operation using temp table in under 1 sec.
 
Back
Top