We are using the Infragistics.Documents.Excel.Workbook class to create an Excel workbook with 10,000+ rows. This works fine without formulas (takes about 1 minute), but seriously slows down when adding 5 formula columns (about 26 minutes). The formula columns don't contain anything complicated, so I don't think it has to do with the complexity of the formulas (we are just doing addition and multiplication mostly).
We are using the ApplyFormulas method to add the columns. E.g.:
worksheet.Rows[rowIndex].ApplyCellFormula(columnIndex, newFormula, CellReferenceMode.R1C1); (where newFormula equals "=RC[-1]*RC[-7]")
We've also tried using the SuspendCalculations() and ResumeCalculations() methods; however, these don't appear to help much. The same goes when programmatically setting the Workbook.CalculationMode to manual. This also does not help. It appears that writing the data gets slower and slower as more rows are added, so I'm pretty sure the formulas are being re-calculated. Is there something we are doing wrong when applying the formulas?
Thanks,
Paul
Hello Paul,
Thank you for contacting Infragistics support!
I’ve created a code sample in order to investigate your case. I’m creating a worksheet with 10000 rows. There are five formulas depending on each other. The workbook has been created for less than 1 minute. Please test it on your side and feel free to modify it in order to represent the issue.
Furthermore, if you don’t need the values of the cells with formulas, you could use SuspendCalculations without resuming them.
You are mentioning that you tried CalculationMode to be set to manual. Actually, this option is for the UI of Excel. It will not help you with the calculations of formulas.
Please let me know if you have any more questions regarding this topic.
Regards,
Aneta Gicheva,
Infragistics
Hi Aneta,
Thank you for the code sample. It seems that combining the ApplyCellFormula method with calling Rows.Insert is producing the slow results. I've attached an updated code sample with using the Rows.Insert method. Rows 1- 1000 takes 10 seconds, Rows 1001 - 2000 takes 32 seconds, Rows 2001 - 3000 takes 55s.
The results were pretty similar when using SuspendCalculations. Rows 1- 1000 takes 10 seconds, Rows 1001 - 2000 takes 32 seconds, Rows 2001 - 3000 takes 53s.
If I remove the ApplyCellFormula calls, but still use Rows.Insert, then 3000 rows takes about a second.
So I believe this is caused when combining calls to ApplyCellFormula and Rows.Insert. I do need to use Rows.Insert because we are using the Infragistics Excel classes to insert data into an existing Excel workbook.
Let me make sure I understand you clearly - you would like to ‘insert’ new rows between already existing rows or at the beginning of the worksheet. When you are calling Insert method in the loop, the preexisting rows are moved downward on every iteration. That can definitely cause excessive overhead, especially if you are moving thousands of rows on every iteration and that is true even without formula columns. My suggestion is to insert all the needed rows in advance.
ws.Rows.Insert(0, 3000 + 1);
for (int i = 0; i < 3000; i++)
{
ws.Rows[i].Cells[0].Value = 11 +i;
ws.Rows[i].Cells[1].Value = 21 + i;
ws.Rows[i].Cells[2].Value = 31 + i;
ws.Rows[i].Cells[3].Value = 41 + i;
…
}
This will move the already existing rows and after that fill the data in the empty rows. I modified the sample. When I test it and the time to add rows is less than a second. Please test it on your side and let me know how this works for you.
Hi Damyan,
I haven't had a chance to implement this yet; however, there is no reason that it shouldn't work. Thanks for discussing this internally. I've verified your answer.
Just wanted to follow up – have you been able to update your project to use the more bulk approach to inserting rows?
Generally speaking, some penalty is to be expected for doing separate inserts as each requires some validation tasks to be performed (like making sure tables are not split, end of worksheet) as well as check and potentially adjust formula references – not even the actual calculation yet mind you, the latter also applies across all worksheets in case they have formulas that reference the affected area. The last part is what you saw even doing inserts in a separate worksheet, which should not be affected as much.We’ve discussed this internally and there are potential improvements we’ve set to investigate further, but in any case we can all agree it’s always advisable to insert large blocks of rows at once where possible.
Regards,Damyan PetevSoftware DeveloperInfragistics, Inc.
Thank you for the detailed information and testing! From the few tests I ran I can tell the bottleneck is indeed the Insert method as it seems to be very costly on the object model/rows collection itself when moving thousands of rows at a time each iteration. As noted above, this seems to be true even without data or little data and before including formulas in the mix and a drastic difference after that as well. We’ll definitely look into this.
In the meantime though, it makes perfect sense to perform the insert operation in bulk as it will always be faster and I’m hoping your current implementation can accommodate such flow as it seems fairly reasonable to be able to determine the amount of rows required directly before filling them in with data.
Will post updates as we have them.
Damyan Petev
Software Developer
Infragistics, Inc.
Let me explain my business use case; I think this will help. We have an Excel "Templating" feature which allows our clients to take an existing Excel workbook and template out headers and footers and then our system fills in the data in the middle. The client also has the ability to take a single datasource and parse it out into as many sheets as they want. So we don't necessarily know how many rows to insert into each sheet until we actually add in the data. This works perfectly fine with the Infragistics excel classes when there are not many rows or Excel formulas are not used in each row, but seriously slows down when many rows are used with formulas. We don't typically consider 10k rows to be a lot of data (I could see a client writing up to 100k rows of data) and Excel can inherently handle inserting a row at a time without being too slow (more on this below).
I don't think pushing down the data in the cells is the issue. Try creating two separate sheets and writing the data in one, while inserting the rows in the other. This takes about the same amount of time as doing this in a single sheet:
I ran another test to try to figure out where the bottleneck was. I create an excel macro that does pretty much the same thing as the code above. This took about 17 seconds for 3000 rows and 148 seconds for 10,000 rows. If I set excel to Manual calculations, this changes to 8 seconds and 90 seconds respectively. This is compared to the code above that took 92 seconds for 3000 rows. I ran the C# test again for 10,000 rows and it took 17 minutes 50 seconds (1070 seconds). So in both cases, the Infragistics classes are over 10x slower than Excel itself.
I'm in no way saying that the Infragistics classes should be as fast as an Excel macro. I do think there is room for improvement though because the Infragistics classes get much slower as more data is added, and this only happens when combining formulas with the Insert method.
I do see the benefit of inserting all the rows that are needed at the same time, so I'll see if I can get this to work; however, it may also be worth it to get a developer to look at this to see why inserting rows is 10x slower than Excel when formulas are used.