
No matter where it is going, it has to run fast by Hamed Saber
Most developers I know are worried about their software performance. They are worried to the point of nonsense. With every decision they will think about, they will immediately balance out the arguments regarding performance whether it is an architecture decision, a technological choice or a nested loop.
This does not make sense on many levels. First of all, performance is more often overrated. A fully functional and correct slow software is always better than a fast buggy or incomplete software. Clients are often asking for highly performing software but in most cases, they are satisfied with a fully working version even if it does not run as fast as they first though.
Developing with a performance mindset often create solutions which contains more bugs, are harder to maintain and are harder to understand and read. Here is a example of this. Compare these two C code snippets:
int i, sum = 0; for (i = 1; i <= N; i++) sum += i; printf ("sum: %d\n", sum);
int sum = (N * (N+1)) >> 1; printf ("sum: %d\n", sum);
They both execute the same task. The second one is optimized for performance. Which one would you prefer for ease of maintenance, ease of understandability and ease of reading ? The first one of course.
Another reason why you should not think about optimization too much is because you cannot know which part of your application is or will be the performance bottleneck. You might put a lot of efforts with a specific section of code that is only called once in a while and fast enough.
I am not saying you should not do performance optimization, but that should be the last thing you should thing of. In my mind, I only see two reasons for doing it:
- Customers are complaining about speed
- You want to gain a competitive advantage (just like the current crop of upcoming browsers)
If you know any other good reason for doing it, let me know.
When you have to optimize your code, the first thing you have to do is use a code profiler to find where is the problem. There are many tools available for this and I am pretty sure there is one for your language. Here is a simple profile output I ran a few days ago for Mercurial.
1239498 function calls (1237075 primitive calls) in 18.604 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 18.604 18.604 :1()
1 0.000 0.000 0.000 0.000 ConfigParser.py:106(Error)
1 0.000 0.000 0.000 0.000 ConfigParser.py:118(NoSectionError)
1 0.000 0.000 0.000 0.000 ConfigParser.py:125(DuplicateSectionError)
1 0.000 0.000 0.000 0.000 ConfigParser.py:132(NoOptionError)
1 0.000 0.000 0.000 0.000 ConfigParser.py:141(InterpolationError)
1 0.000 0.000 0.000 0.000 ConfigParser.py:149(InterpolationMissingOptionError)
1 0.000 0.000 0.000 0.000 ConfigParser.py:162(InterpolationSyntaxError)
1 0.000 0.000 0.000 0.000 ConfigParser.py:166(InterpolationDepthError)
1 0.000 0.000 0.000 0.000 ConfigParser.py:177(ParsingError)
1 0.000 0.000 0.000 0.000 ConfigParser.py:189(MissingSectionHeaderError)
1 0.000 0.000 0.001 0.001 ConfigParser.py:203(RawConfigParser)
5 0.000 0.000 0.000 0.000 ConfigParser.py:204(__init__)
3 0.000 0.000 0.000 0.000 ConfigParser.py:211(defaults)
3 0.000 0.000 0.000 0.000 ConfigParser.py:214(sections)
11 0.000 0.000 0.000 0.000 ConfigParser.py:219(add_section)
20 0.000 0.000 0.000 0.000 ConfigParser.py:229(has_section)
[...]
10 0.000 0.000 0.000 0.000 {setattr}
2 0.000 0.000 0.000 0.000 {signal.signal}
1 0.000 0.000 0.000 0.000 {sys._getframe}
1 0.000 0.000 0.000 0.000 {sys.exit}
2 0.000 0.000 0.000 0.000 {sys.getwindowsversion}
2 0.000 0.000 0.000 0.000 {thread.allocate_lock}
1 0.000 0.000 0.000 0.000 {win32api.GetCurrentProcess}
1 0.000 0.000 0.000 0.000 {win32api.GetFullPathName}
1 0.000 0.000 0.000 0.000 {win32api.RegOpenKey}
1 0.000 0.000 0.000 0.000 {win32api.RegQueryValue}
5 0.020 0.004 0.020 0.004 {win32file.CreateFile}
25 0.474 0.019 0.474 0.019 {win32file.ReadFile}
1 0.000 0.000 0.000 0.000 {win32file.SetFilePointer}
1 0.000 0.000 0.000 0.000 {win32process.GetModuleFileNameEx}
2 0.000 0.000 0.014 0.007 {zip}
Using different runs, you can find out which method is called the most, which one it using the most CPU time and which one is the most expensive with profilers. Once you know that, you can start optimizing the sections which are the most problematic.
Once you are done with your optimizations, you have to run your profiler again with the new code and compare the results with the unoptimized version. That way, you can see if you are making progress or not.
I have come to think that while managing your code, you should use a separate branch for the optimized version and the unoptimized one. Since the unoptimized one is probably going to be easier to read and understand, it will also be easier to maintain that is a benefit you should keep.
Stop worrying about performance and enjoy clean code.
Post a Comment