My rant about Team Server Foundation

Many developpers all tied together in an endless loop by Jonathan Caves

Many developers all tied together in an endless loop by Jonathan Caves

This is my first rant and it is probably not the last one.

Team Server Foundation (TFS) is the Microsoft way of managing source and doing some project tracking at the same time. It integrate so many Microsoft technologies that you are totally tied to one of Bill Gates’ leg when you are using it.

Let’s get started with the good ones. TFS is not such a bad software. It is at least five time much better than what was previously offered by Microsoft in the same field which was Visual Source Safe. It is well integrated with one of the most used IDE which is Visual Studio.

I have to use TFS on a daily basis mainly through the Visual Studio Team System (VSTS) which is the add-on for Visual Studio to access TFS. Getting used to it was a challenge since I tasted flexible, ligthweight and fully distributed source control management system before. I am still trying to figure out some of the details, but I am slowly merging with the Borg. Here is what I dislike about TFS.

You have to “Check out” a file before modifying it

After getting the source code of your application, before modifying any file, you have to perform a “Check out” to edit a file. Why ? Well, that is a good question. It might be to keep up with the good old fashion of doing things with Visual Source Safe. It might be to inform the central server of who is working on the file. It might be to disable developers from modifying a file that has been locked. I should probably not mention that you can remove the read-only attribute of the file to start messing up with this scheme.

The whole concept of “Check out” is my biggest complain about TFS and it is useless to say the least.

It is centrally based

Source control management should not be centrally based anymore. This is 2008 and the first distributed version control system was created in 1997. There is no reason to be tied to a central server and be locked out during shortages.

A developer should be able to work with his local repository the way he wants without being tied to a central server.

It uses weird concepts like shelves

Shelves is one of those things that are only really useful in some really specific situations. In my opinion, it should not have been implemented in the first place or only as an optional add-on. A shelve in TFS is a group of files that is stored in a temporary place with a given name. It is like a lightweight branch where you can easily share modifications that you did for approval by others for instance without having to create a branch for it.

The problem with this concept is that it is not a branch. You are actually storing files on the TFS server and not changes. When you are recovering those shelves or you are “unshelving” them, you are restoring those files and not the changes you made. If one of your colleagues modified that file while it was sitting on the server, you have to apply your colleagues changes to the file you retored before checking in that file. It is a real mess and quite error prone.

Merging is not trivial

I had to do my first merge between branches today. I must say that it was quite painful. After making Visual Studio crashed twice in a row, I resorted to use the command line client. It took me a while just to find out where it was. I had to read some online documentation to figure it all out because there is no command line help. Trying to get help from the command line will actually launch a windows help file.

After my first failed try, I forgot to specify the recursive switch. While waiting for my second failed try to complete, I search the blogsphere for some concrete examples. I figured out I had to specified a changeset range instead of just the changeset I wanted. Finally, I succeeded but it took a while even when I had the right command.

When you are finally doing the merge, it shows a bunch of windows specifying which files need to get merged without mentioning if there are any conflicting merges or if it can all be done without your intervention.

It is trying to do too many things at the same time

TFS is a you-will-not-need-any-other-tool-to-manage-your-project kind of application. It is trying to be a source control management software, a bug tracking software, a collaborative application, a reporting server, a continuous integration server, a portal server and some others. At best, it is succeeding in only one or two of those areas.

If you do not know anything else, it is probably the best thing in the world. Once you tried a few specialized tools in each of those aspects, you might stop relying on it.

It is not bundled with what you expect from a VCS

I expect my version control system to be bundled at least with a blame or annotate tool to find out who wrote which line in a file. You will not find this in the normal client package for TFS. You will need to install the power tools for TFS. Not to mention that in order to install the power tools, I had to install a bunch of other packages that all refused to install before I had install some other prerequisites.

My suggestion is to stay away from this monster.

Continuous integration

A small robot giving you a hand by woordenaar

A small robot giving you a hand by woordenaar

The time has come to get out of the integrated development environment mindset. Software development should not be based on a monolithic piece of software like the IDE. In the beginning, I though it was great to have all those tools within a click away easy access but the more I think about it, the more I am repulsed by it. I want to be independent from that easy build button. It seems like it does not find with that mindset.

In my never ending quest for better software quality, I got interested by the continuous integration process. Before getting too deep with the matter, let’s try to define what is the integration process in a software development. Integration happens when you make a change to your software and you want to make sure everything works with the software you had before and that your change conforms to some quality standards. A change can be a lot of different things like adding a simple message, adding a module, changing the colour of a box or refactoring your whole application. It can be while starting up your project and developing it, it can be a change during the beta phase or it can be an emergency bug correction for a running application. Whatever is changed, however it is done and wherever it goes, you want to make sure it does work.

Integration can be quite easy for small projects with small changes while being quite hard for large projects with big changes. Continuous integration is about making that process an incremental, routinely done, automated and simple thing to do.

How do it works? You will often have one or many dedicated servers waiting for changes to be included in the source control management system. It can be with a hookup script or with a pull process. Often, you will also see some of those continuous integration servers setup to run some task at predetermined intervals like every midnight. Whether is it event based or scheduled, they will most likely run one or more of the following tasks:

  • Building the source
  • Running unit tests
  • Running a code coverage analysis
  • Running a code analysis for standards conformity
  • Running a performance analysis
  • Generating code from a model
  • Building the documentation
  • Deploying the results on a test server

For each task, the server will keep a report on what happened, how did it go and what are the results. For instance, for the unit tests task, you might want to have a report with how many tests succeeded, how many failed and which one failed. A concrete example would be the waterfall view for the Google Chrome continuous integration process.

There are many advantages to use a continuous integration process. The most important one is to find bugs early and correct them early.

Let’s say you configured your continuous integration server to build your source whenever you commit a change in your source control management system. If you commit a change that breaks the build process, you can be notified quickly that you did something wrong and correct it. The cost of correcting a bug is often proportional with the length of time from the moment it was introduced to the moment it was found.

There are many solutions both open source, free and commercial ones available to fill your needs. BuildBot, CruiseControl are some of the popular ones. A simple script might also be the best thing for you is not much is wanted.

With the quality goal in mind, a proper continuous integration process is a must.

Automation and you

Keeping the gears running by Curious Expeditions

Keeping the gears running by Curious Expeditions

Many new business software projects are about automating some repetitive and boring tasks. Instead of having a big spreadsheet in which we all enter our time log that we share by sending it by email to get some reports at the end of the month, we create a centralized client/server application where everyone can enter whenever he wants his time log and automatically generate and send that report at the end of the month. Instead of manually copying our past time log from our old spreadsheets into our new system, we create a simple script that reads those spreadsheets and import that data in our new centralized application. Tasks that could have taken days are completed in minutes instead.

In many situations, I think we could push even more for automation. In the software development business, I feel like we are working on automating tasks for our clients but we are not thinking about automating our tasks enough. I guess it is different from places to places, but in each of my job experiences, I have found processes that could get an easy productivity boost simply by creating a simple script.

The reasons are numerous for not automating more. We do have time for it, we are not using the right tools, we do not know how to create the right tools, we fear changes, we do not know how to validate our automations, we are using applications or processes which do not have any automation entry points or we are plainly lazy and we do not want to learn how to automate things. I include myself in because I have been using each of those reasons at least once for not doing it.

If you do not have time to automate your tasks, you are just wasting your time. Just think about it. Automating is all about saving time. The exception would be a task that is quick to execute or rarely executed that would take an enormous about of effort to automate. I you let you be the judge on this one but do not pretend it always requires huge efforts to automate a process.

Using the wrong tool is a frequent reason. It is hard to overcome because in many cases, you just do not know that there are better tools, applications or ways to automate your task. One way is to try and stay up-to-date with the latest technologies, try to learn new stuff like a scripting language (Python is probably a good start). Automating a task that require you to use a GUI application where there are no alternatives like command line interface, a library interface or any public API is obvliously hard. Even then, there are tools which can help you out in those situations.

If you are like me, with a strong Windows background where the GUI is king, you might want to see how things are done in the unix/linux world where everything is a small command line program that is often used in a long tool chain to automate many tasks. If you are shy of installing a full-blown free linux operating system to toy with those tools, you can do like I do and install a linux-like environment for Windows.

To end this post, I will show you some examples and tools that I have been using recently to automate boring tasks.

Mass image manipulation

To automate the creation of thumbnails with a folder of a thousand pictures, I have been using ImageMagick and a command line similar to this one:

mogrify -format jpg -size 600x600 -auto-orient *.jpg

XML manipulation

To update large and complex xml configuration files, I have been using different XSLT templates with Saxon, a free and open source XSLT processor.

Data transformation, importation and exportation

I often had to import or export data from and to text files, spreadsheets, different databases, LDAP repositories or outlook/exchange. I used to create small C# command line programs but I have switch to Python recently for the productivity gain. My strategy is always the same. Find a component that reads the raw data of the input source, find another component that can write to the destination source and transform the data in the middle.

Python, compilation and software quality

The big dreaded spaghetti monster by St. Murse

The big dreaded spaghetti monster by St. Murse

Software quality is an interesting topic to me. It is what is often missing in many projects that transform them into bloated unmaintainable big spaghetti monsters. Every programmer with some kind of experience knows what it is to work on those software. If you do not, it is unpleasant to say the least.

I have been reading different articles and blogs about my new favorite language, Python. Many of them come to the conclusion that Python is a weaker language because it does not report compilation errors like many other languages do. Accordingly, it makes Python a less suitable candidate to build quality software.

First, let me say that it is possible to build high quality software with Python. This also applies to other similar programming languages supporting a fully dynamic type system. It is true that Python will not return type related errors when compiled. The reason is that Python code is mainly interpreted and does not enforce a static type system. It is possible to compile Python code, but you will likely end up with byte code that will have to be interpreted later on (with a possible speed gain). For instance, let’s take the following code:

if __name__ == "__main__":
    z = 2 + "22"
    z.addtwelve()
    print z

You can compile that code by placing it in a file called add.py with the following statements:

import py_compile
py_compile.compile("add.py")

With CPython, which is a common python implementation, you would get a file named add.pyc in the same directory that would contain a byte code representation of the add.py file. The compilation would not report any error. In any case, trying to run either add.py or add.pyc would result in the following error:

Traceback (most recent call last):
  File "add.py", line 4, in
    z = 2 + "22"
TypeError: unsupported operand type(s) for +: 'int' and 'str'

This example demonstrates that Python does enforce a strong type system but you have to actually execute your code to find out about type errors.

Type errors are one kind of error that are usually found at compile time in other languages like Java, C# or C++ that are not found by Python at compile time. There are also missing import errors, undefined classes, undefined methods, undefined functions, undefined operations, undefined variables and many others. In order to build a quality software, you should be able to find out about those errors before using it.

With Python, the solution to this problem seems too simple to be just that. You have to run your code and you have to run every statements. Not only will you find out about those errors, but you will also find many runtime errors.

Running your whole application before using it

A great deal of software quality is made possible by using different tools. I will try to present some of those tools that I wish would always be part of every new software projects.

A good way to run your code is to write unit tests for it. Running those tests will make you run your application without having to actually use it which does simplify the task of running your code. If you do not know about unit testing, I strongly suggest that you have a look around the web to find out what it is and how it can help you. You can also have a look at an introduction I wrote a few weeks ago.

Another question that will probably be raised in your mind is how it is possible to know that you have ran your whole application or every statements in it. This is made possible by code coverage tools.

Code coverage tools are able to analyze how much of your code and which parts has been executed by a defined execution. While is it possible to use a code coverage tool on a single execution or multiple ones of our application without using unit tests, it makes more sense to use it with your tests. As you are creating your unit tests, you will want to run 100% of your code with them. Knowing that 100% of your code can be executed without errors will give you a good foundation for quality assurance.

I will show you an example with Python using nose, a unittest extension and the coverage module. Let’s say that I have a module named mymodule.py which contains:

class Counter():
    def __init__(self):
        self.x = 1
    def add(self):
        self.x = self.x + 1
    def show(self):
        print self.x

Let’s say that I have a test file named tests.py which contains:

def test():
    import mymodule
    c = mymodule.Counter()
    c.add()

I could run the following line to get a coverage report of my module with my test function:

nosetests --with-coverage tests.py

The results would be:

.
Name       Stmts   Exec  Cover   Missing
----------------------------------------
mymodule       7      6    85%   9
----------------------------------------------------------------------
Ran 1 test in 0.010s

OK

If I change my tests.py file to this:

def test():
    import mymodule
    c = mymodule.Counter()
    c.add()
    c.show()

And I run again the same command line, I would get:

.
Name       Stmts   Exec  Cover   Missing
----------------------------------------
mymodule       7      7   100%
----------------------------------------------------------------------
Ran 1 test in 0.009s

OK

If I rename my show method in the mymodule.py module to display and run the test again, I would get the following:

E
======================================================================
ERROR: tests.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/nose-0.10.4-py2.5.egg/nose/case.py", line 182, in runTest
    self.test(*self.arg)
  File "/home/remy/projects/test-python/tests.py", line 7, in test
    c.show()
AttributeError: Counter instance has no attribute 'show'

Name       Stmts   Exec  Cover   Missing
----------------------------------------
mymodule       7      6    85%   9
----------------------------------------------------------------------
Ran 1 test in 0.014s

FAILED (errors=1)

Using unit tests and code coverage, you can build high quality software with Python too. To be fair, most commonly used languages have their own tools too to create, run unit tests and get a code coverage report. If they do not, you can probably craft your own tools for this purpose.

Quality often requires efforts but sometimes all you have to do is integrate a few tools to get a good start at the job.

Knowledge transfer

Teacher and student statue by Paul

A few months ago, I wrote about how to introduce new programmers to your project. Back than, I was the one introducing new programmers. This time, I am in the learning seat.

Since the beginning of September, I have been mostly dedicated to the knowledge transfer for a current project at my new job. This is quite exciting to me since I have been learning a lot in that process. Today, I have reached the close end of it.

The project is a big one. It is mainly based around Microsoft technologies, the current deployment process involves multiple different quality environments, there are multiple dedicated system administrators for it, it has to work with many different data stores including more than six databases located in many geographically different places and it is used in more than five countries in different time zones with different languages. It is a great work of architecture and design.

I have been thought on how to develop, debug, find and correct bugs for it since I will probably be maintaining it for a while. My coworker who has been my mentor in this process already had a draft list of what should be done in order to get me going. That included the software I had to install on my computer, the order in which I had to install them, any potential shortcomings with the installation process and the general sections of the application which he had to show me around.

In the last weeks, my coworker showed me the way in the application. I had to learn quite much in each days of his training. While doing this knowledge transfer, I completed his draft list to include more details and more points about the whole thing. This document will form the base of the next knowledge transfer whenever there will be a new member joining the team.

Between two sessions where he would be sitting at his computer or at mine and teaching me how to find the important configuration files or how to debug a remote web service, I was assigned a few change requests to complete. They were all quite easy to do. What was more interesting is that I learned more about the application by completing those tasks than by listenning to my coworker. I still have some great respect for him but working directly to develop a new feature or a client request for the application had a tremendous effect on my capacity to understand and learn about the application.

I do not know if this is related to my personality or my abilities, but it seems to me that learning by doing is way more efficient than learning by observing when it comes to software development. Reading back my first article about it, I am coming back to the same conclusion.

Contributing to open projects

I have started using more and more open source software recently and for many different reasons. I am cheap. Whenever I try something that is free and do the job, I just keep using it. As most open source software are also free, I tend to use them. There are also some gems in the open source world that are just better than any commercial offering. I also feel like open source software tend to be more secure since there are more eyes that can see the potential problems and propose a solution.

While using these software, I often encountered bugs, some problems or things I though were just wrong. My first reaction was to go on their website and check for a possible known issue and its solution. When I could not find the solution my second option was to drop by their IRC channel if they had one or to send a quick message on their mailing list.

Recently, I had a problem with Mercurial, a nice distributed version control system. I did what I usualy do but there were no solution for it. It was a problem regarding the use on Mercurial on Windows with paths that contain extended characters. Since I had some free time, I told myself: “I should be able to correct that bug on my own and help the project by submitting a patch.”. That is what I did.

First, I had to download the source, get it to build and get starting to test it. That task took me about 5 hours to complete because Mercurial does have some part that are written in C and the main sections in Python. Building a python module or extension in Windows does requires a bit of know-how. Anyway, I managed to find the problem and get started with a simple patch to correct it.

I went to Mercurial’s website to found out how I could contribute to their project. The first thing that struck me is their how-to contribute page. It is huge. I do understand the rational behind it, but I guess it can easily turn off any casual contributer. I wanted to go all the way so I went by their rules.

In the end, I submitted my patch and the reasoning behind it. It was not accepted but it is fine. The bug was corrected and hopefully, not other related bugs will come out of the correction.

Contributing to an open project can be fustrating sometimes. I still remember the story of Con Kolivas with the Linux kernel. He was trying to modify the scheduler to make it more fair, but his contribution was rejected. After reading his story, I doubt I will ever contribute to the linux kernel. I guess you must have a good feeling about it.

I think contributing is first and mostly about the passion. If you have a strong passion for a project and you will like it to grow and get better, go for it. Get yourself to contribute. There are many different ways you can serve a project. Testing, documenting, translating, designing, programming are only a fraction of what you can do. Get yourself known and you might gain some strong experience.