Code of conduct

Ancient philosophers by Lawrence OP

Ancient philosophers by Lawrence OP

Being a software developper gives you a whole lot of possibilities. You are often given confidential documents. You have access to confidential data that most people would not have access too.

I was once maintaining an online shop where every credit card was stored with a simple symmetric cipher with the key in plain text in the code. I could have use all those credit card records if I was unethical but I did not. You should probably not trust at least 50% of every online shops with your private data these days.

I recently created a personal training plan to get better at software development. I searched for some references on the web and I found this document. It is called the Software Engineering Code of Ethics and Professional Practice. It is a highly interesting text which I endorse. I suggest you read it too and endorse it in your own way.

Since I am not part of the official software engineering community, I never had to follow any code of conduct. This code is well written. It express so many good ideas that I can only make it my own by approving it. If you are in the software development field and you do not have any ethical guidelines, have a look and pass it down to your colleagues.

Building up social skills

Communication problems by Tania Paz

Communication difficulties by Tania Paz

Some weeks ago, I realized something important. Software development is first and foremost a task about communication. It is not the kind of communication happening on network links. It is about human communication.

A programmer will typically have to speak or communicate with his fellow programmers, the analyst, the architect, the projet manager, the customer, the tester, the documentation writer, the graphist, his boss, his coworkers’ bosses, the customer’s boss, the project owner and many others. There are many potential pitfalls.

Analyzing it from my point of view, I think many programmers have social skill problems.  When I say “many”, I do include myself in that group. I think certain kind of personalities makes you like computer science or at least, it makes it more enjoyable. The environment in which you grow up is also a factor which may drive you to this discipline.

On the Myers-Briggs personality scale, I am an INTP. Which can be resume with the following:

  • I for introversion which oppose extroversion
  • N for intuition which oppose sensing
  • T for thinking which oppose feeling
  • P for perception which oppose judgement

In short, I was not easily interacting with people. It has lead me to a slow progression toward social skills. Those skills which make you good at interacting with people and communicating with them. Those same skills which make you understand how people are feeling, what they are going through and adjust yourself to this reality. Those same skills which are essential in software development.

There is only one way out: trying to get better.

This is what I have been trying to do since than. I think so far it is working. Even my girlfriend has noticed.

My first goal was simple. “I will try to engage a discussion in a kindly manner with whoever I meet.” It seems simple, but for someone who has tried to stay in his bubble for as long as possible, it can be hard. So far, I think I have met that goal even though I still have some defiling moments when I just do not want to speak with anyone.

Recently, I bought How to Win Friends & Influence People. This book was recommended in a software developement book. I thought it would make sense to buy it and have a look. I did not buy it to make more friends or manipulate people. I bought it for the social lessons it contains. I have read a few pages so far and I will probably post an update when I finish it.

Another possibility I am currently thinking about is taking some communication classes. Those kind of open courses where you can go in whenever you want.

Social and communication skills are often disregarded by software developpers. I think it is time we change our view and see just how usefull they are.

Encoding problems

A nice paper typography by Caro Wallis

A nice paper typography by Caro Wallis

Storing, managing and displaying characters should be easy. It seems like this is not the case yet.

Encodings are a way to store and recall sequence of symbols like a string in a binary format. For instance, the ASCII encoding map the sequence “abc” to the binary format of 0110 0001 0110 0010 0110 0011 so we can send it through a network link, store it in a file or save it in memory.

Encodings are absolutely needed. We would not be able to work with text in a computerized way without them. They pose a problem because they are misunderstood and misused.

Encoding problems are not rare. They are not uncommon. They are endemic. They are so present in my daily life that I do not even pay attention to them anymore. They are present in small software up to the multimillion projects.

Here is the latest example of this problem.

One of my Amazon Canada bill. They cannot correctly spell my name.

One of my Amazon Canada bill. They cannot correctly spell my name.

One of my Amazon US bill. They know how to spell my name.

One of my Amazon US bill. They know how to spell my name.



Even Amazon has problems with encodings in different systems. No wonder is it a plague among other software too. Having a character out of English language symbol set in your name is a common way to spot this kind of problem.

In the Good Old Days, when computer had to run on quite limited space in terms of memory, storage and bandwith, there were many different encodings for different languages. In order to display or find out which symbols you could construct from a giving binary representation you had to know which encoding was used to encode them.

Nowadays, we have Unicode. Unicode is an attempt at mapping every known character symbol for every currently used languages. It even maps ancient and mostly instinct language symbols.

Unicode has many ways to encode its big table in a binary format. The common ones are UTF-8 and UTF-16. UTF-8 maximize compatibility with ASCII. A text containing strictly ASCII characters will have the same binary representation using the ASCII encoding or the UTF-8 encoding.

A common mistake is that the world lives in English. My recommendation is to take your next vacation week in a small town in Japan to find out if they live in English. Another common mistake is just plain not knowing that character symbols are encoded to binary format and decoded from binary format. I have known many programmers who did not know. I did not know before I had the same kind of problems that Amazon Canada currently have. Colleges and universities should teach this kind of stuff, but that is another discussion.

The most common mistake is going from one encoding to another without thinking about possible loss of precision. You cannot go from Unicode to ISO-8859-1 without possibly losing many characters in the process. Think about it. Unicode version 5.1 maps 100713 symbols while ISO-8859-1 maps 191 symbols. There is a big gap between these two.

Unicode is no silver bullet, but in doubt, you should use it. There is not many reasons why you should not use it. Applying it in every cases in every systems should be the norm.

Central control, central power

2009 is here. My new year resolution is actually a wish. I wish we move towards a more decentralized way of doing things wheter it is within the political system, within companies, within open source projects, the way we create things.

The more I think about it, the more it make sense to do it. How can a central unit, person or group know what is the right thing for all the other people?

There are some situations which require a decision to be taken that goes against the will of the majority or that a decentralized model cannot find an easy way to settle on it but it is just the right thing to do. A good example is the plastic bag tax. In many countries where it was first announced to impose a tax on plastic bag in order to lessen the use of them, people were against it. The results after a few months is that plastic bag usage drop significantly, people change their habits, they change their mind and actually promote the use of reusable bags.

We should find a way to isolate those cases and resolve them with a strong decision making process like a benevolent dictator, but the normal way of resolving problems should come from the bottom and not the top.

By the way, happy new year !

My failed attempt at building an XML diff library

A few weeks ago, I had to manage a some scenarios involving XML files. One of the problem I had was to compare some large XML files that had small content and schema changes.

My first attempt was to use a regular text diff tool like KDiff3. I did not get the result I wanted. Since my files contained whitespaces that were different in each file, I could not easily pinpoint where in my files were the meaningful differences.

For example, if you have these two XML files, where [TAB] is the tabulation character:

<client id="30" name="Georges">
 <phone>
  <number>
   555-555-5555
  </number>
 <phone>
 <phone />
</client>
<client name="Georges" id="30">
[TAB]<phone></phone>
[TAB]<phone>
[TAB][TAB]<number>
[TAB][TAB][TAB]555-555-5555
[TAB][TAB]</number>
[TAB]</phone>
</client>

They can be exactly the same XML file with the same meaning, but to a text diff utility, they can be quite different. Whitespaces, empty tags with or without an end tag, attributes order and tags order can all be problematic to find out the real differences in XML files with some simple tools.

I began searching for some real XML diff tools. I found a few commercial offerings including DeltaXML and a few open source projects including xmldiff and XMLUnit that could be of help.

The commercial products were pretty good. I had a hard time using the open source ones. They all had some problems that I could not get over with. I decided to build my own library to do what I wanted. I though it would be easy to build it.

I started with these goals in mind:

  1. It should work fine on large streams using a read-only forward-only access interface.
  2. It should support namespaces.
  3. It should detect added, removed, renamed and moved actions for elements, attributes, namespaces and data.
  4. It should be accessible as a library and as a command line tool.

The goals 2 and 4 were quite easy. Many XML interfaces support namespace these days and building a command line tool over a well defined library is a simple matter.

For the goals 1 and 3, it was another matter. There are many API interfaces to access an XML file in a read-only forward-only manner. I settled for pulldom with Python.

I began to draw my algorithm on a blank sheet of paper after creating some code to read the files. I often like to take a look back at where I am going on paper before investing more time coding. My first scenario was the most simple one, finding out if an element has been added or removed from either file.

After a few sketches, I found out it would be impossible to figure that simple case in a forward-only manner without storing some part of the file in memory and in the worst case, nearly storing the whole file in memory. That pretty much killed my first goal.

After, I choose to search the web for some tip on how to compare hierarchical structure like XML file using a random access. My new goal was to load both tree in memory and compare them. I found out many graduate papers on different algorithms to perform that kind of comparison, but there was nothing simple with them.

Finally, I choose not to build it. XML diff is hard and it is not for me. I will use the tools already available instead of building mine.

How it all started

My first moment of awe working with a computer was when I was 10. It was after I completed my first “real” program.

I was toying with the different command lines available on my father’s personal computer running a version of MS-DOS. I quickly got tired of using “dir” and “copy” after a while and I discovered “qbasic“.

It was a wonderful discovery with full of menus, an editor, some basic windows and the ability to run something. What in the world could I run and how ?

I knew absolutely nothing about QBasic. I went in the help documentation to find some basic statement to produce something meaningful. After a while, I found out how to print some text on the screen, how to do some simple math. I knew nothing about algebra but I figured out I could capture a value in a named area after a while.

Printing text and doing some simple math became boring after a while. I searched my local BBS to find something else to do with this newly acquired power. I found some simple programs written by other people that did various things. I did a few copy-paste from those sources into my program to try out new things. That is when I discovered I could create random things.

It took a few hours, but I manage to put all those knownledge together and create a working D&D character sheet creator (yes, I am a total nerd).

This is how it all started.