Tag Archives: programming languages

How we reduced our codebase by n kloc by switching programming languages

You’ve seen them before. Those stories of companies that switched from their early design language to some other, did a complete re-write of their money maker, and ended up with a much smaller code base. And as with scientific studies, especially in the field of nutrition, you can find an example for every combination that shows the new approach is “better”. So Rust delivers shorter code than Python, Go leads to a significant reduction in code size compared to Ruby and so on. Oh, and the new code is always faster, more stable and easier to maintain.

My problem with that? Stay with one language and just re-write your code. The results will be (mostly) the same.

The code replaced in those stories usually “grew in an organic fashion”, that is, it was cobbled together haphazardly. It works? Ship it! No time for refactorings. No time for getting rid of that ugly-yet-working-timer or the Thread.Sleep() loop. If you take such a piece of code and just clean it up, it will shrink and become more robust just by you taking the time to plan a little more ahead. You have a better understanding of the requirements. You have all those nagging thoughts still edged into your brain: If only I could do this part in that way. Would be oh so much better, but I can’t, because of that fubar over there that would have to be fixed first. And so on. By writing your code anew, you can address all those little points with the bosses’ blessings instead of slipping in a tiny refactoring under the radar now and then.

The verdict? I reduced a codebase of 250 kloc down to 69 kloc by staying in C#. From 79 kloc in Java down to 23 kloc, still Java. The language used does play a part in this, and the C# base, for example, might have come down to 10 kloc of Lisp or Python. But always keep in mind that there is a difference between essential and accidental complexity.


Choosing the right language for the job

Quick: Which programming language would you use for a big project?

There is no single answer to that question without a few more specifics. The “right” language to use depends heavily on the type of problem at hand. What about a thought experiment?

Here is a list of all programming languages that I know at least a little:

Ada, Awk, Assembler, BASIC, C, C++, C#, COMAL, Common Lisp, D, Delphi, Erlang, Factor, Forth, Go, Haskell, Java, Javascript, Lua, OCaml, Turbo Pascal, Perl, PHP, Python, Scheme, Shell, Tcl, Visual Basic Classic, Visual Basic .NET

Let’s skip the oldies that don’t run on modern hardware or operating systems anymore. Say goodbye to BASIC, COMAL, Turbo Pascal and Visual Basic Classic.

Now let’s fantasize about a program I want to write. It should be portable across the most important platforms of today, it shouldn’t require a big setup step. Just a single file to download and execute, that’d be fine. And it should be as cheap as possible, as I don’t want to spend lots of money on a personal side project just yet. Those requirements aren’t that weird for a non-web app. Let’s go over them in detail:

I want my programs to deploy as a self-contained executable: Lots of scripting languages and VM based systems can’t do this, so we can strike out Awk, C#, Erlang, Java, Javascript, Lua, Perl, PHP, Python, Shell and Visual Basic .NET. Ouch. As a heavy C# user at work, that’s a language I’ll miss dearly. But single file executables still need the Framework installed for the time being, so it’s a no-go.

I want my sources to be platform independent, so we strike out Assembler. For developing larger applications I want either a sound type system with templates or generics, XOR duck typing: Goodbye, C, Forth and Go.

For my home-brewn projects I want free development tools: Delphi has to go, too. D and Lazarus sadly still don’t make the stability cut. Ada and C++ drop out because I happen to like Garbage Collection as much as I dislike typing my fingers off just to sum up a list of numbers. And the single-file packagers for Java are a little steep for my taste.

So what we end up with is a small but interesting list:

  • Common Lisp
  • Factor
  • Haskell
  • OCaml
  • Scheme
  • Tcl

Look Ma, no C like syntax! Instead we have two Lisps, two full-blown functional languages, albeit differing in purity, a concatenative language and my beloved quirky little Tcl. Which would be my weapon of choice, unless we need max performance. Strange that a language many programmers have never heard of would be my top choice…

Tcl is great for quick GUI projects that don’t have to be as fast as possible. With a Tclkit you can pack your application code, libraries and configuration files, even media files, into a single executable. It’s great for quick-and-dirty programming, but when it comes to performance, it sucks.

Now if we change our requirements a little, the list would look entirely different. If we don’t mind a big installer, and our target platform is Windows, but we need SPEED!, suddenly C# is the tool to use, maybe with a C DLL thrown in. Or use Java with a nice installer. Or could we…? Yup, just use that C DLL that encapsulates the performance-critical stuff in a Tcl program, and you’re good.

As long as your program doesn’t get too big, that is. Now we face a different problem: While C# is a mostly statically typed language, Tcl is not. Static type systems are great if you work together with other programmers in a big team, as you can use tools like ReSharper and FxCop to look for pitfalls. No chance here in a Tcl source. Same thing with Lisps or Factor: Formidable tools for a single hacker, but a disaster if you have a team of programmers with differing skill sets working on a big piece of code. Suddenly interfaces between chunks of your application and a whole lot of tests become paramount.

On the other hand, interfaces and tests should be what drives your development efforts anyway. Divide your application into separate modules, define sound interfaces, and every team member picks a module to hack on. Why should the programming language used matter here? You should use the same big test suite in a C# project that you would have in a Tcl source. What is it that makes us think that dynamic typing is bad for big teams? The practice of factoring a problem into smaller parts was described oh so well in Thinking Forth, a masterpiece of programming literature that few people read as it seems to be about Forth only. Hint: It’s not. Only on the surface. Factoring is all the same, in every language you’ll ever use. But here it is, described in terms of Forth, a language that considers the difference between a single byte and a “cell” (read: Machine word) a type system. Seems like static typing isn’t that important for big teams after all.

On the other hand again, I LOVE ReSharper. Seriously. Nothing beats that feeling of being on top of it when you open a file of code and see the most common issues in it highlighted within a second. No chance here in a dynamic language like Tcl.

I’ll need to think topic about this a little more.