Adventures in D programming

I recently wrote a bigger project in the D programming language, the appstream-generator (asgen). Since I rarely leave the C/C++/Python realm, and came to like many aspects of D, I thought blogging about my experience could be useful for people considering to use D.

Disclaimer: I am not an expert on programming language design, and this is not universally valid criticism of D – just my personal opinion from building one project with it.

Why choose D in the first place?

The previous AppStream generator was written in Python, which wasn’t ideal for the task for multiple reasons, most notably multiprocessing and LMDB not working well together (and in general, multiprocessing being terrible to work with) and the need to reimplement some already existing C code in Python again.

So, I wanted a compiled language which would work well together with the existing C code in libappstream. Using C was an option, but my least favourite one (writing this in C would have been much more cumbersome). I looked at Go and Rust and wrote some small programs performing basic operations that I needed for asgen, to get a feeling for the language. Interfacing C code with Go was relatively hard – since libappstream is a GObject-based C library, I expected to be able to auto-generate Go bindings from the GIR, but there were only few outdated projects available which did that. Rust on the other hand required the most time in learning it, and since I only briefly looked into it, I still can’t write Rust code without having the coding reference open. I started to implement the same examples in D just for fun, as I didn’t plan to use D (I was aiming at Go back then), but the language looked interesting. The D language had the huge advantage of being very familiar to me as a C/C++ programmer, while also having a rich standard library, which included great stuff like std.concurrency.Generator, std.parallelism, etc. Translating Python code into D was incredibly easy, additionally a gir-d-generator which is actively maintained exists (I created a small fork anyway, to be able to directly link against the libappstream library, instead of dynamically loading it).

What is great about D?

This list is just a huge braindump of things I had on my mind at the time of writing πŸ˜‰

Interfacing with C

There are multiple things which make D awesome, for example interfacing with C code – and to a limited degree with C++ code – is really easy. Also, working with functions from C in D feels natural. Take these C functions imported into D:

extern(C):
nothrow:

struct _mystruct {}
alias mystruct_p = _mystruct*;

mystruct_p = mystruct_create ();
mystruct_load_file (mystruct_p my, const(char) *filename);
mystruct_free (mystruct_p my);

You can call them from D code in two ways:

auto test = mystruct_create ();
// treating "test" as function parameter
mystruct_load_file (test, "/tmp/example");
// treating the function as member of "test"
test.mystruct_load_file ("/tmp/example");
test.mystruct_free ();

This allows writing logically sane code, in case the C functions can really be considered member functions of the struct they are acting on. This property of the language is a general concept, so a function which takes a `string` as first parameter, can also be called like a member function of `string`.

Writing D bindings to existing C code is also really simple, and can even be automatized using tools like dstep. Since D can also easily export C functions, calling D code from C is also possible.

Getting rid of C++ “cruft”

There are many things which are bad in C++, some of which are inherited from C. D kills pretty much all of the stuff I found annoying. Some cool stuff from D is now in C++ as well, which makes this point a bit less strong, but it’s still valid. E.g. getting rid of the `#include` preprocessor dance by using symbolic import statements makes sense, and there have IMHO been huge improvements over C++ when it comes to metaprogramming.

Incredibly powerful metaprogramming

Getting into detail about that would take way too long, but the metaprogramming abilities of D must be mentioned. You can do pretty much anything at compiletime, for example compiling regular expressions to make them run faster at runtime, or mixing in additional code from string constants. The template system is also very well thought out, and never caused me headaches as much as C++ sometimes manages to do.

Built-in unit-test support

Unittesting with D is really easy: You just add one or more `unittest { }` blocks to your code, in which you write your tests. When running the tests, the D compiler will collect the unittest blocks and build a test application out of them.

The `unittest` scope is useful, because you can keep the actual code and the tests close together, and it encourages writing tests and keep them up-to-date. Additionally, D has built-in support for contract programming, which helps to further reduce bugs by validating input/output.

Safe D

While D gives you the whole power of a low-level system programming language, it also allows you to write safer code and have the compiler check for that, while still being able to use unsafe functions when needed.

Unfortunately, `@safe` is not the default for functions though.

Separate operators for addition and concatenation

D exclusively uses the `+` operator for addition, while the `~` operator is used for concatenation. This is likely a personal quirk, but I love it very much that this distinction exists. It’s nice for things like addition of two vectors vs. concatenation of vectors, and makes the whole language much more precise in its meaning.

Optional garbage collector

D has an optional garbage collector. Developing in D without GC is currently a bit cumbersome, but these issues are being addressed. If you can live with a GC though, having it active makes programming much easier.

Built-in documentation generator

This is almost granted for most new languages, but still something I want to mention: Ddoc is a standard tool to generate code documentation for D code, with a defined syntax for describing function parameters, classes, etc. It will even take the contents of a `unittest { }` scope to generate automatic examples for the usage of a function, which is pretty cool.

Scope blocks

The `scope` statement allows one to execute a bit of code before the function exists, when it failed or was successful. This is incredibly useful when working with C code, where a free statement needs to be issued when the function is exited, or some arbitrary cleanup needs to be performed on error. Yes, we do have smart pointers in C++ and – with some GCC/Clang extensions – a similar feature in C too. But the scopes concept in D is much more powerful. See Scope Guard Statement for details.

Built-in syntax for parallel programming

Working with threads is so much more fun in D compared to C! I recommend taking a look at the parallelism chapter of the “Programming in D” book.

“Pure” functions

D allows to mark functions as purely-functional, which allows the compiler to do optimizations on them, e.g. cache their return value. See pure-functions.

D is fast!

D matches the speed of C++ in almost all occasions, so you won’t lose performance when writing D code – that is, unless you have the GC run often in a threaded environment.

Very active and friendly community

The D community is very active and friendly – so far I only had good experience, and I basically came into the community asking some tough questions regarding distro-integration and ABI stability of D. The D community is very enthusiastic about pushing D and especially the metaprogramming features of D to its limits, and consists of very knowledgeable people. Most discussion happens at the forums/newsgroups at forum.dlang.org.

What is bad about D?

Half-proprietary reference compiler

This is probably the biggest issue. Not because the proprietary compiler is bad per se, but because of the implications this has for the D ecosystem.

For the reference D compiler, Digital Mars’ D (DMD), only the frontend is distributed under a free license (Boost), while the backend is proprietary. The FLOSS frontend is what the free compilers, LLVM D Compiler (LDC) and GNU D Compiler (GDC) are based on. But since DMD is the reference compiler, most features land there first, and the Phobos standard library and druntime is tuned to work with DMD first.

Since major Linux distributions can’t ship with DMD, and the free compilers GDC and LDC lack behind DMD in terms of language, runtime and standard-library compatibility, this creates a split world of code that compiles with LDC, GDC or DMD, but never with all D compilers due to it relying on features not yet in e.g. GDCs Phobos.

Especially for Linux distributions, there is no way to say “use this compiler to get the best and latest D compatibility”. Additionally, if people can’t simply `apt install latest-d`, they are less likely to try the language. This is probably mainly an issue on Linux, but since Linux is the place where web applications are usually written and people are likely to try out new languages, it’s really bad that the proprietary reference compiler is hurting D adoption in that way.

That being said, I want to make clear DMD is a great compiler, which is very fast and build efficient code. I only criticise the fact that it is the language reference compiler.

UPDATE: To clarify the half-proprietary nature of the compiler, let me quote the D FAQ:

The front end for the dmd D compiler is open source. The back end for dmd is licensed from Symantec, and is not compatible with open-source licenses such as the GPL. Nonetheless, the complete source comes with the compiler, and all development takes place publically on github. Compilers using the DMD front end and the GCC and LLVM open source backends are also available. The runtime library is completely open source using the Boost License 1.0. The gdc and ldc D compilers are completely open sourced.

Phobos (standard library) is deprecating features too quickly

This basically goes hand in hand with the compiler issue mentioned above. Each D compiler ships its own version of Phobos, which it was tested against. For GDC, which I used to compile my code due to LDC having bugs at that time, this means that it is shipping with a very outdated copy of Phobos. Due to the rapid evolution of Phobos, this meant that the documentation of Phobos and the actual code I was working with were not always in sync, leading to many frustrating experiences.

Furthermore, Phobos is sometimes removing deprecated bits about a year after they have been deprecated. Together with the older-Phobos situation, you might find yourself in a place where a feature was dropped, but the cool replacement is not yet available. Or you are unable to import some 3rd-party code because it uses some deprecated-and-removed feature internally. Or you are unable to use other code, because it was developed with a D compiler shipping with a newer Phobos.

This is really annoying, and probably the biggest source of unhappiness I had while working with D – especially the documentation not matching the actual code is a bad experience for someone new to the language.

Incomplete free compilers with varying degrees of maturity

LDC and GDC have bugs, and for someone new to the language it’s not clear which one to choose. Both LDC and GDC have their own issues at time, but they are rapidly getting better, and I only encountered some actual compiler bugs in LDC (GDC worked fine, but with an incredibly out-of-date Phobos). All issues are fixed meanwhile, but this was a frustrating experience. Some clear advice or explanation which of the free compilers is to prefer when you are new to D would be neat.

For GDC in particular, being developed outside of the main GCC project is likely a problem, because distributors need to manually add it to their GCC packaging, instead of having it readily available. I assume this is due to the DRuntime/Phobos not being subjected to the FSF CLA, but I can’t actually say anything substantial about this issue. Debian adds GDC to its GCC packaging, but e.g. Fedora does not do that.

No ABI compatibility

D has a defined ABI – too bad that in reality, the compilers are not interoperable. A binary compiled with GDC can’t call a library compiled with LDC or DMD. GDC actually doesn’t even support building shared libraries yet. For distributions, this is quite terrible, because it means that there must be one default D compiler, without any exception, and that users also need to use that specific compiler to link against distribution-provided D libraries. The different runtimes per compiler complicate that problem further.

The D package manager, dub, does not yet play well with distro packaging

This is an issue that is important to me, since I want my software to be easily packageable by Linux distributions. The issues causing packaging to be hard are reported as dub issue #838 and issue #839, with quite positive feedback so far, so this might soon be solved.

The GC is sometimes an issue

The garbage collector in D is quite dated (according to their own docs) and is currently being reworked. While working with asgen, which is a program creating a large amount of interconnected data structures in a threaded environment, I realized that the GC is significantly slowing down the application when threads are used (it also seems to use UNIX signals `SIGUSR1` and `SIGUSR2` to stop/resume threads, which I still find odd). Also, the GC performed poorly on memory pressure, which did get asgen killed by the OOM killer on some more memory-constrained machines. Triggering a manual collection run after a large amount of these interconnected data structures wasn’t needed anymore solved this problem for most systems, but it would of course have been better to not needing to give the GC any hints. The stop-the-world behavior isn’t a problem for asgen, but it might be for other applications.

These issues are at time being worked on, with a GSoC project laying the foundation for further GC improvements.

“version” is a reserved word

Okay, that is admittedly a very tiny nitpick, but when developing an app which works with packages and versions, it’s slightly annoying. The `version` keyword is used for conditional compilation, and needing to abbreviate it to `ver` in all parts of the code sucks a little (e.g. the “Package” interface can’t have a property “version”, but now has “ver” instead).

The ecosystem is not (yet) mature

In general it can be said that the D ecosystem, while existing for almost 9 years, is not yet that mature. There are various quirks you have to deal with when working with D code on Linux. It’s always nothing major, usually you can easily solve these issues and go on, but it’s annoying to have these papercuts.

This is not something which can be resolved by D itself, this point will solve itself as more people start to use D and D support in Linux distributions gets more polished.

Conclusion

I like to work with D, and I consider it to be a great language – the quirks it has in its toolchain are not that bad to prevent writing great things with it.

At time, if I am not writing a shared library or something which uses much existing C++ code, I would prefer D for that task. If a garbage collector is a problem (e.g. for some real-time applications, or when the target architecture can’t run a GC), I would not recommend to use D. Rust seems to be the much better choice then.

In any case, D’s flat learning curve (for C/C++ people) paired with the smart choices taken in language design, the powerful metaprogramming, the rich standard library and helpful community makes it great to try out and to develop software for scenarios where you would otherwise choose C++ or Java. Quite honestly, I think D could be a great language for tasks where you would usually choose Python, Java or C++, and I am seriously considering to replace quite some Python code with D code. For very low-level stuff, C is IMHO still the better choice.

As always, choosing the right programming language is only 50% technical aspects, and 50% personal taste πŸ˜‰

UPDATE: To get some idea of D, check out the D tour on the new website tour.dlang.org.

15 thoughts on “Adventures in D programming

  1. D is my favourite language (and thus I also happen to be one of the D packagers for openSUSE), and I generally agree with what you said here.

    Another thing to mention in the “what’s great” part is the really powerful foreach statement, as well as the concept of true immutability (including for function parameters, so that the function guarantees not to change the input variables) and pure functions that also allows the compiler to do some interesting optimisations.
    Also, DMD is fast. Blazing fast. And also supports compile-time function execution.

    The interfacing with C part, well, it’s true (and I have made use of it in the past in my projects), but generally that ought to be only used by the developers of library bindings. Since otherwise, having to deal with the C way of programming (with pointers, manual memory allocation, etc.) defeats the purpose of using D for the most part. Also, bindings vary a lot with regards to quality. For instance, the Lua bindings (LuaD) are absolutely amazing, it’s all presented transparently, one can retrieve and set information within Lua scripts directly in the form of structs and arrays, no need to bother with the tedious stack paradigm that the Lua C bindings offer. On the other hand there’s Derelict, the SDL bindings to D, which is more of a thin binding layer and there’s still a lot of C-isms that are exposed through it.

    As for the downsides, it’s all true, but it’s also all known issues that are being worked on. Well, the DMD backend is a pain, but there are alternatives, as you mentioned; and they are getting much, much better as time goes. Not that long ago they were lagging behind DMD in terms of D versions by a whole lot, whereas now there are just some bugs to iron out. The ABI interoperability is a problem, but it’s being looked at, too, and eventually it should work out. Phobos features won’t be deprecated that quickly in the future, as there won’t be anything to deprecate after a while πŸ˜‰ This ties in with the ABI issue too.
    I’m not too sure about dub, because I never used it. For all my needs, using (and packaging) separate projects has always been sufficient.
    And the GC is, as you mentioned, still a work in progress. Though personally I always found it sufficient for my needs.

    Oh, and by the way, D conferences are very nice as an overview of where D is and where it’s going.

  2. a neat trick is to extend c structures using alias this
    to wrap c structs inside d structs allowing you to set up ranges for uniform data access.
    eg

    extern (C) struct CStruct
    {
    int payload;
    CStruct * next;
    void foo() {}
    }

    struct DStruct
    {
    CStruct * data;
    alias data this;
    this (CStruct c) { data = c; }

    @property empty() { return (data == null); }
    @property front () { return data.payload; }
    void popFront () { data = data.next; }
    }

    auto tmp = cFunctionThatReturnsCStruct();
    auto dstruct = DStruct(tmp);
    dstruct.foo();

    foreach(item; dstruct)
    {
    do_operation(item);
    }

  3. The standard solution for reserved word symbo!s is just to append a _ – eg int version_;

    Annoying in beginning but one gets used to it.

    You might clarify that DMD is proprietary in name only. The source is open, and if you want to redistribute commercially ask Walter and he will say yes. It’s just a technical constraint imposed by his former employer, and he can’t do anything about it.

    Also, it’s easy to prototype with the GC and use std.experimental.allocator (or malloc/free if you insist) later on. That’s what Weka, a company that deals with petabyte scale storage does. So it’s not much work to use free lists or regional heaps – there could just be a few more examples of how to use. And take a look at EMSI’s containers for one solution that goes on top of the allocator.

    1. I deliberately didn’t choose the “_” suffix, because I’ve also seen it in use for marking a variable as a private class-member – and it looks really odd if you only use that notation in one place πŸ˜‰

      > You might clarify that DMD is proprietary in name only.
      It also is proprietary, not only in name, if it isn’t DFSG-free πŸ˜‰
      But I see what you mean, I clarified that bit above by quoting the D FAQ on that matter. It’s still unfortunate that DMD is so close to being free software, but doesn’t quite make it ^^

      I haven’t done much D stuff without GC yet, but I’ll look into it. I can’t imagine programming a microcontroller in D, but I also can’t think of doing that in C++ either πŸ˜€
      That D as a systems programming language works has been proven by people writing experimental OS kernels with it.

  4. “so you won’t loose performance” – pet peeve, the word you want is *lose*. “loose” is the opposite of “tight”.

  5. Would it help if the D community (or maybe Walter) switched to making a different backend the reference one? LLVM would be a good choice, and then we could have a truly Free D ecosystem and distribute an up-to-date Phobos everywhere. I think that would be a big win for D.

    I enjoyed getting to know D a bit during my recent interactions with the appstream-generator source code. Previously I’d not bothered to look at D since I wasn’t sure what it added to what is already in C++. However, looking at some real D code, together with your insightful comments here, has been very illuminating.

    Personally, I’ve been switching to Haskell over the past several years, and I think it’s fantastic. However, it’s been a steep learning curve and many people won’t have the necessary motivation. So D’s ease of entry for C/C++ people is very important.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.