This Week in D August 23, 2015

Welcome to This Week in D! Each week, we'll summarize what's been going on in the D community and write brief advice columns to help you get the most out of the D Programming Language.

The D Programming Language is a general purpose programming language that offers modern convenience, modeling power, and native efficiency with a familiar C-style syntax.

This Week in D has an RSS feed.

This Week in D is edited by Adam D. Ruppe. Contact me with any questions, comments, or contributions.


Major Changes

The switch to ddmd happened this week - the D compiler is now officially written in D! This was achieved through a semi-automated porting process described here over the course of a couple years.

All open pull requests will need to be updated, but there is a process described in the talk in how to do that fairly painlessly.

The result, so far, has been slightly slower compiles as the ported D needs a bit of optimization work and the release needs to be built with gdc or ldc for max performance, but all the tests pass and pull requests have already started to be merged that leverage D's advantages over C++ to make a simpler, faster, and more reliable codebase than the original.

Building dmd now requires a bootstrap D compiler to already be installed on the system. You can do this by simply installing the previous pre-compiled version on your development computer.

In the community

Community announcements

See more at digitalmars.D.announce.

Significant Forum Threads

dmd codegen improvements has Walter asking for people to help him find low-hanging fruit in dmd's optimizer that he can quickly fix for big gains.

This received some pushback, including the usual call for him to abandon dmd and instead focus purely on gdc or ldc, and a more interesting objection: codegen optimizations are rarely free from regressions.

(the D parts of all three compilers - the front end - is 99% shared, and moving toward 100% shared - but the code generating backend is different for all three. dmd uses the dmc backend, which was primarily written in the 90's and has very fast compile times for non-optimized builds, but generates code that tends to be about 30% slower than gdc or ldc with optimizations enabled. dmd is also the reference compiler in great part because it is easier to hack on too, with a smaller codebase (which Walter understands deeply, having written almost all of it himself) and simpler build system. gdc uses the gcc backend and ldc uses the llvm backend, both of which optimize well but are more complicated than the dmc one.)

Walter disagreed on the regression point, arguing the test suite catches them, but others pointed to the record of seemingly innocent codegen optimizations that passed the tests nevertheless introducing bugs in production, saying it just isn't worth the risk. If you want dmd's compile speed and hackability, use it, otherwise, we should just use gdc and ldc for optimized release builds.

Different compilers has been problematic for companies in the past though, because bugs can manifest differently. If you use dmd for development and testing, then gdc for a final build, you might end up with a buggy final build. This editor is aware of at least one company now funding work on ldc to close that bug gap, and the open source community is slowly but surely working on getting more code shared among them to ensure gdc and ldc keep up with dmd in both features and bug fixes. IMO there's a lot of hope in this being successful.

This thread led to a new choose your compiler page which briefly outlines the pros and cons of the three options as they stand now and a few new enhancement requests filed that may help dmd's performance, though the point about regression risk seems to stand as of this writing.

Object.factory() and exe file size bloat is another thread from Walter talking about optimizations. This time, it is about modifying the factory function in the standard library to enable more dead code elimination by the linker. This got pushback from the users of that function, and led to some discussion about shared library symbol exporting.

The basic problem is that since the factory function can construct any class given a runtime argument, it needs to reference the TypeInfo of ALL classes in the build, including libraries, regardless of whether or not they are used by the rest of the program. This, in turn, means all class methods are retained in the final binary, which means every function they call is also retained, and so on and so forth.

The linker will remove unreferenced functions, trimming the generated binary size, but with this, there are fewer unreferenced functions available. Walter proposed making factory an opt-in system with the export keyword instead of having it apply to everything. Then, systems with limited space can choose not to make class factory available and trim out a lot of code, including dependence on TypeInfo. This would be a win for embedded systems especially.

$(P The thread went in three main directions: that this silently breaks working code, that the export is a mistake and should be redesigned, and that TypeInfo is overused in the library in general, due to most the compiler/runtime glue predating templates in D (and secondarily, because templates might bloat the runtime, but the recently-added pragma(inline) can help with that, as well as runtime performance.)

The code breakage argument seems strong - if factory is to change, it should instead be removed so at least it is a compile failure when you use it instead of a random runtime call returning null when before it worked perfectly. This killed Walter's original idea, but offers a way to mitigate the problems. A decision has not yet been made as to which route they will take.

The export argument is an old one from people working on DLL support. In D, export is a protection level rather than an independent attribute. This means it is not possible to export private methods, which at first seems like an oxymoron, but is actually quite useful in having private helper functions in a DLL that are used by public template methods, which do not necessarily actually reside in the dll binary but still need access to that helper function.

Moreover, export is unimplemented on Linux. On that system, all symbols are exported regardless of their protection level (which is also the common case in C and C++ on Linux), which causes a problem of binary bloat and load time slowdowns - it is becoming increasingly common for compiler flags to change this behavior in C++, showing it is likely to be a problem in D too. The critics want to change the Linux behavior to match the Windows behavior for consistency and performance. This argument is strong, though library authors would need to learn to start using it after the change.

While the experts who work in the area all agree these changes would be good, it is unclear as to if they will actually get the changes out of worries of code breakage and the question of who would actually implement it.

The third significant thread this week was string <-> null/bool implicit conversion, with someone question what is up with the truthiness of arrays.

The way it works is fairly simple: it checks if the pointer is null. But, this can be surprising if you aren't used to it, since an empty array is not necessarily null: [] is null passes, since the literal avoids allocating for nothing, but [1][1..$] is null fails, despite the array being empty, because the pointer is then set to the end of the input array.

Many people argue this distinction between empty and null is unnecessary and confusing, and the shortcut syntax of if(a) resulting in the null check instead of an empty check makes the problem even worse. On the other hand, if you do understand the rule behind it, the current behavior does make sense and is sometimes useful.

To avoid the whole problem, explicitly use if(a is null) or if(a.length == 0) to check for null and empty, respectively. This will always work consistently and show you wrote what you meant. You may also import std.array if you would like to use if(a.empty()).

There was a compiler pull to make the implicitly conversion a warning or error that got temporarily added, but removed after pushback from a user of the feature who was annoyed that it broke a lot of his code for, in his eyes, little benefit. He wasn't against the change per se, he was willing to change his code to fit, but he was annoyed that people argued the change had zero cost when it obviously didn't. The final decision to revert the change, however, was Walter and Andrei, who want to avoid any such breaks. (Which, in the opinion of several forum goers, is too extreme for them and hurting forward progress.)

Little is likely to change as a result of this thread, though it did get on Hacker News as an example of a "misfeature" in D. Regardless, though, it helps to understand the current behavior of the language.

Learn more about D

To learn more about D and what's happening in D: