This Week in D June 07, 2015

Welcome to This Week in D! Each week, we'll summarize what's been going on in the D community and write brief advice columns to help you get the most out of the D Programming Language.

The D Programming Language is a general purpose programming language that offers modern convenience, modeling power, and native efficiency with a familiar C-style syntax.

This Week in D has an RSS feed.

This Week in D is edited by Adam D. Ruppe. Contact me with any questions, comments, or contributions.

Statistics

Major Changes

Several Phobos functions have been "range-ified", which changes the signature but should mostly work the same way. The change means fewer memory allocations will happen in the Phobos library.

std.traits now has a hasUDA function, to make checking user-defined attributes easier.

Vladimir has released forum.dlang.org, version 2 (BETA) which will probably become the new web interface. Chime in now if you have comments!

DConf 2015

DConf 2015 happened recently! Over 30 men gathered in person at Utah Valley University for about nine hours a day over three days to discuss D, with the majority of the conference also being livestreamed over Youtube to many other people.

The conference was also professionally recorded and those videos will be made available later, once editing is finished.

This Week in D summarized the Wednesday morning session last week. This week, we'll continue our coverage.

Wednesday Morning Session

See last week's issue.

Wednesday Afternoon Session

After lunch, we reconvened for three additional talks and one long Q&A session.


The first talk after lunch was by Liran Zvibel discussing his company's, Weka.io, use of D. Weka is based in Israel and has been using D since early 2014 writing max performance, high availability primary storage software. His slides are here.

After introducing his company and work, Liran briefly described their old infrastructure: a mix of C and Python with a lot of auto-generated code. They wanted better and moved to D.

For maximum performance, they wrote memory efficient code with zero copying, no allocation, and no locks. For more understandable code, they used fibers, a reactor, and a RPC framework based on D's compile-time reflection.

They had a number of debugging challenges: bugs must be fixed but the system can't go down and reproducing errors is too expensive. They couldn't use extensive text tracing because writing to a log is too slow and too bloated. To solve this problem, they wrote a custom tracing framework that uses Brian Schott's libdparse and static code generation to instrument their code and make a binary storage system. For example, it would store ID integers instead of strings for maximum efficiency. A lockless move system to shared memory allows it to qyuckly log the useful information without pausing the system.

Weka uses a custom log viewer to give their developers access to the stored trace data. He noted that access to linker sections would be nice to remove a step in the custom trace process, but it is not necessary to get the system working.

Next, Liran described the IPC system, which uses compile time reflection to generate the code to communicate with remote nodes by looking at regular D interface declarations. Editor's note: chapters eight and nine of my D Cookbook describe techniques that can form the foundation of such a system, though Weka's implementation goes further than the book elaborated upon. I'd also add that while this creates some beautiful code that is easy to alter, it does come at a cost: in today's compiler, at least (the implementation is currently inefficient), such reflection and codegen can have a major effect on compile times - Liran mentions slow builds later in the talk - so it isn't necessarily the right solution all the time. You might want to use the code gen to make the interface, but instead of mixing it in immediately, use pragma(msg) to write the code out to stdout, and have your makefile save that to a file which you can cache between builds, or something along those lines.

Liran mentioned that this approach is much easier to use than implementing it in C with external tools, while remaining extremely efficient at runtime.

Next, we heard about some custom code they wrote: they wrote some of their own assertion and reflection helpers, building on what D has built in. They crated fiber-local storage and fiber debugging helpers. They also wrote a number of no-GC efficient data structures, gc_hacks for getting private GC statistics to help with their optimization, and reflection-based accessors which notify of changes to member variables.

Liran took a moment now to exclaim that "what you should take of [this talk] is: D is great" and that you should adopt D. He mentioned that one of the interview questions they ask is to give them 30 lines of C++ and see if they can tell what it does. Given the complexity of C++, this is harder than it seems, even for experienced C++ developers. When people see D, however, they tend to find it much easier to read, even as a newbie, but especially after they get used to it.

He then started to talk about some challenges they faced: many things stemming from their strict latency requirements being incompatible with garbage collection, so they had to avoid it, a problem with the compiler taking too much memory and not scaling to use all the cores on their dev machine (compiling all at once simply didn't work for them - this relates to a number of known bugs in the compiler, which generally have workarounds, but they don't work as well as the existing workarounds for C++'s slow builds, for example), a huge executable being built, and a few bugs in the compilers - especially gdc and ldc, meaning they stuck with dmd in production, despite the poorer code optimization.

Walter mentioned they could try .di files for the build, but this doesn't solve the fundamental problem they faced. Editor's note: when he said that on the mic, the thought that crossed my mind is "that is like microoptimizing a bubble sort".... Andrei mentioned trying package-at-a-time compilation, which can be better parallelized and avoid many of the rougher edges like template instantation bugs that plague module-at-a-time compilation. Liran said they haven't been able to try that in practice yet. I would also note that fixing the parallelizing and memory usage is slated to come somewhat soon after the move to ddmd, which should happen in a few months (see Daniel Murphy's talk from the Wednesday morning session, summarized last week.)

Liran also mentioned private imports in functions which are theoretically more efficient... but in practice, a pain to use, needing to reimport everything per function instead of per module, and thus not often used by developers.

Another challenge was that many C functions that need to be inlined for good performance are not inlined when called from D. Since the D compiler doesn't see the C source, the function is always called, never inlined. To solve this, Weka ported some important C functions to D. Editor's note: link time optimization in GDC and LDC may solve this, but dmd doesn't implement it. Weka didn't use gdc and ldc due to some unique bugs with those compilers, but for many projects, they work excellently and you might try them before rewriting functions.

He also fund that module constructors and destructors found limited use thanks to import cycles and ordering issues in practice, and that using integer types smaller than int (short, byte) were painful due to an explosion of casts - value range propagation doesn't go far enough to make these both correct and convenient to use yet.

Liran summarized by generally praising D: it gives them a single language for both paths, able to replace both C and Python, it has given them a huge productivity boost and they are heavily using D's unique features, and it is paying off. Their only main downside was that large real time projects could have better support... but, remember, this talk is about their success in building a large, real time project with D!

Finally, Liran said he is looking for D freelancers to help build better infrastructure and give back to the community at the same time. If you're interested, email him at liran@weka.io.


The second after-lunch speaker was David Nadlinger (aka klickverbot), talking about druntime's implementation. You can see his slides here.

David opened with an overview of the various packages that make up druntime and some of the fundamental classes such as TypeInfo and ModuleInfo. He then described exception handling, noting that it is both compiler- and platform-specific, and gave an overview of how D's mark-and-sweep garbage collector works.

At this point, the talk got more specific, discussing just how thread-local storage is implemented and a challenge in getting it to work with shared libraries - the fact that it won't necessarily have a static per-thread table indicating where the storage is found. (Shared libraries may be loaded dynamically into several different programs.)

Walter took to the mic briefly to mention that TLS globals are not actually very efficient due to these indirect lookups - local variables are far more efficient to access.

David went back to discussing how it works, including a custom TLS implementation on dmd on Mac OS X since that operating system didn't support it natively since version 10.7. This custom implementation used functions from the rt.sections_osx module along with special linker sections generated by the compiler to store the data. The LDC implementation of TLS on OS X uses the default LLVM implementation, but with some Apple-specific extensions for GC ranges.

David also described how fibers work (including noting that TLS and exception handling implementation with fibers is harder than it sounds), then briefly described the C startup model. He made the important note that a C program doesn't quite start at main - it actually starts at _start, which is found inside the C library. Editor's note: it can actually start at any symbol, you can override this with a linker script, but _start is the default one and this trivia doesn't actually change David's point. He clears up a common misconception that C programs don't have a runtime: they do, that's how global constructors and destructors are called, and how the environment is set up!

Next, he got into more details on shared libraries and module registration. He described out the _Dmodule_ref system worked - a linked list of modules created by the compiler using C global constructors editor's note: if you have ever played with kernel code in D, you've probably seen this before. The reason it doesn't work there though, the reason the reference is null, is that you are probably skipping the C runtime initialization by writing your own _start... meaning those constructors are never run!. This is a simple system that works portably... but does not work right with shared libraries.

The shared library support on Posix needs special effort. It uses a shared druntime and needs to detect module conflicts (two different versions of the same module from different libraries)... while remaining easy to use (so custom linker scripts are out) and working around myriad linker bugs and incompatibilities Editor's note: he is talking about the GNU and LLVM linkers here! i.e., not D specific.

The solution in druntime now is a function called _d_dso_registry, a compiler generated call mostly written by Martin Nowak, which handles these issues.

David also talked about --gc-sections, an option to the linker which, in theory, should lead to smaller executables. This, again, hit pain in ease of use, linker incompatibilities, and linker bugs, however when it does work, it can result in executables from ldc about 1/4 the size as the ones from dmd. David noted that gdc's executables tend to be huge just because it adds debugging info, which can be easily stripped out.

David's slides, page 30, has a number of good references to better understand the topics he talked about, and he mentioned that link time optimization may continue to improve going forward.


Amaury Sechet (aka deadalnix) was the next speaker and talked about memory, CPU caches, concurrency fencing, and more. His slides are here.

The first point he made is that memory is slow. It takes about 300 cpu cycles to read from main memory and this situation has hit a wall: latency happens because an electronic signal simply has to take time to travel across the circuit.

Editor's note: if you've been programming a while, you'll remember when memory was so much faster than CPUs that it'd make sense to store as much as you can there (though you didn't have infinite memory back then either!). The situation is different now: storing things in main memory is not necessarily a speedup compared to calculating it again on demand. It comes down to carefully managing the cpu cache, which is what deadalnix's talk is about.

Amaury then talked about the solution to slow memory: fast caches stored on the CPU itself, physically closer so the signal has less distance to travel. The CPU also prefetches data it expects to be used, so it is available without a wait. His slides list types of cpu cache.

Amaury listed a few big tips to help mitigate the slow memory problem, including: pack your data so it takes less memory, put what you can on the stack (which is likely to be in the hot cache), access data in a linear fashion avoiding indirections and code branching (to help the prefetcher predict the right stuff to grab), and size your data so it fits on a cache line.

He briefly explained the memory management unit in a CPU before moving on to discussing multicore processors' interaction with memory, which took up the bulk of the talk.

He opens this part by noting that multicore processors are everywhere, including in mobile devices, and their presence is very visible to programmers, unlike older faster serial processors, which programmers could basically reap benefit for free - their same programs would just run faster. With multicore, changing the program may be necessary to get benefits. Editor's note: though one free benefit you might see is from the operating system: your process could get a free core while other processes the user runs is on other ones. But, indeed, to really utilize it all, you do need to do some work!

Since multicore is visible to the programmer, old languages have trouble adapting to it and newer languages need to do something to help. This tends to be enforcing semantics to keep up a single-threaded illusion for most code, limiting where memory sharing happens.

In a multicore environment, each core has their own cache which works asynchronously. As a result, reads and writes back to main memory may happen out of order and one core may overwrite another core's write. The x86 architecture is kind to programmers in this respect, but other architectures expose you to all the gory details.

The basics behind cache coherency in the CPU is that a core takes ownership of a cache line and shares when it is not dirty. The tip from this fact is to avoid sharing and writing to memory. In D, this means share immutable data and write to thread local data whenever possible. Since thread local data is the default in D, this is encouraged by the language too.

Since it works on 64 byte cache lines, if you need to share mutable memory across many threads, you may want to pad the data - make sure the shared block takes up the full 64 bytes so two shared variables in the same block don't get thrashed across different threads.

While x86 tries to keep the memory consistency automatically, it also cannot compromise performance too much. One type of operation: the StoreLoad memory barrier, does need to be explicit on x86. This is triggered with the mfence instruction editor's note: which D exposes through inline assembly and I believe intrinsics. ARM requires you to explicitly specify all memory barriers.

Amaury then says we're all doomed as testing for these problems is difficult... but then goes into more detail about how D helps, with its thread local default and explicit sharing, and also the transitive immutable can also be relied upon in a multicore environment.

shared is kinda hard to use in D... but that's a good thing because sharing is hard to use in any case, so making it look easy would be a leaky and probably wrong abstraction. It ought to be used with care.

The next topic was the garbage collector and how it needs more work to be really multicore friendly, as well as the API between the druntime and the compiler.

Amaury's project, sdc, a rewrite of the D compiler, has a proof of concept inspired by jemalloc to make a thread-local heap. It uses write barriers to help with gc pauses, but it also uses more memory than the stock D gc. The key to making it good is to generally only share immutable values...

...but, there's still a few other problems with it: an immutable delegate might have a mutable context pointer in D, exceptions may cross thread lines when a thread is terminated (the Thread.join called from the parent may return a thrown exception that killed a thread), and pure functions can be promoted to immutable after allocation, so how would the GC know to allocate it in the immutable heap?

He proposes some possible solutions: make a generational gc for immutable info, add better ref escape checks and automatically free them, moving to the stack if possible, and improve the inliner for maximum effect of these other ideas.


Finally, we closed out Wednesday with a long ask us anything segment, where people grilled Walter and Andrei.

The following is my notes from the session, roughly formated. It is NOT a transcript and likely includes some mistakes by me, but should give you a jist of the discussion.

David: will we keep the whole race condition undefined behavior memory model?
Andrei: yes, but low pressure on it because of explicit sharing. Defining shared and shared/unshared interaction is the important bit to do. Shared is too restricted and poorly defined but an important start. David: Lock-free data structure is desired, the problem is still existing. Andrei: it is still a smaller problem.
Online: Allocators, when?
Andrei: probably two releases away. his talk will be about it more in "Generics must go".
Liran: back to memory model, shared isn't enough, we need volatile too.
Walter: volatile and sequential consistency is separate. volatile load and store in druntime is for stuff like mem mapped i/o and interrupts. Andrei: we can do in lib. Liran: it should be in the language for efficiency and simplicity. Walter: it might be magic intrinsic functions. Liran: that can be ok but there's finer things in the mix too to get every bit of cpu.
Liran: any plans of concurrent GC?
Walter: Martin was going to talk about it but he wasn't able to be here. Amaury: problem is we can't regen code like a VM, so write barrier is the tricky bit and he's not sure there is a good solution if most the data is not immutable.
Amuary: the Java volatile thing can help with all this stuff.
Robert: std.stream has been out of date for three years tomorrow....
Andrei: meh nobody really cares. We might just remove it
Online: What's the state of C++ interface.
Walter: might talk about it later. No change since NWCPP three months ago. Needs to work with STL still. Andrei: other things have taken priority.
Liran: Phobos forces GC. Do you plan to change this?
Walter: yes, the range stuff should really change this, not 100% but should get to 90%. Andrei: maybe refcounting and ownership management too but it is mostly just work to get done. Liran: ARC? Andrei: language solution of some sort with control.
idk: D foundation?
Andrei: by the end of the year. Want a subscription model for donations to forward work on D.
idk2: consume a range in a single pass while doing multiple operations, does the article fall apart?
Andrei: Spolsky wrote an article about leaky abstractions. Sometimes you still need to write a loop, ranges can't do everything.
Online: will tuple unpacking ever get pulled?
Walter: idk, I haven't looked that closely. Andrei: probably but is that it? Need to analyze whole design. Commenter: it has been sitting for a long time, what moves it forward? Andrei: this question will do it, but generally, we need to up the decisiveness and stop leaving things hanging or bogged down in giant discussions over small details. Walter: want to get ddmd done. Daniel: very very close. Walter: not good enough!
Liran: Python's keyword argument thing is nice. Can we have it?
Walter: named arguments is negatory cuz of overloading, it gets too confusing. Comment: can we do named instead of overloading? So they don't mix? Walter: should D do it all or just offer one way? How common is this? Probably not that common and the config struct is good enough. It isn't a blocker. Liran: it is really nice though. We could figure it out. If we wrote it, would you consider it? Andrei: is it that important? Is this more important than gc-less phobos? Liran: Well, we don't use phobos so..... Walter: is this the best thing we can spend time on? Liran: when I explain D to people, we call it "Python that compiles". This is an important selling point for high level stuff, sugar is good. Walter: every day people propose new features, I have to say no. Commenter: variadic templates and reflect, been written! EDITOR NOTE: I still want with(SomeStruct()) for easier config
Brian: I want to clean stuff up. Question tho: return ref examples return ref to inside struct. If you have a container that gives a struct with pointers back to the ocntainer, will this keep you safe?
Walter: the impl of the struct is responsible for encapsulating more things with private and such. Andrei: the language helps but it doesn't do it all, like the range should increase count of container if needed. Walter: it is not transitive, you still need to DIY
John: properties!
Andrei: this needs to be simplified, it should just do edge cases with @property. Walter: we never wrote it down, need to revisit. John: we just want a solution to expect.
Online: alias expressions to member to member implemented?
Walter: idk. Andrei: needs enhancement request. Going into it beats principle. Walter: it sounds possible but do we want to do that? It can get super complex. Probably answer is "no".
Robert: what kind of libraries do you hope in Phobos? Units?
Walter: can't we do what APL does? Andrei: he is hoping for like most useful Go libs impled in D. Batteries included phobos would be good. LIBRARY GAP!!!!!!!!! Liran: goroutines are just marketed fibers. Andrei: we want fibers to be streamlined to compete with goroutines... Walter: fibers need to be movable between threads.
Atila: when are we going to get vibe.d tasks in Phobos?
Walter: when someone writes it. Andrei: i also want to see lexing and parsing of D in Phobos and vibed too. Walter: ddmd can do multicore, we need to move over.
Daniel: dub was moved to DPL, what is the further plan?
Andrei: maybe package with the compiler. Want to package dfmt too. Brian: dfix too!
idk3: complicated generic code in signatures is scary. Can we get sugar for taking input range and such?
Walter: problem is constraints are often more complicated than that... and like 12 overloads with different constraints is super complex looking but it is actually kinda simple. Maybe refactoring it to be static if internally will help the external look too. Andrei: opposed to this! Agree it is a doc issue and specializations shouldn't spill into the interface but i think real fix is not static if which can break impl, but the doc needs overhaul for it.
Mark: can we tell if an exception is currently in flight, so like debug if exception is flying through debugger.
Andrei: that is just a runtime function but yes we should do it. Make an ER.
Online: will macro system be implemented?
Andrei: no. Walter: indeed, no.
Online: is C++17 a threat?
Walter: no. D will replace all languages. Andrei: C++ is playing catchup with D and growing by accretion. I want to destroy C++. It is too big for its own good and we are ahead. Walter: yes, C++ gets better all the time, but so does D.
idk4: used variadic template inputs and a struct, I can show the code.
Walter: cool post to ng. Andrei: lots of people wanted to quit C++ when Java was big in the 90's, but then C++ came back because there was good enough stuff people could use and libraries started to explode. Libraries need to be done and stopping the language can help with it. We gotta stop adding features and start exploring features.
Liran: libdparse needs to be in phobos but keep up with the right dmd.
Brian: lexer generator at least. Andrei: i like the lexer generator. I want libdparse to be a competitor, outside expert, that checks on ddmd. I want ddmd to be the library parser and libdparse checks on it independently. Brian: makes sense, without a separate project those bugs wouldn't have been found.
Liran: static if rox. where's static foreach? where's version with logical stuff?
Walter: no, i refuse because version gets so ugly like preprocessor #if. You can do better with simple stuff, I'll show how. Andrei: I agree but hate him for it. But static if rox. static foreach is needed.
Online: final should be default.
Online: Will bug 314 be fixed (module system)?
Walter: maybe eventually. Commentator: there is a PR for it that is considered good and Martin wants to merge. Breaks lots of code tho and wants deprecation strat. Andrei: these bugs need to be fixed. The holes in module system need to be fixed.

Thursday and Friday sessions

The remaining days of DConf will be summarized in later issues, and we'll revisit links to videos once the professionally recorded files are made available. Keep reading next week!

Open D Jobs

A new page has been added to the D Wiki listing open D jobs. Take a look if you're interested, and add yours if you know of one that is available!

In the community

Community announcements

See more at digitalmars.D.announce.

Significant Forum Discussions

See more at forum.dlang.org and keep up with community blogs at Planet D.

Learn more about D

To learn more about D and what's happening in D: