Talk given: May 23, 2014. Transcript written on July 18, 2014.
Notes last updated: July 19, 2014
This is an annotated transcript of my DConf 2014 talk.Watch it on YouTube
Inline annotations will be added with a different background. Larger additions will appear in boxes or as links.
This document will also organize the topics and add a table of contents for easier digestion of the material.
MC: All right folks, ah if you can just take your seats. So, I know the talk at the end of yesterday was extremely well received talking about how to squeeze D onto smaller and smaller devices and I think today's talk, Adam's talk, is gonna take us a little bit closer to the metal. Um and a man with probably no introduction, certainly with no slides, take it away, Adam.
Adam: Fantastic. *applause* Wow, he just asked me, 'am I ready for this', the answer is 'no'. I scribbled down some ideas last night and that's as far as I got. So, first let me tell you a little bit about me. My name's Adam Ruppe (and that's the pronunciation by the way *unintelligble* It rhymes with 'poop'. LOL). I'm right now working for a company called Beyond Z doing web applications for educational purposes. Sadly, it is a Ruby on Rails job. I hoped to use D, but I can't always win.
SASS for cssing and I oh in fact I wrote that in 2010, it is under misc stuff/html.d/CssMacroExpand. It is now also available as a stand-alone utility on GitHub and on the dub repository. lol I am not well organized.Of course, in the past I've done a lot of D programming. In fact, over the last several years, most of my uh, contracts I kinda insist on using D because it is so much more efficient. I can just write what I think - I wrote a whole lot of my own libraries. You can find it on my miscellaneous github including stuff like, like uh just recently they were asking for
D Cookbook" for Packt Publishing. And in that, I take about 100 different random ideas, put them all together, and try to show you how to get stuff done. And I would have brought that today, but it will not be printed until next week. The book is now available via Packt and other sellers. But, if you want to go to packtpub.com and search for D Cookbook, you can use a discount code 'dconf2014' and go ahead and purchase that and I will be talking about a lot of the stuff we learned in there, primarily Chapter 11 covers this same idea.But, other than that, I've lately been working on what's called the "
Anyway, the next question is basically one of philosophy. I spend a lot of time on the chat rooms, on the forums, and there's people who ask questions 'well, can I do this' or 'what happens if I do this' and my philosophy is to just do it. And first let's contrast that to the real world. A couple weeks ago, I was playing with some friends and we were walking over a bridge and there was a guard rail on this bridge and he said "I wonder if I can balance that" and I'm sitting there going "NOOOOOO" aand you look down and it must have been like 50 miles to the water below and there's rapids and sharks with laser beams on their heads. If you, if you can't do it there, there's going to be wailing, there's going to be weeping, there's going to be gnashing of teeth. It is not the thing you want to try.
Now, contrast that to the computer world. They say 'well, can I append something to a string'. Just try it! What's the worst that's gonna happen? It'll say 'Segmentation fault' and you pull your hair out? It's really not a big deal.
So what I encourage people to do is to just try it. Does it work? Who knows? Just do it! I've got 100 files on my desktop called test, I've got test1.d through about test110.d. When people ask me something on the chat room, I just immediately fire up the compiler and see what happens.
And now, on a similar vein, a question is 'well, is it possible to do this?' and the answer is almost always 'yes'. In the virtual world, you can do... almost anything you imagine. Like they say, well, can you make... one of my co-workers at Beyond Z was telling me a story where he wanted to do fancy graphics on the web and this was a good 10 years ago before you had HTML canvas or anything. So, what he did was made an array of divs and change the CSS background color on the individual one by one items to make himself a frame buffer. And it's absurd! But it got the job done.
And that's really my feeling whenever I go into D. They say well you can't write websites in D. and well, yes I can. You just have to sit down and /do it/.
And that brings me to the bare metal question. They say well, can you write a kernel in D? Sure. Can you do it well? Eh, if you have the time. But basically everything C and do, D can do too. And it's really... that's actually literally true, you could always write a C compile that generates an inline asm string and mixes it in. You can do it! And you don't have to feel limited. People say 'well, dmd has bugs, I can't get work done with that', but the subset that works like C and the subset that works like Java both work exceptionally well and have for many years. The worst case scenario and you just write it the same way you would in C or Java and you haven't lost a whole lot and you still have a handful of advantages to use to work around the bugs.
In my book, they asked Andrei Alexandrescu to write a foreword for it and as I read this, my ego just expanded until it burst. And at the end, there's this really cool AK-47 tip and if you buy the book for nothing else, get that, it's worth it. But, he was talking about how I talk about language limitations. For example, you cannot have a virtual function in an interface because it won't fit in the vtable; it doesn't know how many slots to do.
So, once the new operator overloading, these are templated functions, how do you put them in a class. And the answer is very simple: just forward it to a virtual function and that way you have these named functions and you get the job done. It might not be pretty, you might be duplicating some code where you prefer to alias, but the job's done, you can move on.
And he pointed that out as an example where my language advocacy goes basically you can't do it that way, but what is the big picture you're trying to accomplish, step back, find another approach, it will be good enough.
So now, with that philosophy in mind, the main thing they want me to talk about here is doing kernel in D. And I submitted this not really as a useful thing, I just think it is a kinda cool to play around. There's a certain feeling of power you get when you know every line of code running on that processor right now is yours.
Back in the day, I would do a floppy disk load that had my little bootloader, which, don't write them anymore, 512 bytes is all you had. But you would load up from the floppy disk and you don't wanna use DOS, you don't wanna use the BIOS, you just wanna get there and do it yourself.
So what we need to do is first get a hello world compiled. And of course, the talk yesterday had that in the ARM processors. It's more or less the same thing, you start with an empty object.d, you strip out the runtime with the linker and then you just try it and see what happens. What's the worst it'll say? "Undefined reference to whatever"? It's not a big deal.
So then what you do, each one of those compile and linker errors, you go back and see about adding what you can add. Like in object.d it'll complain that, for example, TypeInfo_Struct, and then OK, you add one, now it complains compiler mismatch, the size is not correct. So what did I do? I search the dmd source code and I did grep that error message *.c and it comes up in I think typinf.c where it tells you oh this needs to be 52 bytes and then you just go ahead and say.. in my case, I used void* in a static size and Michael yesterday used a ubyte array, same thing, you put it in the compiler says 'oh the size is right, i don't care, I'm moving on' so there you go.
Then, the next thing you do is as you move on, you use more and more of the language, this kind of stuff comes up. And you just keep searching, you keep pulling it back in. For example, an array literal is lowered into a function called _d_arrayliteralSOMETHING and, well, how can you make that work? Go ahead and search the D runtime, you'll find it in I think lifetime.d and you can copy/paste it or you can do your own. And the beauty about the machine is if the names match, they find the symbol. If the, the types, they don't necessarily need to match. You just need to get the size good enough to move on. And if you do that, you start to have a little bit of fun.
So, before long, you end up with an object.d that defines a minimal class Object, class Throwable, Exception, struct ModuleInfo, and a handful of TypeInfos that don't actually do anything except make the compiler shut up and you can move on.
Then, the next thing I wanted to do is say, 'well, now I have a minimal runtime, let's get the program running'. I was using dmd on Linux so I would change the linker command instead of the default link of Phobos and everything, just simply -nostdlib, no C library either, link it that way and you end up with a very small executable. It is about a 3 KB ELF file when you've defined absolutely nothing, but you get it compiled and it runs.
And then you need to have an entry point because you're not using the C library and what works there? Inline assembly! The most useful thing.. that's a big reason why I use the Digital Mars C compiler. The inline assembly is so easy to use. When you look at gcc, you've got these bizarre strings and colons, and I look at that and say "What were they thinking?!"
When you look at the Digital Mars inline assembly, it's the same kind of thing you would write with a standalone assembler. It might not be able to do register clobbers, it might not be able to automatically infer those enregistered C variables, but you know what? It works and it is very easy to follow what's going on.
So, the next thing I did is defined my entry point. Which is... like ah, the compiler now expects _d_run_main and this is a... it automatically generates a C main file that references this, so if you don't use that name, it is going to complain about an undefined symbol. So go ahead and define it! And then the next thing to do is to get it running on the Linux environment. So I defined that function and said do the system call exit(2). It ran; it did not segfault. Huge, huge accomplishment at that point.
And this brings me to another point about process. When you're doing things like CTFE, it is difficult to really figure out what's going on at compile time. You don't have the compile time write line you don't really have a debugger. So, what I lean on is let's get it working in a regular environment. Get your compile time functions, just write them as a regular function, run it as a regular program. See what happens. Once it's debugged and you can print it to stdout, then go back and change it to a mixin. Then go back and use pragma(msg). Since CTFE runs such a large subset of the language nowadays, if you can make it work at runtime, it almost certainly will work at compile time, just by changing a few works.
And that was the same thing I'm doing in this kernel example. Bare metal is not really a special environment, It's still running the same code you run before. And you use like the GRUB bootloader, you don't even have to change the format of the file. All you have to say is hey GRUB, load this ELF file. And it makes it really easy. I was using stock dmd to develop this.
So then, once it's running, you can go back and start adding more and more of the functions. And what gets interesting here is how much of the language is actually in object.d, how much of it is magic stuff from the compiler.
One of the first things you'll notice is there's an
alias string = immutable(char). String is not really built into the D language. Arrays of chars are, yes, but string itself is just an alias in object.d. Then you've got the typeinfo where the size needs to match because the compiler outputs a memory map about all this stuff. So, for typeinfo_class, you look at it and it says, you've got the initializer array, which the compiler just puts out, you've got pointer to the vtable, again, the compiler just writes that out as static data.
So, you need to have that matching. And I opened an enhancement request a while ago (and still haven't followed up on it. Everyone opens enhancements, nobody follows up, except I guess Kenji) but what I'd really like to do this is instead of outputting a static blob of data that we need to match in the library, I would just say make those available through __traits.
For example, you can do classInstanceSize as a trait today, but you cannot get the initializer. So if you want to create a class, you need to 1) allocate the memory, 2) copy the initial data to it, and then 3) call a constructor. Those are the three steps that new class does.
And what if, instead of having to go typeid().init, we did some kind of trait? Well, what would be nice then is typeinfo would actually be opt-in. You would only have that initializer if you specifically requested it, and then you can do the layout however you want. And when you have a now, an empty object.d, ok it all works.
Now, not every form of TypeInfo can be done that way. As you move on like you use TypeInfo_Struct you have a simple thing. Then you add an array to it [the program] and that has I believe just one member which is TypeInfo next. So basically, you have an int array, if you do typeid on that, you get TypeInfo_Array. .next is now TypeInfo int [TypeInfo_i].
So, that kind of thing, I guess you could go ahead and pull that out with some is() expressions, but I think maybe it is easy to let the compiler continue making those.
But then what gets interesting with typeinfo int, right now if you look at druntime, you go into rt/typeinfo there's a whole bunch of manually written files. So it says well override int sizeof for typeinfo int, it returns 4. The compiler, of course, knows all of this and nowadays we have mixins and loops where we can automate that very quickly and very easily. And in fact, for the, my minimal.zip, which I would have brought if I was more prepared than a piece of paper but what I did in there was I did a loop where it goes foreach( TypeTuple!( all these built in types )) and it generates a class which pulls it out of __traits and mixes that in to make your TypeInfos.
and the only interesting there is the name. You look at those files and you see TypeInfo_i, TypeInfo_Aya, what's that about? But then you realize that's just the mangled name. i is the mangle of int, Aya is an array of immutable chars, and the compiler knows this, so all you have to do is when you're doing your mixin hit type.mangleof and it will form it and you try this and it works. You no longer need to write all this runtime type information manually.
And what gets fun here is extending it. If you use the built in typeinfo as a key into an associative array of more type info then you can expand this without even modifying the runtime. So one thing is you want to be able to print any random data as a string. With a template, that's really easy. You can call to!string from std.conv.
But if you have just a block of data, like in the D runtime, most of the functions take two arguments: a void* to the data and a TypeInfo to the type that it is. And then it uses the TypeInfo to get how large this data is, how cna I operate upon it. So if you're not working with templates and you want to print any random data, then you want to write a toString interface that takes that void* and then implements it. And you know at that point you just have a regular interface and class but you get to extend it anyway you want. And that is how to make a bridge between compile time traits and runtime reflection.
A similar thing is you can make a static constructor for each individual types. If you look in object.d there's this really cool thing called RTInfo and right now it says enum RTInfo = null. It was added for precise garabge collection years ago. Nobody's really done that much with it but it is one of the coolest hooks in the library because each and every user defined type gets that instantiated and I've used that in here for two different items.
One is you can run a lot of static checks on a type. A few years ago, Manu Evans was saying this virtual function thing is a huge problem, we need some way to figure it out. And I'm sitting there on the chatroom and I say well "challenge accepted".
At that point, we didn't have UDAs yet so I wrote this /hideous/ piece of code that reflected over an entire module and then said is that a virtual function? If yes, then look at a hand written list of acceptable functions. If it's not there, static assert(0), it failed. And that kind of thing, it sounds really really ugly, but you don't need a separate lint tool in D. You can do that kind of stuff if you want to use the runtime reflection. You can't actually do all of it, but you can do a lot of it. And if you hook that rtinfo in object.d you don't have to remember to mixin your checks on each module, you can just do it in one place and have it apply to your entire project.
Which is, I opened a pull request for that and it's been languishing because, again, I didn't follow up, but applying across your entire project means if you pull in libraries, it won't really work that well. I've got a work around for that too, the libraries can define their own mixin checks, you put them in individually, now it's getting complicated. But if you're in control of your own program, which you are in the kernel, and I understand Sociomantic uses a custom druntime as well and if you're a big organization and you need these very specific things, it's available to you and you can have a little fun.
Where was I? *awkward silence* Yeah, so once you got that reflection set up, you can go back and query it all at run time. A static constructor will be, you can define a separate static list [constructor] on module scope for each and every item and the compiler will automatically combine these for you and see that they're all run. And then you can build up global data to get what you need and query it later, just like any other global variable.
And what I really like about that is you can go absolutely wild. Anything you can imagine can be done at runtime. To bring it back to the kernel in D, though, static constructors are actually a D runtime feature. The compiler will output a list of modules and this is a bizarre symbol called _Dmodule_ref, but the linker puts all that together and as you loop over that, you'll see that the compiler provides pointers to its unittests, to its static constructors, to its static destructors, and what you need to do is run those yourself if you're stripping out the runtime.
So, going back to the idea of the program entry point, you gotta first find this data, loop over it, run all that stuff, and then finally call your actual main. And that's not as expensive as you would think. Even getting that working, I was up to about 20 KB for the minimal program.
So then you say, well, how about we have a little more fun? How much of this language can we get to work? And at that point, I started to look into classes. Classes need a lot of runtime support. Dynamic casting is done by a function called, you might have guessed it, _d_dynamic_cast. It follows a pretty predictable pattern.
And what that does is you look at the typeinfo [vtable entry 0] and say 'is this legitimately a base of that other class, if yes, go ahead and return the offsetted object'. Allocating a new class, that calls, _d_newclass and you can make that work.
And the next question, of course, is memory management. So now you're allocating memory, when does it get freed? The D runtime itself assumes that you have a garbage collector. And we've heard a lot of hate for the GC; I kinda love it. I remember, it's almost been ten years, but I would say "garbage collectors are for wimps who don't understand destructors." I would go on the C++ forums and say "all you need to do is define a struct with a destructor. What are you worrying about? That works really well." It actually does work really well, but so do garbage collectors in the vast majority of cases.
And then you stop thinking about ownership. There's one time I'm the Qt library (or is that pronounced 'cute'? But whatever) so I'm working with that and my question is well, who owns the memory that I just gave into it? So I started pouring through the documentation and I did get the answer but it took twenty minutes to figure out that the library in fact took ownership.
With the garbage collector, who cares? The GC owns it all, you just do it, you get it done, and you move on. Now the garbage collector gets hate for performance and I actually agree with that and spend a lot of time figuring out how to avoid it inside tight loops. But, Walter said earlier you're usually wrong about the hot spots. That's true about GC too, you go into it and say well the garbage collector is making my program slow. Are you sure? Go ahead and profile it and just confirm that before you go and rip out your whole architecture.
Now, if you, on the bare metal, I was hoping to port the D garbage collector to it and get that running. It did not come to pass.
So, instead I was, I wrote a very simple push-the-pointer memory allocator where you know it goes to allocate class, you need 80 bytes, just bump it up to the next one. And then the, I did not write a free. You know, malloc is easy to write. free, that's difficult. But once you're into that you can now suddenly start working with classes in the bare metal environment. And what's cool about that is how many things actually just work.
Virtual functions, they just work. All your class members, they just work. Thread local storage... doesn't just work. That drove me nuts. When you're working on the bare metal, you don't have the operating system to define these thread buffers for you so if you actually use one, the linker will throw you a warning and it's like undefined behavior whatever. Me? I ignore it, it's a warning, who cares about warnings? And then you run it and your program crashes in all kinds of miserable ways.
Luckily, D also supports the shared and __gshared storage classes, which let you tell it "trust me, I've got this under control" and once you do that, now you've got access to the whole below stack segment where you can put your static data, you can put global variables and get stuff done. But, now you've written yourself off from using the majority of D code out there. Even if you manage to implement the rest of the runtime functions, they're going to assume that you have TLS. druntime almost immediately allocates some thread local data data and then the garbage collector allocates a thread context for you and this is normal under circumstances, but when you don't have gc when you don't have threads, it's very annoying.
And moving on from there, you also need to understand where to free the memory. I didn't write free but you need to pretend that you did. And this is actually relatively easy because you can use scope(exit), you can use scope(failure) and scope(success). It makes the resource management can be written at the usage point immediately once you run it and interestingly that compiles down into try/catch/finally loops, or not loops, but finally statements, which I guess is Walter's lazy way of implementing it correctly so it really writes into what you would have done by hand but it looks a lot better, but that needs all the exception support which I'll get to in a minute.
But once you clear that and you free the memory, then you're more or less all set to write code in a C style. The big problem then is when you go into libraries. They wanna keep references to your stuff. They say well, this is a garbage collected language, who cares about ownership? So you can't really use a lot of libraries but you can use more than you think.
If you look at Phobos, the std.digest package is allocation free. base64 is allocation free. std.algorithm... almost never uses it. std.traits, rarely. And you can build a lot of really cool stuff off of just these modules. The problem is then you go import std.algorithm, it wants to get std.range. std.array, std.string, it wants to pull in about 30 modules when you want to use just one Phobos function, which irks me a lot.
When I wrote my little simpledisplay.d, I explicitly avoided Phobos at all. Because then you compile it and its done, its so fast, you think the program's not working. It is only compiling like my 6000 line monster module, but it's done in a snap. Then if you import any one module, now its, its also compiling 100,000 lines of Phobos code, it's instantiating all kinds of templates, it makes it feel slow.
I remember back when I was using D1, hit make, its done before you realize it. Wow, that is fast compile times. D2 has kinda lost that because there's so much CTFE which is brutally, brutally slow. One of my work projects, it uses my web.d which generates wrapper functions for each and every method in an object to make them available from the web. Really cool stuff. Then you hit make, memory usage goes up to 2 GB, it starts swapping, then next thing you know, two minutes later, your build is finally finished and yeah, so much for fast compiling times at that point. But, typical D does not have to worry about that.
Anyway, when we get into the runtime, another important thing is to disable the invariants and the bounds checking. Those are also runtime functions. When you do an array index, it will call _d_boundscheck which in turn calls _d_assertm, which then wants your full blown ModuleInfo available, next thing you know, you just it's rolled up you've gotta have 30 or 40 kilobytes of nonsense. And yeah, we have a 1 TB hard drive with 8 GB of memory, but 30 /kilobytes/ have some sense people, we can't have that!
So, I want, when I compile for bare metal, I like to use the -release and -noboundscheck functions, or switches to the compiler and it just simplifies how much you need. Otherwise, you have to define invariant.d, you have to define all these functions. It's doable, but it's a hassle.
So, now that we have the program running and we have a fair chunk of the D language working, we start to write code. And one of the most interesting pieces of code to write on bare metal is an interrupt handler. If you've never done that before, you've gotta get everything right or the processor, it will fault, which tries to trigger an interrupt handler to handle the error, that fails. And then next thing you know, it restarts. And you don't get error messages, it just resets, loads your system again, resets, loads your system again, it's not fun.
So a lot of times, when you see a hobbyest operating system development they tell you write these in plain assembly; have a separate file. I don't wanna do that! I wanna write D. So that's where the inline assembly comes in. And more than just that, you have naked inline assembly. You've got this disgusting pornographic view of your functions where everything's on display. There's no stack frame, there's no ret, you've gotta do it all yourself.
Which is exactly what you need in an interrupt handler! And you know, even if you're trying to write micro-optimized code, setting up that stack frame kinda costs a lot when you call it a bunch of times. But, you're supposed to inline it [the function] at that point, not really do naked functions.
But anyway, once you get into your interrupt handler, you need to have two things: one is a pointer to the function which I'll talk about in a minute and two is the actual handler itself. So what you do is you preserve the registers you use, you need to handle the interrupt, acknowledge it with the interrupt controller, which is a simple call to the out instruction, and then you return, you write iret. Easy enough, run the program.
Triple fault. What?! I sat on that one for about half an hour. Why isn't it returning? I called iret. Let's look at the disassembly... iret double [iretd]. Yup, sometimes you have to disassemble your assembly because the dmd compiler believe it or not outputs a 16 bit interrupt return opcode when you're in 32 bit more. You have to explicitly write iretd for double which is a 32 bit return.
And that's just the kind of joy to have when you decide to dive into this field. But, you know its really cool when it works because then you press a key and something happens and you say that's all my code. You press a key... nothing happens. You forgot the acknowledge the interrupt with the slave controller. Oops. But, you know, one thing at a time.
Now, the next thing is to tell the processor where these interrupts are. and the x86 architecture has some really bizarre memory layouts. It's compatible way back with like the 8080 processor and they kept adding stuff onto it. So instead of being a simple pointer to a function, no, its a length, its a, you've got the pointer broken up into like three things, you've got the low 16 bits, you've got a flags, and then they added 32 bits, so now you've got a middle 8 bits and an upper 8 bits, and it's all in this bizarre structure.
That's ok, you open up the documentation, you write your structure conforming to what you read... triple fault. What? Turns out when structs have what's called alignment. So for this one, you have, you need to have a ushort and a uint. And what the compiler does by default is aligns integers on four byte um addresses. So when you put your ushort, then you would be on a power of 2, or a multiple of 2, not a multiple of 4, so the integer is then put on your next one, which is down here. But what are these two bytes? Those are called padding.
And D's struct alignment has two different yet subtly different align functions. If you put align on the fields themselves, then instead of putting it on say that multiple of 4, you can say align(1) and it will put it on the next multiple of one, thereby putting your ushort and your uint right up together.
But then you say struct.sizeof. It still says 8. Why? Because the structure itself has an alignment. So, instead of really getting rid of that padding, you just moved it to the bottom. And that's where the other align directive comes in. You put that on the outside of the struct so you go align(1) struct InterruptLocation and now all of a sudden you can finally get down to sizeof equals 6.
So alignment on the fields pushes the padding to the end of the struct, alignment on the struct will get rid of the alignment, that padding bytes at the end. And if you wanna have your little compact, packed structs that you can put in an array and have no padding at all, you need to use both of those align.
Which it is so easy to forget. When I was working on my book one of the things that I put was struct alignment. And then I said, well why did I write align twice and I erased one of them. And then the reviews came back, your static assert failed, what kind of nonsense are you sending us?
So I look at and I'm like, oh, both of those are necessary and it wasn't until that point when I realized what the two of them actually do. It's kinda like the is() expressions too, until recently I would just copy/paste the same ones. I got it right once, let's copy paste that. And then, those of you who've done win32 programming, did you even know what RegisterClass was for the first like year you did it? I didn't. Well, the examples say you need to register a class. I'll copy paste it. And then some time later, I realized oh, that actually does something and it is useful to know what.
Well, anyway, just to finish the is expression, when I finally looked at the dlang.org documentation, it shows seven, seven different forms, you say well what do these have to do with one another? And when I was writing the book, I realized each one is actually an addition to the one before. So when you say is() you write a, the type you wanna compare and then either = or : which memory for that, colon is similar to inheritance, it lets you implicitly cast, equal is for an exact relationship. And then you write a mock declaration and then comma and a list of all the wildcards you used. And finally you have your is() expression and you can break down complex type definitions to anything you want. You can also stick an alias in it and this is like type #6 on the documentation, but it really just is the same idea, you alias the type that it happens to match.
Anyway, I'm off topic now. Like I had a topic to begin with. So now that you have your interrupts loaded, the next thing you want to do is start working with your hardware. And most hardware is actually really really simple to use. It's memory mapped: basically, as far as your program is concerned, it is just a magical array.
For example, if you write to a text output on a string, this is one of those memory addresses I remembered well when I was a DOS programmer because the printf was so brutally slow you always do it yourself and you don't wanna call the BIOS because that's insane. So you would go to address 0xb8000 and if you write to that, you'll now see a letter appear in the upper left of your screen.
And, how do you do that without going wild? And my answer was simply let's write a struct, we'll put in our own bounds checks and operator overloads. And operator overloads by the way do not require any runtime support. The compiler just magically rewrites it into regular functions. Very convenient.
And then once you have that running, you just write to it and you get stuff out. And then the question is what about that's updated by hardware? This value is going to change on you and you don't want the compiler to be caching that and this is the big volatile question and I looked into it and some people said use shared, that will tell it... it kinda does, it kinda doesn't. dmd you say well, dmd doesn't care at all, you don't need volatile in it, but that could change. There's only one way that I've found that you can rely upon: inline assembly. Note new peek/poke implementation just got in
And that's a bit crazy, because you wanna read and write to these registers and its a somewhat common operation and if you're writing you know, asm blocks all the time, you say, well what's the point of D? And that's where metaprogramming comes in. You can write these nice little mixin functions which will generate the asm for you and then you write it at a higher level, you still get the assembly so you know exactly what's happening and Michael talked about that yesterday too, that was a really fun talk, I'm glad I caught it on the live stream.
But that's more or less what you do. And inline assembly is just one of those things that gives you an infinite flexibility. Another thing on IRC they asked well I wanna have addresses as, the address of a label as a variable. And then, I don't wanna say you can't do that, you can do that, it is just gonna be ugly code. And then the next thing I know, I'm working on inline assembly and you can get it at runtime using the dollar sign, you can poke your own memory and pull the numbers out that way, not really acceptable.
But you can take the address of a naked function. You can use that workaround. I digress again.
Talking about taking addresses of functions, when pass a delegate to a function, that allocates memory, how do you handle that when you don't have the GC to clean up after you? The best answer is probably you don't: you should let that continue to be a linker failure where it goes there's no function _d_allocmemory, you won't accidentally use it you can work something else out.
I think the best way to do a closure is to write your own struct or class, copy the data you specifically need and then send that little object instead. That way, it's very clear what's happening, it's very clear what you want, what you need, and you have control over the ownership but that's not as beautiful as a built in function.
So, being a mad man, I went ahead and wrote _d_allocmemory and returned a block. Well, how do you free that? And that's when you break down the delegate into its two pieces. There's a function pointer and a data pointer in any delegate. The function pointer is pretty clearly you know just plain old data. The data pointer is what gets interesting. It might be a pointer to a class, it might be a pointer to a struct, it might be a pointer to some random stack frame copied over onto the heap. So what I did there is I know where my magical heap where _d_allocmemory comes from, so I check that data. Is it within that section of memory, if so, free it, if not, let somebody else handle it.
So again, is it possible? Yes. Is it beautiful code that you got done on time? Maybe, maybe not.
Oh, another, just a function that's fun to use, and that's switch on a string. This believe it or not just another very simple library call. It goes into _d_string_switch and it passes you a pointer to the string and an array of the compile time options. And the compiler is kind enough to sort that array for you so all you have to do is a quick binary search of that.
Now, fun fact, I was looking at the druntime source code and I looked at this _d_string_switch and I was like "why is it doing a linear search, this is stupid, you can rewrite this into a binary search" and I started to do a pull request for it, then I realized I was looking at the in contract, so lesson learned there, make sure you're actually running the code you're looking at and if you wanna call the code stupid, look for the beam in your own eye first, make sure you're not the one being a fool.
Now, let's talk about exceptions. To work with scope(exit), you don't strictly need the full exception support. In druntime there's a function there's a function called _d_throw, it's _d_throwc on dmd, just _d_throw on gdc. And this is a really cool function because it takes an object and then it looks up the exception handling data, which is just a static block the compiler put out and it uses that along with the base pointer which, a quick intro to the C calling convention.
What happens is the caller pushes your arguments onto the stack and then you enter your function. The first thing you do is you save a copy of the stack pointer into the bp [base pointer] register and then you move the stack pointer aside to make room for your local variables. In naked asm functions, the compiler does not do this, but it does tell you the size of those locals so you can do it yourself.
And you know, I did not know about that when I was writing assembly way back then, so I would use always static global variables and reuse them. It's not elegant; it's beautiful to allocate stack space. It's one of those where you know, looking at a C disassembly makes you a better assembly programmer and if yo know how the assembly works, you know how C works too and you won't make silly mistakes based on non-optimal instructions.
Well, anyway, once the D throw exception will use this information to find the return address on the stack and walk its way up. And the implementation on dmd, you've got separate little inline assemblies for OS X, for Linux, and for Windows uses an entirely different system. That uses the Microsoft Structured Exception Handling, which is a beautiful thing and I don't know the other languages don't bother.
That's also why on Windows, if you write to a null pointer, it throws an exception saying Access Violation instead of doing a segfault signal. Windows actually understands all that stuff, it is a beautiful system, I love Windows.
Anyway, once you copy/paste that function into your little object.d, now exceptions work and I just kinda assumed that there would be operating system support needed so I ran it on the bare metal and it worked and that's just a very cool thing. I think exceptions are beautiful.
And what's next is you move on to... I kinda wanna tell a story. The stack trace in druntime, it was previous allocated all at once, as soon as the exception was thrown and there was a bugzilla saying exceptions on D are 1000x slower than on Java, but it wasn't this unwinding code. That's just pulling a pointer from the stack and moving on, that's easy, it wasn't even the stack trace itself, because that's again, just walking up those pointers. It was converting it to strings, and that was the brutally slow part. I moved that to the toString which has some work arounds for const issues and const in D is something to talk about about for an hour, and now it takes like seven milliseconds and that's the way it should be. Or not even milliseconds it was microseconds. That's the way it should be.
So and that brings me to a point I have here and that's kernel in D, it sounds useless. Customizing your runtime, you close yourself off from the library ecosystem, that's useless. But the knowledge you get from playing around with this stuff, when you're willing to, "fortune favors the bold", "move fast and break things", do it! What's the worst that's gonna happen, you can always just reinstall your broken compiler. It's not like you're gonna fall off the bridge into the jagged rocks with sharkes and atomic bombs waiting for you.
But once you start doing this, you understand how the system works and then when you see those bugs, you have an idea where to look. When you see the strange and bizarre linker error, I put "associative arrays... NOT" because AAs.. you do not want to get into that code, it is absolutely hideous. Half of it is written like the regular arrays where you have these typeinfo and void pointers, the other half is written as a struct AssociativeArray in object.d that doesn't actually do anything, but if you don't have it, it won't link and sometimes the compiler forgets to output that template so you compile this giant program and it goes "undefined identifier like AyaAZ whatever whatever". that's another thing too, I can kinda mangle and unmangle some stuffin my head. When you look at it long enough it just starts to work but not under pressure right now.
Let me say, what's this all about? If you've seen it before, you know how to work around the bug. If you look on my cgi.d, way down at the bottom it goes void workAroundLinkerErrors and it goes writeln(typeid()) and all these various strange and bizarre associative arrays, it just works around that hideous associative array code to make it work and nobody wants to dig into it and fix it, there was a few attempts a while ago and it hit a brick wall because that is ugly, ugly code.
I can't get over it.
But another benefit of knowing these function names, let's say you're looking for GC allocations, you've determined that this is a problem and you want to get ride of it. Load it up in your debugger and do a search for gc_malloc, you'll find it, very quickly. You set the breakpoint, you run the program, and there is is. Then you walk up the stack and you see this is called from _d_arrayliteralX... static data in an array literal leads to a dynamic allocation? Yup. So what you have to do there is to literally call it static on the array definition and that will get rid of some of the allocations but you need to be very careful about that.
And understanding what these functions are from toying around it for your random hobbies, then you know where to look.
Scott, yesterday, called it the application of the tool, that's where things get fun.I guess another thing I wanna talk about is just something really cool. And for that struct padding bytes, you can do that all using the .offsetof and .sizeof properties in D itself and then if you throw all together into a string and do a pragma(msg) you can make the compiler throw out cool little diagrams. And I actually did write that program last night, but decided to use my fingers instead. But it's just one of those things where the D compiler tells you all of this information and you can just have your fun with it. You can write out diagrams with pragma(msg), you can read them back in using CTFE string and I was just telling Walter one of the things you know a string mixin is kinda like assembly language, it is really easy to learn. When you see mov EAX, 0, you just say, oh , you moved it into that register but then how do you build that up into a full program? And that's where the trickier things come in like the stack frames or the algorithms, so the tool itself is really easy and
So to work with string mixins, I like to write just a regular old parser, get the strings out of the way, build up an abstract syntax tree, do your manipulations on that, then convert it back into a string to print it. When you try to parse strings and work with them all in one big mess of code, you'll get lost. Break it into a traditional lex, parse, manipulate, toString steps and it is a lot easier to do. It might be more code, but when you look back at it later, you won't see you know concatenation operators all over the place.
And the concatenation operator, that thing is evil. The append operator is very nice, when you're doing memory allocation and you append to something, you just grow the block, it works. When you don't have the GC or if you do and don't want to use it too much, the a ~ b operator you'll hate it. It can generate so many temporaries that just get thrown away, you can't track where they are if you want to manually free those temporaries. Don't use it in the bare metal, I left it completely unimplemented so whenever I accidentally use it, throws an error so you can go back to a custom type, you form your own buffer, you overload the append operator, ~=, use that, you'll save yourself a lot of trouble.
But uh, yeah.
Andrei Alexandrescu: "Are you stoppable?"
Me: I don't know!
Andrei: "haha well, so OK um executable review, so it is 12:00, there's no perfect conference without a slideless talk, this is like absolutely like amazing. Adam seems to be unstoppable. We should take a few questions. I should say, I looked at his notes so there's one page, you would think it is like all this title case, nice printed stuff, no, no other human on the planet can read this stuff."
Me: "Oh, you should see the beginning of this book. Every so often you can make out a LOL but that's it."
Andrei: "You should scan them. OK, Adam, I think you should take a few questions, there's a couple online and look at the hands."
Me: "I thought I was gonna have like an hour left, I talked way longer than I should have. Anyway, let's get started in the room. Yeah, we'll start here."
Question: "So you mentioned you were using dmd right?"
Question: "Is there a reason to use dmd over gdc which has a beta support for bare metal?"
Me: "Not particularly, I just had dmd installed on my computer. And the main reason there is a couple years ago, dmd would download the zip and run it and it would work. gdc wanted me to compile and bootstrap. That's now been fixed, ldc has a beautiful binary download, gdc now has binary downloads for cross compilers and it's amazing. I like it a lot now. But I'm still a dmd guy just because I'm too lazy to switch."
Question: "Should I buy your book and why?"
Me: "You should because it's genius."
Me: "Like I said, there's a tip about hip firing AK-47s. That's a selling point in any programming book!"
Question: "Just a comment about that, I've been trying to buy the book since yesterday, it's been about 12 hours, and uh it doesn't take the dconf2014 code. The only way you can get it off, get the discount is if you buy two or more copies."
Me: "Really? I would have to ask them about that, that was not my understanding."
Commentator: "Actually, it does work, I had to enter the code multiple times and on like the second or third try, it actually went through. So try something random, then try the code again."
Andrei: "How many copies are gonna turn up? You understand this is being livestreamed so everybody knows the trick."
Question: "You're discussing bare metal, does this apply for a VM, would that be an easier thing? Is this really bare metal or is it bare VM?"
Me: "I did run one on a floppy disk just to prove that it worked, it did, but a VM is infinitely easier. Once it starts triple faulting, you don't want your floppy disk to go *floppy disk sounds* you don't wanna listen to that, so yeah, use a VM."
MC: "We have a question from the livestream, you had a little rant on associative arrays there, but what's the worst feature in D, in your opinion?"
Andrei: "How long do we have?"
MC: "Just one!"
Me: "That's difficult to do because I don't really see features as being bad. Sometimes, I see them as being suboptimal so I try to work around it, but I never let anything get in my way, so if there is a bad feature, I've learned over the years to just avoid it."
MC: "That sounds like wise words to finish this talk so I'm gonna suggest we actually stop, obviously you're going to be around all day, oh we have just one more?"
Question: "I wanted to add associative array, for the next release, we turned it into UFCS type actually, so there's quite a big improvement so this template is gone."
Question: "And we're going to be working on doing a real library type and a real lowering in the compiler side."
Me: "That'll make it a lot easier to use."
MC: "You're getting a huge amount of electronic clapping from the channel and I think everyone here has really enjoyed your talk, so thank you very much indeed."
MC: "And we'll be back at 1:30."