The D Windowing System

Legal note: Note that I'm reserving all my legal rights to the information here. I'm most likely going to public domain or maybe GPL everything, but I haven't decided for sure yet, and want to keep the door open to other options. If this scares you, best to stop reading now.

NOTES TO SELF: there should be some kind of magic localtime function for displaying.

Spoiler space (lol)







Other DWS pages:

Document status: first draft in progress. Parts just plain aren't done, what is done has been written over several sessions sitting down at the computer, and none of it has been proofread or edited yet, so bare with me if I repeat myself or even contradict myself at times. It'll get better when I come back to it.

There's a public DWS manager up now to play with it. Go here.

Last updated: Oct 30. See above - there's a manager running publically with binaries for download. So cool. From before: Code! tar.gz here or just surf to the files in your browser: my generic libraries (includes unrelated stuff) and dws code. The code is a mess of old and new ideas. dws-test has the client stuff in it, and dim-cpp has the viewer. (The names are leftovers from the old projects this was spawned from.) You'll have to fix the makefiles to compile it, and I don't think it will actually compile on Windows, except for the viewer, which is just Qt/C++, since my D network code is linux specific right now. I'll write more about the specifics in there as I find the time.

Check back for updates about once a week, or shoot me an email destructionator@gmail.com with 'DWS' in the subject and I'll let you know when I make major updates. Importantly, the code I have written so far will be posted soon, so check back, probably next Sunday night (the 25th) to see that.

Also feel free to email me any questions and comments.

Final note before getting started: everything here is subject to change as I code it up - pretty well everything here has indeed already been revised multiple times; I've never been much of a big design up front set in stone guy. Even the names may change - dws is just what I'm calling it now. I like it though.

Why I got started

This is actually the combination of three old projects, all in one: a better text app API, a detachable GUI system, and a superior networked program comprehensive API. Bring them together and you get this.

Anyway, the main thrust is something like this: I find myself fairly annoyed with GUI programs on my Linux box. It is cool that X programs can run over the network, but it is lacking in several areas: it is slow to start up, laggy to use, and is tied to the box where I started the program.

For terminal apps, you can run them inside GNU Screen, which allows you to detach them from the place where you started them and resume using the program from another location. I thought it would be cool if I could do this with GUI programs, too.

At first, I looked into making some kind of X proxy, which would act just like Screen, but for X programs. However, this would be a lot of work, and I don't think it would give that much benefit - the programs would still be slow.

Instead, I figured I'd make my own system. If I design the protocol around my own needs, using my experience with other systems like X11 and Remote Desktop on Windows, the good and the bad, I think I can do a better job.

But I can also go beyond what Screen for X could offer: I could also make the programs appear native when viewed from different platforms. I could make the level of abstraction high enough (for the common cases) that I could even view my programs from a DOS machine running in text mode! This would allow me to leverage some of my older hardware to use my same programs that I use on the desktop.

Best of all, I can write the API for the best programming language evar, The D Programming Language! (Though the protocol is simple and can be spoken by just about anything - on the lowest level, it looks like a C API.)

After I got started, I noticed there is another bonus to this: it could kill the horrible idea of web apps by doing everything they do, better. Yay! Moreover, it has potential to bring more capabilities to mobile phones: the viewer program is a fairly simple one, and is given enough flexibility in how it implements its details that you may be able to get a pretty nice experience running your desktop apps, displayed on your phone.

Overview

There are several pieces to the D Windowing System (henceforth DWS):

The way it works is: you write a client, using the DWS API to handle your GUI and other operations. The client connects to the manager, a helper service that runs on the local machine and emulates a viewer (sort of), storing your current state. Then, the user connects a viewer program to the manager and is able to use and interact with your user interface.

The viewer may disconnect at any time - the manager serves as a buffer between it and your client, insulating you from those nasty details. Similarly, the clients can come and go, all sharing one viewer. From the user's perspective, the manager service doesn't exist.

Even from the client's perspective, you can program it pretty naturally, without worrying about the details of the network viewers. This data is made available to you for optimization purposes, but you can basically just carry on without worrying about it.

The guts of the implementation

Here, I'll go into detail about what I have now and how it works deeper down. If you use the high level library, you shouldn't have to worry about any of this, but it might be interesting to know anyway. You'll definitely want to read it if you want to hack my code!

The protocol

One of the main goals of the protocol, if not the main goal, is to be as fast as possible, even on slow remote links. A competing, but complementary goal, is to also keep the bandwidth usage down. These often go together - a lean protocol is usually a fast one - but, not always. In the cases where they don't go together, there should be a way to prioritize one or the other.

With that in mind, I started writing. Using a text protocol, or wrapping inside layers upon layers of other stuff would be a pain to code, would waste bytes, and would slow things down. XML over HTTP is right out!

Instead, I went with a binary protocol. I wanted it to be fairly future proof, but not at the great expense of speed. I also wanted it to be fairly simple to implement in multiple languages.

I ended up settling on this: a session would be a series of messages, stacked one after another. Each message would consist of a length field of bytes of the message, a function number, and then the function arguments (note: non-void functions have an implicit int argument - request ID, which is used for sending back the return value). The arguments would be encoded differently based on their low level data type, which is either numeric or array.

Data types

Numeric data types (int*) would be encoded using a system inspired by MIDI: one or more bytes. You read a byte, and if the most significant bit is set, you read another byte and or it on to this one. The little byte is stored first.

The D language code to implement it as it is now (copy/pasted and commented from my struct that handles this) is:


int toInt() {
	int sum = 0;

	for(int w = 0; w < MAX /* used */; w++) { // MAX is just the size of my buffer. In theory, it could be infinity
		int v = value[w] & 0b_0111_1111; // value is the buffer of bytes we received

		v <<= (7 * w); // shift it left seven bits for each byte we've read so far
		sum |= v; // or it on to our running count

		if(!(value[w] & 0b_1000_0000)) // if the MSB isn't set, we're done.
			break;
	}

	return sum;
}

The advantage of this is the most common case - small numbers - take only one byte on the wire. At the same time, it remains extensible to handle any number you need to send. I didn't have to decide on a valid number range in the function definition. This makes it small, flexible, and good for future growth.

While there is a cost in encoding/decoding the data, this is utterly tiny compared to network latency, and isn't very significant on its own anyway. For the most common case, you're just testing it once against a bit pattern - even in a tight inner loop, I really find it hard to get too worked up over two basic asm instructions.

* Note: the int as it is right now, is unsigned. I plan to add a proper int/uint separation before finalizing the protocol for any serious release. The implementation works with signed vs unsigned right now though de-facto reinterpret casts: -1 is sent down the wire as 4294967295, then casted back to signed on the other end. It works, but is obviously a waste of bytes - exactly what I wanted to avoid. When I fix it, I'll make uint work exactly as int does now, and the signed int... well, I haven't decided how yet. I'm leaning toward just having a sign bit in there, so the second to last bit in the sequence is set if negative, reset if positive. This means there is a negative zero, which is silly and wastes a number, but meh, it is simple. Perhaps I'll go for a two's complement. I'll decide later.

Of course, int isn't the only basic data type. There are also arrays, which come in two low level types: string and byte[]. All arrays work the same way. First, there is a uint which tells the payload length, in bytes (not number of array elements!), then the actual payload follows immediately, with no trailing terminator.

The string payload is assumed to be UTF-8, just like the string in D. Other arrays can be defined differently. Currently, I have only defined one other type of array: the array of bytes, byte[]. The protocol doesn't really care what is in there - to it, all arrays are just plain bytes - the different names are a tip to the user and my protocol code generator (been wondering why I'm giving the protocol syntax? That's it - it is what my code generator reads).

Note, of course, that these are the low level data: function A may interpret its int differently than function B. In the high level API, they would different typedefs/classes to enforce this.

What about the function number? It is just an int that is the index of the function into a list of calls. This allows easy, efficient implementation by using an array of function pointers. My implementation doesn't actually do this, but it could. More importantly though, it allows operations to be very short on the wire. A function with no arguments can take as few as two bytes: a length, then the function number. You can stack hundreds of commands in a single network packet!

IDs

Several functions, including all non-void functions, take some kind of resource ID. However, while you might expect to see a signature like int createMainWindow(); , where the ID is given to you when the resource is allocated, you instead see void createMainWindow(int wid); .

Why? How can you pass an ID for a resource that you are creating? The reason is very simple: if the ID was returned to you by the remote process, you'd have to create the resource, then wait on a round trip to the server to get the id, then make another request initializing it. This is obviously going to be laggy - we want the creation and initialization to all take place at the same time. You should be able to use the resource immediately, without waiting on round trip lag.

Hence, all functions which take ID arguments allow you to dictate what they will be. Request IDs, window IDs; everything. As long as you are consistent in using the same unique ID when referring to the same resource, this works beautifully.

Buffering

As you can see from above, limiting round trips on the network and packing as much as possible into one packet are things I'm going for, but remember, they aren't necessarily goals in and of themselves. More importantly, a super-goal if you will is to keep the program fast and responsive for the user.

If limiting bandwidth was the only concern, we'd want to buffer everything into one packet, but it isn't. Where do we draw the line and send the data to the server? Also, how can we do this as transparently as possible from the client API side?

All viewer side network buffers must be flushed whenever:

They should also be flushed periodically without being required, except when operating in low bandwidth mode. This is to help fight latency (sending data in the background so the user doesn't have to wait as long when he triggers something) and to protect against viewer crashes or network failures, by ensuring the state the application has is never far behind what the user has entered.

Client side network buffers should be flushed when:

Predictive Paths

This is what I'm calling a feature where you can send a series of commands to the viewer to be executed on the trigger of a certain event. I guess it is basically like shooting a delegate over - indeed, that actually seems like the ideal API! (This isn't implemented in any form yet, hence the details are certain to change.)

The idea is to make things like dialog boxes be more responsive. Instead of:

	> Menu option clicked
	< Create Dialog [...]

(A server round trip, and bandwidth eaten if used multiple times!) You would do:

	< StartDelegate (delegate); CreateDialog [...]; EndDelegate; SetDelegateOnTrigger(delegate, trigger);

And the trigger never needs to go back to your program at all - you needn't concern yourself with the busywork on demand. The user just sees his dialog instantly, just as if it was a local program.

Protocol generator

Todo: discuss the code generator.

Protocol reference

See the code generator's input file here: dws.c.html. The filename gets a .c extension primarily so I get syntax highlighting in it - the syntax is similar to C and D, but actually is neither. This file is a work in progress and is littered with my notes, some of which no longer apply, and many of which only make sense if you know what I was thinking at the time that I wrote them.

I'll do a proper reference later, after more of it is finalized. Hopefully, most the names will help you make sense of what I'm doing there until then.

Important note to understand the file: the server: label lists requests your app makes of the server. The client functions are requests the server makes of your app. Think of them as a list of abstract virtual functions that the server and client, respectively, must implement to speak the complete protocol. (Indeed, this is what the code generator spits out, along with a bunch of boring boilerplate.)

Implementation note: the function ID number, which I discussed above, is currently linear in this file. The first function listed is ID 0, the next is Id 1, and so on. This is a limitation of the current code generator that I don't intend to keep by the time this is beta release quality, but knowing this fact may help you debug the network stream.

Widgets

Rather than using lower level input and output, you should try to use the built-in widgets. The widgets are defined in terms of their high level interface, allowing the viewer freedom on the implementation. This means that they will (can) appear native on different platforms, and most importantly, will be faster for the user, since all the details are handled on the viewer.

Widgets send higher level events: triggered, value changed, etc., rather than key up and key down events, etc..

Conceptually, you can think of widgets as being loosly polymorphic, all holding the same properties, but interpreting them differently. This table shows the widgets on the left, the properties on the top, and the meaning in the body: (nop == no operation and * == write only - the user/viewer can never change it)

Widget Type value value2 text image* lowerLimit* upperLimit* Description
0: LineEdit nop nop The text in the control nop nop The maximum length of the input, in characters This is the single line entry and editing widget. It should send the triggered signal when the user presses enter.
1: TextEdit nop nop The text in the control nop nop The maximum length of the input, in characters This is a multi line, plain text edit box, like in Windows notepad. It is triggered on each press of enter.
2: RichEdit nop nop The text in the control nop nop The maximum length of the input, in characters This is a one piece widget that edits rich text. It should allow the user to pick formatting options, like bold, etc. The viewer is free to implement however it wants. My idea of it is like Windows WordPad's text entry field and toolbar, packed into one. Like the other edits, it sends triggered on the press of enter.
3: Button nop nop The text on the button The icon that might be displayed on the button nop nop A plain old push button that the user can click. It should send the triggered signal when it is clicked.
Spacer
Checkbox
RadioButton
DropDownList
ListBox The selected item's id nop nop nop nop nop Displays a single select list box. The value is the item currently selected. It sends triggered when an item is double clicked.
Label Icon location: N S W E BG nop The text displayed on the screen This is the icon displayed in the label nop nop This simply displays some text and/or an image in a location.
HTMLViewer
ComboBox
SpinBox
ScrollBar
Slider
ProgressBar
ActionButton
Linear I/O
Widget Type value value2 text image* lowerLimit* upperLimit* Description

Other properties widgets have include: readonly, disabled.

My hope is that these basic widgets, which can be implemented on just about any platform in their common form, including the browser and pure text mode, would cover some 80% of applications, and hopefully even 80% of the other 20% of apps.

TODO: write overview on what widgets and canvases are and how the base classes (in principle) work and how triggers work... lots to do.

Canvases

Graphical Canvas
Text Canvas
3D Canvas

Windows

Main Windows
Consoles
Frameless Windows
Game Windows

The manager

Security

The viewer

API

Client applications

I'll discuss the clients I'm writing for it and show a bunch of working example code.

Capabilities

Sound

Beeps
Alerts
Waveform

Midi

Video

Local file access

It annoys me when running remote X when I want to open a file saved on my laptop, but the program is running on my desktop. Remote Desktop on Windows offers a way around this: \\tsclient\whatever can do SMB mounting of your drives/folders. This pwns, making file transfer between the two boxes easy, and making opening programs in various apps cake.

Linux with some sshfs magic can actually do this moderately acceptably too, btw.

Ideally, I'd be able to offer full integration like Remote Desktop. This might be a pain to implement though, and I'm willing to settle for a lot less. What I plan to do for early versions at least is offer a basic set of low level APIs for reading and writing files that the user grants you access to. You shouldn't do this often - store files on the same box your program is running for best results - but it is still a nice thing to have available.

Serial port

Serial devices should work though the local file access too. It can recognize magic filenames (COM1, etc.) and talk to the port for you. Naturally, the viewer could just shoot back 'permission denied' if you don't want to give remote apps access to your hardware. The usefulness of this, however, would be in running thin terminals with barcode readers or something like that.

Other forms of input - Joysticks

It would be awesome if I could port my D1 game engine to this DWS and have it actually be playable. This is waaay down the line though.

Printers (and scanners?)

I'd like to be able to connect to an app on my home computer from a work computer, and feel like I'm running the app locally. This includes things like printing Just Working. (Naturally, I'd like print on the home computer to work too, even when remote, but that's fairly trivial. I want the default to feel local though.)

Scanners would be nice too, for input the same way.

What I'm leaning toward doing is just defining it as 'tell me what printers you have' -- 'here print this postscript or whatever to printer A' and letting the viewer and local OS figure out how to actually run the printer. It would be stupid to have to put drivers on the manager box to run a remote printer, and it would be stupid to have to put a file conversion utility on the viewer to do your custom app. There needs to be just a simple exchange format.

Idle reporting

Non-capabilities

The DWS makes no attempt at providing an X11 like window management framework. Leave that to Windows or X.

Miscellaneous

Standard input and output (and stderr)

I haven't decided yet. I kinda want them to be shoved through to a window that you can display if you are interested in them. I find printf debugging to be pretty useful. Now, on Linux, which is the only platform with a manager so far, the std* just go to the launching terminal, which is actually ok by me. Windows just kinda bitbuckets it.

I'll make a decision on it at some point, but I'm in no rush.

Rich text

I'm pretty tempted to just go with a small subset of HTML. It is well known, not terribly difficult to implement, and can be expanded upon later. Other formats can naturally be converted to and from in the API. Or, maybe go with a lighter binary format, and have the API do conversions to it. But I'm not too concerned about shaving off every extra byte in rich text. I doubt the overhead of html tags will be a large percentage of the payload.

HTML is what works now though, anyway.

Tabbed windows

I'm of the opinion that tabbed interfaces are the job of the window manager, not the application. Similar windows should be automatically (or not, at the user's demand) tabbed together rather than every app implementing the tabs over and over again.

But since existing window managers don't do this well, if at all, the DWS will offer some way to making this happen in a pretty way. What you do is give the windows a class or something and specify where you want the tab bar to appear, if there is one. Then you just create multiple windows, even if from separate processes, and let the viewer do the tabbing.

Why a new API???

Why not make existing programs able to get these benefits? Well, it is just harder to do, and I wanted a prettier D API anyway.

About the author

My name is Adam Ruppe. I'm a for contract programmer for my main job, with most my work currently being writing web applications and marketing websites. The upside is it pays the bills. The downside is the web sucks (I could pontificate for megabytes on that), and work eats up a huge amount of my computer time, leaving less than I'd like to do my fun personal projects, like this one.

Combine that with various other IRL things and it means progress here is going to be very slow. Be patient with me though - it'll be awesome even before it's done.


© 2009, Adam D. Ruppe. All rights reserved.