www.digitalmars.com

D Programming Language 2.0

Last update Sat Apr 7 20:54:31 2012

arsd.cgi

Provides a uniform server-side API for CGI, FastCGI, SCGI, and HTTP web applications.

	import arsd.cgi;

	// Instead of writing your own main(), you should write a function
	// that takes a Cgi param, and use mixin GenericMain
	// for maximum compatibility with different web servers.
	void hello(Cgi cgi) {
		cgi.write("Hello, world!");
	}

	mixin GenericMain!hello;

Concepts:

Input:
get, post, request(), files, cookies, pathInfo, requestMethod, and HTTP headers (headers, userAgent, referrer, accept, authorization, lastEventId

Output:
cgi.write(), cgi.header(), cgi.setResponseStatus, cgi.setResponseContentType, gzipResponse

Cookies:
setCookie, clearCookie, cookie, cookies

Caching:
cgi.setResponseExpires, cgi.updateResponseExpires, cgi.setCache

Redirections:
cgi.setResponseLocation Other Information: remoteAddress, https, port, scriptName, requestUri, getCurrentCompleteUri, onRequestBodyDataReceived Overriding behavior: handleIncomingDataChunk, prepareForIncomingDataChunks, cleanUpPostDataState

Installing:
Apache, IIS, CGI, FastCGI, SCGI, embedded HTTPD (not recommended for production use)

Guide for PHP users:
If you are coming from PHP, here's a quick guide to help you get started:

$GET["var"] == cgi.get["var"] $POST["var"] == cgi.post["var"] $COOKIE["var"] == cgi.cookies["var"]

In PHP, you can give a form element a name like "something[]", and then $POST["something"] gives an array. In D, you can use whatever name you want, and access an array of values with the cgi.getArray["name"] and cgi.postArray["name"] members.

echo("hello"); == cgi.write("hello");

$SERVER["REMOTE_ADDR"] == cgi.remoteAddress $SERVER["HTTP_HOST"] == cgi.host

See Also:
You may also want to see dom.d, web.d, and html.d for more code for making web applications. database.d, mysql.d, postgres.d, and sqlite.d can help in accessing databases.

If you are looking to access a web application via HTTP, try curl.d.

template ForwardCgiConstructors()
If you are doing a custom cgi class, mixing this in can take care of the required constructors for you

class Cgi;
The main interface with the web request

enum RequestMethod;
the methods a request can be

this(long maxContentLength = cast(long)5000000, const(immutable(char)[][string]) env = null, const(ubyte)[] delegate() readdata = null, void delegate(const(ubyte)[]) _rawDataOutput = null, void delegate() _flush = null);
Initializes it using a CGI or CGI-like interface

void dispose();
Cleans up any temporary files. Do not use the object after calling this.

NOTE:
it is called automatically by GenericMain

const void onRequestBodyDataReceived(size_t receivedSoFar, size_t totalExpected);
you can override this function to somehow react to an upload in progress.

Take note that parts of the CGI object is not yet initialized! Stuff from HTTP headers, including get[], is usable. But, none of post[] is usable, and you cannot write here. That's why this method is const - mutating the object won't do much anyway.

My idea here was so you can output a progress bar or something to a cooperative client (see arsd.rtud for a potential helper)

The default is to do nothing. Subclass cgi and use the CustomCgiMain mixin to do something here.

this(BufferedInputRange ir, bool* closeConnection);
Initializes the cgi from completely raw HTTP data. The ir must have a Socket source. *closeConnection will be set to true if you should close the connection after handling this request

this(BufferedInputRange inputData, string address, ushort _port, int pathInfoStarts = 0, bool _https = false, void delegate(const(ubyte)[]) _rawDataOutput = null, void delegate() _flush = null, bool* closeConnection = null);
Initializes it from raw HTTP request data. GenericMain uses this when you compile with -version=embedded_httpd.

NOTE:
If you are behind a reverse proxy, the values here might not be what you expect.... FIXME somehow.

Parameters:
BufferedInputRange inputData the incoming data, including headers and other raw http data. When the constructor exits, it will leave this range exactly at the start of the next request on the connection (if there is one).
string address the IP address of the remote user
ushort _port the port number of the connection
int pathInfoStarts the offset into the path component of the http header where the SCRIPT_NAME ends and the PATH_INFO begins.
bool _https if this connection is encrypted (note that the input data must not actually be encrypted)
void delegate(const(ubyte)[]) _rawDataOutput delegate to accept response data. It should write to the socket or whatever; Cgi does all the needed processing to speak http.
void delegate() _flush if rawDataOutput buffers, this delegate should flush the buffer down the wire
bool* closeConnection if the request asks to close the connection, *closeConnection == true.

struct UploadedFile;
This represents a file the user uploaded via a POST request.

static UploadedFile fromData(immutable(void)[] data);
If you want to create one of these structs for yourself from some data, use this function.

string name;
The name of the form element.

string filename;
The filename the user set.

string contentType;
The MIME type the user's browser reported. (Not reliable.)

bool contentInMemory;
For small files, cgi.d will buffer the uploaded file in memory, and make it directly accessible to you through the content member. I find this very convenient and somewhat efficient, since it can avoid hitting the disk entirely. (I often want to inspect and modify the file anyway!)

I find the file is very large, it is undesirable to eat that much memory just for a file buffer. In those cases, if you pass a large enough value for maxContentLength to the constructor so they are accepted, cgi.d will write the content to a temporary file that you can re-read later.

You can override this behavior by subclassing Cgi and overriding the protected handlePostChunk method. Note that the object is not initialized when you write that method - the http headers are available, but the cgi.post method is not. You may parse the file as it streams in using this method.

Anyway, if the file is small enough to be in memory, contentInMemory will be set to true, and the content is available in the content member.

If not, contentInMemory will be set to false, and the content saved in a file, whose name will be available in the contentFilename member.

Tip:
if you know you are always dealing with small files, and want the convenience of ignoring this member, construct Cgi with a small maxContentLength. Then, if a large file comes in, it simply throws an exception (and HTTP error response) instead of trying to handle it.

The default value of maxContentLength in the constructor is for small files.

immutable(ubyte)[] content;
The actual content of the file, if contentInMemory == true

string contentFilename;
the file where we dumped the content, if contentInMemory == false. Note that if you want to keep it, you MUST move the file, since otherwise it is considered garbage when cgi is disposed.

void requireBasicAuth(string user, string pass, string message = null);
Very simple method to require a basic auth username and password. If the http request doesn't include the required credentials, it throws a HTTP 401 error, and an exception.

Note:
basic auth does not provide great security, especially over unencrypted HTTP; the user's credentials are sent in plain text on every request.

If you are using Apache, the HTTP_AUTHORIZATION variable may not be sent to the application. Either use Apache's built in methods for basic authentication, or add something along these lines to your server configuration:

RewriteEngine On RewriteCond %{HTTP:Authorization} ^(.*) RewriteRule ^(.*) - [E=HTTP_AUTHORIZATION:%1]

To ensure the necessary data is available to cgi.d.

void setCache(bool allowCaching);
Very simple caching controls - setCache(false) means it will never be cached. Good for rapidly updated or sensitive sites. setCache(true) means it will always be cached for as long as possible. Best for static content. Use setResponseExpires and updateResponseExpires for more control

bool gzipResponse;
Set to true and use cgi.write(data, true); to send a gzipped response to browsers who can accept it

const string getCurrentCompleteUri();
This gets a full url for the current request, including port, protocol, host, path, and query

void setResponseStatus(string status);
Sets the HTTP status of the response. For example, "404 File Not Found" or "500 Internal Server Error". It assumes "200 OK", and automatically changes to "302 Found" if you call setResponseLocation(). Note setResponseStatus() must be called *before* you write() any data to the output.

void setResponseLocation(string uri, bool important = true);
Sets the location header, which the browser will redirect the user to automatically. Note setResponseLocation() must be called *before* you write() any data to the output. The optional important argument is used if it's a default suggestion rather than something to insist upon.

void setResponseExpires(long when, bool isPublic = false);
Sets the Expires: http header. See also: updateResponseExpires, setPublicCaching The parameter is in unix_timestamp * 1000. Try setResponseExpires(getUTCtime() + SOME AMOUNT) for normal use.

Note:
the when parameter is different than setCookie's expire parameter.

void updateResponseExpires(long when, bool isPublic);
This is like setResponseExpires, but it can be called multiple times. The setting most in the past is the one kept. If you have multiple functions, they all might call updateResponseExpires about their own return value. The program output as a whole is as cacheable as the least cachable part in the chain. setCache(false) always overrides this - it is, by definition, the strictest anti-cache statement available. If your site outputs sensitive user data, you should probably call setCache(false) when you do, to ensure no other functions will cache the content, as it may be a privacy risk. Conversely, setting here overrides setCache(true), since any expiration date is in the past of infinity.

void setCookie(string name, string data, long expiresIn = 0, string path = null, string domain = null, bool httpOnly = false, bool secure = false);
Sets an HTTP cookie, automatically encoding the data to the correct string. expiresIn is how many milliseconds in the future the cookie will expire.

TIP:
to make a cookie accessible from subdomains, set the domain to .yourdomain.com. Note setCookie() must be called *before* you write() any data to the output.

void clearCookie(string name, string path = null, string domain = null);
Clears a previously set cookie with the given name, path, and domain.

void setResponseContentType(string ct);
Sets the content type of the response, for example "text/html" (the default) for HTML, or "image/png" for a PNG image

void header(string h);
Adds a custom header. It should be the name: value, but without any line terminator. For example: header("X-My-Header: Some value"); Note you should use the specialized functions in this object if possible to avoid duplicates in the output.

void write(const(void)[] t, bool isAll = false, bool maybeAutoClose = true);
Writes the data to the output, flushing headers if they have not yet been sent.

void close();
Flushes the buffers to the network, signifying that you are done. You should always call this explicitly when you are done outputting data.

const nothrow T request(T = string)(in string name, in T def = T.init);
Gets a request variable as a specific type, or the default value of it isn't there or isn't convertible to the request type.

Checks both GET and POST variables, preferring the POST variable, if available.

A nice trick is using the default value to choose the type:

			/*
				The return value will match the type of the default.
				Here, I gave 10 as a default, so the return value will
				be an int.

				If the user-supplied value cannot be converted to the
				requested type, you will get the default value back.
			*/
			int a = cgi.request("number", 10);

			if(cgi.get["number"] == "11")
				assert(a == 11); // conversion succeeds

			if("number" !in cgi.get)
				assert(a == 10); // no value means you can't convert - give the default

			if(cgi.get["number"] == "twelve")
				assert(a == 10); // conversion from string to int would fail, so we get the default
You can use an enum as an easy whitelist, too:

			enum Operations {
				add, remove, query
			}

			auto op = cgi.request("op", Operations.query);

			if(cgi.get["op"] == "add")
				assert(op == Operations.add);
			if(cgi.get["op"] == "remove")
				assert(op == Operations.remove);
			if(cgi.get["op"] == "query")
				assert(op == Operations.query);

			if(cgi.get["op"] == "random string")
				assert(op == Operations.query); // the value can't be converted to the enum, so we get the default

const bool isClosed();
Is the output already closed?

immutable immutable(char[][string]) requestHeaders;
What follows is data gotten from the HTTP request. It is all fully immutable, partially because it logically is (your code doesn't change what the user requested...) and partially because I hate how bad programs in PHP change those superglobals to do all kinds of hard to follow ugliness. I don't want that to ever happen in D.

For some of these, you'll want to refer to the http or cgi specs for more details. All the raw headers in the request as name/value pairs. The name is stored as all lower case, but otherwise the same as it is in HTTP; words separated by dashes. For example, "cookie" or "accept-encoding". Many HTTP headers have specialized variables below for more convenience and static name checking; you should generally try to use them.

immutable immutable(char[]) host;
The hostname in the request. If one program serves multiple domains, you can use this to differentiate between them.

immutable immutable(char[]) userAgent;
The browser's user-agent string. Can be used to identify the browser.

immutable immutable(char[]) pathInfo;
This is any stuff sent after your program's name on the url, but before the query string. For example, suppose your program is named "app". If the user goes to site.com/app, pathInfo is empty. But, he can also go to site.com/app/some/sub/path; treating your program like a virtual folder. In this case, pathInfo == "/some/sub/path".

immutable immutable(char[]) scriptName;
The full base path of your program, as seen by the user. If your program is located at site.com/programs/apps, scriptName == "/programs/apps".

immutable immutable(char[]) authorization;
The full authorization string from the header, undigested. Useful for implementing auth schemes such as OAuth 1.0. Note that some web servers do not forward this to the app without taking extra steps. See requireBasicAuth's comment for more info.

immutable immutable(char[]) accept;
The HTTP accept header is the user agent telling what content types it is willing to accept. This is often */*; they accept everything, so it's not terribly useful. (The similar sounding Accept-Encoding header is handled automatically for chunking and gzipping. Simply set gzipResponse = true and cgi.d handles the details, zipping if the user's browser is willing to accept it.

immutable immutable(char[]) lastEventId;
The HTML 5 draft includes an EventSource() object that connects to the server, and remains open to take a stream of events. My arsd.rtud module can help with the server side part of that. The Last-Event-Id http header is defined in the draft to help handle loss of connection. When the browser reconnects to you, it sets this header to the last event id it saw, so you can catch it up. This member has the contents of that header.

immutable immutable(RequestMethod) requestMethod;
The HTTP request verb: GET, POST, etc. It is represented as an enum in cgi.d (which, like many enums, you can convert back to string with std.conv.to()). A HTTP GET is supposed to, according to the spec, not have side effects; a user can GET something over and over again and always have the same result. On all requests, the get[] and getArray[] members may be filled in. The post[] and postArray[] members are only filled in on POST methods.

immutable immutable(char[]) queryString;
The unparsed content of the request query string - the stuff after the ? in your URL. See get[] and getArray[] for a parse view of it. Sometimes, the unparsed string is useful though if you want a custom format of data up there (probably not a good idea, unless it is really simple, like "?username" perhaps.)

immutable immutable(char[]) cookie;
The unparsed content of the Cookie: header in the request. See also the cookies[string] member for a parsed view of the data.

immutable immutable(char[]) referrer;
The Referer header from the request. (It is misspelled in the HTTP spec, and thus the actual request and cgi specs too, but I spelled the word correctly here because that's sane. The spec's misspelling is an implementation detail.) It contains the site url that referred the user to your program; the site that linked to you, or if you're serving images, the site that has you as an image. Also, if you're in an iframe, the referrer is the site that is framing you.

Important note: if the user copy/pastes your url, this is blank, and, just like with all other user data, their browsers can also lie to you. Don't rely on it for real security.

immutable immutable(char[]) requestUri;
The full url if the current request, excluding the protocol and host. requestUri == scriptName ~ pathInfo ~ (queryString.length ? "?" ~ queryString : "");

immutable immutable(char[]) remoteAddress;
The IP address of the user, as we see it. (Might not match the IP of the user's computer due to things like proxies and NAT.)

immutable bool https;
Was the request encrypted via https?

immutable int port;
On what TCP port number did the server receive the request?

immutable immutable(char[][string]) get;
Here come the parsed request variables - the things that come close to PHP's GET, POST, etc. superglobals in content. The data from your query string in the url, only showing the last string of each name. If you want to handle multiple values with the same name, use getArray. This only works right if the query string is x-www-form-urlencoded; the default you see on the web with name=value pairs separated by the & character.

immutable immutable(char[][string]) post;
The data from the request's body, on POST requests. It parses application/x-www-form-urlencoded data (used by most web requests, including typical forms), and multipart/form-data requests (used by file uploads on web forms) into the same container, so you can always access them the same way. It makes no attempt to parse other content types. If you want to accept an XML Post body (for a web api perhaps), you'll need to handle the raw data yourself.

immutable immutable(char[][string]) cookies;
Separates out the cookie header into individual name/value pairs (which is how you set them!)

immutable immutable(UploadedFile[string]) files;
Represents user uploaded files.

When making a file upload form, be sure to follow the standard: set method="POST" and enctype="multipart/form-data" in your html
tag attributes. The key into this array is the name attribute on your input tag, just like with other post variables. See the comments on the UploadedFile struct for more information about the data inside, including important notes on max size and content location.

immutable immutable(char[][][string]) getArray;
Use these if you expect multiple items submitted with the same name. btw, assert(get[name] is getArray[name][$-1); should pass. Same for post and cookies. the order of the arrays is the order the data arrives like get, but an array of values per name

immutable immutable(char[][][string]) postArray;
ditto for post

immutable immutable(char[][][string]) cookiesArray;
ditto for cookies

string makeDataUrl(string mimeType, in void[] data);
Makes a data:// uri that can be used as links in most newer browsers (IE8+).

struct Uri;
Represents a url that can be broken down or built up through properties

string scheme;
e.g. "http" in "http://example.com/"

string userinfo;
the username (and possibly a password) in the uri

string host;
the domain name

int port;
port number, if given. Will be zero if a port was not explicitly given

string path;
e.g. "/folder/file.html" in "http://example.com/folder/file.html"

string query;
the stuff after the ? in a uri

string fragment;
the stuff after the # in a uri.

this(string uri);
Breaks down a uri string to its components

const string toString();
Converts the broken down parts back into a complete string

const Uri basedOn(in Uri baseUrl);
Returns a new absolute Uri given a base. It treats this one as relative where possible, but absolute if not. (If protocol, domain, or other info is not set, the new one inherits it from the base.)

Browsers use a function like this to figure out links in html.

string[][string] decodeVariables(string data, string separator = "&");
breaks down a url encoded string

string[string] decodeVariablesSingle(string data);
breaks down a url encoded string, but only returns the last value of any array

string encodeVariables(in string[string] data);
url encodes the whole string

string encodeVariables(in string[][string] data);
url encodes a whole string

template GenericMain(alias fun,T...)
Use this instead of writing your own main

template CustomCgiMain(CustomCgi,alias fun,T...) if (is(CustomCgi : Cgi))
If you want to use a subclass of Cgi with generic main, use this mixin.

class ListeningConnectionManager;
To use this thing:

auto manager = new ListeningConnectionManager(80); foreach(connection; manager) { // work with connection // note: each connection may get its own thread, so this is a kind of concurrent foreach.

// this can have implications if you access local variables in the function, as they are // implicitly shared!

// FIXME: break does not work }

I suggest you use BufferedInputRange(connection) to handle the input. As a packet comes in, you will get control. You can just continue; though to fetch more.

FIXME:
should I offer an event based async thing like netman did too? Yeah, probably.