arsd.cgi

Provides a uniform server-side API for CGI, FastCGI, SCGI, and HTTP web applications.

import arsd.cgi;

// Instead of writing your own main(), you should write a function
// that takes a Cgi param, and use mixin GenericMain
// for maximum compatibility with different web servers.
void hello(Cgi cgi) {
	cgi.setResponseContentType("text/plain");

	if("name" in cgi.get)
		cgi.write("Hello, " ~ cgi.get["name"]);
	else
		cgi.write("Hello, world!");
}

mixin GenericMain!hello;

Compile and run:
For CGI,

dmd yourfile.d cgi.d

then put the executable in your cgi-bin directory. For FastCGI:

dmd yourfile.d cgi.d -version=fastcgi

and run it. spawn-fcgi helps on nginx. You can put the file in the directory for Apache. On IIS, run it with a port on the command line. For SCGI:

dmd yourfile.d cgi.d -version=scgi

and run the executable, providing a port number on the command line. For an embedded HTTP server, run

dmd yourfile.d cgi.d -version=embedded_httpd

and run the generated program. It listens on port 8085 by default. You can change this on the command line with the --port option when running your program.

You can also simulate a request by passing parameters on the command line, like:

./yourprogram GET / name=adr

And it will print the result to stdout.

CGI Setup tips:
On Apache, you may do

SetHandler cgi-script

in your

.htaccess

file.

Integration tips:
cgi.d works well with dom.d for generating html. You may also use web.d for other utilities and automatic api wrapping.

dom.d usage:

	import arsd.cgi;
	import arsd.dom;

	void hello_dom(Cgi cgi) {
		auto document = new Document();

		static import std.file;
		// parse the file in strict mode, requiring it to be well-formed UTF-8 XHTML
		// (You'll appreciate this if you've ever had to deal with a missing </div>
		// or something in a php or erb template before that would randomly mess up
		// the output in your browser. Just check it and throw an exception early!)
		//
		// You could also hard-code a template or load one at compile time with an
		// import expression, but you might appreciate making it a regular file
		// because that means it can be more easily edited by the frontend team and
		// they can see their changes without needing to recompile the program.
		//
		// Note on CTFE: if you do choose to load a static file at compile time,
		// you *can* parse it in CTFE using enum, which will cause it to throw at
		// compile time, which is kinda cool too. Be careful in modifying that document,
		// though, as it will be a static instance. You might want to clone on on demand,
		// or perhaps modify it lazily as you print it out. (Try element.tree, it returns
		// a range of elements which you could send through std.algorithm functions. But
		// since my selector implementation doesn't work on that level yet, you'll find that
		// harder to use. Of course, you could make a static list of matching elements and
		// then use a simple e is e2 predicate... :) $(RPAREN)
		document.parseUtf8(std.file.read("your_template.html"), true, true);

		// fill in data using DOM functions, so placing it is in the hands of HTML
		// and it will be properly encoded as text too.
		//
		// Plain html templates can't run server side logic, but I think that's a
		// good thing - it keeps them simple. You may choose to extend the html,
		// but I think it is best to try to stick to standard elements and fill them
		// in with requested data with IDs or class names. A further benefit of
		// this is the designer can also highlight data based on sources in the CSS.
		//
		// However, all of dom.d is available, so you can format your data however
		// you like. You can do partial templates with innerHTML too, or perhaps better,
		// injecting cloned nodes from a partial document.
		//
		// There's a lot of possibilities.
		document["#name"].innerText = cgi.request("name", "default name");

		// send the document to the browser. The second argument to `cgi.write`
		// indicates that this is all the data at once, enabling a few small
		// optimizations.
		cgi.write(document.toString(), true);
	}

Concepts:

Input:
get, post, request(), files, cookies, pathInfo, requestMethod, and HTTP headers (headers, userAgent, referrer, accept, authorization, lastEventId

Output:
cgi.write(), cgi.header(), cgi.setResponseStatus, cgi.setResponseContentType, gzipResponse

Cookies:
setCookie, clearCookie, cookie, cookies

Caching:
cgi.setResponseExpires, cgi.updateResponseExpires, cgi.setCache

Redirections:
cgi.setResponseLocation Other Information: remoteAddress, https, port, scriptName, requestUri, getCurrentCompleteUri, onRequestBodyDataReceived Overriding behavior: handleIncomingDataChunk, prepareForIncomingDataChunks, cleanUpPostDataState

Installing:
Apache, IIS, CGI, FastCGI, SCGI, embedded HTTPD (not recommended for production use)

Guide for PHP users:
If you are coming from PHP, here's a quick guide to help you get started:

$GET["var"] == cgi.get["var"] $POST["var"] == cgi.post["var"] $COOKIE["var"] == cgi.cookies["var"]

In PHP, you can give a form element a name like "something[]", and then $POST["something"] gives an array. In D, you can use whatever name you want, and access an array of values with the cgi.getArray["name"] and cgi.postArray["name"] members.

echo("hello"); == cgi.write("hello");

$SERVER["REMOTE_ADDR"] == cgi.remoteAddress $SERVER["HTTP_HOST"] == cgi.host

See Also:
You may also want to see dom.d, web.d, and html.d for more code for making web applications. database.d, mysql.d, postgres.d, and sqlite.d can help in accessing databases.

If you are looking to access a web application via HTTP, try curl.d.

template ForwardCgiConstructors()

If you are doing a custom cgi class, mixing this in can take care of

the required constructors for you

class Cgi;

The main interface with the web request

enum RequestMethod: int;

the methods a request can be

this(string[] args);

Initializes it with command line arguments (for easy testing)

this(long maxContentLength = defaultMaxContentLength, in string[string] env = null, const(ubyte)[] delegate() readdata = null, void delegate(const(ubyte)[]) _rawDataOutput = null, void delegate() _flush = null);

Initializes it using a CGI or CGI-like interface

void dispose();

Cleans up any temporary files. Do not use the object

after calling this.

NOTE:
it is called automatically by GenericMain

struct UploadedFile;

This represents a file the user uploaded via a POST request.

static UploadedFile fromData(immutable(void)[] data, string name = null);: If you want to create one of these structs for yourself from some data,

use this function.
string name;: The name of the form element.
string filename;: The filename the user set.
string contentType;: The MIME type the user's browser reported. (Not reliable.)
bool contentInMemory;: For small files, cgi.d will buffer the uploaded file in memory, and make it directly accessible to you through the content member. I find this very convenient and somewhat efficient, since it can avoid hitting the disk entirely. (I often want to inspect and modify the file anyway!)

I find the file is very large, it is undesirable to eat that much memory just for a file buffer. In those cases, if you pass a large enough value for maxContentLength to the constructor so they are accepted, cgi.d will write the content to a temporary file that you can re-read later.

You can override this behavior by subclassing Cgi and overriding the protected handlePostChunk method. Note that the object is not initialized when you write that method - the http headers are available, but the cgi.post method is not. You may parse the file as it streams in using this method.

Anyway, if the file is small enough to be in memory, contentInMemory will be set to true, and the content is available in the content member.

If not, contentInMemory will be set to false, and the content saved in a file, whose name will be available in the contentFilename member.

Tip:
if you know you are always dealing with small files, and want the convenience of ignoring this member, construct Cgi with a small maxContentLength. Then, if a large file comes in, it simply throws an exception (and HTTP error response) instead of trying to handle it.

The default value of maxContentLength in the constructor is for small files.
immutable(ubyte)[] content;: The actual content of the file, if contentInMemory == true
string contentFilename;: the file where we dumped the content, if contentInMemory == false. Note that if you want to keep it, you MUST move the file, since otherwise it is considered garbage when cgi is disposed.

const void onRequestBodyDataReceived(size_t receivedSoFar, size_t totalExpected);

you can override this function to somehow react

to an upload in progress.

Take note that parts of the CGI object is not yet

initialized! Stuff from HTTP headers, including get[], is usable.

But, none of post[] is usable, and you cannot write here. That's

why this method is const - mutating the object won't do much anyway.

My idea here was so you can output a progress bar or

something to a cooperative client (see arsd.rtud for a potential helper)

The default is to do nothing. Subclass cgi and use the

CustomCgiMain mixin to do something here.

this(BufferedInputRange ir, bool* closeConnection);

Initializes the cgi from completely raw HTTP data. The ir must have a Socket source.

*closeConnection will be set to true if you should close the connection after handling this request

this(BufferedInputRange inputData, string address, ushort _port, int pathInfoStarts = 0, bool _https = false, void delegate(const(ubyte)[]) _rawDataOutput = null, void delegate() _flush = null, bool* closeConnection = null);

Initializes it from raw HTTP request data. GenericMain uses this when you compile with -version=embedded_httpd.

NOTE:
If you are behind a reverse proxy, the values here might not be what you expect.... FIXME somehow.

Params:

BufferedInputRange inputData	the incoming data, including headers and other raw http data. When the constructor exits, it will leave this range exactly at the start of the next request on the connection (if there is one).
string address	the IP address of the remote user
ushort _port	the port number of the connection
int pathInfoStarts	the offset into the path component of the http header where the SCRIPT_NAME ends and the PATH_INFO begins.
bool _https	if this connection is encrypted (note that the input data must not actually be encrypted)
void delegate(const(ubyte)[]) _rawDataOutput	delegate to accept response data. It should write to the socket or whatever; Cgi does all the needed processing to speak http.
void delegate() _flush	if rawDataOutput buffers, this delegate should flush the buffer down the wire
bool* closeConnection	if the request asks to close the connection, *closeConnection == true.

void requireBasicAuth(string user, string pass, string message = null);

Very simple method to require a basic auth username and password.

If the http request doesn't include the required credentials, it throws a

HTTP 401 error, and an exception.

Note:
basic auth does not provide great security, especially over unencrypted HTTP;

the user's credentials are sent in plain text on every request.

If you are using Apache, the HTTP_AUTHORIZATION variable may not be sent to the

application. Either use Apache's built in methods for basic authentication, or add

something along these lines to your server configuration:

RewriteEngine On

RewriteCond %{HTTP:Authorization} ^(.*)

RewriteRule ^(.*) - [E=HTTP_AUTHORIZATION:%1]

To ensure the necessary data is available to cgi.d.

void setCache(bool allowCaching);

Very simple caching controls - setCache(false) means it will never be cached. Good for rapidly updated or sensitive sites.

setCache(true) means it will always be cached for as long as possible. Best for static content.

Use setResponseExpires and updateResponseExpires for more control

bool gzipResponse;

Set to true and use cgi.write(data, true); to send a gzipped response to browsers

who can accept it

immutable bool isCalledWithCommandLineArguments;

Set to true if and only if this was initialized with command line arguments

const string getCurrentCompleteUri();

This gets a full url for the current request, including port, protocol, host, path, and query

const string logicalScriptName();

You can override this if your site base url isn't the same as the script name

void setResponseStatus(string status);

Sets the HTTP status of the response. For example, "404 File Not Found" or "500 Internal Server Error".

It assumes "200 OK", and automatically changes to "302 Found" if you call setResponseLocation().

Note setResponseStatus() must be called *before* you write() any data to the output.

bool canOutputHeaders();

Returns true if it is still possible to output headers

void setResponseLocation(string uri, bool important = true, string status = null);

Sets the location header, which the browser will redirect the user to automatically.

Note setResponseLocation() must be called *before* you write() any data to the output.

The optional important argument is used if it's a default suggestion rather than something to insist upon.

void setResponseExpires(long when, bool isPublic = false);

Sets the Expires: http header. See also: updateResponseExpires, setPublicCaching

The parameter is in unix_timestamp * 1000. Try setResponseExpires(getUTCtime() + SOME AMOUNT) for normal use.

Note:
the when parameter is different than setCookie's expire parameter.

void updateResponseExpires(long when, bool isPublic);

This is like setResponseExpires, but it can be called multiple times. The setting most in the past is the one kept.

If you have multiple functions, they all might call updateResponseExpires about their own return value. The program

output as a whole is as cacheable as the least cachable part in the chain.

setCache(false) always overrides this - it is, by definition, the strictest anti-cache statement available. If your site outputs sensitive user data, you should probably call setCache(false) when you do, to ensure no other functions will cache the content, as it may be a privacy risk.

Conversely, setting here overrides setCache(true), since any expiration date is in the past of infinity.

void setCookie(string name, string data, long expiresIn = 0, string path = null, string domain = null, bool httpOnly = false, bool secure = false);

Sets an HTTP cookie, automatically encoding the data to the correct string.

expiresIn is how many milliseconds in the future the cookie will expire.

TIP:
to make a cookie accessible from subdomains, set the domain to .yourdomain.com.

Note setCookie() must be called *before* you write() any data to the output.

void clearCookie(string name, string path = null, string domain = null);

Clears a previously set cookie with the given name, path, and domain.

void setResponseContentType(string ct);

Sets the content type of the response, for example "text/html" (the default) for HTML, or "image/png" for a PNG image

void header(string h);

Adds a custom header. It should be the name: value, but without any line terminator.

For example: header("X-My-Header: Some value");

Note you should use the specialized functions in this object if possible to avoid

duplicates in the output.

void write(const(void)[] t, bool isAll = false, bool maybeAutoClose = true);

Writes the data to the output, flushing headers if they have not yet been sent.

void close();

Flushes the buffers to the network, signifying that you are done.

You should always call this explicitly when you are done outputting data.

const nothrow T request(T = string)(in string name, in T def = T.init);

Gets a request variable as a specific type, or the default value of it isn't there or isn't convertible to the request type.

Checks both GET and POST variables, preferring the POST variable, if available.

A nice trick is using the default value to choose the type:

	/*
		The return value will match the type of the default.
		Here, I gave 10 as a default, so the return value will
		be an int.

		If the user-supplied value cannot be converted to the
		requested type, you will get the default value back.
	*/
	int a = cgi.request("number", 10);

	if(cgi.get["number"] == "11")
		assert(a == 11); // conversion succeeds

	if("number" !in cgi.get)
		assert(a == 10); // no value means you can't convert - give the default

	if(cgi.get["number"] == "twelve")
		assert(a == 10); // conversion from string to int would fail, so we get the default

You can use an enum as an easy whitelist, too:

	enum Operations {
		add, remove, query
	}

	auto op = cgi.request("op", Operations.query);

	if(cgi.get["op"] == "add")
		assert(op == Operations.add);
	if(cgi.get["op"] == "remove")
		assert(op == Operations.remove);
	if(cgi.get["op"] == "query")
		assert(op == Operations.query);

	if(cgi.get["op"] == "random string")
		assert(op == Operations.query); // the value can't be converted to the enum, so we get the default

const bool isClosed();

Is the output already closed?

immutable immutable(string[string]) requestHeaders;

What follows is data gotten from the HTTP request. It is all fully immutable, partially because it logically is (your code doesn't change what the user requested...) and partially because I hate how bad programs in PHP change those superglobals to do all kinds of hard to follow ugliness. I don't want that to ever happen in D.

For some of these, you'll want to refer to the http or cgi specs for more details.

All the raw headers in the request as name/value pairs. The name is stored as all lower case, but otherwise the same as it is in HTTP; words separated by dashes. For example, "cookie" or "accept-encoding". Many HTTP headers have specialized variables below for more convenience and static name checking; you should generally try to use them.

immutable immutable(string) host;

The hostname in the request. If one program serves multiple domains, you can use this to differentiate between them.

immutable immutable(string) origin;

The origin header in the request, if present. Some HTML5 cross-domain apis set this and you should check it on those cross domain requests and websockets.

immutable immutable(string) userAgent;

The browser's user-agent string. Can be used to identify the browser.

immutable immutable(string) pathInfo;

This is any stuff sent after your program's name on the url, but before the query string. For example, suppose your program is named "app". If the user goes to site.com/app, pathInfo is empty. But, he can also go to site.com/app/some/sub/path; treating your program like a virtual folder. In this case, pathInfo == "/some/sub/path".

immutable immutable(string) scriptName;

The full base path of your program, as seen by the user. If your program is located at site.com/programs/apps, scriptName == "/programs/apps".

immutable immutable(string) scriptFileName;

The physical filename of your script

immutable immutable(string) authorization;

The full authorization string from the header, undigested. Useful for implementing auth schemes such as OAuth 1.0. Note that some web servers do not forward this to the app without taking extra steps. See requireBasicAuth's comment for more info.

immutable immutable(string) accept;

The HTTP accept header is the user agent telling what content types it is willing to accept. This is often */*; they accept everything, so it's not terribly useful. (The similar sounding Accept-Encoding header is handled automatically for chunking and gzipping. Simply set gzipResponse = true and cgi.d handles the details, zipping if the user's browser is willing to accept it.

immutable immutable(string) lastEventId;

The HTML 5 draft includes an EventSource() object that connects to the server, and remains open to take a stream of events. My arsd.rtud module can help with the server side part of that. The Last-Event-Id http header is defined in the draft to help handle loss of connection. When the browser reconnects to you, it sets this header to the last event id it saw, so you can catch it up. This member has the contents of that header.

immutable immutable(RequestMethod) requestMethod;

The HTTP request verb: GET, POST, etc. It is represented as an enum in cgi.d (which, like many enums, you can convert back to string with std.conv.to()). A HTTP GET is supposed to, according to the spec, not have side effects; a user can GET something over and over again and always have the same result. On all requests, the get[] and getArray[] members may be filled in. The post[] and postArray[] members are only filled in on POST methods.

immutable immutable(string) queryString;

The unparsed content of the request query string - the stuff after the ? in your URL. See get[] and getArray[] for a parse view of it. Sometimes, the unparsed string is useful though if you want a custom format of data up there (probably not a good idea, unless it is really simple, like "?username" perhaps.)

immutable immutable(string) cookie;

The unparsed content of the Cookie: header in the request. See also the cookies[string] member for a parsed view of the data.

immutable immutable(string) referrer;

The Referer header from the request. (It is misspelled in the HTTP spec, and thus the actual request and cgi specs too, but I spelled the word correctly here because that's sane. The spec's misspelling is an implementation detail.) It contains the site url that referred the user to your program; the site that linked to you, or if you're serving images, the site that has you as an image. Also, if you're in an iframe, the referrer is the site that is framing you.

Important note: if the user copy/pastes your url, this is blank, and, just like with all other user data, their browsers can also lie to you. Don't rely on it for real security.

immutable immutable(string) requestUri;

The full url if the current request, excluding the protocol and host. requestUri == scriptName ~ pathInfo ~ (queryString.length ? "?" ~ queryString : "");

immutable immutable(string) remoteAddress;

The IP address of the user, as we see it. (Might not match the IP of the user's computer due to things like proxies and NAT.)

immutable bool https;

Was the request encrypted via https?

immutable int port;

On what TCP port number did the server receive the request?

immutable immutable(string[string]) get;

Here come the parsed request variables - the things that come close to PHP's GET, POST, etc. superglobals in content.

The data from your query string in the url, only showing the last string of each name. If you want to handle multiple values with the same name, use getArray. This only works right if the query string is x-www-form-urlencoded; the default you see on the web with name=value pairs separated by the & character.

immutable immutable(string[string]) post;

The data from the request's body, on POST requests. It parses application/x-www-form-urlencoded data (used by most web requests, including typical forms), and multipart/form-data requests (used by file uploads on web forms) into the same container, so you can always access them the same way. It makes no attempt to parse other content types. If you want to accept an XML Post body (for a web api perhaps), you'll need to handle the raw data yourself.

immutable immutable(string[string]) cookies;

Separates out the cookie header into individual name/value pairs (which is how you set them!)

immutable immutable(UploadedFile[][string]) filesArray;

Represents user uploaded files.

When making a file upload form, be sure to follow the standard: set method="POST" and enctype="multipart/form-data" in your html

immutable immutable(string[][string]) getArray;

Use these if you expect multiple items submitted with the same name. btw, assert(get[name] is getArray[name][$-1); should pass. Same for post and cookies.

the order of the arrays is the order the data arrives

like get, but an array of values per name

immutable immutable(string[][string]) postArray;

ditto for post

immutable immutable(string[][string]) cookiesArray;

ditto for cookies

Cgi dummyCgi(Cgi.RequestMethod method = Cgi.RequestMethod.GET, string url = null, in ubyte[] data = null, void delegate(const(ubyte)[]) outputSink = null);

use this for testing or other isolated things

string makeDataUrl(string mimeType, in void[] data);

Makes a data:// uri that can be used as links in most newer browsers (IE8+).

struct Uri;

Represents a url that can be broken down or built up through properties

string scheme;: e.g. "http" in "http://example.com/"
string userinfo;: the username (and possibly a password) in the uri
string host;: the domain name
int port;: port number, if given. Will be zero if a port was not explicitly given
string path;: e.g. "/folder/file.html" in "http://example.com/folder/file.html"
string query;: the stuff after the ? in a uri
string fragment;: the stuff after the # in a uri.
this(string uri);: Breaks down a uri string to its components
const string toString();: Converts the broken down parts back into a complete string
const Uri basedOn(in Uri baseUrl);: Returns a new absolute Uri given a base. It treats this one as

relative where possible, but absolute if not. (If protocol, domain, or

other info is not set, the new one inherits it from the base.)

Browsers use a function like this to figure out links in html.

string[][string] decodeVariables(string data, string separator = "&");

breaks down a url encoded string

string[string] decodeVariablesSingle(string data);

breaks down a url encoded string, but only returns the last value of any array

string encodeVariables(in string[string] data);

url encodes the whole string

string encodeVariables(in string[][string] data);

url encodes a whole string

string rawurlencode(in char[] data);

Encodes all but the explicitly unreserved characters per rfc 3986

Alphanumeric and -.~ are the only ones left unencoded

name is borrowed from php

template GenericMain(alias fun, long maxContentLength = defaultMaxContentLength)

Use this instead of writing your own main

template CustomCgiMain(CustomCgi, alias fun, long maxContentLength = defaultMaxContentLength) if (is(CustomCgi : Cgi))

If you want to use a subclass of Cgi with generic main, use this mixin.

class ListeningConnectionManager;

To use this thing:

void handler(Socket s) { do something... } auto manager = new ListeningConnectionManager("127.0.0.1", 80, &handler); manager.listen();

I suggest you use BufferedInputRange(connection) to handle the input. As a packet comes in, you will get control. You can just continue; though to fetch more.

FIXME:
should I offer an event based async thing like netman did too? Yeah, probably.