arsd.cgi
Provides a uniform server-side API for CGI, FastCGI, SCGI, and HTTP web applications.import arsd.cgi; // Instead of writing your own main(), you should write a function // that takes a Cgi param, and use mixin GenericMain // for maximum compatibility with different web servers. void hello(Cgi cgi) { cgi.write("Hello, world!"); } mixin GenericMain!hello;Concepts:
Input:
get, post, request(), files, cookies, pathInfo, requestMethod, and HTTP headers (headers, userAgent, referrer, accept, authorization, lastEventId Output:
cgi.write(), cgi.header(), cgi.setResponseStatus, cgi.setResponseContentType, gzipResponse Cookies:
setCookie, clearCookie, cookie, cookies Caching:
cgi.setResponseExpires, cgi.updateResponseExpires, cgi.setCache Redirections:
cgi.setResponseLocation Other Information: remoteAddress, https, port, scriptName, requestUri, getCurrentCompleteUri, onRequestBodyDataReceived Overriding behavior: handleIncomingDataChunk, prepareForIncomingDataChunks, cleanUpPostDataState Installing:
Apache, IIS, CGI, FastCGI, SCGI, embedded HTTPD (not recommended for production use) Guide for PHP users:
If you are coming from PHP, here's a quick guide to help you get started: $GET["var"] == cgi.get["var"] $POST["var"] == cgi.post["var"] $COOKIE["var"] == cgi.cookies["var"] In PHP, you can give a form element a name like "something[]", and then $POST["something"] gives an array. In D, you can use whatever name you want, and access an array of values with the cgi.getArray["name"] and cgi.postArray["name"] members. echo("hello"); == cgi.write("hello"); $SERVER["REMOTE_ADDR"] == cgi.remoteAddress $SERVER["HTTP_HOST"] == cgi.host See Also:
You may also want to see dom.d, web.d, and html.d for more code for making web applications. database.d, mysql.d, postgres.d, and sqlite.d can help in accessing databases. If you are looking to access a web application via HTTP, try curl.d.
Class / struct Name | Description |
---|---|
Cgi | The main interface with the web request |
ListeningConnectionManager | To use this thing: |
Uri | Represents a url that can be broken down or built up through properties |
Function Name | Description |
---|---|
CustomCgiMain | If you want to use a subclass of Cgi with generic main, use this mixin. |
ForwardCgiConstructors | If you are doing a custom cgi class, mixing this in can take care of the required constructors for you |
GenericMain | Use this instead of writing your own main |
decodeVariables | breaks down a url encoded string |
decodeVariablesSingle | breaks down a url encoded string, but only returns the last value of any array |
encodeVariables | url encodes the whole string |
makeDataUrl | Makes a data:// uri that can be used as links in most newer browsers (IE8+). |
- If you are doing a custom cgi class, mixing this in can take care of the required constructors for you
- The main interface with the web request
Member Quick Reference Member Name Description RequestMethod the methods a request can be UploadedFile This represents a file the user uploaded via a POST request. accept The HTTP accept header is the user agent telling what content types it is willing to accept. This is often */*; they accept everything, so it's not terribly useful. (The similar sounding Accept-Encoding header is handled automatically for chunking and gzipping. Simply set gzipResponse = true and cgi.d handles the details, zipping if the user's browser is willing to accept it. authorization The full authorization string from the header, undigested. Useful for implementing auth schemes such as OAuth 1.0. Note that some web servers do not forward this to the app without taking extra steps. See requireBasicAuth's comment for more info. clearCookie Clears a previously set cookie with the given name, path, and domain. close Flushes the buffers to the network, signifying that you are done. You should always call this explicitly when you are done outputting data. content The actual content of the file, if contentInMemory == true contentFilename the file where we dumped the content, if contentInMemory == false. Note that if you want to keep it, you MUST move the file, since otherwise it is considered garbage when cgi is disposed. contentInMemory For small files, cgi.d will buffer the uploaded file in memory, and make it directly accessible to you through the content member. I find this very convenient and somewhat efficient, since it can avoid hitting the disk entirely. (I often want to inspect and modify the file anyway!) contentType The MIME type the user's browser reported. (Not reliable.) cookie The unparsed content of the Cookie: header in the request. See also the cookies[string] member for a parsed view of the data. cookies Separates out the cookie header into individual name/value pairs (which is how you set them!) cookiesArray ditto for cookies dispose Cleans up any temporary files. Do not use the object after calling this. filename The filename the user set. files Represents user uploaded files. fromData If you want to create one of these structs for yourself from some data, use this function. get Here come the parsed request variables - the things that come close to PHP's GET, POST, etc. superglobals in content. The data from your query string in the url, only showing the last string of each name. If you want to handle multiple values with the same name, use getArray. This only works right if the query string is x-www-form-urlencoded; the default you see on the web with name=value pairs separated by the & character. getArray Use these if you expect multiple items submitted with the same name. btw, assert(get[name] is getArray[name][$-1); should pass. Same for post and cookies. the order of the arrays is the order the data arrives like get, but an array of values per name getCurrentCompleteUri This gets a full url for the current request, including port, protocol, host, path, and query gzipResponse Set to true and use cgi.write(data, true); to send a gzipped response to browsers who can accept it header Adds a custom header. It should be the name: value, but without any line terminator. For example: header("X-My-Header: Some value"); Note you should use the specialized functions in this object if possible to avoid duplicates in the output. host The hostname in the request. If one program serves multiple domains, you can use this to differentiate between them. https Was the request encrypted via https? isClosed Is the output already closed? lastEventId The HTML 5 draft includes an EventSource() object that connects to the server, and remains open to take a stream of events. My arsd.rtud module can help with the server side part of that. The Last-Event-Id http header is defined in the draft to help handle loss of connection. When the browser reconnects to you, it sets this header to the last event id it saw, so you can catch it up. This member has the contents of that header. name The name of the form element. onRequestBodyDataReceived you can override this function to somehow react to an upload in progress. pathInfo This is any stuff sent after your program's name on the url, but before the query string. For example, suppose your program is named "app". If the user goes to site.com/app, pathInfo is empty. But, he can also go to site.com/app/some/sub/path; treating your program like a virtual folder. In this case, pathInfo == "/some/sub/path". port On what TCP port number did the server receive the request? post The data from the request's body, on POST requests. It parses application/x-www-form-urlencoded data (used by most web requests, including typical forms), and multipart/form-data requests (used by file uploads on web forms) into the same container, so you can always access them the same way. It makes no attempt to parse other content types. If you want to accept an XML Post body (for a web api perhaps), you'll need to handle the raw data yourself. postArray ditto for post queryString The unparsed content of the request query string - the stuff after the ? in your URL. See get[] and getArray[] for a parse view of it. Sometimes, the unparsed string is useful though if you want a custom format of data up there (probably not a good idea, unless it is really simple, like "?username" perhaps.) referrer The Referer header from the request. (It is misspelled in the HTTP spec, and thus the actual request and cgi specs too, but I spelled the word correctly here because that's sane. The spec's misspelling is an implementation detail.) It contains the site url that referred the user to your program; the site that linked to you, or if you're serving images, the site that has you as an image. Also, if you're in an iframe, the referrer is the site that is framing you. remoteAddress The IP address of the user, as we see it. (Might not match the IP of the user's computer due to things like proxies and NAT.) request Gets a request variable as a specific type, or the default value of it isn't there or isn't convertible to the request type. requestHeaders What follows is data gotten from the HTTP request. It is all fully immutable, partially because it logically is (your code doesn't change what the user requested...) and partially because I hate how bad programs in PHP change those superglobals to do all kinds of hard to follow ugliness. I don't want that to ever happen in D. requestMethod The HTTP request verb: GET, POST, etc. It is represented as an enum in cgi.d (which, like many enums, you can convert back to string with std.conv.to()). A HTTP GET is supposed to, according to the spec, not have side effects; a user can GET something over and over again and always have the same result. On all requests, the get[] and getArray[] members may be filled in. The post[] and postArray[] members are only filled in on POST methods. requestUri The full url if the current request, excluding the protocol and host. requestUri == scriptName ~ pathInfo ~ (queryString.length ? "?" ~ queryString : ""); requireBasicAuth Very simple method to require a basic auth username and password. If the http request doesn't include the required credentials, it throws a HTTP 401 error, and an exception. scriptName The full base path of your program, as seen by the user. If your program is located at site.com/programs/apps, scriptName == "/programs/apps". setCache Very simple caching controls - setCache(false) means it will never be cached. Good for rapidly updated or sensitive sites. setCache(true) means it will always be cached for as long as possible. Best for static content. Use setResponseExpires and updateResponseExpires for more control setCookie Sets an HTTP cookie, automatically encoding the data to the correct string. expiresIn is how many milliseconds in the future the cookie will expire. setResponseContentType Sets the content type of the response, for example "text/html" (the default) for HTML, or "image/png" for a PNG image setResponseExpires Sets the Expires: http header. See also: updateResponseExpires, setPublicCaching The parameter is in unix_timestamp * 1000. Try setResponseExpires(getUTCtime() + SOME AMOUNT) for normal use. setResponseLocation Sets the location header, which the browser will redirect the user to automatically. Note setResponseLocation() must be called *before* you write() any data to the output. The optional important argument is used if it's a default suggestion rather than something to insist upon. setResponseStatus Sets the HTTP status of the response. For example, "404 File Not Found" or "500 Internal Server Error". It assumes "200 OK", and automatically changes to "302 Found" if you call setResponseLocation(). Note setResponseStatus() must be called *before* you write() any data to the output. updateResponseExpires This is like setResponseExpires, but it can be called multiple times. The setting most in the past is the one kept. If you have multiple functions, they all might call updateResponseExpires about their own return value. The program output as a whole is as cacheable as the least cachable part in the chain. setCache(false) always overrides this - it is, by definition, the strictest anti-cache statement available. If your site outputs sensitive user data, you should probably call setCache(false) when you do, to ensure no other functions will cache the content, as it may be a privacy risk. Conversely, setting here overrides setCache(true), since any expiration date is in the past of infinity. userAgent The browser's user-agent string. Can be used to identify the browser. write Writes the data to the output, flushing headers if they have not yet been sent. - the methods a request can be
- this(long maxContentLength = cast(long)5000000, const(immutable(char)[][string]) env = null, const(ubyte)[] delegate() readdata = null, void delegate(const(ubyte)[]) _rawDataOutput = null, void delegate() _flush = null);
- Initializes it using a CGI or CGI-like interface
- Cleans up any temporary files. Do not use the object
after calling this.
NOTE:
it is called automatically by GenericMain - you can override this function to somehow react to an upload in progress. Take note that parts of the CGI object is not yet initialized! Stuff from HTTP headers, including get[], is usable. But, none of post[] is usable, and you cannot write here. That's why this method is const - mutating the object won't do much anyway. My idea here was so you can output a progress bar or something to a cooperative client (see arsd.rtud for a potential helper) The default is to do nothing. Subclass cgi and use the CustomCgiMain mixin to do something here.
- this(BufferedInputRange ir, bool* closeConnection);
- Initializes the cgi from completely raw HTTP data. The ir must have a Socket source. *closeConnection will be set to true if you should close the connection after handling this request
- this(BufferedInputRange inputData, string address, ushort _port, int pathInfoStarts = 0, bool _https = false, void delegate(const(ubyte)[]) _rawDataOutput = null, void delegate() _flush = null, bool* closeConnection = null);
- Initializes it from raw HTTP request data. GenericMain uses this when you compile with -version=embedded_httpd.
NOTE:
If you are behind a reverse proxy, the values here might not be what you expect.... FIXME somehow. Parameters:BufferedInputRange inputData the incoming data, including headers and other raw http data. When the constructor exits, it will leave this range exactly at the start of the next request on the connection (if there is one). string address the IP address of the remote user ushort _port the port number of the connection int pathInfoStarts the offset into the path component of the http header where the SCRIPT_NAME ends and the PATH_INFO begins. bool _https if this connection is encrypted (note that the input data must not actually be encrypted) void delegate(const(ubyte)[]) _rawDataOutput delegate to accept response data. It should write to the socket or whatever; Cgi does all the needed processing to speak http. void delegate() _flush if rawDataOutput buffers, this delegate should flush the buffer down the wire bool* closeConnection if the request asks to close the connection, *closeConnection == true. - This represents a file the user uploaded via a POST request.
- If you want to create one of these structs for yourself from some data, use this function.
- The name of the form element.
- The filename the user set.
- The MIME type the user's browser reported. (Not reliable.)
- For small files, cgi.d will buffer the uploaded file in memory, and make it
directly accessible to you through the content member. I find this very convenient
and somewhat efficient, since it can avoid hitting the disk entirely. (I
often want to inspect and modify the file anyway!)
I find the file is very large, it is undesirable to eat that much memory just
for a file buffer. In those cases, if you pass a large enough value for maxContentLength
to the constructor so they are accepted, cgi.d will write the content to a temporary
file that you can re-read later.
You can override this behavior by subclassing Cgi and overriding the protected
handlePostChunk method. Note that the object is not initialized when you
write that method - the http headers are available, but the cgi.post method
is not. You may parse the file as it streams in using this method.
Anyway, if the file is small enough to be in memory, contentInMemory will be
set to true, and the content is available in the content member.
If not, contentInMemory will be set to false, and the content saved in a file,
whose name will be available in the contentFilename member.
Tip:
if you know you are always dealing with small files, and want the convenience of ignoring this member, construct Cgi with a small maxContentLength. Then, if a large file comes in, it simply throws an exception (and HTTP error response) instead of trying to handle it. The default value of maxContentLength in the constructor is for small files. - The actual content of the file, if contentInMemory == true
- the file where we dumped the content, if contentInMemory == false. Note that if you want to keep it, you MUST move the file, since otherwise it is considered garbage when cgi is disposed.
- Very simple method to require a basic auth username and password.
If the http request doesn't include the required credentials, it throws a
HTTP 401 error, and an exception.
Note:
basic auth does not provide great security, especially over unencrypted HTTP; the user's credentials are sent in plain text on every request. If you are using Apache, the HTTP_AUTHORIZATION variable may not be sent to the application. Either use Apache's built in methods for basic authentication, or add something along these lines to your server configuration: RewriteEngine On RewriteCond %{HTTP:Authorization} ^(.*) RewriteRule ^(.*) - [E=HTTP_AUTHORIZATION:%1] To ensure the necessary data is available to cgi.d. - Very simple caching controls - setCache(false) means it will never be cached. Good for rapidly updated or sensitive sites. setCache(true) means it will always be cached for as long as possible. Best for static content. Use setResponseExpires and updateResponseExpires for more control
- Set to true and use cgi.write(data, true); to send a gzipped response to browsers who can accept it
- This gets a full url for the current request, including port, protocol, host, path, and query
- Sets the HTTP status of the response. For example, "404 File Not Found" or "500 Internal Server Error". It assumes "200 OK", and automatically changes to "302 Found" if you call setResponseLocation(). Note setResponseStatus() must be called *before* you write() any data to the output.
- Sets the location header, which the browser will redirect the user to automatically. Note setResponseLocation() must be called *before* you write() any data to the output. The optional important argument is used if it's a default suggestion rather than something to insist upon.
- Sets the Expires: http header. See also: updateResponseExpires, setPublicCaching
The parameter is in unix_timestamp * 1000. Try setResponseExpires(getUTCtime() + SOME AMOUNT) for normal use.
Note:
the when parameter is different than setCookie's expire parameter. - This is like setResponseExpires, but it can be called multiple times. The setting most in the past is the one kept. If you have multiple functions, they all might call updateResponseExpires about their own return value. The program output as a whole is as cacheable as the least cachable part in the chain. setCache(false) always overrides this - it is, by definition, the strictest anti-cache statement available. If your site outputs sensitive user data, you should probably call setCache(false) when you do, to ensure no other functions will cache the content, as it may be a privacy risk. Conversely, setting here overrides setCache(true), since any expiration date is in the past of infinity.
- Sets an HTTP cookie, automatically encoding the data to the correct string.
expiresIn is how many milliseconds in the future the cookie will expire.
TIP:
to make a cookie accessible from subdomains, set the domain to .yourdomain.com. Note setCookie() must be called *before* you write() any data to the output. - Clears a previously set cookie with the given name, path, and domain.
- Sets the content type of the response, for example "text/html" (the default) for HTML, or "image/png" for a PNG image
- Adds a custom header. It should be the name: value, but without any line terminator. For example: header("X-My-Header: Some value"); Note you should use the specialized functions in this object if possible to avoid duplicates in the output.
- Writes the data to the output, flushing headers if they have not yet been sent.
- Flushes the buffers to the network, signifying that you are done. You should always call this explicitly when you are done outputting data.
- Gets a request variable as a specific type, or the default value of it isn't there
or isn't convertible to the request type.
Checks both GET and POST variables, preferring the POST variable, if available.
A nice trick is using the default value to choose the type:
/* The return value will match the type of the default. Here, I gave 10 as a default, so the return value will be an int. If the user-supplied value cannot be converted to the requested type, you will get the default value back. */ int a = cgi.request("number", 10); if(cgi.get["number"] == "11") assert(a == 11); // conversion succeeds if("number" !in cgi.get) assert(a == 10); // no value means you can't convert - give the default if(cgi.get["number"] == "twelve") assert(a == 10); // conversion from string to int would fail, so we get the default
You can use an enum as an easy whitelist, too:enum Operations { add, remove, query } auto op = cgi.request("op", Operations.query); if(cgi.get["op"] == "add") assert(op == Operations.add); if(cgi.get["op"] == "remove") assert(op == Operations.remove); if(cgi.get["op"] == "query") assert(op == Operations.query); if(cgi.get["op"] == "random string") assert(op == Operations.query); // the value can't be converted to the enum, so we get the default
- Is the output already closed?
- What follows is data gotten from the HTTP request. It is all fully immutable, partially because it logically is (your code doesn't change what the user requested...) and partially because I hate how bad programs in PHP change those superglobals to do all kinds of hard to follow ugliness. I don't want that to ever happen in D. For some of these, you'll want to refer to the http or cgi specs for more details. All the raw headers in the request as name/value pairs. The name is stored as all lower case, but otherwise the same as it is in HTTP; words separated by dashes. For example, "cookie" or "accept-encoding". Many HTTP headers have specialized variables below for more convenience and static name checking; you should generally try to use them.
- The hostname in the request. If one program serves multiple domains, you can use this to differentiate between them.
- The browser's user-agent string. Can be used to identify the browser.
- This is any stuff sent after your program's name on the url, but before the query string. For example, suppose your program is named "app". If the user goes to site.com/app, pathInfo is empty. But, he can also go to site.com/app/some/sub/path; treating your program like a virtual folder. In this case, pathInfo == "/some/sub/path".
- The full base path of your program, as seen by the user. If your program is located at site.com/programs/apps, scriptName == "/programs/apps".
- The full authorization string from the header, undigested. Useful for implementing auth schemes such as OAuth 1.0. Note that some web servers do not forward this to the app without taking extra steps. See requireBasicAuth's comment for more info.
- The HTTP accept header is the user agent telling what content types it is willing to accept. This is often */*; they accept everything, so it's not terribly useful. (The similar sounding Accept-Encoding header is handled automatically for chunking and gzipping. Simply set gzipResponse = true and cgi.d handles the details, zipping if the user's browser is willing to accept it.
- The HTML 5 draft includes an EventSource() object that connects to the server, and remains open to take a stream of events. My arsd.rtud module can help with the server side part of that. The Last-Event-Id http header is defined in the draft to help handle loss of connection. When the browser reconnects to you, it sets this header to the last event id it saw, so you can catch it up. This member has the contents of that header.
- The HTTP request verb: GET, POST, etc. It is represented as an enum in cgi.d (which, like many enums, you can convert back to string with std.conv.to()). A HTTP GET is supposed to, according to the spec, not have side effects; a user can GET something over and over again and always have the same result. On all requests, the get[] and getArray[] members may be filled in. The post[] and postArray[] members are only filled in on POST methods.
- The unparsed content of the request query string - the stuff after the ? in your URL. See get[] and getArray[] for a parse view of it. Sometimes, the unparsed string is useful though if you want a custom format of data up there (probably not a good idea, unless it is really simple, like "?username" perhaps.)
- The unparsed content of the Cookie: header in the request. See also the cookies[string] member for a parsed view of the data.
- The Referer header from the request. (It is misspelled in the HTTP spec, and thus the actual request and cgi specs too, but I spelled the word correctly here because that's sane. The spec's misspelling is an implementation detail.) It contains the site url that referred the user to your program; the site that linked to you, or if you're serving images, the site that has you as an image. Also, if you're in an iframe, the referrer is the site that is framing you. Important note: if the user copy/pastes your url, this is blank, and, just like with all other user data, their browsers can also lie to you. Don't rely on it for real security.
- The full url if the current request, excluding the protocol and host. requestUri == scriptName ~ pathInfo ~ (queryString.length ? "?" ~ queryString : "");
- The IP address of the user, as we see it. (Might not match the IP of the user's computer due to things like proxies and NAT.)
- Was the request encrypted via https?
- On what TCP port number did the server receive the request?
- Here come the parsed request variables - the things that come close to PHP's GET, POST, etc. superglobals in content. The data from your query string in the url, only showing the last string of each name. If you want to handle multiple values with the same name, use getArray. This only works right if the query string is x-www-form-urlencoded; the default you see on the web with name=value pairs separated by the & character.
- The data from the request's body, on POST requests. It parses application/x-www-form-urlencoded data (used by most web requests, including typical forms), and multipart/form-data requests (used by file uploads on web forms) into the same container, so you can always access them the same way. It makes no attempt to parse other content types. If you want to accept an XML Post body (for a web api perhaps), you'll need to handle the raw data yourself.
- Separates out the cookie header into individual name/value pairs (which is how you set them!)
- Represents user uploaded files. When making a file upload form, be sure to follow the standard: set method="POST" and enctype="multipart/form-data" in your html
- Use these if you expect multiple items submitted with the same name. btw, assert(get[name] is getArray[name][$-1); should pass. Same for post and cookies. the order of the arrays is the order the data arrives like get, but an array of values per name
- ditto for post
- ditto for cookies
- Makes a data:// uri that can be used as links in most newer browsers (IE8+).
- Represents a url that can be broken down or built up through properties
Member Quick Reference Member Name Description basedOn Returns a new absolute Uri given a base. It treats this one as relative where possible, but absolute if not. (If protocol, domain, or other info is not set, the new one inherits it from the base.) fragment the stuff after the # in a uri. host the domain name path e.g. "/folder/file.html" in "http://example.com/folder/file.html" port port number, if given. Will be zero if a port was not explicitly given query the stuff after the ? in a uri scheme e.g. "http" in "http://example.com/" toString Converts the broken down parts back into a complete string userinfo the username (and possibly a password) in the uri - e.g. "http" in "http://example.com/"
- the username (and possibly a password) in the uri
- the domain name
- port number, if given. Will be zero if a port was not explicitly given
- e.g. "/folder/file.html" in "http://example.com/folder/file.html"
- the stuff after the ? in a uri
- the stuff after the # in a uri.
- this(string uri);
- Breaks down a uri string to its components
- Converts the broken down parts back into a complete string
- Returns a new absolute Uri given a base. It treats this one as relative where possible, but absolute if not. (If protocol, domain, or other info is not set, the new one inherits it from the base.) Browsers use a function like this to figure out links in html.
- breaks down a url encoded string
- breaks down a url encoded string, but only returns the last value of any array
- url encodes the whole string
- url encodes a whole string
- Use this instead of writing your own main
- If you want to use a subclass of Cgi with generic main, use this mixin.
- To use this thing:
auto manager = new ListeningConnectionManager(80);
foreach(connection; manager) {
// work with connection
// note: each connection may get its own thread, so this is a kind of concurrent foreach.
// this can have implications if you access local variables in the function, as they are
// implicitly shared!
// FIXME: break does not work
}
I suggest you use BufferedInputRange(connection) to handle the input. As a packet
comes in, you will get control. You can just continue; though to fetch more.
FIXME:
should I offer an event based async thing like netman did too? Yeah, probably.