<?xml version="1.0" encoding="UTF-8"?>

<?oxygen RNGSchema="http://www.w3.org/XML/1998/06/schema/xmlspec.rng" type="xml"?>

<spec role="editors-copy" xmlns:fg="http://www.fgeorges.org/xmlspec">
   <header>
      <title>HTTP Client Module</title>
      <subtitle>EXPath Candidate Module 13 April 2009</subtitle>
      <w3c-designation>w3c-designation</w3c-designation>
      <w3c-doctype>Candidate</w3c-doctype>
      <pubdate>
         <day>13</day>
         <month>April</month>
         <year>2009</year>
      </pubdate>
      <publoc>
         <loc href="http://www.expath.org/modules/http-client-20090413.html"/>
      </publoc>
      <altlocs>
         <loc href="http://www.expath.org/modules/http-client-20090413.xml">XML</loc>
      </altlocs>
      <latestloc>
         <loc href="http://www.expath.org/modules/http-client.html"/>
      </latestloc>
      <prevlocs>
         <loc href="http://www.expath.org/modules/http-client-20090302.html"/>
         <loc href="http://www.expath.org/modules/http-client-20090112.html"/>
      </prevlocs>
      <authlist>
         <author>
            <name>Florent Georges</name>
            <affiliation>fgeorges.org</affiliation>
         </author>
      </authlist>
      <copyright>
         <p/>
      </copyright>
      <abstract>
         <p>
            This proposal provides an HTTP client interface for XPath
            2.0.  It defines one extension function to perform HTTP
            requests, and has been designed to be compatible with XQuery 1.0
            and XSLT 2.0, as well as any other XPath 2.0 usage.
         </p>
      </abstract>
      <status>
         <p>Must be ignored, but is required by the schema...</p>
      </status>
      <langusage>
         <language>langusage</language>
      </langusage>
      <revisiondesc>
         <p>revisiondesc</p>
      </revisiondesc>
   </header>
   <body>
      <div1>
         <head>Introduction</head>
         <p>TODO: Add @href to http:response, taking redirects into account.</p>
         <div2>
            <head>Namespace conventions</head>
            <p>
               The module defined by this document does define one function in the namespace
               <code>http://www.expath.org/mod/http-client</code>.  In this document, the
               <code>http</code> prefix, when used, is bound to this namespace URI.
            </p>
         </div2>
         <div2>
            <head>Error management</head>
            <p>
               Error conditions are identified by a code (a <code>QName</code>.)  When such
               an error condition is reached in the evaluation of an expression, an dynamic
               error is thrown, with the corresponding error code (as if the standard XPath
               function <code>error</code> had been called.)  TODO: Have not been defined yet.
            </p>
         </div2>
      </div1>
      <div1>
         <head>The <code>http:send-request</code> function</head>
         <p>
            This module defines an XPath extension function that sends an
            HTTP request and return the corresponding response.  It supports
            HTTP multi-part messages.  Here is the signature of this function:
            <eg>
<fg:function>http:send-request</fg:function>($uri as <fg:type>xs:string?</fg:type>,
                  $request as <fg:type>element(http:request)?</fg:type>,
                  $content as <fg:type>item()?</fg:type>,
                  $serial as <fg:type>item()?</fg:type>) as <fg:type>item()+</fg:type></eg>
         </p>
         <ulist>
            <item>
               <p>
                  <code>$uri</code> is the HTTP or HTTPS URI to send the request to.  It is an
                  xs:anyURI, but is declared as a string to be able to pass literal
                  strings (without requiring to explicitly cast it to an xs:anyURI.)
               </p>
            </item>
            <item>
               <p>
                  <code>$content</code> is the request body content, for HTTP methods that can
                  contain a body in the request (POST and PUT.)  This is an error if
                  this param is not the empty sequence for other methods (DELETE, GET,
                  HEAD and OPTIONS.)
               </p>
            </item>
            <item>
               <p>
                  <code>$request</code> contains the various parameters of the request, for instance
                  the HTTP method to use or the HTTP headers.  Among other things, it
                  can also contain the other param's values: the URI, the content and
                  the serialization option.  If they are not set as parameter to the
                  function, their value in $request, if any, is used instead.  See the
                  following section for the detailed definition of the http:request
                  element.
               </p>
            </item>
            <item>
               <p>
                  <code>$serial</code> defines the serialization option to serialize the content
                  to the HTTP request.  It can be either a serialization method (a string,
                  either 'xml', 'html', 'xhtml' or 'text',) the name of an output definition
                  (a string, which is the name of a named xsl:output instruction,) or an xsl:output
                  element itself.  The content is then serialized accordingly to the chosen method
                  or xsl:output regarding <bibref ref="xserial"/>.
               </p>
            </item>
         </ulist>
         <p>
            Besides the 4-params signature above, there are 3 other signatures that
            are convenient shortcuts (corresponding to the full version in which
            corresponding params have been set to the empty sequence.)  They are:
            <eg>
<fg:function>http:send-request</fg:function>($request as <fg:type>element(http:request)</fg:type>) as <fg:type>item()+</fg:type>
<fg:function>http:send-request</fg:function>($uri as <fg:type>xs:string?</fg:type>,
                  $request as <fg:type>element(http:request)?</fg:type>) as <fg:type>item()+</fg:type>
<fg:function>http:send-request</fg:function>($uri as <fg:type>xs:string?</fg:type>,
                  $request as <fg:type>element(http:request)?</fg:type>,
                  $content as <fg:type>item()?</fg:type>) as <fg:type>item()+</fg:type></eg>
         </p>
      </div1>
      <div1>
         <head>Sending a request</head>
         <p>
            The functions defined in this module make one able to send a request to an
            HTTP server and receive the corresponding response.  Here is how the request
            is represented by the parameters to this function, and how they are used
            to generate the actual HTTP request to send.
         </p>
         <div2>
            <head>The request elements</head>
            <p>
               The <code>http:request</code> element represents all the needed
               information to send the HTTP request.  So it is always possible
               to create such an element that will carry over all the needed info
               for a particular request.  For some of those values though, you
               can use an additional param instead.  For instance, some signatures
               define the parameter <code>$uri</code>.  If the value of this parameter
               is not the empty sequence, it will then be used instead of the value
               of the attribute <code>href</code> on the <code>http:request</code>
               element.
            </p>
            <eg>
&lt;http:request method = NCName
              href? = anyURI
              status-only? = boolean
              username? = string
              password? = string
              auth-method? = string
              send-authorization? = boolean
              override-content-type? = string
              follow-redirect? = boolean>
   (http:header*,
     (http:multipart|
      http:body)?)
&lt;/http:request></eg>
            <ulist>
               <item>
                  <p>
                     <code>method</code> is the HTTP verb to use, as GET, POST, etc.  It is case
                     insensitive
                  </p>
               </item>
               <item>
                  <p>
                     <code>href</code> is the URI the request has to be sent to.  It can be overridden
                     by the parameter <code>$uri</code>
                  </p>
               </item>
               <item>
                  <p>
                     <code>status-only</code> control how the response will look like; if it is
                     true, only the status code and the headers are returned, the content is not
                     (no http:body nor http:multipart, nor the interpreted additional value in the
                     returned sequence, see hereafter.)
                  </p>
               </item>
               <item>
                  <p>
                     <code>username</code>, <code>password</code>, <code>auth-method</code>
                     and <code>send-authorization</code> are used for authentication (see section
                     below.)
                  </p>
               </item>
               <item>
                  <p>
                     <code>override-content-type</code> is a MIME type.  It can be used only with
                     <code>http:request</code>, and will override the Content-Type header returned
                     by the server.
                  </p>
               </item>
               <item>
                  <p>
                     <code>follow-redirect</code> control whether an HTTP redirect is automatically
                     followed or not.  If it is false, the HTTP redirect is returned as the response.
                     If it is true (the default) the function tries to follow the redirect, by sending
                     the same request to the new address (including body, headers, and authentication
                     credentials.)  Maximum one redirect is followed (there is no attempt to follow
                     a redirect in response to following a first redirect.)
                  </p>
               </item>
               <item>
                  <p>
                     <code>http:header</code> represent an HTTP header, either in the
                     <code>http:request</code> or in the <code>http:response</code> elements, as
                     defined below.
                  </p>
               </item>
               <item>
                  <p>
                     <code>http:multipart</code> represents a multi-part body, either in a request
                     or a response, as defined below.
                  </p>
               </item>
               <item>
                  <p>
                     <code>http:body</code> represents a multi-part body, either in a request
                     or a response, as defined below.  It can be overridden by the parameter
                     <code>$content</code> (the way <code>$content</code> is used to build the
                     body can be controlled by the parameter <code>$serial</code>, see section
                     below for details.)
                  </p>
               </item>
            </ulist>
            <eg>
&lt;http:header name = string
             value = string/></eg>
            <p>
               The <code>http:header</code> element represents an HTTP header, either in a request
               or in a response.
            </p>
            <eg>
&lt;http:body content-type = string
           encoding? = string
           id? = string
           description? = string
           href? = string>
   any*
&lt;/http:body></eg>
            <p>
               The <code>http:body</code> element represents the body of either an HTTP request
               or of an HTTP response (in multi-part requests and responses, it represents
               the body of a single one part.)
            </p>
            <p>
               The <code>content-type</code> and <code>encoding</code> attributes are used to
               control the way the content of this element is used to create the HTTP request
               (how it is serialized to the request content.)  See section below for details.
               The <code>id</code> attribute specifies the value of the HTTP header
               <code>Content-ID</code> and <code>description</code> the value of the HTTP header
               <code>Content-Description</code>.  The <code>href</code> attribute can be used
               in a request to set the body content as the content of the linked resource instead
               of using the children of the <code>http:body</code> element (children of this
               element and the <code>href</code> attribute are mutually exclusive.)
            </p>
            <eg>
&lt;http:multipart content-type = string
                boundary = string>
   (http:header*,
    http:body)+
&lt;/http:multipart></eg>
            <p>
               The <code>http:multipart</code> element represents an HTTP multi-part request
               or response.  The <code>content-type</code> attribute is the media type of the
               whole request or response, and has to be a multipart media type (that is, its
               main type must be <code>multipart</code>.)  The <code>boundary</code> attribute
               is the boundary marker used to separate the several parts in the message (the
               value of the attribute is prefixed with "<code>--</code>" to form the actual
               boundary marker in the request; on the other way, this prefix is removed from
               the boundary marker in the response to set the value of the attribute.)
            </p>
         </div2>
         <div2>
            <head>Serializing the request content</head>
            <p>
               If the request can have content (one body or several body parts,) it can be
               specified by the <code>http:multipart</code> element, the <code>http:body</code>
               element, and/or the parameter <code>$content</code>.  If <code>$content</code>
               is not the empty sequence, it replaces the value of the <code>http:body</code>
               element (in multipart, if there are several bodies, exactly one <code>http:body</code>
               must be empty.)  For each body, the content of the HTTP body is generated as follow.
            </p>
            <p>
               The parameter <code>$serial</code> is used to control the way the content is
               serialized.  This parameter can be an <code>xsl:output</code> element, as defined
               in <bibref ref="xslt20"/>, and the serialization is defined in <bibref ref="xserial"/>.
               <code>$serial</code> can also be a string, either '<code>xml</code>', '<code>html</code>',
               '<code>xhtml</code>' or '<code>text</code>' (other values are implementation-defined,
               as explained in the above mentioned recommendations.) (Note: <code>$serial</code> should be
               able to be a function item too, when EXPath will have defined the corresponding module.)  If
               <code>$serial</code> is the empty sequence, the default value for this parameter depends
               on the <code>content-type</code> of the body: it is '<code>xml</code>' if it is an XML
               media type, '<code>html</code>' if it is an HTML media type, '<code>xhtml</code>' if
               it is <code>application/xhtml+xml</code> or 'text' for any other case.
            </p>
         </div2>
         <div2>
            <head>Authentication</head>
            <p>
               HTTP authentication when sending a request is controlled by the attributes 
               <code>username</code>, <code>password</code>, <code>auth-method</code> and
               <code>send-authorization</code> on the element <code>http:request</code>.
               If <code>username</code> has a value, <code>password</code> and
               <code>auth-method</code> must have a value too.  And if any one of the three
               other attributes have been set, <code>username</code> must be set too.
            </p>
            <p>
               The attribute <code>auth-method</code> can be either "<code>Basic</code>" or
               "<code>Digest</code>", but other values can also be used, in an
               implementation-defined way.  The handling of those attributes must be done
               in conformance to <bibref ref="rfc2617"/>.  If <code>send-authorization</code>
               is true (default value is false) and the authentication method supports
               generating the header <code>Authorization</code> without challenge, the
               request contains this header.  The default value is to send a non-authenticated
               request, and if the response is an authentication challenge, then only send
               the credentials in a second message.
            </p>
         </div2>
      </div1>
      <div1>
         <head>Dealing with the response</head>
         <p>
            After having sent the request to the HTTP server, the function waits for
            the response.  It analyses it and returns a sequence representing this
            response.  This sequence has an <code>http:response</code> element as
            first item, which is followed be an additional item for each body or
            body part in the response.
         </p>
         <div2>
            <head>The result element</head>
            <eg>
&lt;http:response status = integer
                  message = string>
   (http:header*,
     (http:multipart |
      http:body)?)
&lt;/http:response></eg>
            <p>
               This is the first item returned by the function defined in this module.
               The <code>status</code> attribute is the HTTP status code returned by the
               server, and <code>message</code> is the message coming with the status on the
               status line.  The <code>http:header</code> elements are as defined for the
               request, but represent instead the response headers.  The <code>http:body</code>
               and <code>http:multipart</code> elements are also like in the request, but
               <code>http:body</code> elements must be empty.
            </p>
         </div2>
         <div2>
            <head>Representing the result content</head>
            <p>
               Instead of being inserted within the <code>http:response</code> element, the
               content of each body is returned as a single item in the return sequence.
               Each item is in the same order (after the <code>http:response</code> element)
               than the <code>http:body</code> elements.  For each body, the way this item
               is built from the HTTP response is as follow.
            </p>
            <p>
               If the <code>status-only</code> attribute has the value <code>true</code>
               (default is <code>false</code>,) the returned sequence will only contain the
               <code>http:response</code> element (with the headers, but also the empty
               <code>http:body</code> or <code>http:multipart</code> elements, as if
               <code>status-only</code> was false,) and the following items, representing
               the bodies content are not generated from the HTTP response.
            </p>
            <p>
               For each body that has to be interpreted, the following rules apply in order to
               build the corresponding item.  If the body media type is a text media type, the
               item is a string, containing the body content.  If the media type is an XML
               media type, the content is parsed and the item is the resulting document node.
               If the media type is an HTML type, the content is <emph>tidied up</emph> and
               parsed (this process is implementation-dependant) and the item is the resulting
               document node.  If this is a binary media type, the content is returned as a
               base64Binary item.  From the previous rules, a result item can then be either a
               document node (from XML or HTML,) a string or a base64Binary.
            </p>
            <p>
               If the attribute <code>override-content-type</code> is set on the request, its
               value is used instead of the content-type returned by the HTTP server (TODO: how
               does it fit with multipart responses?)
            </p>
            <!--p>
               TODO: we should be able to select the part to use (for instance, for
               a multipart text and xhtml, always select xhtml...)
            </p-->
         </div2>
      </div1>
      <div1>
         <head>Content types handling</head>
         <p>
            In both requests and responses, MIME type strings are used to choose the way the
            entity content has to be respectively serialized or parsed.  Four different kinds
            of type are defined here, which are used in the above text about sending request
            and receiving response.  The intent is to provide the spirit of the entity content
            handling regarding its content type, but an implementation is encouraged to deviate
            from those rules if it is obvious that a particular type should be treated in a
            specific way (normally, that would be the case only to treat a binary type as
            another type.)
         </p>
         <ulist>
            <item>
               <p>
                  An XML media type has a MIME type of <code>text/xml</code>,
                  <code>application/xml</code>, <code>text/xml-external-parsed-entity</code>,
                  or <code>application/xml-external-parsed-entity</code>, as defined in
                  <bibref ref="rfc3023"/> (except that <code>application/xml-dtd</code> is
                  considered a text media type.)  MIME types ending by <code>+xml</code> are
                  also XML media types.
               </p>
            </item>
            <item>
               <p>
                  An HTML media type has a MIME type of <code>text/html</code>.
               </p>
            </item>
            <item>
               <p>
                  Text media types are the remaining types beginning with <code>text/</code>.
               </p>
            </item>
            <item>
               <p>
                  Binary types are all the other types.  An implementation can treat some of
                  those binary types as either an XML, HTML or text media type if it is more
                  appropriate (this is implementation-defined.)
               </p>
            </item>
         </ulist>
      </div1>
   </body>
   <back>
      <div1>
         <head>References</head>
         <p>
            The structure of most of the elements and most of the attributes used in this
            candidate are inspired from the corresponding step in <bibref ref="xproc"/>.
         </p>
         <blist>
            <bibl id="tidy" key="HTML Tidy">
               <loc href="http://tidy.sf.net/">HTML Tidy Library Project</loc>. SourceForge project.
            </bibl>
            <bibl id="rfc1521" key="RFC 1521">
               <loc href="http://www.ietf.org/rfc/rfc1521.txt">RFC 1521: MIME (Multipurpose Internet
                  Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of
                  Internet Message Bodies</loc>.  N. Borenstein, N. Freed, editors. Internet
               Engineering Task Force. September, 1993.
            </bibl>
            <bibl id="rfc2616" key="RFC 2616">
               <loc href="http://www.ietf.org/rfc/rfc2616.txt">RFC 2616: Hypertext Transfer Protocol
               — HTTP/1.1</loc>.  R. Fielding, J. Gettys, J. Mogul, et. al., editors. Internet
               Engineering Task Force. June, 1999.
            </bibl>
            <bibl id="rfc2617" key="RFC 2617">
               <loc href="http://www.ietf.org/rfc/rfc2617.txt">RFC 2617: HTTP Authentication: Basic
               and Digest Access Authentication</loc>.  J. Franks, P. Hallam-Baker, J. Hostetler,
               S. Lawrence, P. Leach, A. Luotonen, L. Stewart. June, 1999.
            </bibl>
            <bibl id="rfc3023" key="RFC 3023">
               <loc href="http://www.ietf.org/rfc/rfc3023.txt">RFC 3023: XML Media Types</loc>.
               M. Murata, S. St. Laurent, and D. Kohn, editors. Internet Engineering Task Force.
               January, 2001.
            </bibl>
            <bibl id="xserial" key="Serialization">
               <loc href="http://www.w3.org/TR/xslt-xquery-serialization/">XSLT 2.0 and XQuery 1.0
               Serialization</loc>.  Scott Boag, Michael Kay, Joanne Tong, Norman Walsh, and Henry
               Zongaro, editors. W3C Recommendation. 23 January 2007.
            </bibl>
            <bibl id="tagsoup" key="TagSoup">
               <loc href="http://ccil.org/~cowan/XML/tagsoup/">TagSoup - Just Keep On Truckin'</loc>.
               John Cowan.
            </bibl>
            <bibl id="xproc" key="XProc">
               <loc href="http://www.w3.org/TR/xproc/">XProc: An XML Pipeline Language</loc>.
               N. Walsh, A. Milowski, and H. S. Thompson, editors. W3C Candidate Recommendation.
               26 November 2008.
            </bibl>
            <bibl id="xslt20" key="XSLT 2.0">
               <loc href="http://www.w3.org/TR/xslt20/">XSL Transformations (XSLT) Version 2.0</loc>.
               Michael Kay, editor. W3C Recommendation. 23 January 2007.
            </bibl>
         </blist>
      </div1>
   </back>
</spec>
