<?xml version="1.0" encoding="UTF-8"?>

<?oxygen RNGSchema="http://www.w3.org/XML/1998/06/schema/xmlspec.rng" type="xml"?>

<spec role="editors-copy" xmlns:fg="http://www.fgeorges.org/xmlspec">
   <header>
      <title>Packaging System</title>
      <w3c-designation>w3c-designation</w3c-designation>
      <w3c-doctype>EXPath Candidate Module</w3c-doctype>
      <pubdate>
         <day>18</day>
         <month>November</month>
         <year>2009</year>
      </pubdate>
      <publoc>
         <loc href="http://www.expath.org/modules/pkg-20091118.html"/>
      </publoc>
      <altlocs>
         <loc href="http://www.expath.org/modules/pkg-20091118.xml">XML</loc>
      </altlocs>
      <latestloc>
         <loc href="http://www.expath.org/modules/pkg.html"/>
      </latestloc>
      <prevlocs>
         <loc href="http://www.expath.org/modules/pkg-20090901.html"/>
      </prevlocs>
      <authlist>
         <author>
            <name>Florent Georges</name>
            <affiliation>H2O Consulting</affiliation>
         </author>
      </authlist>
      <copyright>
         <p/>
      </copyright>
      <abstract>
         <p>This proposal defines a packaging system for various core XML technologies: XSLT,
            XQuery, and XProc. The goal is to define it in a way enough generic so to adapt it to
            other technologies in the future (such as XML Schema, XForms, etc.) using the same
            framework. Besides enabling the delivery of libraries written in standard XSLT, XQuery
            and XProc, it provides support for extensions specific to some processors, as well as
            enabling new processors to be supported by using the same framework.</p>
      </abstract>
      <status>
         <p>Must be ignored, but is required by the schema...</p>
      </status>
      <langusage>
         <language>langusage</language>
      </langusage>
      <revisiondesc>
         <p>revisiondesc</p>
      </revisiondesc>
   </header>
   <body>
      <div1>
         <head>Introduction</head>
         <p>XSLT, XQuery and XProc are amazing programming languages. But they lack a large choice
            of libraries, and when such libraries do exist, this is a challenge to install. There is
            no automatic install process, the rules are different for each processor, library
            authors do not follow the same rules regarding the info they provide, the cataloging,
            the way they reference third-party libraries, etc.</p>
         <p>All those problems (well, most of them) can be addressed by a packaging system that
            would be broadly adopted by processor vendors and library authors. The cornerstone of
            such a system is the packaging format: a description of the information to be provided
            by the library authors and how to provide and structure them.</p>
      </div1>
      <div1>
         <head>Concepts</head>
         <p>A library is a set of files fulfilling a common purpose. An XSLT library can for
            instance provide a set of template rules and functions to help formating a particular
            XML document type. A package is a way to bundle those files into a single ZIP file,
            following a defined structure and providing more information within the <kw>package
               descriptor</kw>. The package descriptor is a plain XML file, named
               <code>expath-pkg.xml</code> at the root of the ZIP file, and containing information
            about the library (like its name and its version number) and about the files it provides
            and how to reference them (for instance stylesheets and query modules.)</p>
         <p>The ZIP file structure (aka the package structure) must have exactly two entries at the
            top level: the package descriptor and one directory entry. This directory contains all
            the library files, and all file references in the package descriptor are relative to
            this directory. This directory is called the <kw>library directory</kw>.</p>
         <p>All the elements in the package descriptor are defined in the namespace
               <code>http://expath.org/mod/expath-pkg</code>. All the elements defined in this
            specification or used in samples and in text are in this namespace, even if no prefix is
            used. The root element is <code>package</code>, and contains exactly one child element
               <code>module</code>:</p>
         <eg>
&lt;package>
   module
&lt;/package>

&lt;module name = NCName
        version = string>
   title,
   (xslt
   |xquery
   |xproc
   |xsd
   |rng
   |rnc
   |...)+
&lt;/module></eg>
         <p><code>name</code> is the library name. The top-level directory in the package structure
            must have the exact same name. The module has also a version number, and a
            human-readable title. It then provides information about one or several files. Those
            files are called the <kw>components</kw>. In addition to those standard file
            descriptors, it can also contain elements specific to some processors (for instance an
            element for Saxon, eXist, etc.) Details are provided below.</p>
         <p>The components are the files <emph>exported</emph> by the module. But the whole library
            directory must be preserved. Indeed, it can contain other, private files, aimed to be
            used only from within library files, not from the outside.</p>
         <p>The components are accessed from the outside of the package by using a URI. This URI is
            the <kw>public URI</kw>, and absolute URI, which cannot be of scheme <code>file:</code>.
            Its exact usage depends on the kind of component (for instance, with XSLT it is aimed at
            be used in xsl:import, and in XQuery this is the target namespace of an XQuery library
            module.) Each kind of component defines its own <kw>URI space</kw>. So to uniquely
            identify a component in the repository, one needs the public URI and the URI space to
            use.</p>
      </div1>
      <div1>
         <head>Standard components</head>
         <p>Here is the description of the standard component kinds supported by this specification,
            and how they contribute to the package descriptor document type.</p>
         <div2>
            <head>XSLT</head>
            <p>An XSLT file is associated a <emph>public import URI</emph>.This is the URI to use in
               an XSLT import instruction (aka <code>xsl:import</code>) to import the XSLT file
               provided in the package. This file is configured with the element
               <code>xslt</code>.</p>
            <eg>
&lt;xslt>
   import-uri,
   file
&lt;/xslt></eg>
            <p>The element <code>file</code> contains the path to the file within the package
               structure, relative to the library directory. Both elements <code>import-uri</code>
               and <code>file</code> are of type <code>anyURI</code>.</p>
         </div2>
         <div2>
            <head>XQuery</head>
            <p>An XQuery library module is referenced by its namespace URI. Thus the
                  <code>xquery</code> element associates a namespace URI to an XQuery file. An
               importing module just need to use an import statement of the form <code>import module
                  namespace xx = "&lt;namespace-uri>";</code>.</p>
            <eg>
&lt;xquery>
   namespace,
   file
&lt;/xquery></eg>
            <p>Note that there is no way to set any location hint (as the <code>at</code> clause in
               the import statement.) To use this packaging system, an XQuery library module must be
               referenced by its target namespace.</p>
         </div2>
         <div2>
            <head>XProc</head>
            <p>An XProc pipeline, like an XSLT stylesheet, is associated a <emph>public import
                  URI</emph>, aimed to be used in an XProc <code>p:import</code> statement.</p>
            <eg>
&lt;xproc>
   import-uri,
   file
&lt;/xproc></eg>
         </div2>
         <div2>
            <head>XML Schema</head>
            <p>An XML schema can be imported using its target namespace. Like for XQuery, there is
               no way to use any <code>schemaLocation</code> instead. There is neither the ability
               to set several files as several sources for the schema. If the schema is spread over
               multiple files, there must be one top-level file that includes the other files.</p>
            <eg>
&lt;xsd>
   namespace,
   file
&lt;/xsd></eg>
            <p>(TODO: Should we support schemas with empty target namespace? I am sure this is a
               good idea in a packaging system...) (TODO: This does not support
                  <code>xs:redefine</code>, as it requires a hint, not a TNS)</p>
         </div2>
         <div2>
            <head>RelaxNG</head>
            <p>A RelaxNG schema, like an XSLT stylesheet, is associated a public import URI, aimed
               to be used in an <emph>import</emph> statement (either the <emph>include</emph>
               element for an RNG schema or an <emph>import</emph> directive for an RNC schema.)</p>
            <eg>
&lt;rng>
   import-uri,
   file
&lt;/rng>

&lt;rnc>
   import-uri,
   file
&lt;/rnc></eg>
         </div2>
         <div2>
            <head>Schematron</head>
            <p>A Schematron schema is associated a public URI.</p>
            <eg>
&lt;schematron>
   import-uri,
   file
&lt;/schematron></eg>
         </div2>
         <div2>
            <head>NVDL</head>
            <p>An NVDL script is associated a public URI.</p>
            <eg>
&lt;nvdl>
   import-uri,
   file
&lt;/nvdl></eg>
         </div2>
         <div2>
            <head>Not supported file kinds</head>
            <p>Documentation (like result of XSLStyle or xqDoc) is not taken into account in the
               packaging format, though that could be used by IDEs for instance to provide
               documentation for functions in an editor with a live completion feature. Some support
               for documentation can of course be added as a product-specific feature to the package
               descriptor.</p>
         </div2>
      </div1>
      <div1>
         <head>Processor behaviour</head>
         <p>A <kw>processor</kw> is any program that use packaged components. For instance, an XSLT
            processor uses XSLT stylesheets (as well as XML schemas for an XSLT 2.0 SA processor),
            an XML database can use XQuery modules, XSLT stylesheets or any other kind of component,
            etc. But processors include also IDEs, editors, and any program that could want to use
            packaged components and that supports this specification.</p>
         <p>The installed package list is implementation-defined. Each implementation (for a
            specific processor) can define its own way to install and remove packages, as long as it
            properly documents it. A processor should use, when appropriate, the <kw>on-disk
               repository layout</kw> as defined below.</p>
         <p>When a reference to a file of a specific kind is done via an absolute URI, a processor
            must look up for this URI in the corresponding URI space in the repository. How the
            repository is set to the processor is implementation-defined (a processor can also use a
            list of repositories, and enable or disable some libraries in any implementation-defined
            way.)</p>
         <p>The URI space to use is defined by the nature of the reference. An XSLT
               <code>href</code> attribute on <code>xsl:import</code> will use the <code>xslt</code>
            URI space, while it will use the <code>xsd</code> space for
               <code>xsl:import-schema</code>.</p>
         <div2>
            <head>XProc pipelines</head>
            <p>An XProc processor in particular has to pay great attention to the space it use
               regarding the step that is beng evaluated. Any <code>xsl:import</code> instruction
               encountered on the <code>stylesheet</code> port of the step <code>p:xslt</code> has
               to be looked for in the <code>xslt</code> space (regardless if the stylesheet
               document is inlined in the pipeline, computed, loaded from the file system or
               retrieved from the Internet, or if the containing stylesheet has been imported
               itself.)</p>
            <p>The XProc elements <code>p:document</code> and <code>p:data</code>, as well as the
               step <code>p:load</code> are handled specially. They can be used to access any kind
               of resource, including but not limited to components in a repository. The user has to
               tell explicitely the processor what kind of component is looked for by using the
                  <code>pkg:kind</code> extension attribute. For instance, a stylesheet can be
               loaded from a repository as input to the step <code>xslt</code> as following:</p>
            <eg>
&lt;p:xslt>
   &lt;p:input port="stylesheet">
      &lt;p:document href="..." pkg:kind="xslt"/>
   &lt;/p:input>
   ...
&lt;/p:xslt></eg>
         </div2>
      </div1>
      <div1>
         <head>On-disk repository layout</head>
         <p>This section defines a standard structure for on-disk repositories. An implementation
            can choose to not support this kind of repository and to define its own one (or even to
            not define it publicly, just to provide the ability to install and remove packages, in a
            clearly documented way.) However, there are several advantages to support this
            structure, the most obvious one is to be able to benefit from existing tools to manage
            such repositories as well as existing libraries to access those repositories.</p>
         <p>The resolving machinery is based on OASIS XML Catalogs <bibref ref="catalogs"/>. The
            repository is a simple directory, each subdirectory of which is an installed package
            (aka a <kw>package dir</kw>.) The only exception to this is the subdirectory
               <code>.expath-pkg/</code> which is dedicated to store working information about the
            installed packages, among which the catalogs (aka the <kw>admin dir</kw>.)</p>
         <eg>
[repository-dir]/
   .expath-pkg/
      xquery-catalog.xml
      xslt-catalog.xml
      lib1/
         xquery-catalog.xml
         xslt-catalog.xml
      lib2/
         ...
   lib1/
      query.xq
      style.xsl
   lib2/
      ...</eg>
         <p>The <emph>package dirs</emph> are really simple: they are simply an unzipped version of
            the XAR file. The name of the directory is simply the same as the name of the module in
            the package. The <emph>admin dir</emph> contains a catalog for each URI space (the
            catalog for one specific URI space can not be there if there is no one file in that URI
            space in the whole repository.) The name of such a catalog is
               <code>[space]-catalog.xml</code> where [space] is either <code>xslt</code>,
               <code>xquery</code>, <code>rnc</code>, etc. Those catalogs are called <kw>repository
               catalogs</kw>. It also contains a subdirectory for each installed package, with the
            same naming convention. In turn, those directories contain catalog files, containing the
            mappings defined in the corresponding package descriptors (pointing to the actual files
            installed in the <emph>package dirs</emph>.) Those are called the <kw>package
               catalogs</kw>. They follow the same naming convention than the <emph>repository
               catalogs</emph> (divided by URI spaces.) The repository catalogs just include the
            several package catalogs for the same URI space.</p>
         <p>[ ... TODO ... ]</p>
      </div1>
      <div1>
         <head>Examples</head>
         <p>[... TODO ...] (package structures, package descriptors, directory layout, processor
            behaviour...)</p>
      </div1>
      <!--div1>
         <head>Processor-specific files</head>
         <p>Bla-bla...</p>
         <div2>
            <head>Saxon</head>
            <p>Bla-bla...</p>
            <p>... classes for <loc
                  href="http://www.saxonica.com/documentation/extensibility/functions.html"
                  >extension functions</loc> (both Java and .Net?), classes for <loc
                  href="http://www.saxonica.com/documentation/extensibility/instructions.html"
                  >extension instructions</loc>, classes for <loc
                  href="http://www.saxonica.com/documentation/extensibility/localizing.html"
                  >localizing numbers and dates</loc>, URI resolvers, etc.</p>
         </div2>
         <div2>
            <head>Xalan</head>
            <p>Bla-bla...</p>
         </div2>
         <div2>
            <head>eXist</head>
            <p>Bla-bla...</p>
         </div2>
         <div2>
            <head>MarkLogic</head>
            <p>Bla-bla...</p>
         </div2>
         <div2>
            <head>Zorba</head>
            <p>Bla-bla...</p>
         </div2>
         <div2>
            <head>Calabash</head>
            <p>Bla-bla...</p>
         </div2>
         <div2>
            <head>Calumet</head>
            <p>Bla-bla...</p>
         </div2>
         <div2>
            <head>xmlsh</head>
            <p>Bla-bla...</p>
         </div2>
      </div1-->
      <div1>
         <head>Extensibility</head>
         <p>[ ... TODO ... ]</p>
      </div1>
      <div1>
         <head>Editorial notes</head>
         <ulist>
            <item>
               <p>Should the package system define a set of XPath functions? Instead of just
                  defining the package format and letting everything else as implementation-defined,
                  should it in addition define a module of functions to install, delete, and more
                  generally manage packages from within a processor?</p>
               <p>Drawback: potential problems if the processor requires to be stopped?</p>
               <p>Advantages: enables writing tools on top of the system (one single graphical
                  package manager for one system, simply using the XPath functions, as well as easy
                  integration within IDEs; or even other systems could be more easily be built on
                  top of it, like a packaging system for XRX applications for instance.)</p>
            </item>
            <item>
               <p>Should we add a generic "xml" URI space, for any XML document?</p>
            </item>
            <item>
               <p>Add a restriction (actual add a definition of such URIs at the beginning) on
                  public URIs: require them to be absolute, and maybe only either HTTP or URN (or at
                  least prohibit the FILE scheme.)</p>
            </item>
            <item>
               <p>Interesting use case with XProc and NVDL: How to configure the XSLT processor in
                  Calabash with the NVDL URI space for an NVDL implementation for XProc written
                  using XSLT as a plain library step?</p>
            </item>
         </ulist>
      </div1>
   </body>
   <back>
      <div1>
         <head>Package descriptor schema</head>
         <p>[ ... TODO ... ]</p>
         <!--eg><xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../xsd/expath-pkg.xsd" parse="text"/></eg>
         <p>(+ RNC + RNG) (+ Schematron?, if useful)</p-->
      </div1>
      <div1>
         <head>References</head>
         <blist>
            <bibl id="catalogs" key="Catalogs"><loc
                  href="http://www.oasis-open.org/committees/documents.php?wg_abbrev=entity">XML
                  Catalogs</loc>, OASIS Standard V1.1, 7 October 2005</bibl>
         </blist>
      </div1>
   </back>
</spec>
