Originally published at http://www.xml.com/pub/a/2000/11/29/schemas/part1.html
Minor revision made by Junghoo "John" Cho for the CS188 class at UCLA
The W3C XML Schema Definition Language is an XML language for describing and constraining the content of XML documents. W3C XML Schema is a W3C Recommendation.
This article is an introduction to using W3C XML Schemas, and also includes a comprehensive reference to the Schema datatypes and structures.
Let's start by having a look at this simple document which describes a book:
<?xml version="1.0"?>
<book isbn="0836217462">
<title>
Being a Dog Is a Full-Time Job
</title>
<author>Charles M. Schulz</author>
<character>
<name>Snoopy</name>
<friend-of>Peppermint Patty</friend-of>
<since>1950-10-04</since>
<qualification>
extroverted beagle
</qualification>
</character>
<character>
<name>Peppermint Patty</name>
<since>1966-08-22</since>
<qualification>bold, brash and tomboyish</qualification>
</character>
</book>
To write a schema for this document, we could simply follow its structure
and define each element as we find it. To start, we open a xs:schema element:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> .../... </xs:schema>
The schema element opens our schema. It can also hold the
definition of the target namespace and several default options, of which we
will see some of them in the following sections.
To match the start tag for the book element, we define an
element named book. This element has attributes and non text
children, thus we consider it as a complexType (since the other
datatype, simpleType is reserved for datatypes holding only values
and no element or attribute sub-nodes. The list of children of the book
element is described by a sequence element:
<xs:element name="book">
<xs:complexType>
<xs:sequence>
.../...
</xs:sequence>
.../...
</xs:complexType>
</xs:element>
The sequence is a "compositor" that defines an ordered sequence
of sub-elements. There exist other compositors, such as choice and
all, but we will only focus on sequence in this tutorial.
Now we can define the title and author elements as simple types -- they don't
have attributes or non-text children and can be described directly within a
degenerate element element. The type (xs:string) is prefixed by
the namespace prefix associated with XML Schema, indicating a predefined XML
Schema datatype:
<xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/>
Now, we must deal with the character element, a complex type.
Note how its cardinality is defined:
<xs:element name="character" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
.../...
</xs:sequence>
</xs:complexType>
</xs:element>
Unlike other schema definition languages, W3C XML Schema lets us define the
cardinality of an element (i.e. the number of its possible occurrences) with
some precision. We can specify both minOccurs (the minimum number
of occurences) and maxOccurs (the maximum number of occurrences).
Here maxOccurs is set to unbounded which means that
there can be as many occurences of the character element as the author wishes.
Both attributes have a default value of one.
We specify then the list of all its children in the same way:
<xs:element name="name" type="xs:string"/>
<xs:element name="friend-of" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="since" type="xs:date"/>
<xs:element name="qualification" type="xs:string"/>
And we terminate its description by closing the complexType ,
element and sequence elements.
We can now declare the attributes of the document elements, which must always come last. There appears to be no special reason for this, but the W3C XML Schema Working Group has considered that it was simpler to impose a relative order to the definitions of the list of elements and attributes within a complex type, and that it was more natural to define the attributes after the elements.
<xs:attribute name="isbn" type="xs:string"/>
And close all the remaining elements.
That's it! This first design, sometimes known as "Russian Doll Design" tightly follows the structure of our example document.
One of the key features of such a design is to define each element and attribute within its context and to allow multiple occurrences of a same element name to carry different definitions.
Complete listing of this first example:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="character" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="friend-of" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="since" type="xs:date"/>
<xs:element name="qualification" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="isbn" type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:schema>
The next section explores how to subdivide schema designs to make them more readable and maintainable.
While the previous design method is very simple, it can lead to a depth in the embedded definitions, making it hardly readable and difficult to maintain when documents are complex. It also has the drawback of being very different from a DTD structure, an obstacle for human or machine agents wishing to transform DTDs into XML Schemas, or even just use the same design guides for both technologies.
The second design is based on a flat catalog of all the elements available in the instance document and, for each of them, lists of child elements and attributes. This effect is achieved through using references to element and attribute definitions that need to be within the scope of the referencer, leading to a flat design:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- definition of simple type elements -->
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="friend-of" type="xs:string"/>
<xs:element name="since" type="xs:date"/>
<xs:element name="qualification" type="xs:string"/>
<!-- definition of attributes -->
<xs:attribute name="isbn" type="xs:string"/>
<!-- definition of complex type elements -->
<xs:element name="character">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="friend-of" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="since"/>
<xs:element ref="qualification"/>
<!-- the simple type elements are referenced using
the "ref" attribute -->
<!-- the definition of the cardinality is done
when the elements are referenced -->
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="author"/>
<xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute ref="isbn"/>
</xs:complexType>
</xs:element>
</xs:schema>
Using a reference to an element or an attribute is somewhat comparable to cloning an object. The element or attribute is defined first, and it can be duplicated at another place in the document structure by the reference mechanism, in the same way an object can be cloned. The two elements (or attributes) are then two instances of the same class.
The next section shows how we can define such classes, called "types," that enables us to re-use element definitions.
We have seen that we can define elements and attributes as we need them (Russian doll design), or create them first and reference them (flat catalog). W3C XML Schema gives us a third mechanism, which is to define data types (either simple types that will be used for PCDATA elements or attributes or complex types that will be used only for elements) and to use these types to define our attributes and elements.
This is achieved by giving a name to the simpleType and
complexType elements, and locating them outside of the definition
of elements or attributes. We will also take the opportunity to show how we can
derive a datatype from another one by defining a restriction over the values of
this datatype.
For instance, to define a datatype named nameType, which is a
string with a maximum of 32 characters, we will write:
<xs:simpleType name="nameType">
<xs:restriction base="xs:string">
<xs:maxLength value="32"/>
</xs:restriction>
</xs:simpleType>
The simpleType element holds the name of the new datatype. The
restriction element expresses the fact that the datatype is
derived from the string datatype of the W3C XML Schema namespace
(attribute base) by applying a restriction, i.e. by limiting the
number of possible values. The maxLength element that, called a
facet, says that this restriction is a condition on the maximum length to be 32
characters.
Another powerful facet is the pattern element, which defines a
regular expression that must be matched. For instance, if we do not care about
the "-" signs, we can define an ISBN datatype as 10 digits
thus:
<xs:simpleType name="isbnType">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{10}"/>
</xs:restriction>
</xs:simpleType>
Facets, and the two other ways to derive a datatype (list and union) are covered in the next sections.
Complex types are defined as we've seen before, but given a name.
Defining and using named datatypes is comparable to defining a class and using it to create an object. A datatype is an abstract notion that can be used to define an attribute or an element. The datatype plays then the same role with an attribute or an element that a class would play with an object.Full listing:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- definition of simple types -->
<xs:simpleType name="nameType">
<xs:restriction base="xs:string">
<xs:maxLength value="32"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="sinceType">
<xs:restriction base="xs:date"/>
</xs:simpleType>
<xs:simpleType name="descType">
<xs:restriction base="xs:string"/>
</xs:simpleType>
<xs:simpleType name="isbnType">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{10}"/>
</xs:restriction>
</xs:simpleType>
<!-- definition of complex types -->
<xs:complexType name="characterType">
<xs:sequence>
<xs:element name="name" type="nameType"/>
<xs:element name="friend-of" type="nameType" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="since" type="sinceType"/>
<xs:element name="qualification" type="descType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="bookType">
<xs:sequence>
<xs:element name="title" type="nameType"/>
<xs:element name="author" type="nameType"/>
<xs:element name="character" type="characterType" minOccurs="0"/>
<!-- the definition of the "character" element is
using the "characterType" complex type -->
</xs:sequence>
<xs:attribute name="isbn" type="isbnType" use="required"/>
</xs:complexType>
<!-- Reference to "bookType" to define the
"book" element -->
<xs:element name="book" type="bookType"/>
</xs:schema>
Each W3C XML Schema document is bound to a specific namespace through the targetNamespace attribute, or to the absence of namespace through the lack of such an attribute.
Until now we have omitted the targetNamespace attribute, which means that we were working without namespaces. To get into namespaces, let's first imagine that our example belongs to a single namespace:
<book isbn="0836217462" xmlns="http://example.org/ns/books/"> .../... </book>
The least intrusive way to adapt our schema is to add some more attributes to our xs:schema
element.
<xs:schema targetNamespace="http://example.org/ns/books/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:bk="http://example.org/ns/books/">
.../...
</xs:schema>
Here, the most important part is the targetNamespace attribute, which lets you define what namespace
is described in this schema. In this example, all elements, datatypes, etc., defined in this schema definition become
part of the http://example.org/ns/books/ namespace.
The namespace declaration xmlns:xs="http://www.w3.org/2001/XMLSchema"
says that in this schema definition, we will use the prefix xs to identify the elements and datatypes defined in the
W3C XML Schema standard as we have done all over the examples thus far. Understand that we
could have chosen any prefix instead of xs. We could even make http://www.w3.org/2001/XMLSchema
our default namespace and in this case, we wouldn't have prefixed the W3C XML Schema elements nor its
datatypes.
Since we are working with the http://example.org/ns/books/
namespace, in the above example we also define it (with a bk prefix). This means that we
will now prefix the references to "objects" (datatypes, elements, attributes,
...) belonging to this namespace with bk:. Again, we could have
chosen any prefix to identify this namespace or even have made it our default
namespaces.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.