An Introduction To XML
by Kirthan Battapadi
February 6, 2000
The newest addition to the pantheon of Web technologies is yet
another markup language, XML , otherwise known as Extensible Markup Language, resembles
other markup languages like SGML and HTML. XML allows you to use your own tags to define
parts of a document. XML is becoming the vehicle for structured data on the web, fully
complementing HTML, which is used to present the data.
Although XML differs from HTML in some fundamental ways, learning
to author XML documents do not require a great deal of effort. Because HTML and XML both
share a common heritage (SGML), their syntax is very similar. XML is a simplified version
of SGML. SGML is a markup language used in technical documentation.
The design goals of XML, taken from the specification are
- XML shall be straightforwardly usable over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs, which process XML documents.
- The number of optional features in XML is to be kept to the absolute minimum, ideally
zero.
- XML documents should be human legible and reasonably clear.
- The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance.
XML Design v/s HTML Design
XML is not going to replace HTML. The two are not the same thing. XMLs purpose is to
describe the content of a document. Unlike HTML, XML does not describe what that content
is. Using XML the Web page developer can semantically markup the contents of a document,
describing the content in terms of its relevance as data.
XML and HTML will work together in the future to display a page
in the Web browser.
For example, the following HTML document
<p>Sleeping with the enemy</p>
describes the contents within the tags as a paragraph. Here we
are concerned with deploying the words within a web page. But if we want to access those
words as data. Using XML we can markup the words that better reflects their significance
as data.
<film>Sleeping with the enemy</film>
When marking up documents in XML, you can choose the tag name
that best describes the contents of the element.
How XML can be used?
XML can keep data separated from your HTML. HTML pages are used to display data. Data is
often stored inside HTML pages. With XML this data can now be stored in a separate XML
file. XML data can also be stored inside HTML pages as "Data Islands".
Computer systems and databases contain data in incompatible
formats. So the time consuming challenges for developers is to exchange data between such
systems over the Internet. Converting the data to XML can greatly reduce this complexity
and create data that can be read by different types of applications. XML can also store
data in files or in databases. Applications can be written to store data and retrieve
information from the store, and generic applications can be used to display data.
XML Constructs
To create XML documents, you need to be familiar with several basic constructs: elements,
attributes and entities.
Elements: Elements are labels used to describe your
content. Theyre described in the DTD by element declarations and invoked in the
document element as tags. Element declarations by default define tag pairs. Tag pairs
contain text as well as other elements and their content. An element declaration may also
define an empty element, one that isnt designed to contain any text or other
elements.
Attributes: Attributes provide extra information about the
element. XML elements can have attributes in name/value pairs just like in HTML.
<platform type="mac" />
<os type="Mac 7.x" />
One of the rules of XML is that all attribute values, regardless
of type must be enclosed in quotation marks.
There are some problems using attributes:
- Attributes cannot contain multiple values.
- Attributes are not expandable.
- Attributes are more difficult to manipulate by program code.
- Attribute values are not easy to test against a DTD.
XML Syntax
An example XML document is as follows:
<?xml version="1.0">
<package>
<title>Norton Utilities</title>
<version>3.5</version>
<vendor>Symantec</vendor>
<platform />
<os />
<description>A hard disk utility program</description>
<copies>1</copies>
</package>
The first line in the document: The XML declaration should always
be included. It defines the XML version of the document.
<?xml version="1.0">
The next line defines the first element of the document (the root
element);
<package>
The next lines defines child elements of the root (title,
version, vendor, platform, os, description, copies)
<title>Norton Utilities</title>
<version>3.5</version>
<vendor>Symantec</vendor>
<platform />
<os />
<description>A hard disk utility program</description>
<copies>1</copies>
The last line defines the end of the root element.
</package>
The empty platform and os elements have a slash before the
greater than sign that closes the tag. This is XMLs syntax for specifying the empty
elements.
<platform />
<os />
Since these two elements are empty there has to be another way to
provide information about the platform and the operating system. The other way is to use
attributes.
<platform type="mac" />
<os type="Mac 7.x" />
XMLs Syntactic Highlights
- HTML allows fairly loose structuring in which end tag, such as </p>, is optional.
XML does not allow such omissions. Remember, an XML document is made up of elements, not
tags. It requires that corresponding end tags follow all start tags.
<paragraph>XML is the future of the Web</paragragh>
- All XML elements must be closed, tags without content and , therefore, without end tags
must be closed in the following manner.
<image url="picture.gif"/>
Along the same lines, empty elements (<film></film>) may be
marked in the following manner :
<film/>
- You cannot overlap elements. For example, the following code
<actress>Cindy<lname>Crawford</actress></lname>
is improper syntax. The following is correct:
<actress>Cindy<lname>Crawford</lname></actress>
- All attribute values must be in quotes:
<photograph url="cindy.gif"
width="300px"/>
- The contents of an XML element is treated as data, white space is not ignored. Therefore
<film>Sleeping with the enemy</film>
is not equivalent to
<film>Sleeping with the enemy</film>
- There will be times when you will want certain character data to be treated as such. For
instance, if the contents of an XML element consists of some sample code, rather than
replacing each reserved character with its decimal code equivalent you can simply mark it
as character data:
<?[CDATA[Rope]]>
- XML is case sensitive. The following element
<sportsman>Sachin Tendulkar</sportsman>
is not equivalent to
<SPORTSMAN>Sachin Tendulkar</SPORTSMAN>
Document Tag Definition
Two other constructs are important when discussing XML syntax. The first is a DTD or
Document Tag Definition. The first thing a parser does when it finds an XML document is
look for the existence of a DTD. DTDs are not mandatory in XML but if they exist, they
define the XML tags and structure for that document. DTDs are commonly used to define
markup tags for a specific industry. By reading a DTD, an XML parser knows how to
interpret the markup.
DTDs are also used to validate the "correctness" of an
XML data stream. They contain information such as what elements are allowed in the file,
what type of data is allowed in each element, whether a certain structure can repeat. The
DTD for an XML document can be contained within the document itself or referenced
externally.
Extensible Stylesheet Language
A second construct to understand in XSL or (extensible stylesheet language). The
stylesheet describes how an XML data stream is to be rendered or converted. The power of
stylesheets is that the formatting information is completely independent of the XML data.
This allows you to use one or more stylesheets to present data in whatever formats is
necessary.
Future of XML
It has been amazing to see how quickly the XML standard has been developed, and how
quickly a large number of people have adopted the standard. XML will be as important to
the future of the Web as HTML has been to the foundation of the Web. XML will be the
future for all data manipulation and data transmission.
As XML becomes more popular we are likely to to see it creeping
into a number of applications. Your accounts in your spreadsheet would be able to be used
to display figures on your corporate intranet. Your customers database could be written in
XML so that your email program could use it when you need to send a bulk mail out at the
same time as your accounts departments are checking on payments of individual orders. And
at the same time the whole recordset is human readable.
As far as XML is concerned, the evolution of the Web is something
that is needed by industry, and as such will be driven by the marketplace rather than by
the marketing efforts of the major players. You can take it as a given that XML is here to
stay.
About the author
Kirthan Battapadi is a web developer with DBS Internet Services Pvt. Ltd. in Bombay,
India.
|