Not all data is ideally suited to storage in relational tables, using relations
to express logical associations between parts of the data. XML,
eXtensible Markup Language, is a very useful alternate way to represent certain
types of data, particularly data that has a hierarchical structure.
Hierarchical data is quite common in real-life business data. For example, an
invoice contains various elements such as date, invoice number, and line
items. You can envisage the invoice as the top of a hierarchy and the other
elements as subsidiary elements in the hierarchy. The line items section may,
in turn, contain several individual line items. You can easily represent this
type of conceptual data structure as XML.
XML is also useful for storing or transmitting across a network information
that can vary in structure from one document to another. XML was developed
as a specification produced by the World Wide Web Consortium (W3C)
and therefore is widely accepted internationally.
Sometimes the term semi-structured is used to refer to XML because it is not
structured in the same way as relational data. In reality, XML data is always
structured. It simply isn't structured in the same way as relational data.
SQL Server 2005 provides several tools to allow you to store XML and to
retrieve selected parts of that XML for use in your applications. You can break
XML data into a structure that can be stored in SQL Server and you can take
the results of relational data queries and combine that data into an XML structure.
SQL Server 2005 allows you, for the first time, to store your XML data in
SQL Server and query it as if it was XML. Strictly speaking, SQL Server 2005
stores XML in a proprietary binary format.
The binary format allows you to search your XML data quicker, because the
indexes you can create on the binary format allow you to navigate the logical
structure of the XML quickly and efficiently.
Introducing XML
XML is a markup language that is highly flexible. Unlike HTML, which has various
specified element names, you can define your own element names in
XML documents. This makes it very flexible but that flexibility can cause
problems too. If you don't know how a business partner is going to structure
a purchase order written in XML, you will likely have problems in creating a
way to process the purchase order automatically. A common approach to
ensuring that a class of XML documents are structured in a predictable way
is to validate the structure of each document of that class, using a schema. A
schema simply defines the allowed structure of that class of XML documents.
You can use two schema languages to specify a schema for XML: Document
Type Definition (DTD) and XSD (also called W3C XML Schema Definition language).
SQL Server 2005 supports the XSD schema language.
An XML document has to conform to several rules, including
� An XML document must have a single document element and all other
elements must be contained inside it. An XML fragment needn't meet this
criterion.
� Each start tag of an element must have a matching end tag, unless the
element is empty when you can use an empty element structure.
� Attributes are added to a start tag (or empty element).
You can express a simple purchase order like this in XML
(PurchaseOrder.xml):
<?xml version="1.0"?>
<PurchaseOrder>
<Date></Date>
<From>Some Fictional Company</From>
<Contact>Fred Smith</Contact>
<ContactPhone>123-456-7890</ContactPhone>
<LineItems>
<LineItem Quantity="3">Some article</LineItem>
<LineItem Quantity="12">Some other article</LineItem>
<LineItem Quantity="300">Yet another kind of
article</LineItem>
</LineItems>
</PurchaseOrder>
XML and SQL Server 2005
SQL Server 2005 builds on the XML support by adding several new pieces of
functionality:
� A new datatype for XML, the xml datatype. I describe that
datatype in the next section.
� Support for querying XML by using the XML Query Language, XQuery.
� An XML Data Manipulation Language (a proprietary extension to
XQuery), which you can use to insert, update, or delete components of
XML data held as the xml datatype.
The xml datatype
SQL Server 2005 supports an xml datatype that wasn't supported in SQL Server
2000. SQL Server 2005 supports two flavors of XML: XML documents and XML
fragments. I discuss these two entities and how to use them in the next section.
You can use the xml datatype in the following situations:
� Columns in a SQL Server 2005 table
� Variables
� Stored procedures
� Function parameters
You create a column of the xml datatype in a table to store purchase orders
written in XML as follows:
CREATE TABLE xmlSample (ID int primary key, PurchaseOrder
xml)
You create a variable of the xml datatype as follows:
DECLARE @myXMLVariable xml
You create a stored procedure with a parameter of the xml datatype as follows:
CREATE PROCEDURE myProcedure (myXMLDoc xml)
AS -- the rest of the stored procedure goes here
Using Untyped and Typed XML
SQL Server 2005 allows you to store XML in two broad ways. Untyped XML is
XML that is stored without having a schema specified for it. Typed XML is XML
that has an associated schema document that defines which structure(s) are
allowed in the XML that is stored.
Untyped XML is easier to use, because you don't need to create a schema
document. It is also more flexible because you can change the structure of
XML documents. A downside of untyped XML is that the data stored in that
column can contain errors that you may avoid with typed XML instead.
If you use typed XML, then each XML document is checked to ensure that it
has the correct structure — for example, before it is stored in a column in a
SQL Server table.
Stored XML, whether it is untyped or typed, cannot exceed 2GB in size per
instance. That means that the data in a column that has the xml datatype
cannot exceed 2GB. For many purposes that size limit won't be a problem.
Using these in SQL server will be demonstrated in the next article.
No comments :
Post a Comment