How to use Emacs for XML editing

From FedoraProject

(Difference between revisions)
Jump to: navigation, search
(Add Mallard to the results tryout)
(Add even more explanation)
Line 2: Line 2:
  
 
With the right customizations, editing [http://docbook.org/xml/ DocBook XML], [http://projectmallard.org Mallard], [http://www.w3.org/TR/xslt XSLT], or other markup is a breeze with [http://www.gnu.org/software/emacs/ GNU Emacs].  This page is not meant to be a tutorial for Emacs; there's one built into the program and plenty other stuff like that on the Web.  But this page will show you how to customize Emacs to be a stunning editor for markup.
 
With the right customizations, editing [http://docbook.org/xml/ DocBook XML], [http://projectmallard.org Mallard], [http://www.w3.org/TR/xslt XSLT], or other markup is a breeze with [http://www.gnu.org/software/emacs/ GNU Emacs].  This page is not meant to be a tutorial for Emacs; there's one built into the program and plenty other stuff like that on the Web.  But this page will show you how to customize Emacs to be a stunning editor for markup.
 +
 +
{{Admon/tip | Some helpful background | Here are some concepts that will help you understand the rest of this page.
 +
 +
; What's XML?
 +
: XML is a markup language.  It defines a way of notating around text to describe how that text should be interpreted.  For instance, <title>Catcher in the Rye</title> makes it easy for either a human or a computer to understand that ''Catcher in the Rye'' is a title.
 +
 +
; What's a schema?
 +
: A schema is a definition that shows how a particular class of XML document should be structured. For instance, a schema may declare that to be valid, an valid XML file for that schema needs to have exactly three elements ''foo'', ''bar'', and ''baz''.  There are many schemas out there -- some examples are [http://docbook.org/xml/ DocBook], [http://www.w3.org/TR/xhtml1/ XHTML], and [http://projectmallard.org Mallard].  Each schema defines a specific class of XML document that, to be valid, needs to follow its rules.  Violating the rules makes the document invalid and can cause tools and processors to fail, sometimes in unexpected ways.
 +
 +
; Do schemas themselves follow a format?
 +
: Yes, and there are several different ways to define an XML document. The venerable Document Type Definition (DTD) is one. Another is to use Relax NG, Relax NG Compact, or the XML Schema Definition (XSD). Because each of these types of schema has its own very specific grammar rules, it's possible to convert between them using a utility like [http://www.thaiopensource.com/relaxng/trang.html trang].
 +
 +
; What do I need to follow these steps?
 +
: A Fedora system, preferably at least {{FedoraVersion|long|previous}}.
 +
: A connection to the Internet.
 +
: The ability to open a terminal.
 +
: The ability to edit some files in your home directory -- these directions assume you'll be running Emacs, so just use that!
 +
}}
  
 
== Install your tools ==
 
== Install your tools ==
Line 19: Line 37:
 
=== What this means ===
 
=== What this means ===
  
Using the yum shell just makes this step a little more efficient, doing both the installation of the "Authoring and Publishing" group and the other packages in one transaction.
+
The schemas you're installing will allow some tools to work with your documents.  The tools can compare your documents to the right schema, for instance, to make sure your document is valid and will work properly, or process your document and turn it into different formats.
 +
 
 +
Using the yum shell instead of several separate yum commands just makes this step a little more efficient, doing both the installation of the "Authoring and Publishing" group and the other packages in one transaction. Notice that we're also picking up the ''trang'' tool mentioned earlier.
 +
 
  
 
== Edit your ~/.emacs ==
 
== Edit your ~/.emacs ==
Line 38: Line 59:
  
 
=== What this means ===
 
=== What this means ===
 +
 +
Understanding Lisp is not simple, so don't worry if the above doesn't make sense to you.  This is basically a programming language that Emacs understands, and the contents of your ~/.emacs file are read and processed by Emacs when it starts up.  These configuration changes help Emacs understand where to find some of the configurations we'll be adding in the next steps.
  
 
These Lisp commands tell Emacs:
 
These Lisp commands tell Emacs:
Line 43: Line 66:
 
* If some other tool is calling xml-mode, use nxml-mode to satisfy that request.
 
* If some other tool is calling xml-mode, use nxml-mode to satisfy that request.
 
* Add ~/.schema/schemas.xml to the list of places that nXML will look when it tries to locate RelaxNG schemas<ref>You may not know what RelaxNG schemas are. That's OK, don't panic. For now, just note that they help Emacs figure out what tags are allowed where in your document. In other words, this helps Emacs and nXML help you!</ref>.
 
* Add ~/.schema/schemas.xml to the list of places that nXML will look when it tries to locate RelaxNG schemas<ref>You may not know what RelaxNG schemas are. That's OK, don't panic. For now, just note that they help Emacs figure out what tags are allowed where in your document. In other words, this helps Emacs and nXML help you!</ref>.
 +
  
 
== Set up schemas ==
 
== Set up schemas ==
Line 54: Line 78:
 
</pre>
 
</pre>
  
Now get a RelaxNG Compact schema for Mallard:
+
Now retrieve a RelaxNG Compact schema for Mallard from the internet:
 
<pre>curl -O http://projectmallard.org/1.0/mallard-1.0.rnc</pre>
 
<pre>curl -O http://projectmallard.org/1.0/mallard-1.0.rnc</pre>
  
Line 76: Line 100:
 
=== What this means ===
 
=== What this means ===
  
This might seem a little voodoo-like if you're not familiar with schemas and/or RelaxNG.  Basically, nXML mode can read in RelaxNG Compact schemas and use them to help you edit your documents -- it can tell you what elements and attributes are valid as you go. It can even prompt you for what to do next so you don't have to manually consult a reference.
+
Whoa! What did all that do? This might seem a little voodoo-like if you're not familiar with schemas and/or RelaxNG.  Basically, nXML mode can read in RelaxNG Compact schemas and use them to help you edit your documents.  This allows Emacs to tell you what elements and attributes are valid as you go. It can even prompt you for what to do next so you don't have to manually consult a reference.
  
The steps above ensure that:
+
What we just did was to retrieve one schema (Mallard's) from the internet.  It was already in RelaxNG Compact schema format, which typically is a file with the ''.rnc'' extension.  (RelaxNG typically uses ''.rng'', and you usually find DTDs using ''.dtd'' and XSD using ''.xsd''.)
* You have the schemas you need
+
 
 +
The DocBook XML 4.5 schema is already installed on your system, thanks to the yum commands we ran earlier.  You used the ''trang'' utility to convert the DTD format into the RelaxNG Compact (''.rnc'') format that nXML mode requires.
 +
 
 +
We also wrote a special file that nXML mode will parse -- thanks to our earlier Emacs configuration changes in ''~/.emacs''.  This file sets up nXML so that:
 
* Mallard files, as long as we set their XML namespace, are set to automatically validate against the Mallard schema
 
* Mallard files, as long as we set their XML namespace, are set to automatically validate against the Mallard schema
 
* XML files with a root element of &lt;article&gt; or &lt;book&gt; are set to validate against the DocBook 4.5 schema<ref>Of course, you can modify this to use a different DocBook version if needed.</ref>.
 
* XML files with a root element of &lt;article&gt; or &lt;book&gt; are set to validate against the DocBook 4.5 schema<ref>Of course, you can modify this to use a different DocBook version if needed.</ref>.
 +
  
 
== Try out the results ==
 
== Try out the results ==
Line 102: Line 130:
  
 
Select XML > Set Schema > Automatically from the Emacs menu and the schema will be set up for you.
 
Select XML > Set Schema > Automatically from the Emacs menu and the schema will be set up for you.
 +
  
 
== Adding other schemas ==
 
== Adding other schemas ==

Revision as of 21:10, 9 February 2011

Contents


With the right customizations, editing DocBook XML, Mallard, XSLT, or other markup is a breeze with GNU Emacs. This page is not meant to be a tutorial for Emacs; there's one built into the program and plenty other stuff like that on the Web. But this page will show you how to customize Emacs to be a stunning editor for markup.

Idea.png
Some helpful background
Here are some concepts that will help you understand the rest of this page.
What's XML?
XML is a markup language. It defines a way of notating around text to describe how that text should be interpreted. For instance, <title>Catcher in the Rye</title> makes it easy for either a human or a computer to understand that Catcher in the Rye is a title.
What's a schema?
A schema is a definition that shows how a particular class of XML document should be structured. For instance, a schema may declare that to be valid, an valid XML file for that schema needs to have exactly three elements foo, bar, and baz. There are many schemas out there -- some examples are DocBook, XHTML, and Mallard. Each schema defines a specific class of XML document that, to be valid, needs to follow its rules. Violating the rules makes the document invalid and can cause tools and processors to fail, sometimes in unexpected ways.
Do schemas themselves follow a format?
Yes, and there are several different ways to define an XML document. The venerable Document Type Definition (DTD) is one. Another is to use Relax NG, Relax NG Compact, or the XML Schema Definition (XSD). Because each of these types of schema has its own very specific grammar rules, it's possible to convert between them using a utility like trang.
What do I need to follow these steps?
A Fedora system, preferably at least Fedora 19.
A connection to the Internet.
The ability to open a terminal.
The ability to edit some files in your home directory -- these directions assume you'll be running Emacs, so just use that!

Install your tools

What to do

Run the following commands to install a set of DocBook schemas and documentation tools along with Emacs:

su -c 'yum shell'
> groupinstall 'Authoring and Publishing'
> install emacs
> install trang
> run

(After the transaction completes, type 'exit' and hit Enter, or hit Ctrl+D, to exit the yum shell.)

What this means

The schemas you're installing will allow some tools to work with your documents. The tools can compare your documents to the right schema, for instance, to make sure your document is valid and will work properly, or process your document and turn it into different formats.

Using the yum shell instead of several separate yum commands just makes this step a little more efficient, doing both the installation of the "Authoring and Publishing" group and the other packages in one transaction. Notice that we're also picking up the trang tool mentioned earlier.


Edit your ~/.emacs

What to do

Add the following lines to your ~/.emacs file:

(setq auto-mode-alist (cons '("\\.xml$" . nxml-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.xsl$" . nxml-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.xhtml$" . nxml-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.page$" . nxml-mode) auto-mode-alist))

(autoload 'xml-mode "nxml" "XML editing mode" t)

(eval-after-load 'rng-loc
  '(add-to-list 'rng-schema-locating-files "~/.schema/schemas.xml"))

What this means

Understanding Lisp is not simple, so don't worry if the above doesn't make sense to you. This is basically a programming language that Emacs understands, and the contents of your ~/.emacs file are read and processed by Emacs when it starts up. These configuration changes help Emacs understand where to find some of the configurations we'll be adding in the next steps.

These Lisp commands tell Emacs:

  • When you open up a file with one of the listed extensions, whether it exists or not, Emacs should put itself in nxml-mode. This uses the nXML extension (which is included with your Fedora system's Emacs editor) to provide automatic validation and lots of other helpful functions for editing.
  • If some other tool is calling xml-mode, use nxml-mode to satisfy that request.
  • Add ~/.schema/schemas.xml to the list of places that nXML will look when it tries to locate RelaxNG schemas[1].


Set up schemas

What to do

Make a ~/.schema directory:

mkdir ~/.schema
cd ~/.schema

Now retrieve a RelaxNG Compact schema for Mallard from the internet:

curl -O http://projectmallard.org/1.0/mallard-1.0.rnc

Make a folder for DocBook 4.5 and generate a RelaxNG Compact schema for it:

mkdir ~/.schema/docbook-xml-4.5
cd ~/.schema/docbook-xml-4.5
trang /usr/share/sgml/docbook/xml-dtd-4.5/docbookx.dtd docbook.rnc

Make a file ~/.schema/schemas.xml and use the following content:

<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
  <namespace ns="http://projectmallard.org/1.0/" uri="mallard-1.0.rnc"/>
  <documentElement prefix="" localName="article" typeId="DocBook"/>
  <documentElement prefix="" localName="book" typeId="DocBook"/>
  <typeId id="DocBook" uri="docbook-xml-4.5/docbook.rnc"/>
</locatingRules>

What this means

Whoa! What did all that do? This might seem a little voodoo-like if you're not familiar with schemas and/or RelaxNG. Basically, nXML mode can read in RelaxNG Compact schemas and use them to help you edit your documents. This allows Emacs to tell you what elements and attributes are valid as you go. It can even prompt you for what to do next so you don't have to manually consult a reference.

What we just did was to retrieve one schema (Mallard's) from the internet. It was already in RelaxNG Compact schema format, which typically is a file with the .rnc extension. (RelaxNG typically uses .rng, and you usually find DTDs using .dtd and XSD using .xsd.)

The DocBook XML 4.5 schema is already installed on your system, thanks to the yum commands we ran earlier. You used the trang utility to convert the DTD format into the RelaxNG Compact (.rnc) format that nXML mode requires.

We also wrote a special file that nXML mode will parse -- thanks to our earlier Emacs configuration changes in ~/.emacs. This file sets up nXML so that:

  • Mallard files, as long as we set their XML namespace, are set to automatically validate against the Mallard schema
  • XML files with a root element of <article> or <book> are set to validate against the DocBook 4.5 schema[2].


Try out the results

First, you'll need a fresh copy of Emacs running. Start Emacs and open a new file test.xml. Make the root element a book by adding the following content:

<book>
</book>

Now from the Emacs menu, select XML > Set Schema > Automatically. You'll see a message in the message bar that says "Using schema ~/.schema/docbook-xml-4.5/docbook.rnc".

Add a blank line between the opening and closing book tags, and you can start enjoying nXML mode. Type a < character and hit Ctrl+Enter for a list of valid tags. You can type a few letters and hit Tab to use auto-completion. Hit Enter to insert the given tag. This also works with attributes: simply add a space after the tag, and hit Ctrl+Enter for attribute auto-completion. If attribute values are declared in the schema, you can also auto-complete those by hitting Ctrl+Enter after the double quote. You'll see the message "No completions available" if you've misspelled the beginning, or if there are not a finite set of choices that nXML can display.

In the configuration above, Mallard files are discovered by their XML namespace. The top-level element for a Mallard document is page, so you can create a document like this:

<page id="index" type="guide" xmlns="http://projectmallard.org/1.0/">
</page>

Select XML > Set Schema > Automatically from the Emacs menu and the schema will be set up for you.


Adding other schemas

You can add other XML document types. For example, say you want to manually edit the XML file that defines a virtual guest domain using libvirt. To allow nXML to deal with its schema, you need to locate a RelaxNG schema or DTD file. Fortunately libvirt provides these, so you just need to turn that schema into a RelaxNG Compact schema (.rnc) and add it to ~/.schema/schemas.xml. Note that the libvirt in this example is 0.8.3:

cd ~/.schema
mkdir libvirt-0.8.3
cd libvirt-0.8.3
trang /usr/share/libvirt/schemas/domain.rng domain.rnc

Then edit the ~/.schema/schemas.xml file to add the following additional rule inside the locatingRules element:

<documentElement prefix="" localName="domain" uri="libvirt-0.8.3/domain.rnc">

Now you can perform the same nXML magic with a libvirt domain!



  1. You may not know what RelaxNG schemas are. That's OK, don't panic. For now, just note that they help Emacs figure out what tags are allowed where in your document. In other words, this helps Emacs and nXML help you!
  2. Of course, you can modify this to use a different DocBook version if needed.