How to use Emacs for XML editing

From FedoraProject

(Difference between revisions)
Jump to: navigation, search
m (Markup)
(Update for newer Emacs)
 
(2 intermediate revisions by one user not shown)
Line 7: Line 7:
  
 
; What's a schema?
 
; What's a schema?
: A schema is a definition that shows how a particular class of XML document should be structured. For instance, a schema may declare that to be valid, an valid XML file for that schema needs to have exactly three elements ''foo'', ''bar'', and ''baz''.  There are many schemas out there -- some examples are [http://docbook.org/xml/ DocBook], [http://www.w3.org/TR/xhtml1/ XHTML], and [http://projectmallard.org Mallard].  Each schema defines a specific class of XML document that, to be valid, needs to follow its rules.  Violating the rules makes the document invalid and can cause tools and processors to fail, sometimes in unexpected ways.
+
: A schema is a definition for constructing a particular class of XML document. For instance, a schema for ''FooBar'' documents may require one each of exactly three elements ''foo'', ''bar'', and ''baz''.  If your document doesn't follow those rules, it may still be XML, but it's not a valid ''FooBar'' document.
 +
: There are many schemas out there -- some examples are [http://docbook.org/xml/ DocBook], [http://www.w3.org/TR/xhtml1/ XHTML], and [http://projectmallard.org Mallard].  Each schema defines a specific class of XML document that, to be valid, needs to follow its rules.  Violating the rules makes the document invalid and can cause tools and processors to fail, sometimes in unexpected ways.
  
 
; Do schemas themselves follow a format?
 
; Do schemas themselves follow a format?
Line 13: Line 14:
  
 
; What do I need to follow these steps?
 
; What do I need to follow these steps?
: A Fedora system, preferably at least {{FedoraVersion|long|previous}}.
+
: A Fedora system, preferably at least {{FedoraVersion|long|previous}}. (Most of these instructions will work on any Linux distribution, but you may need to adjust the ones for installing software.)
 
: A connection to the Internet.
 
: A connection to the Internet.
 
: The ability to open a terminal.
 
: The ability to open a terminal.
Line 54: Line 55:
 
(eval-after-load 'rng-loc
 
(eval-after-load 'rng-loc
 
   '(add-to-list 'rng-schema-locating-files "~/.schema/schemas.xml"))
 
   '(add-to-list 'rng-schema-locating-files "~/.schema/schemas.xml"))
 +
</pre>
 +
 +
If you are using a newer Fedora with Emacs 24 or higher, you will also need this line:
 +
<pre>
 +
(global-set-key [C-return] 'completion-at-point)
 
</pre>
 
</pre>
  
Line 64: Line 70:
 
* If some other tool is calling xml-mode, use nxml-mode to satisfy that request.
 
* If some other tool is calling xml-mode, use nxml-mode to satisfy that request.
 
* Add ''~/.schema/schemas.xml'' to the list of places that nXML will look when it tries to locate RelaxNG schemas.  This file doesn't exist yet, but you'll create it in just a moment.
 
* Add ''~/.schema/schemas.xml'' to the list of places that nXML will look when it tries to locate RelaxNG schemas.  This file doesn't exist yet, but you'll create it in just a moment.
 
+
* The extra line for Emacs 24 or above sets up the necessary option for '''Ctrl+Enter''' to work its magic as described later.
  
 
== Set up schemas ==
 
== Set up schemas ==

Latest revision as of 03:04, 20 December 2012

With the right customizations, editing DocBook XML, Mallard, XSLT, or other markup is a breeze with GNU Emacs. This page is not meant to be a tutorial for Emacs; there's one built into the program and plenty other stuff like that on the Web. But this page will show you how to customize Emacs to be a stunning editor for markup.

Idea.png
Some helpful background
Here are some concepts that will help you understand the rest of this page.
What's XML?
XML is a markup language. It defines a way of notating around text to describe how that text should be interpreted. For instance, <title>Catcher in the Rye</title> makes it easy for either a human or a computer to understand that Catcher in the Rye is a title.
What's a schema?
A schema is a definition for constructing a particular class of XML document. For instance, a schema for FooBar documents may require one each of exactly three elements foo, bar, and baz. If your document doesn't follow those rules, it may still be XML, but it's not a valid FooBar document.
There are many schemas out there -- some examples are DocBook, XHTML, and Mallard. Each schema defines a specific class of XML document that, to be valid, needs to follow its rules. Violating the rules makes the document invalid and can cause tools and processors to fail, sometimes in unexpected ways.
Do schemas themselves follow a format?
Yes, and there are several different ways to define an XML document. The venerable Document Type Definition (DTD) is one. Another is to use Relax NG, Relax NG Compact, or the XML Schema Definition (XSD). Because each of these types of schema has its own very specific grammar rules, it's possible to convert between them using a utility like trang.
What do I need to follow these steps?
A Fedora system, preferably at least Fedora 19. (Most of these instructions will work on any Linux distribution, but you may need to adjust the ones for installing software.)
A connection to the Internet.
The ability to open a terminal.
The ability to edit some files in your home directory -- these directions assume you'll be running Emacs, so just use that!

Contents

[edit] Install your tools

[edit] What to do

Run the following commands to install a set of DocBook schemas and documentation tools along with Emacs:

su -c 'yum shell'
> groupinstall 'Authoring and Publishing'
> install emacs
> install trang
> run

(After the transaction completes, type 'exit' and hit Enter, or hit Ctrl+D, to exit the yum shell.)

[edit] What this means

The schemas you're installing will allow some tools to work with your documents. The tools can compare your documents to the right schema, for instance, to make sure your document is valid and will work properly, or process your document and turn it into different formats.

Using the yum shell instead of several separate yum commands just makes this step a little more efficient, doing both the installation of the "Authoring and Publishing" group and the other packages in one transaction. Notice that we're also picking up the trang tool mentioned earlier.


[edit] Configure Emacs

[edit] What to do

Add the following lines to your ~/.emacs file:

(setq auto-mode-alist (cons '("\\.xml$" . nxml-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.xsl$" . nxml-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.xhtml$" . nxml-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.page$" . nxml-mode) auto-mode-alist))

(autoload 'xml-mode "nxml" "XML editing mode" t)

(eval-after-load 'rng-loc
  '(add-to-list 'rng-schema-locating-files "~/.schema/schemas.xml"))

If you are using a newer Fedora with Emacs 24 or higher, you will also need this line:

(global-set-key [C-return] 'completion-at-point)

[edit] What this means

Understanding Lisp is not simple, so don't worry if the above doesn't make sense to you. This is basically a programming language that Emacs understands, and the contents of your ~/.emacs file are read and processed by Emacs when it starts up. These configuration changes help Emacs understand where to find some of the configurations we'll be adding in the next steps.

These Lisp commands tell Emacs:

  • When you open up a file with one of the listed extensions, whether it exists or not, Emacs should put itself in nxml-mode. This uses the nXML extension (which is included with your Fedora system's Emacs editor) to provide automatic validation and lots of other helpful functions for editing.
  • If some other tool is calling xml-mode, use nxml-mode to satisfy that request.
  • Add ~/.schema/schemas.xml to the list of places that nXML will look when it tries to locate RelaxNG schemas. This file doesn't exist yet, but you'll create it in just a moment.
  • The extra line for Emacs 24 or above sets up the necessary option for Ctrl+Enter to work its magic as described later.

[edit] Set up schemas

[edit] What to do

Make a ~/.schema directory:

mkdir ~/.schema
cd ~/.schema

Now retrieve a RelaxNG Compact schema for Mallard from the internet:

curl -O http://projectmallard.org/1.0/mallard-1.0.rnc

Make a folder for DocBook 4.5 and generate a RelaxNG Compact schema for it:

mkdir ~/.schema/docbook-xml-4.5
cd ~/.schema/docbook-xml-4.5
trang /usr/share/sgml/docbook/xml-dtd-4.5/docbookx.dtd docbook.rnc

Make a file ~/.schema/schemas.xml and use the following content:

<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
  <namespace ns="http://projectmallard.org/1.0/" uri="mallard-1.0.rnc"/>
  <documentElement prefix="" localName="article" typeId="DocBook"/>
  <documentElement prefix="" localName="book" typeId="DocBook"/>
  <typeId id="DocBook" uri="docbook-xml-4.5/docbook.rnc"/>
</locatingRules>

[edit] What this means

Whoa! What did all that do? This might seem a little voodoo-like if you're not familiar with schemas and/or RelaxNG. Basically, nXML mode can read in RelaxNG Compact schemas and use them to help you edit your documents. This allows Emacs to tell you what elements and attributes are valid as you go. It can even prompt you for what to do next so you don't have to manually consult a reference.

What we just did was to retrieve one schema (Mallard's) from the internet. It was already in RelaxNG Compact schema format, which typically is a file with the .rnc extension. (RelaxNG typically uses .rng, and you usually find DTDs using .dtd and XSD using .xsd.)

The DocBook XML 4.5 schema is already installed on your system, thanks to the yum commands we ran earlier. You used the trang utility to convert the DTD format into the RelaxNG Compact (.rnc) format that nXML mode requires.

We also wrote a special file that nXML mode will parse -- thanks to our earlier Emacs configuration changes in ~/.emacs. This file sets up nXML so that:

  • Mallard files, as long as we set their XML namespace, are set to automatically validate against the Mallard schema
  • XML files with a root element of <article> or <book> are set to validate against the DocBook 4.5 schema[1].


[edit] Try out the results

First, you'll need a fresh copy of Emacs running. Start Emacs and open a new file test.xml. Make the root element a book by adding the following content:

<book>
</book>

Now from the Emacs menu, select XML > Set Schema > Automatically. The message bar should display the following message: Using schema ~/.schema/docbook-xml-4.5/docbook.rnc

Add a blank line between the opening and closing book tags, and you can start enjoying nXML mode. Type a < character and hit Ctrl+Enter for a list of valid tags. You can type a few letters and hit Tab to use auto-completion. Hit Enter to insert the given tag. This also works with attributes: simply add a space after the tag, and hit Ctrl+Enter for attribute auto-completion. If attribute values are declared in the schema, you can also auto-complete those by hitting Ctrl+Enter after the double quote. You'll see the message No completions available if you've misspelled the beginning, or if there are not a finite set of choices that nXML can display.

In the configuration above, Mallard files are discovered by their XML namespace. The top-level element for a Mallard document is page, so you can create a document like this:

<page id="index" type="guide" xmlns="http://projectmallard.org/1.0/">
</page>

Select XML > Set Schema > Automatically from the Emacs menu and the schema will be set up for you.

Note that if you open a pre-existing file that meets the rules you've described in ~/.schema/schemas.xml, the schema will be set when you open it. Asking Emacs to set the schema automatically is only necessary when you create brand-new file content.


[edit] Adding other schemas

You can add other XML document types. For example, say you want to manually edit the XML file that defines a virtual guest domain using libvirt. To allow nXML to deal with its schema, you need to locate a RelaxNG schema or DTD file. Fortunately libvirt provides these, so you just need to turn that schema into a RelaxNG Compact schema (.rnc) and add it to ~/.schema/schemas.xml. Note that the version of libvirt' in this example is 0.8.3:

cd ~/.schema
mkdir libvirt-0.8.3
cd libvirt-0.8.3
trang /usr/share/libvirt/schemas/domain.rng domain.rnc

Then edit the ~/.schema/schemas.xml file to add the following additional rule inside the locatingRules element:

<documentElement prefix="" localName="domain" uri="libvirt-0.8.3/domain.rnc">

Now you can perform the same nXML magic with a libvirt domain!



  1. Of course, you can modify this to use a different DocBook version if needed.