Sphinx

This page is helpful and I used their config and modified it to our needs.
The Sphinx indexer simply runs on a cron, so that part is simple.
As far as front end, we are going to look at packaging the above linked MW extension.
- The extension depends on sphinxapi.php, which is in the libsphinxclient package, at /usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php.
- The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway.
Sphinx does not crawl, it only indexes databases, which kind of defeats the purpose for us.

Xapian

Doesn't have a crawler built in.
Most stuff is done via Omega, Xapian just backs it.
Hacky way to crawl sites: Crawl with htdig, convert into a format omega understands and can index.
htdig is unsupported and OLD.
htdig seems to segfault on https sites in my testing.
Omega's default UI is ugly but that is changeable.

Link
Looks nice. Has a somewhat nice UI, and is customizable.
Built in crawler, with a default 1000 line (with comments) config file.
CGI barfs when there are results: bug 19129 and bug 19141 upstream.
- Being able to view results might be important, in a search engine. :)

Apache Lucene (with Apache Nutch to crawl).
- Heavily relies Java so probably out of the question (Lucene is Java, Nutch is a Tomcat servlet. Nuff said.)
Datapark Search
- Fork of Mnogosearch?
- Written in C.