The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Sphinx

This page is helpful and I used their config and modified it to our needs.
The Sphinx indexer simply runs on a cron, so that part is simple.
As far as front end, we are going to look at packaging the above linked MW extension.
- The extension depends on sphinxapi.php, which is in the libsphinxclient package, at */usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php*.
- The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway.
*Sphinx does not crawl, it only indexes databases, which kind of defeats the purpose for us.*

Xapian

Doesn't have a crawler built in.
Most stuff is done via Omega, Xapian just backs it.
Hacky way to crawl sites: Crawl with htdig, convert into a format omega understands and can index.
htdig is unsupported and *OLD*.
htdig seems to segfault on https sites in my testing.
Omega's default UI is *ugly* but that is changeable.

Mnogosearch

Link
Looks nice. Has a somewhat nice UI, and is customizable.
Built in crawler, with a default 1000 line (with comments) config file.
CGI barfs when there are results: bug 19129 and bug 19141 upstream.

Others to try

Apache Lucene (with Apache Nutch to crawl).
- Heavily relies Java so probably out of the question (Lucene is Java, Nutch is a Tomcat servlet. Nuff said.)
Datapark Search
- Fork of Mnogosearch?
- Written in C.

Search

User:Codeblock/Search

Contents

Sphinx

Xapian

Mnogosearch

Others to try