User:Codeblock/Search

= Sphinx =


 * This page is helpful and I used their config and modified it to our needs.
 * The Sphinx indexer simply runs on a cron, so that part is simple.
 * As far as front end, we are going to look at packaging the above linked MW extension.
 * The extension depends on sphinxapi.php, which is in the libsphinxclient package, at /usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php.
 * The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway.
 * Sphinx does not crawl, it only indexes databases, which kind of defeats the purpose for us.

= Xapian =


 * Doesn't have a crawler built in.
 * Most stuff is done via Omega, Xapian just backs it.
 * Hacky way to crawl sites: Crawl with htdig, convert into a format omega understands and can index.
 * htdig is unsupported and OLD.
 * htdig seems to segfault on https sites in my testing.
 * Omega's default UI is ugly but that is changeable.

= Mnogosearch =


 * Link
 * Looks nice. Has a somewhat nice UI, and is customizable.
 * Built in crawler, with a default 1000 line (with comments) config file.
 * CGI barfs when there are results: bug 19129 and bug 19141 upstream.
 * Being able to view results might be important, in a search engine. :)

= Others to try =


 * Apache Lucene (with Apache Nutch to crawl).
 * Heavily relies Java so probably out of the question (Lucene is Java, Nutch is a Tomcat servlet. Nuff said.)
 * Datapark Search
 * Fork of Mnogosearch?
 * Written in C.
 * ASPseek
 * C++
 * Last copyright year on their site is 2003. Is it unmaintained?