From Fedora Project Wiki
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Sphinx

  • This page is helpful and I used their config and modified it to our needs.
  • The Sphinx indexer simply runs on a cron, so that part is simple.
  • As far as front end, we are going to look at packaging the above linked MW extension.
    • The extension depends on sphinxapi.php, which is in the libsphinxclient package, at */usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php*.
    • The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway.
  • *Sphinx does not crawl, it only indexes databases, which kind of defeats the purpose for us.*

Xapian

  • Doesn't have a crawler built in.
  • Most stuff is done via Omega, Xapian just backs it.
  • Hacky way to crawl sites: Crawl with htdig, convert into a format omega understands and can index.
  • htdig is unsupported and *OLD*.
  • htdig seems to segfault on https sites in my testing.
  • Omega's default UI is *ugly* but that is changeable.

Mnogosearch

  • Link
  • Looks nice. Has a somewhat nice UI, and is customizable.
  • Built in crawler, with a default 1000 line (with comments) config file.
  • CGI barfs when there are results: bug 19129 and bug 19141 upstream.

Others to try

  • Apache Lucene (with Apache Nutch to crawl).
    • Heavily relies Java so probably out of the question (Lucene is Java, Nutch is a Tomcat servlet. Nuff said.)
  • Datapark Search
    • Fork of Mnogosearch?
    • Written in C.
  • ASPseek
    • C++
    • Last copyright year on their site is 2003. Is it unmaintained?