Adding the Boolean Operator OR to Nutch

Nutch  is an open source web-search crawler and engine.  Built upon the Apache Lucene text search engine, Nutch allows you to add a mini “Google”-like indexing and query environment right to your Web site, but with added features like semantic tagging specific to your application.

However, unlike Google, the Boolean OR operator doesn’t exist in the latest version of Nutch (0.9).  There is an active bug entry and patch to add OR support, however the 06/2007 patch didn’t work correctly (specifically, searching for “apple OR orange” returns pages that MUST have apple, but MAYBE has orange, when we really want either apple or orange).

For a current project, we really wanted to use Nutch, but needed OR as well, so I had the pleasure to hack a bit on the Nutch source.  This patch allows OR to work correctly (but like Google, does not behave well in the presence of “-” NOT).  It works for Nutch 0.9.  Please apply it directly to the “src” directory and rebuild with ant.