Analysing a lucene index

The adventures of querying a lucene index outside of luke.

Recently I had the chance to evaluate a lucene index and perform queries on it outside of tools like Luke. For this we used a rails project and used JRuby as our language as that allows us to import java packages.

The analyser which Luke uses by default is the Standard Analyser. This proved very slow for queries over fields like email addresses or dates. I switched to Keyword Analyser which was meant for fields like the ones mentioned above. After this change there was a marked improvement but not enough. So when Steven Bearzatto mentioned about Whitespace Analyser, I decided to use it. And immediately there were performance gains. These gains are subjective though, based on the field being queried.