Not sure how to get this post started exactly, but let me first say that: “I am not a Doctor… Err… Search Engine specialist”…
Now, I think, it is an interesting topic that a few people might find helpful and perhaps (and hopefully) some could even provide a little further insight in the comments as my dear readers often do…
So does your app need a search feature?.. More than often — it does.
(Do I need to mention that there are about 92308.6147 solutions that exist out there?)
I am going to go over some of the findings, headaches and success we’ve had while implementing a super-cool-and-robust search feature.
However, before proceeding, let’s take a few things into consideration:
Given all the potential options, as always, it is extremely important to use the right tool for the right job. (So anything below may be a guiding light for your project or a complete dead-end, but at least, and hopefully, you’ll get to that realization sooner than later).
Let’s plow on…
I should share some of the high-level specifics of the app where the super-duper search engine was required.
In general, it is a rather active (yet simple) forum application (CakePHP, of course) with over a million comments, and with a new post or comment coming in approximately every few seconds or so (Decent LAMP stack powered by AWS).
Therefore we had to evaluate some options to make sure (as mentioned) we use right tool for the right job.
Well… now you can probably guess that Sphinx was our final candidate and ultimately the chosen solution. A few things that made my colleagues and I very impressed was that the indexing is extremely fast. Compared to Lucene the initial indexing only took minutes for about one million records.
As well, and quite importantly, updating the index for any new posts/comments is also very fast.
By default the results are filtered by relevance as well as the date of the post (well, at least in our app it had an importance).
For example, most relevant and recent posts would be at the top of the resultset of a search, while older, yet still relevant results would appear lower.
(Word highlighting and other “neat” features are also available, but justly so… with other search tools as well).
Now then, what about the actual implementation?
Once the decision has been made to try out Sphinx, it was actually rather simple…
First of all a HUGE “thank you”, to the creator of this excellent behavior.
It has performed flawlessly in both CakePHP 1.2 and 1.3… (certainly some adjustments might be required for anyone’s specific needs, but the foundation, which has been laid down, is outstanding).
Once the behavior is properly attached to the required models, the Sphinx configuration couldn’t be easier:
(I will skip over the defaults and just point out what was required to get this thing off the ground)
This is where the overall configuration of the Sphinx search engine is stored, as well as our initialization query to get thing up and running:
sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO forum_counter SELECT 1, MAX(id) FROM forum_comments
sql_query = SELECT id, category_id, topic_id, user_id, body, UNIX_TIMESTAMP(created) AS created FROM forum_comments WHERE active = 1
sql_attr_uint = topic_id
sql_attr_timestamp = created
sql_query_info = SELECT * FROM forum_comments WHERE id = $id
Yep, besides any server-side defaults this is all that was custom-tailored and needed to get things going.
I hope you see how simple the queries are and can utilize the setup in your app.
(Notice, that we are using cake’s excellent counter cache here).
source delta : main
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, category_id, topic_id, user_id, body, UNIX_TIMESTAMP(created) AS created FROM forum_comments WHERE active = 1 AND id > ( SELECT max_doc_id FROM forum_counter WHERE counter_id = 1)
This little snippet controls the “delta”, i.e. the difference between the original index and, well, any new additions to the forum.
max_doc_id, which is referring to the Sphinx index.
Again, besides the defaults (and attaching the above-mentioned behavior), this is all that was needed to be done to get a really great search engine working in our app.
I know that this has become a rather long post already, so I’d like to cut it short right about now…
Well then, if you’ve made it this far, next round of beers is on me ;)