Configure Sphinx Search server with a main + delta indexing scheme, including updates & deletes

Sphinx Search is an OpenSource FULL TEXT search server developed in C++, and it is a very fast and scalable solution, superior to what database servers offer. It works on all major operating systems, but in this example, I will show you how to install and configureĀ  it in Linux, which is the most common choice.

The datasource will be a MySQL database.


Installing is simple. You can download the sources, and use the standard procedure (configure and make). If you are using CentOS, you can download the latest RPM and install it like this:

rpm -ihv <the-URL-of-RPM-from-sphinx-website>

CentOS usually has an old version with the official yum repo, so downloading the latest version would be needed, because new cool features are always added.

if it complains about missing libraries, like odbc, use yum to locate them.


If you used rpm to install, the configuration file is located at /etc/sphinx/sphinx.conf

Sample config:


As you can see I used a table named ads.

You need to create two tables for sphinx:

  1. sphinx_ads_deleted – Will contain deleted items from ads. The deleted items are inserted for the DELETE trigger in ads
  2. sphinx_counter – Will contain the updated last id and modification date since the last reindex

You need to define a DELETE trigger found bellow in the ads table.

I will include the structure for my ads table also.

Useful commands

start/stop/restart service:
service searchd restart
indexer –rotate ads_main

rotate will update index named ads_main even if it is in use

Cron jobs (update schedule)

Usually the main index is rebuilt once a day, and the delta updates more frequently.

Make sure the crond service is running with:
service crond status
It should say the service is running.

Create a file for each job in /etc/cron.d

sphinx_main – runs at 2:12 AM each night
sphinx_delta – runs at every minute

Faster updates ?

1. Use merge

Instead of reindexing main, you could merge delta into main. This still consumes a lot of memory, but it would be faster.

The basic command syntax is as follows:

So you will have something like this:

The problem is that you can’t use the shell for this because you will need to update the sphinx_counter table also, and that is why you will need to do this from a script.

I prefer to rebuild the index each night to make sure I am using a synchronized version of the database.
A full re index for a 100K records table takes only a few seconds.

2. Use Real TIme indexes for live updates

Real Time indexes were introduced with version version 1.10-beta. Updates on a RT index can appear in the search results in 1-2 milliseconds, ie. 0.001-0.002 seconds. However, RT index are less efficient for bulk indexing huge amounts of data.

Share This: