Sphinx Search is an OpenSource FULL TEXT search server developed in C++, and it is a very fast and scalable solution, superior to what database servers offer. It works on all major operating systems, but in this example, I will show you how to install and configure it in Linux, which is the most common choice.
The datasource will be a MySQL database.
Installing
Installing is simple. You can download the sources, and use the standard procedure (configure and make). If you are using CentOS, you can download the latest RPM and install it like this:
rpm -ihv <the-URL-of-RPM-from-sphinx-website>
CentOS usually has an old version with the official yum repo, so downloading the latest version would be needed, because new cool features are always added.
if it complains about missing libraries, like odbc, use yum to locate them.
Configure
If you used rpm to install, the configuration file is located at /etc/sphinx/sphinx.conf
Sample config:
source ads_main
{
type = mysql
sql_host = localhost
sql_user = ads
sql_pass = secretpass
sql_db = ads
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO sphinx_counter \
SELECT "ads", MAX(id), NOW() FROM ads
sql_query = \
SELECT \
ads.id, ads.name, ads.description, ads.category_id, \
counties.name, cities.name, \
(SELECT GROUP_CONCAT(parent.name SEPARATOR ' / ') \
FROM categories AS node, categories AS parent \
WHERE node.lft BETWEEN parent.lft AND parent.rght \
AND node.id = ads.category_id \
ORDER BY node.lft \
) AS category_path, \
ads.county_id, ads.price, ads.views, \
UNIX_TIMESTAMP(ads.created) AS created \
FROM ads \
LEFT JOIN counties ON (ads.county_id = counties.id) \
LEFT JOIN cities ON (ads.city_id = cities.id)
sql_attr_uint = category_id
sql_attr_uint = county_id
sql_attr_float = price
sql_attr_uint = views
sql_attr_timestamp = created
sql_query_info = SELECT * FROM ads WHERE id=$id
}
source ads_delta : ads_main
{
sql_query_pre = SET NAMES utf8
sql_query_pre = SELECT @last_updated := last_updated \
FROM sphinx_counter WHERE counter_id = 'ads'
sql_query_pre = SELECT @last_delete := MAX(deleted_on) \
FROM sphinx_ads_deleted;
sql_query = SELECT \
ads.id, ads.name, ads.description, ads.category_id, \
counties.name, cities.name, \
(SELECT GROUP_CONCAT(parent.name SEPARATOR ' / ') \
FROM categories AS node, categories AS parent \
WHERE node.lft BETWEEN parent.lft AND parent.rght \
AND node.id = ads.category_id \
ORDER BY node.lft \
) AS category_path, \
ads.county_id, ads.price, ads.views, \
UNIX_TIMESTAMP(ads.created) AS created \
FROM ads \
LEFT JOIN counties ON (ads.county_id = counties.id) \
LEFT JOIN cities ON (ads.city_id = cities.id) \
WHERE ads.id > (SELECT max_doc_id FROM sphinx_counter \
WHERE counter_id='ads') \
OR modified > (SELECT last_updated FROM \
sphinx_counter WHERE counter_id='ads')
sql_query_post_index = DELETE FROM sphinx_ads_deleted \
WHERE deleted_on <= @last_delete
sql_query_killlist = \
SELECT \
ads.id \
FROM \
ads \
WHERE \
ads.modified >= @last_updated AND \
ads.id <= ( SELECT max_doc_id FROM sphinx_counter ) \
UNION ALL \
SELECT \
sp.id \
FROM \
sphinx_ads_deleted sp \
WHERE \
sp.deleted_on <= @last_delete
}
index ads_main
{
source = ads_main
path = /var/lib/sphinx/ads_main
docinfo = extern
mlock = 0
morphology = none
min_word_len = 3
charset_type = utf-8
charset_table = 0..9, a..z, _, A..Z->a..z, U+00C0->a, U+00C1->a, \
U+00C2->a, U+00C3->a, U+00C4->a, U+00C5->a, U+00C7->c, U+00C8->e, \
U+00C9->e, U+00CA->e, U+00CB->e, U+00CC->i, U+00CD->i, U+00CE->i, \
U+00CF->i, U+00D1->n, U+00D2->o, U+00D3->o, U+00D4->o, U+00D5->o, \
U+00D6->o, U+00D8->o, U+00D9->u, U+00DA->u, U+00DB->u, U+00DC->u, \
U+00DD->y, U+00E0->a, U+00E1->a, U+00E2->a, U+00E3->a, U+00E4->a, \
U+00E5->a, U+00E7->c, U+00E8->e, U+00E9->e, U+00EA->e, U+00EB->e, \
U+00EC->i, U+00ED->i, U+00EE->i, U+00EF->i, U+00F1->n, U+00F2->o, \
U+00F3->o, U+00F4->o, U+00F5->o, U+00F6->o, U+00F8->o, U+00F9->u, \
U+00FA->u, U+00FB->u, U+00FC->u, U+00FD->y, U+00FF->y, U+0100->a, \
U+0101->a, U+0102->a, U+0103->a, U+0104->a, U+0105->a, U+0106->c, \
U+0107->c, U+0108->c, U+0109->c, U+010A->c, U+010B->c, U+010C->c, \
U+010D->c, U+010E->d, U+010F->d, U+0112->e, U+0113->e, U+0114->e, \
U+0115->e, U+0116->e, U+0117->e, U+0118->e, U+0119->e, U+011A->e, \
U+011B->e, U+011C->g, U+011D->g, U+011E->g, U+011F->g, U+0120->g, \
U+0121->g, U+0122->g, U+0123->g, U+0124->h, U+0125->h, U+0128->i, \
U+0129->i, U+0131->i, U+012A->i, U+012B->i, U+012C->i, U+012D->i, \
U+012E->i, U+012F->i, U+0130->i, U+0134->j, U+0135->j, U+0136->k, \
U+0137->k, U+0139->l, U+013A->l, U+013B->l, U+013C->l, U+013D->l, \
U+013E->l, U+0141->l, U+0142->l, U+0143->n, U+0144->n, U+0145->n, \
U+0146->n, U+0147->n, U+0148->n, U+014C->o, U+014D->o, U+014E->o, \
U+014F->o, U+0150->o, U+0151->o, U+0154->r, U+0155->r, U+0156->r, \
U+0157->r, U+0158->r, U+0159->r, U+015A->s, U+015B->s, U+015C->s, \
U+015D->s, U+015E->s, U+015F->s, U+0160->s, U+0161->s, U+0162->t, \
U+0163->t, U+0164->t, U+0165->t, U+0168->u, U+0169->u, U+016A->u, \
U+016B->u, U+016C->u, U+016D->u, U+016E->u, U+016F->u, U+0170->u, \
U+0171->u, U+0172->u, U+0173->u, U+0174->w, U+0175->w, U+0176->y, \
U+0177->y, U+0178->y, U+0179->z, U+017A->z, U+017B->z, U+017C->z, \
U+017D->z, U+017E->z, U+01A0->o, U+01A1->o, U+01AF->u, U+01B0->u, \
U+01CD->a, U+01CE->a, U+01CF->i, U+01D0->i, U+01D1->o, U+01D2->o, \
U+01D3->u, U+01D4->u, U+01D5->u, U+01D6->u, U+01D7->u, U+01D8->u, \
U+01D9->u, U+01DA->u, U+01DB->u, U+01DC->u, U+01DE->a, U+01DF->a, \
U+01E0->a, U+01E1->a, U+01E6->g, U+01E7->g, U+01E8->k, U+01E9->k, \
U+01EA->o, U+01EB->o, U+01EC->o, U+01ED->o, U+01F0->j, U+01F4->g, \
U+01F5->g, U+01F8->n, U+01F9->n, U+01FA->a, U+01FB->a, U+0200->a, \
U+0201->a, U+0202->a, U+0203->a, U+0204->e, U+0205->e, U+0206->e, \
U+0207->e, U+0208->i, U+0209->i, U+020A->i, U+020B->i, U+020C->o, \
U+020D->o, U+020E->o, U+020F->o, U+0210->r, U+0211->r, U+0212->r, \
U+0213->r, U+0214->u, U+0215->u, U+0216->u, U+0217->u, U+0218->s, \
U+0219->s, U+021A->t, U+021B->t, U+021E->h, U+021F->h, U+0226->a, \
U+0227->a, U+0228->e, U+0229->e, U+022A->o, U+022B->o, U+022C->o, \
U+022D->o, U+022E->o, U+022F->o, U+0230->o, U+0231->o, U+0232->y, \
U+0233->y, U+1E00->a, U+1E01->a, U+1E02->b, U+1E03->b, U+1E04->b, \
U+1E05->b, U+1E06->b, U+1E07->b, U+1E08->c, U+1E09->c, U+1E0A->d, \
U+1E0B->d, U+1E0C->d, U+1E0D->d, U+1E0E->d, U+1E0F->d, U+1E10->d, \
U+1E11->d, U+1E12->d, U+1E13->d, U+1E14->e, U+1E15->e, U+1E16->e, \
U+1E17->e, U+1E18->e, U+1E19->e, U+1E1A->e, U+1E1B->e, U+1E1C->e, \
U+1E1D->e, U+1E1E->f, U+1E1F->f, U+1E20->g, U+1E21->g, U+1E22->h, \
U+1E23->h, U+1E24->h, U+1E25->h, U+1E26->h, U+1E27->h, U+1E28->h, \
U+1E29->h, U+1E2A->h, U+1E2B->h, U+1E2C->i, U+1E2D->i, U+1E2E->i, \
U+1E2F->i, U+1E30->k, U+1E31->k, U+1E32->k, U+1E33->k, U+1E34->k, \
U+1E35->k, U+1E36->l, U+1E37->l, U+1E38->l, U+1E39->l, U+1E3A->l, \
U+1E3B->l, U+1E3C->l, U+1E3D->l, U+1E3E->m, U+1E3F->m, U+1E40->m, \
U+1E41->m, U+1E42->m, U+1E43->m, U+1E44->n, U+1E45->n, U+1E46->n, \
U+1E47->n, U+1E48->n, U+1E49->n, U+1E4A->n, U+1E4B->n, U+1E4C->o, \
U+1E4D->o, U+1E4E->o, U+1E4F->o, U+1E50->o, U+1E51->o, U+1E52->o, \
U+1E53->o, U+1E54->p, U+1E55->p, U+1E56->p, U+1E57->p, U+1E58->r, \
U+1E59->r, U+1E5A->r, U+1E5B->r, U+1E5C->r, U+1E5D->r, U+1E5E->r, \
U+1E5F->r, U+1E60->s, U+1E61->s, U+1E62->s, U+1E63->s, U+1E64->s, \
U+1E65->s, U+1E66->s, U+1E67->s, U+1E68->s, U+1E69->s, U+1E6A->t, \
U+1E6B->t, U+1E6C->t, U+1E6D->t, U+1E6E->t, U+1E6F->t, U+1E70->t, \
U+1E71->t, U+1E72->u, U+1E73->u, U+1E74->u, U+1E75->u, U+1E76->u, \
U+1E77->u, U+1E78->u, U+1E79->u, U+1E7A->u, U+1E7B->u, U+1E7C->v, \
U+1E7D->v, U+1E7E->v, U+1E7F->v, U+1E80->w, U+1E81->w, U+1E82->w, \
U+1E83->w, U+1E84->w, U+1E85->w, U+1E86->w, U+1E87->w, U+1E88->w, \
U+1E89->w, U+1E8A->x, U+1E8B->x, U+1E8C->x, U+1E8D->x, U+1E8E->y, \
U+1E8F->y, U+1E96->h, U+1E97->t, U+1E98->w, U+1E99->y, U+1EA0->a, \
U+1EA1->a, U+1EA2->a, U+1EA3->a, U+1EA4->a, U+1EA5->a, U+1EA6->a, \
U+1EA7->a, U+1EA8->a, U+1EA9->a, U+1EAA->a, U+1EAB->a, U+1EAC->a, \
U+1EAD->a, U+1EAE->a, U+1EAF->a, U+1EB0->a, U+1EB1->a, U+1EB2->a, \
U+1EB3->a, U+1EB4->a, U+1EB5->a, U+1EB6->a, U+1EB7->a, U+1EB8->e, \
U+1EB9->e, U+1EBA->e, U+1EBB->e, U+1EBC->e, U+1EBD->e, U+1EBE->e, \
U+1EBF->e, U+1EC0->e, U+1EC1->e, U+1EC2->e, U+1EC3->e, U+1EC4->e, \
U+1EC5->e, U+1EC6->e, U+1EC7->e, U+1EC8->i, U+1EC9->i, U+1ECA->i, \
U+1ECB->i, U+1ECC->o, U+1ECD->o, U+1ECE->o, U+1ECF->o, U+1ED0->o, \
U+1ED1->o, U+1ED2->o, U+1ED3->o, U+1ED4->o, U+1ED5->o, U+1ED6->o, \
U+1ED7->o, U+1ED8->o, U+1ED9->o, U+1EDA->o, U+1EDB->o, U+1EDC->o, \
U+1EDD->o, U+1EDE->o, U+1EDF->o, U+1EE0->o, U+1EE1->o, U+1EE2->o, \
U+1EE3->o, U+1EE4->u, U+1EE5->u, U+1EE6->u, U+1EE7->u, U+1EE8->u, \
U+1EE9->u, U+1EEA->u, U+1EEB->u, U+1EEC->u, U+1EED->u, U+1EEE->u, \
U+1EEF->u, U+1EF0->u, U+1EF1->u, U+1EF2->y, U+1EF3->y, U+1EF4->y, \
U+1EF5->y, U+1EF6->y, U+1EF7->y, U+1EF8->y, U+1EF9->y,
}
index ads_delta : ads_main
{
source = ads_delta
path = /var/lib/sphinx/ads_delta
}
indexer
{
mem_limit = 256M
}
searchd
{
listen = 9312
listen = 9306:mysql41
log = /var/log/sphinx/searchd.log
query_log = /var/log/sphinx/query.log
read_timeout = 5
max_children = 30
pid_file = /var/run/sphinx/searchd.pid
max_matches = 5000
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
workers = threads # for RT to work
binlog_path = /var/lib/sphinx/
}
Database
As you can see I used a table named ads.
You need to create two tables for sphinx:
- sphinx_ads_deleted – Will contain deleted items from ads. The deleted items are inserted for the DELETE trigger in ads
- sphinx_counter – Will contain the updated last id and modification date since the last reindex
You need to define a DELETE trigger found bellow in the ads table.
I will include the structure for my ads table also.
--
-- Table structure for table `ads`
--
CREATE TABLE IF NOT EXISTS `ads` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`category_id` int(10) unsigned NOT NULL,
`user_id` int(10) NOT NULL,
`currency` enum('CHF','EUR','RON','USD') NOT NULL,
`price` decimal(10,2) DEFAULT NULL,
`name` varchar(350) NOT NULL,
`exchange` tinyint(1) DEFAULT NULL,
`slug` varchar(200) NOT NULL,
`description` text,
`promo` tinyint(1) DEFAULT NULL,
`promo_start` date DEFAULT NULL,
`promo_end` date DEFAULT NULL,
`modified` datetime DEFAULT NULL,
`created` datetime NOT NULL,
`active` tinyint(1) NOT NULL DEFAULT '1',
`expires` date DEFAULT NULL,
`views` int(10) unsigned NOT NULL DEFAULT '0',
`owner_name` varchar(150) NOT NULL,
`phone` char(10) DEFAULT NULL,
`county_id` int(10) unsigned NOT NULL,
`city_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `slug` (`slug`),
KEY `popularity` (`views`),
KEY `fk_category` (`category_id`),
KEY `user_id` (`user_id`),
KEY `status_user` (`active`,`user_id`),
KEY `category_currency_modified` (`category_id`,`currency`,`modified`),
KEY `exchange` (`exchange`),
KEY `county_id` (`county_id`,`city_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Triggers in `ads`
--
DROP TRIGGER IF EXISTS `ads_trigger_delete`;
DELIMITER //
CREATE TRIGGER `ads_trigger_delete` AFTER DELETE ON `ads`
FOR EACH ROW BEGIN
# add deleted items to sphinx saerch
IF (@DISABLE_TRIGGERS IS NULL) THEN
INSERT INTO sphinx_ads_deleted (id, deleted_on) VALUES (OLD.id, NOW());
END IF;
END
//
DELIMITER ;
-- --------------------------------------------------------
--
-- Table structure for table `sphinx_ads_deleted`
-- Will contain deleted items from ads
-- The deleted items are inserted for the DELETE trigger in ads
--
CREATE TABLE IF NOT EXISTS `sphinx_ads_deleted` (
`id` int(10) unsigned NOT NULL,
`deleted_on` datetime NOT NULL,
KEY `deleted_on` (`deleted_on`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- --------------------------------------------------------
--
-- Table structure for table `sphinx_counter`
-- Will contain the updated last id and modification date since the last reindex
--
CREATE TABLE IF NOT EXISTS `sphinx_counter` (
`counter_id` varchar(20) NOT NULL,
`max_doc_id` bigint(20) unsigned NOT NULL,
`last_updated` datetime NOT NULL,
PRIMARY KEY (`counter_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Useful commands
- start/stop/restart service:
- service searchd restart
- reindex
- indexer –rotate ads_main
rotate will update index named ads_main even if it is in use
Cron jobs (update schedule)
Usually the main index is rebuilt once a day, and the delta updates more frequently.
Make sure the crond service is running with:
service crond status
It should say the service is running.
Create a file for each job in /etc/cron.d
- sphinx_main – runs at 2:12 AM each night
-
SHELL=/bin/sh
12 2 * * * sphinx indexer --rotate ads_main
- sphinx_delta – runs at every minute
-
SHELL=/bin/sh
* * * * * sphinx indexer --rotate ads_delta
Faster updates ?
1. Use merge
Instead of reindexing main, you could merge delta into main. This still consumes a lot of memory, but it would be faster.
The basic command syntax is as follows:
indexer --merge DSTINDEX SRCINDEX [--rotate]
So you will have something like this:
indexer --merge ads_main ads_delta --rotate
The problem is that you can’t use the shell for this because you will need to update the sphinx_counter table also, and that is why you will need to do this from a script.
Pseudocode:
run merge command
update sphinx_counter table (sql update)
I prefer to rebuild the index each night to make sure I am using a synchronized version of the database.
A full re index for a 100K records table takes only a few seconds.
2. Use Real TIme indexes for live updates
Real Time indexes were introduced with version version 1.10-beta. Updates on a RT index can appear in the search results in 1-2 milliseconds, ie. 0.001-0.002 seconds. However, RT index are less efficient for bulk indexing huge amounts of data.