Sphinx/Manticore plugin for ActiveRecord/Rails
MIT License
Bot releases are visible (Hide)
No breaking or major changes.
sphinx
-prefixed search methods, in case the standard search
is overridden from something unrelated. (#1265). Existing search methods will also be respected.none
/ search_none
scopes that can be chained to searches and will return no results.ThinkingSphinx::Processor#sync
to synchronise updates/deletions based on a real-time index's scope, by @akostadinov in @1258.No breaking or major changes.
No breaking or major changes.
Published by pat about 3 years ago
No breaking or major changes.
after_commit
(including deletions) instead of after_save
/after_destroy
to ensure data is fully persisted to the database before updating Sphinx. More details in #1204.app/indices
from eager_load_paths in Rails 4.2 and 5, to match the behaviour in 6.Both of these fixes are evolutions/improvements to changes introduced in v5.2.0/5.2.1.
Published by pat over 3 years ago
No breaking or major changes.
rails ts:rebuild
. Disabled by default, can be enabled by setting real_time_tidy
to true per environment in config/thinking_sphinx.yml
(and will need ts:rebuild
to restructure indices upon initial deploy). More details in #1192.app/indices
(in both the Rails app and engines) from Rails' eager load paths, which was otherwise leading to indices being loaded more than once. (See #1191 and #1195).Published by pat almost 4 years ago
No breaking or major changes.
enable_star
is no longer available as a configuration option, as it's been enabled by default in Sphinx since v2.2.2, and is no longer allowed in Sphinx v3.3.1.:path
values to navigate associations for Thinking Sphinx callbacks on SQL-backed indices. Discussed in #1182.Published by pat about 4 years ago
Thinking Sphinx v5.0 has one significant change - explicit callbacks - plus drops support for old versions of Rails/Ruby/Sphinx, and adds a few other smaller improvements.
Previous versions of Thinking Sphinx automatically added callbacks to all ActiveRecord models, for the purpose of persisting changes back to Sphinx (whether that be inserts, updates, or deletions). And while the actual overhead for non-indexed models wasn't super slow, it's still far from ideal.
So now, you need to add callbacks yourself, to just the models you're indexing.
With SQL-backed models (defined using :with => :active_record
), you'll very likely want to add one of the two following lines inside your model:
class Article < ApplicationRecord
# If you're not using delta indices:
ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql])
# If you *are* using delta indices:
ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql, :deltas])
end
If you're using real-time indices, you very likely already have callbacks defined in your models, but you can replace them with the new calls:
class Article < ApplicationRecord
# Instead of this...
after_save ThinkingSphinx::RealTime.callback_for(:article)
# use this...
ThinkingSphinx::Callbacks.append(self, :behaviours => [:real_time])
end
For associated models which still fire real-time callbacks, you can use the :path
option with the same call:
class Comment < ApplicationRecord
belongs_to :article
ThinkingSphinx::Callbacks.append self,
:behaviours => [:real_time],
:path => [:article]
end
And if you're using a custom block with your old real-time callback, you can pass that same block to the new approach as well:
class Article < ApplicationRecord
ThinkingSphinx::Callbacks.append(
self, :behaviours => [:real_time]
) do |instance|
# returning an array of instances to index. You could add
# custom logic here if you don't want indexing to happen
# in some cases.
end
end
At this point in time, the older callback style for real-time indices will continue to work, but it's still recommended to update your code to the new style instead.
On the off chance you are using SQL-backed indices and you have attribute_updates
enabled in config/thinking_sphinx.yml
, you'll want to specify that in your :behaviours
option:
ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql, :updates])
Sphinx 2.1 is no longer supported - and ideally, it's best to upgrade any 2.2.x release to 2.2.11.
Sphinx 3.x releases are supported, but there are known issues with indexing SQL-backed indices on a PostgreSQL database (real-time indices are fine though).
As part of this change, Sphinx's docinfo setting is no longer configured, so the skip_docinfo
setting in config/thinking_sphinx.yml
can be removed.
When it comes to Manticore as a drop-in replacement for Sphinx, we're testing against the latest 2.x and 3.x releases, which are currently 2.8.2 and 3.4.2 respectively.
Versions of Ruby less than 2.3 are no longer supported, sorry. We're currently testing against 2.4 through to 2.7.
It's been a long time coming, but Rails 3.2 (and 4.0 and 4.1) are no longer supported. The current supported versions are 4.2 through to 6.0 (and 6.1 will likely work as well, once it's released).
send
, replaced with public_send
as that's available in all supported Ruby versions.:instances
option to be set alongside :classes
, which is useful in cases to limit the indices returned if you're splitting index data for given classes/models into shards. (Introduced in PR #1171 after discussions with @lunaru in #1166.)reference_name
as per custom index_set_class
definitions. Previously, the class method was called on ThinkingSphinx::IndexSet
even if a custom subclass was configured. (As per discussions with @kalsan in #1172.)None.
Published by pat about 5 years ago
No breaking or major changes.
app/indices
from Zeitwerk's autoload paths in Rails 6.0 onwards (if using Zeitwerk as the autoloader).Published by pat about 5 years ago
No breaking or major changes.
ThinkingSphinx::RealTime.processor
and ThinkingSphinx::RealTime.populator
.The processor should accept call
with two arguments: an array of index objects, and a block to invoke after each index is processed. Here is a simple example for parallel processing of indices:
# Add the 'parallel' gem to your Gemfile.
ThinkingSphinx::RealTime.processor = Proc.new do |indices, &block|
Parallel.map(indices) do |index|
puts "Populating index #{index.name}"
ThinkingSphinx::RealTime.populator.populate index
puts "Populated index #{index.name}"
block.call
end
end
And the populator should respond to populate
, accepting a single argument which is the index object. Here is a simple example for parallel processing.
# Add the 'parallel' gem to your Gemfile.
class ParallelPopulator
def self.populate(index)
new(index).call
end
def initialize(index)
@index = index
end
def call
Parallel.each(index.scope.find_in_batches) do |instances|
transcriber.copy *instances
true # Don't emit any large object because results are accumulated
end
ActiveRecord::Base.connection.reconnect!
end
private
attr_reader :index
def transcriber
@transcriber ||= ThinkingSphinx::RealTime::Transcriber.new index
end
end
ThinkingSphinx::RealTime.populator = ParallelPopulator
Instead of building your own procs/classes from scratch, you may instead wish to subclass the default classes to tweak behaviour - or at the very least, both classes are useful as reference points for your own replacements:
These changes were influenced by discussions in #1134 with @njakobsen about parallel processing of real-time indices.
Published by pat about 5 years ago
No breaking or behaviour changes.
Published by pat over 5 years ago
No breaking or major changes.
skip_running_check
to true/false in config/thinking_sphinx.yml
for appropriate environments. This is useful when Sphinx commands are interacting with a remote Sphinx daemon. As per discussions in #1131.skip_directory_creation
to true/false in config/thinking_sphinx.yml
for appropriate environments. As per discussions in #1131.Published by pat over 5 years ago
No breaking or major changes.
utf8
to something else via the mysql_encoding
setting in config/thinking_sphinx.yml
. In the next significant release, the default will change to utf8mb4
(which is supported in MySQL 5.5.3 and newer).respond_to?
takes Sphinx scopes into account (Jonathan del Strother in #1124).:excerpts
as a known option for search requests.sql_query
if they're only used by query-sourced properties (Hans de Graaff in #1127).Published by pat almost 6 years ago
No breaking or major changes, though Ruby 2.2 is now no longer officially supported - but this release will almost certainly still work on it.
:sql
search option can now accept per-model settings with model names as keys. e.g. ThinkingSphinx.search "foo", :sql => {'Article' => {:include => :user}}
(Sergey Malykh in #1120).Published by pat over 6 years ago
Thinking Sphinx v4.0 has been in development for a little while, and includes some significant changes (as befitting a major release):
Merging indices is now supported via the new ts:merge
rake task. This is useful when you're using delta indices as an alternative to running ts:index
regularly to have new/changed records populated into the core indices. Merging should be reliably faster (and it avoids hitting your database to reprocess all the records).
Running ts:index
every now and then to catch any records changed/modified without callbacks is probably wise (perhaps once a day compared to more frequent ts:merge
calls).
If you've got Sphinx and your Rails app all on a single machine, you may want to have the daemon (searchd
) hosting connections via a UNIX socket instead of a TCP socket. Just set the socket
value in each appropriate environment within config/thinking_sphinx.yml
(and do not set mysql41
unless you want the daemon to also be available via that TCP port).
production:
socket: /var/tmp/production.sphinx
This feature is limited to MRI, as JRuby doesn't seem to have a way to connect to UNIX sockets for MySQL-protocol connections.
The new release of ActiveRecord/Rails is happily supported by this release.
The recent fork of Sphinx known as Manticore is supported, and can be used as a drop-in replacement for Sphinx. In particular, the v2.6.3 release is included in the test matrix.
Sphinx 2.0 is no longer supported - make sure you're running at least 2.1.2, but 2.2.11 is recommended if possible.
Versions of Ruby less than 2.2 are no longer supported, sorry.
If you're filtering via human-entered values (say, via request parameters), then in the past you were allowed to send those string values straight through to Sphinx.
However, Sphinx now supports string filtering, so it's not possible to make assumptions about filter types. Thinking Sphinx v3.4.0 introduced automatic typing of these values, but this was an extra overhead which was far from ideal and was always flagged as temporary.
So, please cast your filter values where appropriate:
Model.search :with => {:foo_id => params[:foo_id]}
# should become:
Model.search :with => {:foo_id => params[:foo_id].to_i}
docinfo
setting from the generated Sphinx configuration (to avoid warnings in Sphinx 2.2+), add skip_docinfo: true
to each appropriate environment in config/thinking_sphinx.yml
.config/thinking_sphinx.yml
, but you must also add absolute_paths: true
to each environment for them to be converted to absolute paths for the generated configuration.flying-sphinx
to override when appropriate).frozen_string_literal: true
pragma comments for safe frozen string literals.ThinkingSphinx::Configuration.instance.guarding_strategy = ThinkingSphinx::Guard::None
).Published by pat about 7 years ago
No breaking or major changes, just three small fixes and a couple of minor changes.
Published by pat about 7 years ago
No breaking or major changes, just two small fixes.
Published by pat about 7 years ago
There are a few significant changes in this release. There's nothing that's going to break your code, but there are some deprecations (and thus, there will be breaking in later releases), so reading through is highly recommended.
Given Riddle now quotes string values in filters (because Sphinx now supports filtering on string attributes), we need to be a little more careful about attribute filter values coming in through params. In the past, Riddle would presume any string value was not actually a string, and that's no longer a safe presumption.
As of this release, Thinking Sphinx will do its best to cast your filter values to their appropriate types, but it's not going to be perfect, and this will be removed in a future release. So, best to do the casting yourself:
Model.search :with => {:foo_id => params[:foo_id]}
# should become:
Model.search :with => {:foo_id => params[:foo_id].to_i}
This is likely going to crop up any time you're using params data in filters, because they'll always be strings.
If you're confident that you're casting all filter values to their appropriate types, you can remove the search middleware that's attempting to auto-cast (and thus, get a bit of a speed boost) by putting the following in an initialiser:
ThinkingSphinx::Middlewares::DEFAULT.delete(
ThinkingSphinx::Middlewares::AttributeTyper
)
ThinkingSphinx::Middlewares::RAW_ONLY.delete(
ThinkingSphinx::Middlewares::AttributeTyper
)
ThinkingSphinx::Middlewares::IDS_ONLY.delete(
ThinkingSphinx::Middlewares::AttributeTyper
)
Thinking Sphinx will now output a warning to your logs when unexpected options are used in search queries.
If you’re adding your own middleware in or have something else that may allow for custom options, make sure you add them to ThinkingSphinx::Search.valid_options
.
If you don’t want this behaviour to occur, you can remove the middleware from your stack by putting the following in an initialiser:
ThinkingSphinx::Middlewares::DEFAULT.delete(
ThinkingSphinx::Middlewares::ValidOptions
)
ThinkingSphinx::Middlewares::RAW_ONLY.delete(
ThinkingSphinx::Middlewares::ValidOptions
)
ThinkingSphinx::Middlewares::IDS_ONLY.delete(
ThinkingSphinx::Middlewares::ValidOptions
)
Rake tasks are now unified, so the original tasks will operate on real-time indices as well. What this means is that ts:generate
and ts:regenerate
can be changed to ts:index
and ts:rebuild
. All standard tasks will perform their appropriate behaviours on all indices.
If you wish to perform operations on specific types of indices, then there are now tasks available for that, including:
ts:sql:index
(the old behaviour of ts:index
)ts:sql:rebuild
(the old behaviour of ts:rebuild
)ts:rt:index
(the old behaviour of ts:generate
)ts:rt:rebuild
(the old behaviour of ts:regenerate
)utf8mb4
).INDEX_FILTER
environment variable.