Sphinx/Manticore plugin for ActiveRecord/Rails
MIT License
Bot releases are hidden (Show)
Published by pat almost 8 years ago
There are no breaking changes in this release - upgrading should be a painless process (but do let me know if that's not the case).
A big thank you to all contributors of this release - in particular, Julio Monteiro and Asaf Barton.
Running the ts:generate
task loads model instances in batches of 1000. You can customise this globally by setting the batch_size option in your config/thinking_sphinx.yml
file per environment.
Also, if you prefer to have data persisted to your real-time indices after the database transaction is committed, the callback helper works with after_commit
just like it does with after_save
- though you should only use one! Also, if you're using after_commit
, that means you can't wrap tests that involve Sphinx in transactions.
class Article < ActiveRecord::Base
# ...
after_commit ThinkingSphinx::RealTime.callback_for(:article)
# ...
end
Published by pat over 8 years ago
There are no breaking changes in this release - upgrading should be a painless process (but do let me know if that's not the case).
A big thank you to all contributors of this release, which has been a while coming (it's been almost a year since 3.1.4). Andrey Novikov, Nathaneal Gray, Mattia Gheda, Roman Usherenko, Jonathan del Strother, Chance Downs, Andrew Roth, @arrtchiu, Brandon Dewitt: your commits and feedback is greatly appreciated!
Much like the existing suspended deltas feature, you can now suspend/resume all Thinking Sphinx callbacks using ThinkingSphinx::Callbacks.suspend!
and ThinkingSphinx::Callbacks.resume!
. This will disable all attribute update callbacks, delta callbacks, real-time update callbacks, and object deletion callbacks. This is particularly useful for unit tests.
Since Thinking Sphinx was first built, the indexing approach has been to process all of the indices in a single indexer
call. It is now possible to opt for a different approach: to call indexer
for each index, one by one:
# This can go in an initialiser:
ThinkingSphinx::Configuration.instance.indexing_strategy = \
ThinkingSphinx::IndexingStrategies::OneAtATime
# or, the default is:
ThinkingSphinx::Configuration.instance.indexing_strategy = \
ThinkingSphinx::IndexingStrategies::AllAtOnce
You can give ThinkingSphinx::Configuration.instance.indexing_strategy
anything you like that responds to call
and expects an array of index options, and yields index names. You can see the implementations of the two approaches here.
Andrey Novikov has given you the ability to use the environment variable NODETACH when running rake ts:start
, and that keeps Sphinx around as a foreground process.
Nathaneal Gray has added a :primary_key
option when defining indices, in case you want something different to your model for Sphinx.
Mattia Gheda has added rand_seed
as an allowed SELECT clause option.
@arrtchiu has added the ability to define Sphinx's MySQL SSL options on a per-index basis (via the set_property
method within an index definition).
JSON attributes are now supported for real-time indices. Also, there's a new exception type ThinkingSphinx::OutOfBoundsError
for when search queries are requesting results outside of their pagination bounds.
Published by pat over 9 years ago
This is the first release since I've added a Contributor Code of Conduct to the project. There haven't been any problems in the past, but I like being upfront about this. By participating in this project, you agree to abide by its terms.
If you're upgrading from v3.1.3 and you're not yet using Sphinx 2.2.x, then you'll probably want to add the charset_type
setting to config/thinking_sphinx.yml
for each of your environments - Thinking Sphinx used to specify a default of 'utf-8'
, but Sphinx now insists on UTF-8 and ignores the setting (and will print a warning).
Also, if you're using polymorphic associations within your index definitions and you're using Rails 3.2, you're going to have to upgrade Rails to use this version of Thinking Sphinx. It's just too painful to manage all the different ActiveRecord behaviours. Sorry.
And of course if you're using something older than v3.1.3, reading the earlier release notes is highly recommended.
If you're using MySQL and SQL-backed indices, and you want to use the GROUP BY
shortcut to speed things up, you can now specify minimal_group_by?
in config/thinking_sphinx.yml
(per environment) instead of needing to call set_property
in each index definition.
For those unfamiliar with this setting: MySQL is often configured by default to not care if you leave off columns from the GROUP BY clause even when you have aggregations. If you enable this, it'll group by only your primary key, along with any columns you specify yourself using the group_by
method in index definitions.
The other new feature of this release is courtesy of Daniel Vandersluis: proper JSON attribute support, which is automatically detected when tied to JSON database columns. Fancy.
Published by pat over 9 years ago
There's no modification required if you're upgrading from v3.1.2, though running rake ts:regenerate
is recommended if you're using real-time indices. Of course, if you're using something older than v3.1.2, reading the earlier release notes is highly recommended.
This is the first release to properly support Rails 4.2.
Two new features, both related to using Thinking Sphinx with multiple data sources (in particular, different PostgreSQL schemas via the Apartment gem):
If you want to change which indices are returned in different situations, you can set a custom class:
ThinkingSphinx::Configuration.instance.index_set_class = TenantIndexSet
Because Sphinx requires all document ids to be unique - even across different indices - they're generated via a unique offset combined with model primary keys. Normally, Thinking Sphinx will use the same offset calculation if you have more than one index for a given model - as they're likely the same record.
However, if you're using the Apartment gem, then this is probably not the case - you have identical tables in different schemas, with different sets of overlapping primary keys. So, there's a need for indices for each Apartment tenant on one model to be considered as separate. The :offset_option
when defining an index will sort this out.
Here is a gist covering both of these new features.
Rails::Railtie
is defined, instead of just Rails
(Andrew Cone).Published by pat almost 10 years ago
There's no modification required if you're upgrading from v3.1.1. Of course, if you're using something older than that, reading the earlier release notes is highly recommended.
Nothing massive, but a few helpful new things, in order of when they were committed:
To ensure document ids are reliably 64-bit integers (aka bigints), set big_document_ids
to true either via set_property
in a specific index or in config/thinking_sphinx.yml
for each appropriate environment.
This is useful when your callback refers to multiple objects via an association and you want to ensure certain data is available by preloading:
ThinkingSphinx::RealTime.callback_for(:post) { |user| user.posts.include(:category) }
ts:status
lets you know if Sphinx is running or not.
Courtesy of @uhlenbrock, this allows you to disable binlog files if you're not using real-time indices - just set binlog_path
to a blank string for each environment in config/thinking_sphinx.yml
.
If you want to change where specific indices are located, instead of all of them, you can supply a :path
option to ThinkingSphinx::Index.define
. This will be the directory where the index files will be stored, and an absolute path is expected.
Published by pat over 10 years ago
There's no modification required if you're upgrading from v3.1.0. Of course, if you're using something older than that, reading the earlier release notes is highly recommended.
This release has the beginnings of support for Sphinx v2.2, including the common options section. This is disabled by default (as it won't work with earlier versions of Sphinx), but if you're keen to give it a spin, add the following to each environment in config/thinking_sphinx.yml
:
common_sphinx_configuration: true
At some point, this will become the default behaviour (likely Thinking Sphinx v3.2.0), but we're a while away from that.
If you want to disable the automatically generated distributed indices, set distributed_indices: false
in each environment in config/thinking_sphinx.yml
.
ThinkingSphinx::Test
is now in a position for proper use with real-time indices. Here's how I use it with RSpec (with the relevant examples tagged with :search => true
):
RSpec.configure do |config|
config.before(:each) do
if example.metadata[:search]
ThinkingSphinx::Test.init
ThinkingSphinx::Test.start :index => false
end
ThinkingSphinx::Configuration.instance.settings['real_time_callbacks'] = !!example.metadata[:search]
end
config.after(:each) do
if example.metadata[:search]
ThinkingSphinx::Test.stop
ThinkingSphinx::Test.clear
end
end
end
The setting for disabling real-time callbacks can be used anywhere, of course - but keep in mind this could lead to your model data being out of sync with Sphinx.
Previously this wasn't supported at all - now, it's only partially supported, for the foreign keys of single HABTM associations (you can't drill further through associations):
has genres.id, :as => :genre_ids, :source => :query
The association/column reference above is slightly misleading - it will actually use the genre_id column in the HABTM join table (thus, avoiding unnecessary joins). You still cannot use the :source
option with columns in other tables accessed through HABTM associations.
Published by pat almost 11 years ago
Thinking Sphinx v3.1.0 is the first v3 release to support JRuby. You'll need the jdbc-mysql gem as well, and then it'll be smooth sailing. However, Rails 3.1 and MRI 1.9.2 are no longer supported - please upgrade to 3.2 and 1.9.3 (or 2.0.0/2.1.0) respectively.
Thinking Sphinx now expects Sphinx v2.1.2 or newer by default. If you're using v2.1.2, or something newer than that, then you should not make any of the changes listed in this section.
However, If you're using Sphinx 2.1.1 or earlier, you'll want to add these lines to an initializer:
ThinkingSphinx::Middlewares::DEFAULT.insert_after(
ThinkingSphinx::Middlewares::Inquirer, ThinkingSphinx::Middlewares::UTF8
)
ThinkingSphinx::Middlewares::RAW_ONLY.insert_after(
ThinkingSphinx::Middlewares::Inquirer, ThinkingSphinx::Middlewares::UTF8
)
And add the following setting to config/thinking_sphinx.yml
:
development:
utf8: false
# repeat for each environment as necessary
If you're using Sphinx 2.0.x, you'll also need to put the following in an initializer as well:
ThinkingSphinx::SphinxQL.variables!
If you're sending through custom SELECT statements via the :select
option in search calls, please note that you'll need to supply *
or specific column names to have them returned (the *
is no longer supplied by default if you're setting something custom). So:
Article.search 'pancakes', :select => 'weight() as w'
# becomes
Article.search 'pancakes', :select => '*, weight() as w'
If you don't want to return all the columns/attributes, but you do want ActiveRecord objects instantiated in your search results, you'll need to include the sphinx_internal_id
and sphinx_internal_class
columns. It's also worth noting that any attribute you refer to in other parts of the query (for example, the ORDER
clause) must exist in your SELECT
clause.
Capistrano v3 is now supported, and there are now cap tasks for real-time indices (thinking_sphinx:generate
and thinking_sphinx:regenerate
). There's no longer any automatic symlinking of directories - it's recommended that pid, index and configuration files are all located in the shared directory permanently, using something like the following in your config/thinking_sphinx.yml
file:
production:
pid_file: /path/to/app/shared/tmp/searchd.pid
indices_location: /path/to/app/shared/db/sphinx
configuration_file: /path/to/app/shared/production.sphinx.conf
Also: previously, thinking_sphinx:index
and thinking_sphinx:start
would automatically run after deploy:cold
. This is no longer the case, partially because the behaviour is different with real-time indices, and partially because it's better for you to have control over those decisions instead.
set_database
method within a index definition block. You can either pass in a database settings hash (like what would exist in database.yml
), or an environment name which corresponds to a known database configuration.ThinkingSphinx::Deltas.suspend_and_update
instead of ThinkingSphinx::Deltas.suspend
).ThinkingSphinx::Connection.persistent = false
).:group
option within :sql
options in a search call is passed through to the underlying ActiveRecord relation (Siarhei Hanchuk).:having
and :group_best
options respectively).*
for SphinxQL SELECT statements.:star => true
) now treats escaped characters as word separators.Published by pat almost 11 years ago
From this point onwards, Thinking Sphinx requires Sphinx v2.0.5 or newer.
If you're using Sphinx 2.1.1 or newer, you should add the following to an initialiser:
ThinkingSphinx::SphinxQL.functions!
Sphinx 2.1.x releases no longer support special variables with the @
prefix - instead, there are equivalent functions. The code above switches Thinking Sphinx to use the functions instead.
If you're using Sphinx 2.1.2 or newer, you'll also want to add the following to your initializer (as 2.1.2 now returns strings as UTF-8 properly, so conversion isn't required):
ThinkingSphinx::Middlewares::DEFAULT.delete ThinkingSphinx::Middlewares::UTF8
And in your config/thinking_sphinx.yml
file:
development:
utf8: true
# repeat for each environment as necessary
All of these changes will become the default behaviour in Thinking Sphinx v3.1.0.
set_property :minimal_group_by? => true
.search_for_ids
can now be chained onto scoped search calls.skip_time_zone
setting is now available per environment via config/thinking_sphinx.yml
to avoid the sql_query_pre
time zone command._sort
prefix for the matching attribute).:select
options for facet searches (Timo Virkkala).ThinkingSphinx::ConnectionError
, instead of the standard Mysql2::Error
.