thinking-sphinx

Sphinx/Manticore plugin for ActiveRecord/Rails

MIT License

Downloads
3.1M
Stars
1.6K
Committers
61

Bot releases are hidden (Show)

thinking-sphinx -

Published by pat 3 months ago

Upgrading

No breaking or major changes.

New Features

  • Support for Manticore 6.0 (#1242)
  • sphinx-prefixed search methods, in case the standard search is overridden from something unrelated. (#1265). Existing search methods will also be respected.
  • none / search_none scopes that can be chained to searches and will return no results.
  • Added ThinkingSphinx::Processor#sync to synchronise updates/deletions based on a real-time index's scope, by @akostadinov in @1258.

Changes to behaviour

  • Improved Rails 7.1 support, by @jdelstrother in #1252.

Fixes

  • Handle both SQL and RT indices correctly for inheritance column checks, by @akostadinov in #1249.
  • Ensure tests and CI work with recent Manticore versions, by @jdelstrother in #1263.
  • Use rm -rf to delete test and temporary directories (instead of rm -r).
thinking-sphinx -

Published by pat 3 months ago

Upgrading

No breaking or major changes.

Changes to behaviour

  • Fixed total count of results in pagination information for Manticore 5.0+, by disabling the cutoff limit. (#1239).
thinking-sphinx - v5.5.0 Latest Release

Published by pat almost 2 years ago

Upgrading

No breaking or major changes.

New Features

  • ThinkingSphinx::Processor, a public interface to perform index-related operations on model instances or model name/id combinations. In collaboration with @akostadinov (#1215).

Changes to behaviour

  • Confirmed support by testing against Ruby 3.1 and 3.2 by @jdelStrother (#1237).

Fixes

  • Fix YAML loading, by @aepyornis (#1217).
  • Further fixes for File.exist? instead of the deprecated File.exists?, by @funsim (#1221) and @graaf (1233).
  • Treat unknown column errors as QueryErrors, so retrying the query occurs automatically.
  • Fix MariaDB error handling.
thinking-sphinx - v5.4.0

Published by pat over 2 years ago

Upgrading

No breaking or major changes.

New Features

  • Rails 7 support, including contributions from @anthonyshull in #1205.

Changes to behaviour

  • Confirmed support by testing against Manticore 4.0 and Sphinx 3.4.

Fixes

  • Include instance_exec in ThinkingSphinx::Search::CORE_METHODS by @jdelStrother in #1210.
  • Use File.exist? instead of the deprecated File.exists? (#1211).
thinking-sphinx - v5.3.0

Published by pat about 3 years ago

Upgrading

No breaking or major changes.

Changes to behaviour

  • StaleIdsExceptions now include a URL in their error message with recommendations on how to resolve the problem.
  • Fire real-time callbacks on after_commit (including deletions) instead of after_save/after_destroy to ensure data is fully persisted to the database before updating Sphinx. More details in #1204.

Fixes

  • Ensure Thinking Sphinx's ActiveRecord components are loaded by either Rails' after_initialise hook or ActiveSupport's on_load notification, because the order of these two events are not consistent.
  • Remove app/indices from eager_load_paths in Rails 4.2 and 5, to match the behaviour in 6.

Both of these fixes are evolutions/improvements to changes introduced in v5.2.0/5.2.1.

thinking-sphinx - v5.2.1

Published by pat about 3 years ago

Upgrading

No breaking or major changes.

Fixes

  • Ensure ActiveRecord components are loaded for rake tasks, but only after the Rails application has initialised. More details in #1199. A fix for a bug introduced in v5.2.0.
thinking-sphinx - v5.2.0

Published by pat over 3 years ago

Upgrading

No breaking or major changes.

New features

  • Confirmed support for Ruby 3.0.
  • Orphaned records in real-time indices can now be cleaned up without running rails ts:rebuild. Disabled by default, can be enabled by setting real_time_tidy to true per environment in config/thinking_sphinx.yml (and will need ts:rebuild to restructure indices upon initial deploy). More details in #1192.

Bug fixes

  • Avoid loading ActiveRecord during Rails initialisation so app configuration can still have an impact (@jdelStrother in #1194).
  • Remove app/indices (in both the Rails app and engines) from Rails' eager load paths, which was otherwise leading to indices being loaded more than once. (See #1191 and #1195).
thinking-sphinx - v5.1.0

Published by pat almost 4 years ago

Upgrading

No breaking or major changes.

New features

  • Support for Sphinx v3.3 and Manticore v3.5.
  • Support for Rails 6.1 (via joiner v0.6.0).

Changes to behaviour

  • enable_star is no longer available as a configuration option, as it's been enabled by default in Sphinx since v2.2.2, and is no longer allowed in Sphinx v3.3.1.
  • All timestamp attributes are now considered plain integer values from Sphinx's perspective. Sphinx was already expecting integers, but since Sphinx v3.3.1 it doesn't recognise timestamps as a data type. There is no functional difference with this change - Thinking Sphinx was always converting times to their UNIX epoch integer values.
  • Allow configuration of the maximum statement length (@kalsan in #1179).
  • Respect :path values to navigate associations for Thinking Sphinx callbacks on SQL-backed indices. Discussed in #1182.

Bug fixes

  • Don't attempt to update delta flags on frozen model instances.
thinking-sphinx - v5.0.0

Published by pat about 4 years ago

Major Features and Breaking Changes

Thinking Sphinx v5.0 has one significant change - explicit callbacks - plus drops support for old versions of Rails/Ruby/Sphinx, and adds a few other smaller improvements.

Explicit Callbacks

Previous versions of Thinking Sphinx automatically added callbacks to all ActiveRecord models, for the purpose of persisting changes back to Sphinx (whether that be inserts, updates, or deletions). And while the actual overhead for non-indexed models wasn't super slow, it's still far from ideal.

So now, you need to add callbacks yourself, to just the models you're indexing.

With SQL-backed models (defined using :with => :active_record), you'll very likely want to add one of the two following lines inside your model:

class Article < ApplicationRecord
  # If you're not using delta indices:
  ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql])

  # If you *are* using delta indices:
  ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql, :deltas])
end

If you're using real-time indices, you very likely already have callbacks defined in your models, but you can replace them with the new calls:

class Article < ApplicationRecord
  # Instead of this...
  after_save ThinkingSphinx::RealTime.callback_for(:article)
  # use this...
  ThinkingSphinx::Callbacks.append(self, :behaviours => [:real_time])
end

For associated models which still fire real-time callbacks, you can use the :path option with the same call:

class Comment < ApplicationRecord
  belongs_to :article

  ThinkingSphinx::Callbacks.append self,
    :behaviours => [:real_time],
    :path       => [:article]
end

And if you're using a custom block with your old real-time callback, you can pass that same block to the new approach as well:

class Article < ApplicationRecord
  ThinkingSphinx::Callbacks.append(
    self, :behaviours => [:real_time]
  ) do |instance|
    # returning an array of instances to index. You could add
    # custom logic here if you don't want indexing to happen
    # in some cases.
  end
end

At this point in time, the older callback style for real-time indices will continue to work, but it's still recommended to update your code to the new style instead.

On the off chance you are using SQL-backed indices and you have attribute_updates enabled in config/thinking_sphinx.yml, you'll want to specify that in your :behaviours option:

ThinkingSphinx::Callbacks.append(self, :behaviours => [:sql, :updates])

Sphinx 2.2.11 or newer is required

Sphinx 2.1 is no longer supported - and ideally, it's best to upgrade any 2.2.x release to 2.2.11.

Sphinx 3.x releases are supported, but there are known issues with indexing SQL-backed indices on a PostgreSQL database (real-time indices are fine though).

As part of this change, Sphinx's docinfo setting is no longer configured, so the skip_docinfo setting in config/thinking_sphinx.yml can be removed.

When it comes to Manticore as a drop-in replacement for Sphinx, we're testing against the latest 2.x and 3.x releases, which are currently 2.8.2 and 3.4.2 respectively.

Ruby 2.4 or newer is required

Versions of Ruby less than 2.3 are no longer supported, sorry. We're currently testing against 2.4 through to 2.7.

Rails 4.2 or newer is required

It's been a long time coming, but Rails 3.2 (and 4.0 and 4.1) are no longer supported. The current supported versions are 4.2 through to 6.0 (and 6.1 will likely work as well, once it's released).

Other changes to behaviour

  • Remove internal uses of send, replaced with public_send as that's available in all supported Ruby versions.
  • Custom index_set_class implementations can now expect the :instances option to be set alongside :classes, which is useful in cases to limit the indices returned if you're splitting index data for given classes/models into shards. (Introduced in PR #1171 after discussions with @lunaru in #1166.)
  • Deletion statements are simplified by avoiding the need to calculate document keys/offsets (@njakobsen via #1134).
  • Real-time data is deleted before replacing it, to avoid duplicate data when offsets change (@njakobsen via #1134).
  • Use reference_name as per custom index_set_class definitions. Previously, the class method was called on ThinkingSphinx::IndexSet even if a custom subclass was configured. (As per discussions with @kalsan in #1172.)
  • Fields and attributes can be overriden - whichever's defined last with a given name is the definition that's used. This is an edge case, but useful if you want to override any of the default fields/indices. (Requested by @kalsan in #1172.)

Bug fixes

None.

thinking-sphinx - v4.4.1

Published by pat about 5 years ago

Upgrading

No breaking or major changes.

Changes to behaviour

  • Automatically remove app/indices from Zeitwerk's autoload paths in Rails 6.0 onwards (if using Zeitwerk as the autoloader).
thinking-sphinx - v4.4.0

Published by pat about 5 years ago

Upgrading

No breaking or major changes.

New features

  • Confirmed Rails 6.0 support.
  • Added ability to have custom real-time index processors (which handles all indices) and populators (which handles a particular index). These are available to get/set via ThinkingSphinx::RealTime.processor and ThinkingSphinx::RealTime.populator.

The processor should accept call with two arguments: an array of index objects, and a block to invoke after each index is processed. Here is a simple example for parallel processing of indices:

# Add the 'parallel' gem to your Gemfile.
ThinkingSphinx::RealTime.processor = Proc.new do |indices, &block|
  Parallel.map(indices) do |index|
    puts "Populating index #{index.name}"
    ThinkingSphinx::RealTime.populator.populate index
    puts "Populated index #{index.name}"

    block.call
  end
end

And the populator should respond to populate, accepting a single argument which is the index object. Here is a simple example for parallel processing.

# Add the 'parallel' gem to your Gemfile.
class ParallelPopulator
  def self.populate(index)
    new(index).call
  end

  def initialize(index)
    @index = index
  end

  def call
    Parallel.each(index.scope.find_in_batches) do |instances|
      transcriber.copy *instances
      true # Don't emit any large object because results are accumulated
    end
    ActiveRecord::Base.connection.reconnect!
  end

  private

  attr_reader :index

  def transcriber
    @transcriber ||= ThinkingSphinx::RealTime::Transcriber.new index
  end
end

ThinkingSphinx::RealTime.populator = ParallelPopulator

Instead of building your own procs/classes from scratch, you may instead wish to subclass the default classes to tweak behaviour - or at the very least, both classes are useful as reference points for your own replacements:

These changes were influenced by discussions in #1134 with @njakobsen about parallel processing of real-time indices.

Changes to behaviour

  • Improve failure message when tables don't exist for models associated with Sphinx indices (Kiril Mitov in #1139).

Bug fixes

  • Injected has-many/habtm collection search calls as default extensions to associations in Rails 5+, as it's a more reliable approach in Rails 6.0.0.
thinking-sphinx - v4.3.2

Published by pat about 5 years ago

Upgrading

No breaking or behaviour changes.

Bug fixes

  • Reverted loading change behaviour from v4.3.1 for Rails v5 (Eduardo J. in #1138).
thinking-sphinx - v4.3.1

Published by pat about 5 years ago

Upgrading

No breaking or behaviour changes.

Bug fixes

  • Fixed loading of index files to work with Rails 6 and Zeitwerk (#1137).
thinking-sphinx - v4.3.0

Published by pat over 5 years ago

Upgrading

No breaking or major changes.

New features

  • Allow overriding of Sphinx's running state by setting skip_running_check to true/false in config/thinking_sphinx.yml for appropriate environments. This is useful when Sphinx commands are interacting with a remote Sphinx daemon. As per discussions in #1131.
  • Allow skipping of directory creation by setting skip_directory_creation to true/false in config/thinking_sphinx.yml for appropriate environments. As per discussions in #1131.

Bug fixes

  • Use ActiveSupport's lock monitor where possible (Rails 5.1.5 onwards) to avoid database deadlocks. Essential investigation by Jonathan del Strother (#1132).
  • Allow facet searching on distributed indices (#1135).
thinking-sphinx - v4.2.0

Published by pat over 5 years ago

Upgrading

No breaking or major changes.

New features

  • Allow changing the default encoding for MySQL database connections from utf8 to something else via the mysql_encoding setting in config/thinking_sphinx.yml. In the next significant release, the default will change to utf8mb4 (which is supported in MySQL 5.5.3 and newer).
  • Added Rails 6.0 and Manticore 2.8 to the test matrix.

Changes to behaviour

  • Use Arel's SQL literals for generated order clauses, to avoid warnings from Rails 6.

Bug fixes

  • Fix usage of alternative primary keys in update and deletion callbacks and attribute access.
  • Ensure respond_to? takes Sphinx scopes into account (Jonathan del Strother in #1124).
  • Add :excerpts as a known option for search requests.
  • Fix depolymorphed association join construction with Rails 6.0.0.beta2.
  • Reset ThinkingSphinx::Configuration's cached values when Rails reloads, to avoid holding onto stale references to ActiveRecord models (#1125).
  • Don't join against associations in sql_query if they're only used by query-sourced properties (Hans de Graaff in #1127).
thinking-sphinx - v4.1.0

Published by pat almost 6 years ago

Upgrading

No breaking or major changes, though Ruby 2.2 is now no longer officially supported - but this release will almost certainly still work on it.

New features

  • The :sql search option can now accept per-model settings with model names as keys. e.g. ThinkingSphinx.search "foo", :sql => {'Article' => {:include => :user}} (Sergey Malykh in #1120).

Changes to behaviour

  • Drop MRI 2.2 from the test matrix, and thus no longer officially supported (though the code will likely continue to work with 2.2 for a while).
  • Added MRI 2.6, Sphinx 3.1 and Manticore 2.7 to the test matrix.

Bug fixes

  • Real-time indices now work with non-default integer primary keys (alongside UUIDs or other non-integer primary keys).
thinking-sphinx - v4.0.0

Published by pat over 6 years ago

Major Features and Changes

Thinking Sphinx v4.0 has been in development for a little while, and includes some significant changes (as befitting a major release):

Merging Indices

Merging indices is now supported via the new ts:merge rake task. This is useful when you're using delta indices as an alternative to running ts:index regularly to have new/changed records populated into the core indices. Merging should be reliably faster (and it avoids hitting your database to reprocess all the records).

Running ts:index every now and then to catch any records changed/modified without callbacks is probably wise (perhaps once a day compared to more frequent ts:merge calls).

Run the daemon on a UNIX socket

If you've got Sphinx and your Rails app all on a single machine, you may want to have the daemon (searchd) hosting connections via a UNIX socket instead of a TCP socket. Just set the socket value in each appropriate environment within config/thinking_sphinx.yml (and do not set mysql41 unless you want the daemon to also be available via that TCP port).

production:
  socket: /var/tmp/production.sphinx

This feature is limited to MRI, as JRuby doesn't seem to have a way to connect to UNIX sockets for MySQL-protocol connections.

ActiveRecord 5.2 Support

The new release of ActiveRecord/Rails is happily supported by this release.

Manticore Support

The recent fork of Sphinx known as Manticore is supported, and can be used as a drop-in replacement for Sphinx. In particular, the v2.6.3 release is included in the test matrix.

Breaking Changes

Sphinx 2.1.2 or newer is required

Sphinx 2.0 is no longer supported - make sure you're running at least 2.1.2, but 2.2.11 is recommended if possible.

Ruby 2.2 or newer is required

Versions of Ruby less than 2.2 are no longer supported, sorry.

Removed auto-typing of search filter values

If you're filtering via human-entered values (say, via request parameters), then in the past you were allowed to send those string values straight through to Sphinx.

However, Sphinx now supports string filtering, so it's not possible to make assumptions about filter types. Thinking Sphinx v3.4.0 introduced automatic typing of these values, but this was an extra overhead which was far from ideal and was always flagged as temporary.

So, please cast your filter values where appropriate:

Model.search :with => {:foo_id => params[:foo_id]}
# should become:
Model.search :with => {:foo_id => params[:foo_id].to_i}

Minor features

  • If you want to remove the docinfo setting from the generated Sphinx configuration (to avoid warnings in Sphinx 2.2+), add skip_docinfo: true to each appropriate environment in config/thinking_sphinx.yml.
  • Sphinx 3.0 is now supported.
  • You can now use relative paths in config/thinking_sphinx.yml, but you must also add absolute_paths: true to each environment for them to be converted to absolute paths for the generated configuration.

Changes to behaviour

  • The INDEX_FILTER environment variable is applied when running ts:index on SQL-backed indices.
  • Useful error messages are now displayed if processing real-time indices is attempted when the daemon isn't running.
  • Rake task code has been refactored into separate command classes under the hood (which allows for flying-sphinx to override when appropriate).
  • Added frozen_string_literal: true pragma comments for safe frozen string literals.
  • Exceptions are logged when processing real-time indices without halting the processing.
  • Update polymorphic properties to support Rails 5.2.
  • Allow configuration of the index guard approach (e.g. ThinkingSphinx::Configuration.instance.guarding_strategy = ThinkingSphinx::Guard::None).
  • Output a warning if guard files exist when calling ts:index.
  • Delete index guard files as part of ts:rebuild and ts:clear.

Bug fixes

  • Don't attempt to interpret indices for models that don't have a database table.
  • Handle situations where no exit code is provided for Sphinx binary calls.
thinking-sphinx - v3.4.2

Published by pat about 7 years ago

Upgrading

No breaking or major changes, just three small fixes and a couple of minor changes.

Changes to behaviour

  • Allow use of deletion callbacks for rollback events.
  • Remove extra deletion code in the Populator - it's also being done by the real-time rake interface.

Bug fixes

  • Real-time callback syntax for namespaced models accepts a string (as was already documented).
  • Fix up logged warnings (and avoiding overwriting the existing warn method).
  • Add missing search options to known values to avoid incorrect warnings.
thinking-sphinx - v3.4.1

Published by pat about 7 years ago

Upgrading

No breaking or major changes, just two small fixes.

Changes to behaviour

  • Errors with "Lost connection to MySQL server" are now treated as connection errors (Manuel Schnitzer).

Bug fixes

  • Index normalisation will now work even when index model tables don't exist.
thinking-sphinx - v3.4.0

Published by pat about 7 years ago

Upgrading

There are a few significant changes in this release. There's nothing that's going to break your code, but there are some deprecations (and thus, there will be breaking in later releases), so reading through is highly recommended.

Basic type checking for attribute filters.

Given Riddle now quotes string values in filters (because Sphinx now supports filtering on string attributes), we need to be a little more careful about attribute filter values coming in through params. In the past, Riddle would presume any string value was not actually a string, and that's no longer a safe presumption.

As of this release, Thinking Sphinx will do its best to cast your filter values to their appropriate types, but it's not going to be perfect, and this will be removed in a future release. So, best to do the casting yourself:

Model.search :with => {:foo_id => params[:foo_id]}
# should become:
Model.search :with => {:foo_id => params[:foo_id].to_i}

This is likely going to crop up any time you're using params data in filters, because they'll always be strings.

If you're confident that you're casting all filter values to their appropriate types, you can remove the search middleware that's attempting to auto-cast (and thus, get a bit of a speed boost) by putting the following in an initialiser:

ThinkingSphinx::Middlewares::DEFAULT.delete(
  ThinkingSphinx::Middlewares::AttributeTyper
)
ThinkingSphinx::Middlewares::RAW_ONLY.delete(
  ThinkingSphinx::Middlewares::AttributeTyper
)
ThinkingSphinx::Middlewares::IDS_ONLY.delete(
  ThinkingSphinx::Middlewares::AttributeTyper
)

Warnings for unknown options in search calls.

Thinking Sphinx will now output a warning to your logs when unexpected options are used in search queries.

If you’re adding your own middleware in or have something else that may allow for custom options, make sure you add them to ThinkingSphinx::Search.valid_options.

If you don’t want this behaviour to occur, you can remove the middleware from your stack by putting the following in an initialiser:

ThinkingSphinx::Middlewares::DEFAULT.delete(
  ThinkingSphinx::Middlewares::ValidOptions
)
ThinkingSphinx::Middlewares::RAW_ONLY.delete(
  ThinkingSphinx::Middlewares::ValidOptions
)
ThinkingSphinx::Middlewares::IDS_ONLY.delete(
  ThinkingSphinx::Middlewares::ValidOptions
)

Unified Rake Tasks

Rake tasks are now unified, so the original tasks will operate on real-time indices as well. What this means is that ts:generate and ts:regenerate can be changed to ts:index and ts:rebuild. All standard tasks will perform their appropriate behaviours on all indices.

If you wish to perform operations on specific types of indices, then there are now tasks available for that, including:

  • ts:sql:index (the old behaviour of ts:index)
  • ts:sql:rebuild (the old behaviour of ts:rebuild)
  • ts:rt:index (the old behaviour of ts:generate)
  • ts:rt:rebuild (the old behaviour of ts:regenerate)

Minor Features

  • Automatically use UTF8 in Sphinx for encodings that are extensions of UTF8 (such as utf8mb4).
  • Allow generation of a single real-time index (Tim Brown) with the INDEX_FILTER environment variable.

Changes to behaviour

  • Handle non-computable queries as parse errors.
  • Don't search multi-table inheritance ancestors.
  • Set a default Sphinx connection timeout of 5 seconds.
  • Use saved_changes if it's available (in Rails 5.1+).
  • Add support for Ruby's frozen string literals feature.
  • Display SphinxQL deletion statements in the log.
  • Allow for unsaved records when calculating document ids (and return nil).
  • Delta callback logic now prioritises checking for high level settings rather than model changes.

Bug Fixes

  • Ensure ts:index now respects rake silent/quiet flags.
  • Use the base class of STI models for polymorphic join generation (via Andrés Cirugeda).
  • Fix multi-field conditions.
  • Fix handling of attached starts of Sphinx (via Henne Vogelsang).
  • Get bigint primary keys working in Rails 5.1.
  • Always close the SphinxQL connection if Innertube's asking (via @cmaion).
  • Fix long SphinxQL query handling in JRuby.
  • Fix Sphinx connections in JRuby.
  • Index normalisation now occurs consistently, and removes unneccesary sphinx_internal_class_name fields from real-time indices.