Transform online stores into APIs !
LGPL-3.0 License
A ruby gem that simplifies the declaration of APIs on online stores through scraping.
Once upon a time, I wanted to create online groceries with great user experience ! That's how I started mes-courses.fr. Unfortunately, most online groceries don't have APIs, so I resorted to scrapping. Scrapping comes with its (long) list of problems as well !
Refactoring by refactoring, I extracted this library which defines scrappers for any online store in a straightforward way (check auchandirect-scrAPI for my real world usage). A scrapper definition consists of :
As a result of using storexplore for mes-courses, the scrapping code was split between the storexplore gem and my special scrapper definition :
Add this line to your application's Gemfile:
gem 'storexplore'
And then execute:
$ bundle
Or install it yourself as:
$ gem install storexplore
In order to be able to enumerate all items of a store in constant memory, Storexplore requires ruby 2.0 for its lazy enumerators.
Online stores are typically organized as hierarchies. For example Ikea (US) is organized as follows :
Ikea
|-> Living room
| |-> Sofas & armchairs
| | |-> Fabric Sofas
| | | |-> Norsborg Sofa
| | | |-> Norborg Loveseat
| | | |-> ...
| | | |-> Pöang Footstool cushion
| | |-> Leather Sofas
| | |-> ...
| | |-> Armchairs
| |-> TV & media furniture
| |-> ...
| |-> Living room textiles & rugs
|-> Bedroom
|-> ...
|-> Dining
Storexplore builds hierarchical APIs on the following pattern :
Store
|-> Category 1
| |-> Sub Category 1
| | |-> Item 1
| | |-> ...
| | |-> Item n
| |-> Sub Category 2
| |-> ...
| |-> Sub Category n
|-> Category 2
|-> ...
|-> Category n
The store is like a root category. Any level of depth is allowed. Any category, at any depth level can have both children categories and items. Items cannot have children of any kind. Both categories and items can have attributes.
All searching of children and attributes is done through mechanize/nokogiri selectors (css or xpath).
Here is a sample store api declaration for Ikea again:
Storexplore::Api.define 'ikea.com/us' do
categories '.departmentLinkBlock a' do
attributes do
{ :name => page.get_one("#breadCrumbNew .activeLink a").content.strip }
end
categories '.departmentLinks a' do
attributes do
{ :name => page.get_one("#breadCrumbNew .activeLink a").content.strip }
end
categories 'a.categoryName' do
attributes do
{ :name => page.get_one("#breadCrumbNew .activeLink a").content.strip }
end
items '.productDetails > a' do
attributes do
{
:name => page.get_one('#name').content.strip,
:type => page.get_one('#type').content.strip,
:price => page.get_one('#price1').content.strip.sub('$','').to_f,
:salesArgs => page.get_one('#salesArg').content.strip,
:image => page.get_one('#productImg').attributes['src'].content,
:ikea_id => page.uri.to_s.match("^.*\/([0-9]+)\/?$").captures.first
}
end
end
end
end
end
end
This defines a hierarchical API on the IKEA store that will be used to browse any store which URI contains ikea.com/us
.
Now here is how this API can be accessed to pretty print all its content:
Storexplore::Api.browse('http://www.ikea.com/us/en').categories.each do |category|
puts "category: #{category.title.strip}"
puts "attributes: #{category.attributes}"
category.categories.each do |sub_category|
puts " category: #{sub_category.title.strip}"
puts " attributes: #{sub_category.attributes}"
sub_category.categories.each do |sub_sub_category|
puts " category: #{sub_sub_category.title.strip}"
puts " attributes: #{sub_sub_category.attributes}"
sub_sub_category.items.each do |item|
puts " item: #{item.title.strip}"
puts " attributes: #{item.attributes}"
end
end
end
end
(This sample can be found in samples/ikea.rb)
NOTE : please keep in mind that these testing utilities have been extracted from my first real use case (auchandirect-scrAPI) and might still rely on assumptions coming from there. Any help cleaning this up is welcome.
This can be quite a challenge. Storexplore can help you with that :
Dummy stores can be generated to the file system using the Storexplore::Testing::DummyStore and Storexplore::Testing::DummyStoreGenerator classes.
To use it, add the following to your spec_helper.rb for example :
require 'storexplore/testing'
Storexplore::Testing.config do |config|
config.dummy_store_generation_dir= File.join(Rails.root, '../tmp')
end
It is then possible to generate a store with the following :
DummyStore.wipe_out_store(store_name)
@store_generator = DummyStore.open(store_name)
@store_generator.generate(3).categories.and(3).categories.and(item_count).items
You can add custom elements with explicit values :
@store_generator.
category(cat_name = "extra long category name").
category(sub_cat_name = "extra long sub category name").
item(item_name = "super extra long item name").generate().
attributes(price: 12.3)
Storexplore provides an api definition for dummy stores in 'storexplore/testing/dummy_store_api'. It can be required independently if needed.
Storexplore also ships with an rspec shared examples macro. It guarantees basic scrapper well behavior such as the presence of many categories, of item names and prices
require 'storexplore/testing'
describe "MyStoreApi" do
include Storexplore::Testing::ApiSpecMacros
it_should_behave_like_any_store_items_api
...
end
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)