📖 Open-source platform that aggregates reviews, book ratings and brochures written in React + TypeScript + NestJS + Redis + ElasticSearch
GPL-3.0 License
Real world open source book reviews aggregator, something like Metacritic / Digg for books. It allows to compare book price between different shops.
🇵🇱 Poland
To be added soon:
🇵🇱 Poland
🌍 World
cp .env.example .env # edit .env config
yarn install
yarn run migration:run
yarn run seed:run
gulp entity:reindex:all
[yarn run console]:
await app.select(ScrapperModule).get('BookParentCategoryService').findAndAssignMissingParentCategories();
await app.select(ScrapperModule).get('BookCategoryRankingService').refreshCategoryRanking();
await app.select(ScrapperModule).get('BookStatsService').refreshAllBooksStats();
[/console]
yarn run develop
gulp scrapper:refresh
Proxy local 9201 to remote ES
ssh -g -L 9201:localhost:9200 -f -N [email protected]
There is NestJS context present on window, it is called app
. All entities are exporeted to context.
yarn console
⚠️ Use services to remove records! (TypeORM async callbacks are buggy)
Remove book:
app.select(ScrapperModule).get('BookService').delete([13])
Reindex all record of particular type (after index structure change or something):
app.select(ScrapperModule).get('EsBookIndex').reindexAllEntities();
Sitemap:
gulp sitemap:refresh
Fetchers:
# Reindex all records
gulp entity:reindex:all
# Fetches single review by id
gulp scrapper:refresh:single --kind BOOK_REVIEW --remoteId 123 --website wykop.pl
# Fetches single book by url
gulp scrapper:refresh:single --remoteId szepty-spoza-nicosci-remigiusz-mroz,p697692.html --website www.publio.pl
# Fetches all reviews from scrapper
gulp scrapper:refresh:all --kind BOOK_REVIEW --website wykop.pl
# Refreshes only first remote reviews page using all scrappers
gulp scrapper:refresh:latest --kind BOOK_REVIEW
gulp scrapper:refresh:latest --kind BOOK_REVIEW --website wykop.pl
# Fetches all reviews pages from websites using all scrappers
gulp scrapper:refresh:all --kind BOOK_REVIEW
# Fetches missing favicons
gulp entity:website:fetch-missing-logos
# Refreshes promotion value in categories
gulp entity:category:refresh-ranking
# After adding new scrapper fetch availability for books
gulp scrapper:loader:fetch-availability --scrapperGroupId=26
Analyzers:
# Iterates over all records and reparses them, dangerous!!
# it removes records that are not classified as reviews after analyze
gulp scrapper:reanalyze:all --kind BOOK_REVIEW
# Parses again single record
gulp scrapper:reanalyze:single --remoteId szepty-spoza-nicosci-remigiusz-mroz,p697692.html --website www.publio.pl
Stats (console):
app.select(BookModule).get('BookStatsService').refreshBooksStats(R.pluck('id', books))
Spiders:
gulp scrapper:spider:run
Scrappers:
Refresh all books from all websites:
node_modules/.bin/gulp scrapper:refresh:all --kind BOOK_REVIEW --initialPage 1 --website wykop.pl
node_modules/.bin/gulp scrapper:refresh:all --kind BOOK_REVIEW --website hrosskar.blogspot.com
Prevent clearing redis when warmup when lock is available (used for long tasks)
dist/locks/redis_warmup_flushdb.lock
Running scrapper
tasks such as refreshLatest
, refreshSingle
triggers fetching new records into scrapper_metadata
table. All of these functions are stored in ServiceModule -> ScrapperService
. After successful fetching page of scrapped content ScrapperService
creates new background job stored in redis that runs database and book matchers.
Each job is later executed and MetadataDbLoaderService
tries to match book in database and saves it.
Adding new scrapper:
cd ./src/server/modules/importer/sites/
mkdir example-scrapper/
touch example-scrapper/ExampleScrapperGroup.ts
scrappersGroups
variable inside ScrapperService
Real World Nest.JS + TypeORM app.