Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
APACHE-2.0 License
Bot releases are visible (Hide)
Published by mnmkng over 4 years ago
session.checkStatus() -> session.retireOnBlockedStatusCodes()
.Session
API is no longer considered experimental.Published by mnmkng over 4 years ago
APIFY_LOCAL_EMULATION_DIR
env var is no longer supported (deprecated on 2018-09-11).APIFY_LOCAL_STORAGE_DIR
instead.SessionPool
API updates and fixes. The API is no longer considered experimental.require
time to Apify.main()
invocation.RegExp
instead of xregexp
for unicode property escapes.Published by mnmkng almost 5 years ago
SessionPool
not automatically working in CheerioCrawler
.PuppeteerPool
.Published by petrpatek almost 5 years ago
CheerioCrawler
ignores ssl errors by default - options.ignoreSslErrors: true
.SessionPool
implemenation to CheerioCrawler
.SessionPool
implementation to PuppeteerPool
and PupeteerCrawler
.Request
constructor not making a copy of objects such as userData
and headers
.desc
option not being applied in local dataset.getData()
.Published by mnmkng almost 5 years ago
Apify.callTask()
body
and contentType
options are now deprecated.input
instead. It must be of content-type: application/json
.SessionPool
implementation to BasicCrawler
.Apify.call()
and Apify.callTask()
.Puppeteer
.country
option to Apify.getApifyProxyUrl()
.Apify.utils.puppeteer.saveSnapshot()
helper to quickly save HTML and screenshot of a page.got
supported options to requestOptions
in CheerioCrawler
cookieJar
again.pipe
errors.CheerioCrawler
.CheerioCrawler
.Apify.utils.requestAsBrowser()
.RequestQueueLocal
.RequestList
persistence of downloaded sources in key-value store.Apify.utils.puppeteer.blockRequests()
always including default patterns.Apify.utils.puppeteer.infiniteScroll()
on some websites.YOUTUBE_REGEX
, YOUTUBE_REGEX_GLOBAL
) to utils.social
json
in handlePageFunction of CheerioCrawler
Published by drobnikj almost 5 years ago
useIncognitoPages
option to PuppeteerPool
to enable opening new pages in incognitobody
and contentType
in handlePageFunction
for this purposes.html
option in handlePageFunction
was replaced with body
option.Published by mnmkng about 5 years ago
@apify/http-request
to version 1.1.2.CheerioCrawler
to use requestAsBrowser()
to better disguise as a real browser.Published by mnmkng about 5 years ago
Published by mnmkng about 5 years ago
dataset.delete()
, keyValueStore.delete()
and requestQueue.delete()
methods have been deprecated in favor of *.drop()
methods, because the drop
name more clearly communicates the fact that those methods drop / delete the storage itself, not individual elements in the storage.Apify.utils.requestAsBrowser()
helper function that enables you to make HTTP(S) requests disguising as a browser (Firefox). This may help in overcoming certain anti-scraping and anti-bot protections.options.gotoTimeoutSecs
to PuppeteerCrawler
to enable easier setting of navigation timeouts.PuppeteerPool
options that were deprecated from the PuppeteerCrawler
constructor were finally removed. Please use maxOpenPagesPerInstance
, retireInstanceAfterRequestCount
, instanceKillerIntervalSecs
, killInstanceAfterSecs
and proxyUrls
via the puppeteerPoolOptions
object.apify
package version.Apify.utils.puppeteer.enqueueLinksByClickingElements()
will now print a warning when the nodes itPublished by mnmkng about 5 years ago
Apify.launchPuppeteer()
now accepts proxyUrl
with the https
, socks4
socks5
schemes, as long as it doesn't contain username or password.desiredConcurrency
option to AutoscaledPool
constructor, removedPublished by mnmkng over 5 years ago
Published by mnmkng over 5 years ago
Dataset.getData()
throws an error if user provides an unsupported optionoptions.userData
of Apify.utils.enqueueLinks()
is deprecated.options.transformRequestFunction
instead.Apify.call()
.Apify.utils.puppeteer.enqueueLinksByClickingElements()
function which enables youApify.utils.puppeteer.infiniteScroll()
function which helps you with scrolling to the bottomRequestQueue.handledCount()
function has been resurrected from deprecation,RequestList
.useExtendedUniqueKey
option to Request
constructor to include method
and payload
Request
's computed uniqueKey
.apify-client
to 0.5.22