Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
APACHE-2.0 License
Bot releases are hidden (Show)
forefront
request fetching in RQv2 (#2689) (03951bd), closes #2669
forefront
option in prolong-
and deleteRequestLock
(#2690) (cba8da3), closes #2681 #2689 #2669
.isFinished()
before RequestList
reads (#2695) (6fa170f)UInt8Array
in KVS.setValue()
(#2682) (8ef0e60)errorHandler
for session errors (#2683) (7d72bcb), closes #2678
username
and password
(#2696) (0f0fcc5)ignoreHTTPSErrors
to acceptInsecureCerts
to support v23 (#2684) (f3927e6)forefront
option in RequestQueue
(#2681) (b0527f9), closes #2669
Published by apify-service-account about 2 months ago
inProgress
cache, rely solely on locked states (#2601) (57fcb08)globs
& regexps
for SitemapRequestList
(#2631) (b5fd3a9)This release is pinning the dependency on cheerio to the last RC version, we might postpone the official support for v1 to next major, or at least wait for them to fix their stuff. Nice demonstration of how not to maintain popular open source projects 😞
Published by apify-service-account 3 months ago
Published by apify-service-account 4 months ago
@crawlee/browser
package (#2532) (3357c7f)useState
in adaptive crawler (#2530) (7e195c1)context.request.loadedUrl
and id
as required inside the request handler (#2531) (2b54660)Published by apify-service-account 4 months ago
waitForAllRequestsToBeAdded
option to enqueueLinks
helper (925546b), closes #2318
useState
implementation into crawling context (eec4a71)crawler.log
publicly accessible (#2526) (3e9e665)launchOptions
on type level (0519d40), closes #1849
crawler.log
when creating child logger for Statistics
(0a0d75d), closes #2412
Published by apify-service-account 4 months ago
requestHandler
is provided in AdaptiveCrawler
(#2518) (31083aa)Published by apify-service-account 5 months ago
Published by apify-service-account 5 months ago
URL_NO_COMMAS_REGEX
regexp to allow single character hostnames (#2492) (ec802e8), closes #2487
Published by apify-service-account 5 months ago
EnqueueStrategy.All
erroring with links using unsupported protocols (#2389) (8db3908)SystemInfo
events every second (#2454) (1fa9a66)content-type
check breaks on content-type
parameters (#2442) (db7d372)FileDownload
"crawler" (#2435) (d73756b)RequestQueue
v2 the default queue, see more on Apify blog (#2390) (41ae8ab), closes #2388
Published by apify-service-account 6 months ago
Published by apify-service-account 6 months ago
setValue
(#2411) (9089bf1)networkidle
to waitUntil
in gotoExtended
(#2399) (5d0030d), closes #2398
application/xml
(#2408) (cbcf47a)Published by apify-service-account 7 months ago
RequestQueueV2
(#2376) (ffba095)Published by apify-service-account 8 months ago
createRequests
works correctly with exclude
(and nothing else) (#2321) (048db09)csv-stringify
and fs-extra
(#2326) (718959d), closes /github.com/redabacha/crawlee/blob/2f05ed22b203f688095300400bb0e6d03a03283c/.eslintrc.json#L50
page.waitForTimeout()
with sleep()
(52d7219), closes #2335
puppeteer@v22
(#2337) (3cc360a)KeyValueStore.recordExists()
(#2339) (8507a65)userAgent
parameter to RobotsFile.isAllowed()
+ RobotsFile.from()
helper (#2338) (343c159)