A command-line utility for taking automated screenshots of websites
APACHE-2.0 License
Bot releases are visible (Hide)
--auth-username x --auth-password y
options for each shot-scraper
command, allowing a username and password to be set for HTTP Basic authentication. #140
shot-scraper URL --interactive
mode now respects the -w
and -h
arguments setting the size of the browser viewport. Thanks, mhalle. #128
--scale-factor
option for setting scale factors other than 2 (for retina). Thanks, Niel Thiart. #136
--browser-arg
option for passing extra browser arguments (such as --browser-args "--font-render-hinting=none"
) through to the underlying browser. Thanks, Niel Thiart. #137
Published by simonw 12 months ago
--bypass-csp
option for bypassing any Content Security Policy on the page that prevents executing further JavaScript. Thanks, Brenton Cleeland. #116
shot-scraper --interactive $URL
- which allows you to interact with the page in a browser window and then hit <enter>
to take the screenshot - it no longer reloads the page before taking the shot (which ignored your activity). #125
Published by simonw over 1 year ago
--omit-background
option to the shot
command to optionally create transparent PNGs. Thanks, Ben Welsh. #108
shot-scraper
to fail to take screenshots on Windows. Thanks, Omer Rosenbaum. #104
--silent
option for the shot
, multi
, pdf
and html
commands, to disable the default console output. #107
Full documentation: https://shot-scraper.datasette.io/
Published by simonw over 1 year ago
shot-scraper multi --fail-on-error
option in favor of the new --fail
option. --fail-on-error
will continue to work until shot-scraper
2.0 (should that ever be released), but is no longer displayed in the --help
menu or documentation. #103
Published by simonw over 1 year ago
--log-console
option for logging the output of calls to console.log()
to standard error. #101
--skip
and --fail
options to specify what should happen if an HTTP 4xx or 5xx error is encountered while trying to load the page. --skip
will ignore the error and either exit cleanly or move on to the next screenshot (in the case of multi
). --fail
will cause the tool to return a non-zero exit code, useful for running in CI environments. #102
Published by simonw almost 2 years ago
Published by simonw about 2 years ago
shot-scraper html URL
command (documented here) for outputting the final HTML of a page, after JavaScript has been executed. #96
shot-scraper javascript
has a new -r/--raw
option for outputting the result of the JavaScript expression as a raw string rather than JSON encoded (shot-scraper javascript documentation). #95
Published by simonw about 2 years ago
shot-scraper multi -o
option for specifying a subset of one or more output files to execute that are defined in the YAML. This is useful for testing a larger shots.yml
file without re-taking every screenshot every time the command is run. #94
Published by simonw about 2 years ago
--format
, --width
, --height
, --scale
and --print-background
. Thanks, Eddie Chapman. #87
-h
shortcut for help - use --help
instead. -h
was clashing with the shorter version of --height
. Thanks, Matthew Bafford. #84
Published by simonw about 2 years ago
Published by simonw about 2 years ago
Published by simonw over 2 years ago
Published by simonw over 2 years ago
shot-scraper $URL --wait-for EXPRESSION
can be used to take the screenshot only once the provided JavaScript expression returns true
. See Waiting until a specific condition. #72
wait_for:
key in the YAML format used by shot-scraper multi
provides equivalent functionality for scripted multiple screenshots.Published by simonw over 2 years ago
--selector-all
option to take a screenshot that encompasses every element matching the specified CSS selector - complements --selector
which takes a screenshot of just the first element matching that selector. See Specifying elements using JavaScript filters. #64
selector_all:
and selectors_all:
keys in the shot-scraper multi
YAML format.--js-selector
and --js-selector-all
options for specifying elements to screenshot using a JavaScript expression, for cases which cannot be handled using CSS selectors #43. The following example takes a screenshot of the first paragraph on the page that mentions shot-scraper
:
shot-scraper https://github.com/simonw/shot-scraper \
--js-selector 'el.tagName == "P" && el.innerText.includes("shot-scraper")'
js_selector:
, js_selectors:
, js_selector_all:
and js_selectors_all:
equivalent keys in the shot-scraper multi
YAML format.--user-agent
option for setting a custom user agent header. #63
--browser webkit
option for running WebKit - thanks, Ryan Murphy. #56
Published by simonw over 2 years ago
shot-scraper accessibility --timeout
option, thanks Ben Welsh. #59
shot-scraper auth --browser
option for authentication using a browser other than Chromium. #61
--quality
now results in a JPEG file with the correct .jpg
extension. Thanks, Ian Wootten. #58
--reduced-motion
flag for emulating the "prefers-reduced-motion" media feature. Thanks, Ryan Murphy. #49
Published by simonw over 2 years ago
-b/--browser
option for the shot-scraper install
, shot
, multi
and javascript
commands. This can be used to install and run alternative browsers firefox
, chrome
or chrome-beta
. Thanks, Ben Welsh. #53
--timeout
option for shot-scraper shot
and shot-scraper multi
. Thanks, Ben Welsh. #47
shot-scraper multi
now continues to create other shots despite a timeout error, unless --fail-on-error
is passed. Thanks, Ryan Cheley. #50
async () => { ... }
pattern for shot-scraper javascript
, as discussed in Extracting web page content using Readability.js and shot-scraper.shot-scraper javascript
to scrape a web page. See Scraping web pages from the command-line with shot-scraper.Published by simonw over 2 years ago
New shot-scraper javascript command for executing JavaScript against a web page and returning the result to the console as JSON: #38
% shot-scraper javascript datasette.io document.title
"Datasette: An open source multi-tool for exploring and publishing data"
This can be used for web scraping and data extraction. Any JavaScript errors will cause the command to return an exit code of 1, so this can also be used to run tests against a website from within a continuous integration environment such as GitHub Actions.
The shot-scraper pdf
and shot-scraper accessibility
commands can both now be used with local files in addition to URLs. #37
The output:
key is no longer required in YAML shot configuration: if omitted, an automatic filename will be used instead. #40
An empty YAML file no longer produces an error. #41