Bot releases are hidden (Show)
Published by EdJoPaTo over 2 years ago
ignore_error
site option to only warn on pages that fail regularlyValueHint
Published by EdJoPaTo over 2 years ago
Published by EdJoPaTo over 2 years ago
html_prettify
attribute improvementsBefore this release changes like this occurred regularly:
-<a class="external link">
+<a class="link external">
-<a style="color: white; display: none">
+<a style="display:none;color:white">
This release sorts class
es and formats style
. This reduces the amount of diffs when the host only changes something like the order.
It also fits into the concept of 'pretty' HTML which this editor attempts.
eeb020f969e8f871a2f2df0202f3fe57aa522c88 4a408269cbdf6f330eb49c91f6568a97e0eaed88
Some websites are server generated based on the queries used. Different queries for the same domain/path are now possible.
05c4dc85642132d0da19cfaf56edbde2c31a309b
housekeeping, dependency updates, …
Published by EdJoPaTo almost 3 years ago
html_markdownify
, html_prettify
and html_textify
received bugfixes and improvements to parsing.
HTML parts aren't escaped anymore 2989f56db5660c5c210751d23b6d22ab28e01fc3 and prettify ensures indentation of text contents 952bde35eb308fcadf458ef727c421e6907667fe.
html_markdownify
now uses the html2md crate which implements more features and less strange edge cases 1362a4ee8dc95a60a69ed36b95d36f3afe611488.
It is now attempted to read the datetime
attribute from elements to determine the pubDate
of the RSS item.
The goal of the datetime
element is to provide a machine-readable format. As parsing the date time from various human formats is hard this is probably the simplest way of adding a useful pubDate
when possible while not over-complicating things.
5b6d9716a94ba0036c95df594d51bb2dbcac05a0
Published by EdJoPaTo almost 3 years ago
previously you configured the wanted extension via the config file. This is now automatically assumed based on the Content-Type HTTP Header and the used editors.
- url: "https://edjopato.de"
- extension: md
editors:
- html_markdownify
a6ba06fdf15b17cb3279d1aa51385ac892e51ac6 01afa7ff57bf6e59ed5d195eeb22ac3c48562537
Its now possible to send notifications on changes via pling.
Notification targets (E-Mail, Slack, Telegram, …) are entirely configured via environment variables as they mainly contain secrets. Check the pling documentation about which environment variables can be set.
The sent notification can be changed via the new config key notification_template.
When using GitHub Actions you can check out their Environment variable documentation and the example repo config which configures Telegram notifications into this Telegram channel.
1b1977a89c444a25c5b4d3df37fc80c251b6e157 14d3837390f475630b0af1cd340bd9c4463bee27 24b6cd605aa6591fa202cb3bb38d8df8fba7042d 8ad53c1e585b02576f38a7aaff12299a14744011
website-stalker check
Check shows more details like configured notifications. This will not show details to prevent leakage of secrets and only the amount of configured notification targets.
Its also possible to print or rewrite the current config as yaml.
This is helpful when migrating older configs or checking if certain environment variables are correctly read.
3caf39eecbb6b8e620b919fe2f181df1594cbc42 d9bfff9596f23e3507b7672958eeb25a22cd5719 3b7c82daaf414ce519f86cced7980f86abd69d32
Published by EdJoPaTo about 3 years ago
You can now specify an URL array to be used for an entry in the config. This way multiple urls will use the same specified options.
This is especially for stalking multiple nearly the same webpages.
To provide an example:
sites:
- url: "https://edjopato.de/"
extension: html
- url: "https://edjopato.de/post/"
extension: html
Can now also be specified like this:
sites:
- url:
- "https://edjopato.de/"
- "https://edjopato.de/post/"
extension: html
092123355c9dfe621a17f357de8e4208555a52b9
Published by EdJoPaTo about 3 years ago
Published by EdJoPaTo about 3 years ago
css_select
into css_select
and css_remove
This results in simpler configs for removing via css selector:
editors:
- - css_select:
- selector: img
- remove: true
+ - css_remove: img
This is a breaking change and also simplifies the internal logic.
fafc19d269954ea87af968ffd4421c5d08397b52
img
in html_markdownify
Images are now added to the markdown output.
Images will require absolute paths when markdown is being rendered as html so html_url_canonicalize
is helpful here.
If you do not want the images (like it was before this release) add the editor css_remove: img
to your config.
ec96f24a59cd5212832c32a659cf211350e5759e
Published by EdJoPaTo about 3 years ago
Published by EdJoPaTo about 3 years ago
Two new editors json_prettify
and html_url_canonicalize
. 73814fb03d26163c9212ccdc5ab3154d6790c4f6 e51baf089de20c7af8c50d11ea4d149969b39aab
The log output now shows which kind of address was used. e034a70ba76c713dcaefd7c9268e84d20b2d374c
Published by EdJoPaTo over 3 years ago
A new editor html_markdownify
can create markdown from html input. See more details about this new editor in the README. e1798ee3243cb7370e99009a14f77b6db2f060a9
Creates now up to one empty line between filled lines db894e9f8e0584a2718bea2f9925a260de3e8545 32fa6d19923b7f8a0f9e2f811eda086e60a94374
Editors should now be more clear in what they are doing when they are applied. This is a breaking change and you have to adapt your configs in order to work with this release. 82cefbc45111103d557f38288cd0ad1a64891360
Published by EdJoPaTo over 3 years ago
Each site in the config file is now more generic. Before each entry was an html
or utf8
entry. Now each entry is basically the same.
Each entry has an URL and a file extension which is then used to save the resulting file.
Each site can also have editors. An editor manipulates the content before saving the result.
css_selector
and regex_replacer
are now editors. The default behavior of html
to prettify the content is now and editor too: html_prettify
.
Additionally this update includes a new editor html_text
which only returns text entries from the HTML.
To give an example:
sites:
- url: "https://edjopato.de/post/"
extension: html
editors:
- css_selector: article
- css_selector:
selector: .meta
remove: true
- html_prettify
If you want to see a config migration see this commit.
css_selector
remove elementsThe css_selector
can now remove matching HTML elements from the result. This is already included in the example above.
html_text
EditorThis editor only returns text entries from the HTML.
To give an example: This will save every h1
heading to the resulting file.
- url: "https://edjopato.de/post/"
extension: txt
editors:
- css_selector: h1
- html_text
Published by EdJoPaTo over 3 years ago
You can use the website-stalker systemd units now as a user too. Check out the systemd/README.md for more details.
580bb5fb69ec4dd54d8efa1c5b96f8c3ceddfba2 953a0388190bdfdd67ba52d71282811b7bdf77d6
Published by EdJoPaTo over 3 years ago
Adds a systemd service and timer to be used locally 3a210f253c0c88c6a2fedd8bc63114af51ebfc66
Migrate some functions from running git
as a commandline tool towards libgit2
.
This should make handling and detecting easier on the code side of things.
Not everything is migrated (yet?). Some outputs like the git diff are just fine currently via the commandline command.
0074611ba91f08f56f3b2fa0f3370d0e613957c6 bf08eb48e304e8246b338e6f515ec439197e4fbd 20f07d52ef9219f85592a41627b2db65dc3391cb 9398cd649a9d6c42c1d923351acd62dd7b9d6c63
This also allows for running from within a subfolder of a git repo 9398cd649a9d6c42c1d923351acd62dd7b9d6c63
Show a summary of changes in the commit message body
63aae5828eceef656571470bf64d6d3312434340 cf165443592b9dac3c924e830a144178d4b59a5c 754c7e32271d49c9fe1d49e713586db4d160ba7f 799d740eb13905e646fd86d5469467e0eca81ab9