website-stalker

Track changes on websites via git

LGPL-2.1 License

Stars
40
Committers
4

Bot releases are hidden (Show)

website-stalker - v0.19.0

Published by EdJoPaTo over 2 years ago

Added

  • ignore_error site option to only warn on pages that fail regularly
  • Generate deb/rpm packages

Changed

  • systemd files are now meant for packages (no …/local/… anymore)

Fixed

  • CLI: correct autocompletion with ValueHint
website-stalker - v0.18.1

Published by EdJoPaTo over 2 years ago

  • fix(css_remove): prevent removing wrong content e459bc6c663a128bc22ab284c49cce43b64b9c27
website-stalker - v0.18.0

Published by EdJoPaTo over 2 years ago

html_prettify attribute improvements

Before this release changes like this occurred regularly:

-<a class="external link">
+<a class="link external">

-<a style="color: white; display: none">
+<a style="display:none;color:white">

This release sorts classes and formats style. This reduces the amount of diffs when the host only changes something like the order.
It also fits into the concept of 'pretty' HTML which this editor attempts.

eeb020f969e8f871a2f2df0202f3fe57aa522c88 4a408269cbdf6f330eb49c91f6568a97e0eaed88

support URL queries

Some websites are server generated based on the queries used. Different queries for the same domain/path are now possible.

05c4dc85642132d0da19cfaf56edbde2c31a309b

Minor changes

housekeeping, dependency updates, …

website-stalker - v0.17.0

Published by EdJoPaTo almost 3 years ago

HTML parsing improvements

html_markdownify, html_prettify and html_textify received bugfixes and improvements to parsing.
HTML parts aren't escaped anymore 2989f56db5660c5c210751d23b6d22ab28e01fc3 and prettify ensures indentation of text contents 952bde35eb308fcadf458ef727c421e6907667fe.
html_markdownify now uses the html2md crate which implements more features and less strange edge cases 1362a4ee8dc95a60a69ed36b95d36f3afe611488.

RSS pubDate

It is now attempted to read the datetime attribute from elements to determine the pubDate of the RSS item.
The goal of the datetime element is to provide a machine-readable format. As parsing the date time from various human formats is hard this is probably the simplest way of adding a useful pubDate when possible while not over-complicating things.

5b6d9716a94ba0036c95df594d51bb2dbcac05a0

Minor Changes

  • feat: improve 5sec between domain logic 722bf5f4cb40ead9ebbcb14786e6e8ea194ab00d
  • perf: dont recreate regular expressions 836ad7845ba71315fa343f7c76f7bcb7cfa36dd9
  • docs(readme): fix typos and backticks in README (#47) 84d8b4511673dbd73fdf814b650f2abe1165ee45
website-stalker - v0.16.0

Published by EdJoPaTo almost 3 years ago

Automatically assume file extensions

previously you configured the wanted extension via the config file. This is now automatically assumed based on the Content-Type HTTP Header and the used editors.

 - url: "https://edjopato.de"
-  extension: md
   editors:
     - html_markdownify

a6ba06fdf15b17cb3279d1aa51385ac892e51ac6 01afa7ff57bf6e59ed5d195eeb22ac3c48562537

Notifications

Its now possible to send notifications on changes via pling.
Notification targets (E-Mail, Slack, Telegram, …) are entirely configured via environment variables as they mainly contain secrets. Check the pling documentation about which environment variables can be set.
The sent notification can be changed via the new config key notification_template.
When using GitHub Actions you can check out their Environment variable documentation and the example repo config which configures Telegram notifications into this Telegram channel.

1b1977a89c444a25c5b4d3df37fc80c251b6e157 14d3837390f475630b0af1cd340bd9c4463bee27 24b6cd605aa6591fa202cb3bb38d8df8fba7042d 8ad53c1e585b02576f38a7aaff12299a14744011

Improvements to website-stalker check

Check shows more details like configured notifications. This will not show details to prevent leakage of secrets and only the amount of configured notification targets.

Its also possible to print or rewrite the current config as yaml.
This is helpful when migrating older configs or checking if certain environment variables are correctly read.

3caf39eecbb6b8e620b919fe2f181df1594cbc42 d9bfff9596f23e3507b7672958eeb25a22cd5719 3b7c82daaf414ce519f86cced7980f86abd69d32

Minor Changes

  • feat(config): allow loading via environment variable 94e4bd427c27b01e3b54cc0e9b10173135048b56
  • fix: dont prefix sites in message with M or A 6959978a8a499aadf4d0172fd89a3121d55c457e
website-stalker - v0.15.0

Published by EdJoPaTo about 3 years ago

Multiple URLs with same options

You can now specify an URL array to be used for an entry in the config. This way multiple urls will use the same specified options.
This is especially for stalking multiple nearly the same webpages.

To provide an example:

sites:
  - url: "https://edjopato.de/"
    extension: html
  - url: "https://edjopato.de/post/"
    extension: html

Can now also be specified like this:

sites:
  - url:
      - "https://edjopato.de/"
      - "https://edjopato.de/post/"
    extension: html

092123355c9dfe621a17f357de8e4208555a52b9

Minor Changes

  • fix(rss): error when no items are selected 408f0b05db52b24035c4f14e1c33fef1b50391f0
  • fix: use actual url for editors f0136b18edeaea2672e8ddb1a76569afa16f7cc7
website-stalker - v0.14.0

Published by EdJoPaTo about 3 years ago

  • feat: add accept_invalid_certs site option c4a25e2b2feddc84a85704c4f1314eecc20f126b
  • feat(git): git message head according to changes 2a233d1b74836b81354608a5a8a1f5059ece9c75
  • fix(git): dont show same change twice in git commit message 71f3da5b33e2f4efba5576c3aed45249125a0e6c
  • build(http): enable socks5 proxy support 3ba2ddbce5c2a82bd85f798ea99f6e3b59df3560
  • build(http): enable deflate body decompression 01b0237e8f7fdee69b472034b56598c9ca11b444
website-stalker - v0.13.0

Published by EdJoPaTo about 3 years ago

Split css_select into css_select and css_remove

This results in simpler configs for removing via css selector:

 editors:
-   - css_select:
-       selector: img
-       remove: true
+   - css_remove: img

This is a breaking change and also simplifies the internal logic.

fafc19d269954ea87af968ffd4421c5d08397b52

img in html_markdownify

Images are now added to the markdown output.

Images will require absolute paths when markdown is being rendered as html so html_url_canonicalize is helpful here.

If you do not want the images (like it was before this release) add the editor css_remove: img to your config.

ec96f24a59cd5212832c32a659cf211350e5759e

Minor Changes

  • fix(git): work in repo without commits yet 7436dff52e8fc80ae3ef6f632e39e121a781dd19
website-stalker - v0.12.1

Published by EdJoPaTo about 3 years ago

  • fix(rss): find link when the item itself is the link a464868862af2aed854b5672b1fd67a58701eb7b
  • build(container): use Github Action cached base image eddd8c494f411095657fb1940ce8fff9f3869367
  • ci(container): reduce image size c0d23678cf11f3fe060aa34ac8724d5b6078dc7c
website-stalker - v0.12.0

Published by EdJoPaTo about 3 years ago

Editors

Two new editors json_prettify and html_url_canonicalize. 73814fb03d26163c9212ccdc5ab3154d6790c4f6 e51baf089de20c7af8c50d11ea4d149969b39aab

IPv6 vs legacy IPv4

The log output now shows which kind of address was used. e034a70ba76c713dcaefd7c9268e84d20b2d374c

website-stalker - v0.10.0

Published by EdJoPaTo over 3 years ago

html_markdownify

A new editor html_markdownify can create markdown from html input. See more details about this new editor in the README. e1798ee3243cb7370e99009a14f77b6db2f060a9

html_textify

Creates now up to one empty line between filled lines db894e9f8e0584a2718bea2f9925a260de3e8545 32fa6d19923b7f8a0f9e2f811eda086e60a94374

Rename editors to be more like functions

Editors should now be more clear in what they are doing when they are applied. This is a breaking change and you have to adapt your configs in order to work with this release. 82cefbc45111103d557f38288cd0ad1a64891360

  • html_text → html_textify
  • css_selector → css_select
  • regex_replacer → regex_replace
website-stalker - v0.9.0

Published by EdJoPaTo over 3 years ago

RSS Feeds

A new editor rss is now able to generate RSS 2.0 feeds from input. See more details about this new editor in the README.

website-stalker - v0.8.0

Published by EdJoPaTo over 3 years ago

More generic config file format

Each site in the config file is now more generic. Before each entry was an html or utf8 entry. Now each entry is basically the same.
Each entry has an URL and a file extension which is then used to save the resulting file.
Each site can also have editors. An editor manipulates the content before saving the result.
css_selector and regex_replacer are now editors. The default behavior of html to prettify the content is now and editor too: html_prettify.
Additionally this update includes a new editor html_text which only returns text entries from the HTML.

To give an example:

sites:
  - url: "https://edjopato.de/post/"
    extension: html
    editors:
      - css_selector: article
      - css_selector:
          selector: .meta
          remove: true
      - html_prettify

If you want to see a config migration see this commit.

css_selector remove elements

The css_selector can now remove matching HTML elements from the result. This is already included in the example above.

html_text Editor

This editor only returns text entries from the HTML.
To give an example: This will save every h1 heading to the resulting file.

  - url: "https://edjopato.de/post/"
    extension: txt
    editors:
      - css_selector: h1
      - html_text

systemd improvements

  • explicit workingdir b36695a4cfbf32155ec30436aded9a14867ef7dd
  • only start timer for user 2f9dab4c56374cbb6a7a0f227f98a7fb7a203c6d
website-stalker - v0.7.1

Published by EdJoPaTo over 3 years ago

systemd user service/timer

You can use the website-stalker systemd units now as a user too. Check out the systemd/README.md for more details.
580bb5fb69ec4dd54d8efa1c5b96f8c3ceddfba2 953a0388190bdfdd67ba52d71282811b7bdf77d6

website-stalker - v0.7.0

Published by EdJoPaTo over 3 years ago

systemd files

Adds a systemd service and timer to be used locally 3a210f253c0c88c6a2fedd8bc63114af51ebfc66

libgit2

Migrate some functions from running git as a commandline tool towards libgit2.
This should make handling and detecting easier on the code side of things.
Not everything is migrated (yet?). Some outputs like the git diff are just fine currently via the commandline command.
0074611ba91f08f56f3b2fa0f3370d0e613957c6 bf08eb48e304e8246b338e6f515ec439197e4fbd 20f07d52ef9219f85592a41627b2db65dc3391cb 9398cd649a9d6c42c1d923351acd62dd7b9d6c63

This also allows for running from within a subfolder of a git repo 9398cd649a9d6c42c1d923351acd62dd7b9d6c63

Minor changes

  • fix(settings): require config file existance 662c18cb39a4311bd98f35af6e32b914043ec086
  • fix: improve readability of run error 2d827ea5f5c4e43e762dd3a64d0cfdfd14b7298e
website-stalker -

Published by EdJoPaTo over 3 years ago

Commit message summary

Show a summary of changes in the commit message body

63aae5828eceef656571470bf64d6d3312434340 cf165443592b9dac3c924e830a144178d4b59a5c 754c7e32271d49c9fe1d49e713586db4d160ba7f 799d740eb13905e646fd86d5469467e0eca81ab9

Console output

  • refactor(logger): rename HINT to INFO 791c21b30418a0e5abb005ec589c31e4e22dd3a1
  • feat: add ChangeKind init for first run 6b474b61047559d44b92a282ce86afdfa98cadd1
  • refactor: improve stdout message when filtered 078d5b6e6bf547d7fe13ad6c9be41f05e4ef7df0
website-stalker -

Published by EdJoPaTo over 3 years ago

  • 7c709d48a0587a75738a9bb7dd69a64dd2674fd7 feat(completions): generate completions
  • ade3b705ed2b0c102e73e596db34d84a673b9b85 perf: improve release build performance
website-stalker -

Published by EdJoPaTo over 3 years ago

  • 2f258ea83bf6aaf9afb0b8e7efc53998f267203e feat(filename): remove domain www prefix
  • d1fa40453f9ac1abc6b460a1179fd25bc7ffc49e feat(html): error when css_selector selects nothing
  • 9ac837c32284cc164bdb76905037879f69f613b4 refactor: simplify git add/commit logic
website-stalker -

Published by EdJoPaTo over 3 years ago

  • b221c477d937ecb0a5d681bafecb1a8302a7ccf9 feat(http): support gzip and brotli decompression
  • 52aa5f3c3a63aff79b55fe019e6cae917c980433 feat: hint on multiple same domain requests
  • 073e60eb749b5eec9dc8294249f32bc85f109fe6 feat: print warn/error to stderr instead of stdout
  • 5d801bc2d3a7f3b91859fa39e8c0de796b0ff2f3 feat(git): warn when no git repo / no git commit
  • ec51415d883315f42fe386ed9e29f36cfd294398 fix(logger): use also all uppercase for hint
  • improve docs
website-stalker -

Published by EdJoPaTo over 3 years ago

  • feat: run async and wait 5s between same domain 70f4a4ea9efe5dbb3485243dab27c8e1bda6e501 70c0ba2a7843c54ceb766d960ada7e21c7411b7d
  • feat: show time it took b13072962e937c3e7743337312137d494ca4980c
  • feat: improve validation of FROM header 0bf3ba6f7b8297f152979932dcc079678ff31f51
  • fix(http): set user_agent once 58ad848b31e8f28cf262a8c65df01088f90d9969
  • perf(git): do git add once with all files e6f8334e0943657b573df75c1db8e712325ff5e2
  • fix(git): remove version from git message 9af4cde7bae4b16698a4f40a8fa33b4eb26739ae
  • fix(git): never use a pager d47fe6390c474fd2d09ab8c42702e8e644c43bc1