website-stalker

Track changes on websites via git

LGPL-2.1 License

Stars
40
Committers
4

Bot releases are visible (Hide)

website-stalker - v0.21.0 Latest Release

Published by github-actions[bot] about 1 year ago

Changed

  • Files are sorted into folders of their domains (#187)
website-stalker - v0.20.0

Published by github-actions[bot] over 1 year ago

Added

  • new editor: html_sanitize
  • headers site options to supply additional headers on requests
  • filename option to override the automatically derived file base name from an url
  • Support URLs with IP addresses

Changed

  • Use git executable instead of libgit2
  • Improve example-config
  • write full words configuration and git repository instead of its short versions on stdout
  • Include port in filename when specified

Removed

  • check --rewrite-yaml and check --print-yaml
website-stalker - v0.19.0

Published by EdJoPaTo over 2 years ago

Added

  • ignore_error site option to only warn on pages that fail regularly
  • Generate deb/rpm packages

Changed

  • systemd files are now meant for packages (no …/local/… anymore)

Fixed

  • CLI: correct autocompletion with ValueHint
website-stalker - v0.18.1

Published by EdJoPaTo over 2 years ago

  • fix(css_remove): prevent removing wrong content e459bc6c663a128bc22ab284c49cce43b64b9c27
website-stalker - v0.18.0

Published by EdJoPaTo over 2 years ago

html_prettify attribute improvements

Before this release changes like this occurred regularly:

-<a class="external link">
+<a class="link external">

-<a style="color: white; display: none">
+<a style="display:none;color:white">

This release sorts classes and formats style. This reduces the amount of diffs when the host only changes something like the order.
It also fits into the concept of 'pretty' HTML which this editor attempts.

eeb020f969e8f871a2f2df0202f3fe57aa522c88 4a408269cbdf6f330eb49c91f6568a97e0eaed88

support URL queries

Some websites are server generated based on the queries used. Different queries for the same domain/path are now possible.

05c4dc85642132d0da19cfaf56edbde2c31a309b

Minor changes

housekeeping, dependency updates, …

website-stalker - v0.17.0

Published by EdJoPaTo almost 3 years ago

HTML parsing improvements

html_markdownify, html_prettify and html_textify received bugfixes and improvements to parsing.
HTML parts aren't escaped anymore 2989f56db5660c5c210751d23b6d22ab28e01fc3 and prettify ensures indentation of text contents 952bde35eb308fcadf458ef727c421e6907667fe.
html_markdownify now uses the html2md crate which implements more features and less strange edge cases 1362a4ee8dc95a60a69ed36b95d36f3afe611488.

RSS pubDate

It is now attempted to read the datetime attribute from elements to determine the pubDate of the RSS item.
The goal of the datetime element is to provide a machine-readable format. As parsing the date time from various human formats is hard this is probably the simplest way of adding a useful pubDate when possible while not over-complicating things.

5b6d9716a94ba0036c95df594d51bb2dbcac05a0

Minor Changes

  • feat: improve 5sec between domain logic 722bf5f4cb40ead9ebbcb14786e6e8ea194ab00d
  • perf: dont recreate regular expressions 836ad7845ba71315fa343f7c76f7bcb7cfa36dd9
  • docs(readme): fix typos and backticks in README (#47) 84d8b4511673dbd73fdf814b650f2abe1165ee45
website-stalker - v0.16.0

Published by EdJoPaTo almost 3 years ago

Automatically assume file extensions

previously you configured the wanted extension via the config file. This is now automatically assumed based on the Content-Type HTTP Header and the used editors.

 - url: "https://edjopato.de"
-  extension: md
   editors:
     - html_markdownify

a6ba06fdf15b17cb3279d1aa51385ac892e51ac6 01afa7ff57bf6e59ed5d195eeb22ac3c48562537

Notifications

Its now possible to send notifications on changes via pling.
Notification targets (E-Mail, Slack, Telegram, …) are entirely configured via environment variables as they mainly contain secrets. Check the pling documentation about which environment variables can be set.
The sent notification can be changed via the new config key notification_template.
When using GitHub Actions you can check out their Environment variable documentation and the example repo config which configures Telegram notifications into this Telegram channel.

1b1977a89c444a25c5b4d3df37fc80c251b6e157 14d3837390f475630b0af1cd340bd9c4463bee27 24b6cd605aa6591fa202cb3bb38d8df8fba7042d 8ad53c1e585b02576f38a7aaff12299a14744011

Improvements to website-stalker check

Check shows more details like configured notifications. This will not show details to prevent leakage of secrets and only the amount of configured notification targets.

Its also possible to print or rewrite the current config as yaml.
This is helpful when migrating older configs or checking if certain environment variables are correctly read.

3caf39eecbb6b8e620b919fe2f181df1594cbc42 d9bfff9596f23e3507b7672958eeb25a22cd5719 3b7c82daaf414ce519f86cced7980f86abd69d32

Minor Changes

  • feat(config): allow loading via environment variable 94e4bd427c27b01e3b54cc0e9b10173135048b56
  • fix: dont prefix sites in message with M or A 6959978a8a499aadf4d0172fd89a3121d55c457e
website-stalker - v0.15.0

Published by EdJoPaTo about 3 years ago

Multiple URLs with same options

You can now specify an URL array to be used for an entry in the config. This way multiple urls will use the same specified options.
This is especially for stalking multiple nearly the same webpages.

To provide an example:

sites:
  - url: "https://edjopato.de/"
    extension: html
  - url: "https://edjopato.de/post/"
    extension: html

Can now also be specified like this:

sites:
  - url:
      - "https://edjopato.de/"
      - "https://edjopato.de/post/"
    extension: html

092123355c9dfe621a17f357de8e4208555a52b9

Minor Changes

  • fix(rss): error when no items are selected 408f0b05db52b24035c4f14e1c33fef1b50391f0
  • fix: use actual url for editors f0136b18edeaea2672e8ddb1a76569afa16f7cc7
website-stalker - v0.14.0

Published by EdJoPaTo about 3 years ago

  • feat: add accept_invalid_certs site option c4a25e2b2feddc84a85704c4f1314eecc20f126b
  • feat(git): git message head according to changes 2a233d1b74836b81354608a5a8a1f5059ece9c75
  • fix(git): dont show same change twice in git commit message 71f3da5b33e2f4efba5576c3aed45249125a0e6c
  • build(http): enable socks5 proxy support 3ba2ddbce5c2a82bd85f798ea99f6e3b59df3560
  • build(http): enable deflate body decompression 01b0237e8f7fdee69b472034b56598c9ca11b444
website-stalker - v0.13.0

Published by EdJoPaTo about 3 years ago

Split css_select into css_select and css_remove

This results in simpler configs for removing via css selector:

 editors:
-   - css_select:
-       selector: img
-       remove: true
+   - css_remove: img

This is a breaking change and also simplifies the internal logic.

fafc19d269954ea87af968ffd4421c5d08397b52

img in html_markdownify

Images are now added to the markdown output.

Images will require absolute paths when markdown is being rendered as html so html_url_canonicalize is helpful here.

If you do not want the images (like it was before this release) add the editor css_remove: img to your config.

ec96f24a59cd5212832c32a659cf211350e5759e

Minor Changes

  • fix(git): work in repo without commits yet 7436dff52e8fc80ae3ef6f632e39e121a781dd19
website-stalker - v0.12.1

Published by EdJoPaTo about 3 years ago

  • fix(rss): find link when the item itself is the link a464868862af2aed854b5672b1fd67a58701eb7b
  • build(container): use Github Action cached base image eddd8c494f411095657fb1940ce8fff9f3869367
  • ci(container): reduce image size c0d23678cf11f3fe060aa34ac8724d5b6078dc7c
website-stalker - v0.12.0

Published by EdJoPaTo about 3 years ago

Editors

Two new editors json_prettify and html_url_canonicalize. 73814fb03d26163c9212ccdc5ab3154d6790c4f6 e51baf089de20c7af8c50d11ea4d149969b39aab

IPv6 vs legacy IPv4

The log output now shows which kind of address was used. e034a70ba76c713dcaefd7c9268e84d20b2d374c

website-stalker - v0.11.0

Published by github-actions[bot] about 3 years ago

Simplify Git Logic

The git part was heavily updated. When running with --commit the command now aborts when not in a git repo or the repo is unclean.
If the repo is unclean (without --commit) no more git add is used which simplifies testing out the ideal config before commiting it.

With these changes also now all the git logic is handled via libgit2. The git binary is not anymore a required dependency. ❇️

  • feat(run)!: prevent --commit in a not clean repo 73800f0f2a4dfcb4e87f74a4958cd94929f1e3e5
  • feat!: prevent --commit when not in a git repo 664837c0d91a8bed766a45c20df40fdd572091cf
  • fix(run): only git add when --commit 25fa0d80ddb58d68f2bf6aeb273de971b6122e0e
  • fix(git): dont integrate git diff and git status da23989310de2bff0d0db7df43fc4438845af8f7
  • feat(run): dont cleanup or reset b75d9f8e989f29f02d31219178ab6cd6fa53845f
  • refactor(run): simplify git finishup logic 8efda45681acb8551d9732285d447d69ea0fdd2d

Warn on redirected URLs

Some urls are redirected first before the content is returned. This results in additional traffic and roundtrips. As this is done every time the website-stalker is running this adds up over time. In order to reduce traffic the target of the redirects should be specified directly.
There is now a warning which shows which URL leads where and suggests using the target instead.

  • feat: warn on redirected URLs to reduce traffic 4c9136c8241750e89783bacaae0dc94bf7bcb76e

Init command

You can now init a directory with a git repo (git init) and a config (website-stalker example-config > website-stalker.yaml) in one neat command:
website-stalker init

  • feat(init): provide init folder/repo/config command 9842d9af37399eba78fd33adce211ab05f8ca4c4

Case insensitive site filter

The site filter is now case insensitve. When you had to use website-stalker run EdJoPaTo for running on https://EdJoPaTo.de you can now do so with website-stalker run edjopato

  • feat(cli)!: site filter is now case insensitive 85af5f6f5dbca0a665a0034beb3e4ab41835b394

Config format is now fixed

Before you could use other formats for the config like website-stalker.toml. In order to simplify the config logic the config now has to be a yaml file.

  • refactor(config)!: simplify 4d5e390b1d1d3f5d7a0ce1720cfe8a579f9ff725

Minor Changes

  • fix(check): dont panic, just exit code != 0 3689922e8b2703823a5cd8d614ab0351a19c6a8e
  • fix: dont print empty lines 9b1eb2eb741ebbaafdb07822629177d921beb66f
website-stalker - v0.10.0

Published by EdJoPaTo over 3 years ago

html_markdownify

A new editor html_markdownify can create markdown from html input. See more details about this new editor in the README. e1798ee3243cb7370e99009a14f77b6db2f060a9

html_textify

Creates now up to one empty line between filled lines db894e9f8e0584a2718bea2f9925a260de3e8545 32fa6d19923b7f8a0f9e2f811eda086e60a94374

Rename editors to be more like functions

Editors should now be more clear in what they are doing when they are applied. This is a breaking change and you have to adapt your configs in order to work with this release. 82cefbc45111103d557f38288cd0ad1a64891360

  • html_text → html_textify
  • css_selector → css_select
  • regex_replacer → regex_replace
website-stalker - v0.9.0

Published by EdJoPaTo over 3 years ago

RSS Feeds

A new editor rss is now able to generate RSS 2.0 feeds from input. See more details about this new editor in the README.

website-stalker - v0.8.0

Published by EdJoPaTo over 3 years ago

More generic config file format

Each site in the config file is now more generic. Before each entry was an html or utf8 entry. Now each entry is basically the same.
Each entry has an URL and a file extension which is then used to save the resulting file.
Each site can also have editors. An editor manipulates the content before saving the result.
css_selector and regex_replacer are now editors. The default behavior of html to prettify the content is now and editor too: html_prettify.
Additionally this update includes a new editor html_text which only returns text entries from the HTML.

To give an example:

sites:
  - url: "https://edjopato.de/post/"
    extension: html
    editors:
      - css_selector: article
      - css_selector:
          selector: .meta
          remove: true
      - html_prettify

If you want to see a config migration see this commit.

css_selector remove elements

The css_selector can now remove matching HTML elements from the result. This is already included in the example above.

html_text Editor

This editor only returns text entries from the HTML.
To give an example: This will save every h1 heading to the resulting file.

  - url: "https://edjopato.de/post/"
    extension: txt
    editors:
      - css_selector: h1
      - html_text

systemd improvements

  • explicit workingdir b36695a4cfbf32155ec30436aded9a14867ef7dd
  • only start timer for user 2f9dab4c56374cbb6a7a0f227f98a7fb7a203c6d
website-stalker - v0.7.1

Published by EdJoPaTo over 3 years ago

systemd user service/timer

You can use the website-stalker systemd units now as a user too. Check out the systemd/README.md for more details.
580bb5fb69ec4dd54d8efa1c5b96f8c3ceddfba2 953a0388190bdfdd67ba52d71282811b7bdf77d6

website-stalker - v0.7.0

Published by EdJoPaTo over 3 years ago

systemd files

Adds a systemd service and timer to be used locally 3a210f253c0c88c6a2fedd8bc63114af51ebfc66

libgit2

Migrate some functions from running git as a commandline tool towards libgit2.
This should make handling and detecting easier on the code side of things.
Not everything is migrated (yet?). Some outputs like the git diff are just fine currently via the commandline command.
0074611ba91f08f56f3b2fa0f3370d0e613957c6 bf08eb48e304e8246b338e6f515ec439197e4fbd 20f07d52ef9219f85592a41627b2db65dc3391cb 9398cd649a9d6c42c1d923351acd62dd7b9d6c63

This also allows for running from within a subfolder of a git repo 9398cd649a9d6c42c1d923351acd62dd7b9d6c63

Minor changes

  • fix(settings): require config file existance 662c18cb39a4311bd98f35af6e32b914043ec086
  • fix: improve readability of run error 2d827ea5f5c4e43e762dd3a64d0cfdfd14b7298e
website-stalker -

Published by EdJoPaTo over 3 years ago

Commit message summary

Show a summary of changes in the commit message body

63aae5828eceef656571470bf64d6d3312434340 cf165443592b9dac3c924e830a144178d4b59a5c 754c7e32271d49c9fe1d49e713586db4d160ba7f 799d740eb13905e646fd86d5469467e0eca81ab9

Console output

  • refactor(logger): rename HINT to INFO 791c21b30418a0e5abb005ec589c31e4e22dd3a1
  • feat: add ChangeKind init for first run 6b474b61047559d44b92a282ce86afdfa98cadd1
  • refactor: improve stdout message when filtered 078d5b6e6bf547d7fe13ad6c9be41f05e4ef7df0
website-stalker -

Published by EdJoPaTo over 3 years ago

  • 7c709d48a0587a75738a9bb7dd69a64dd2674fd7 feat(completions): generate completions
  • ade3b705ed2b0c102e73e596db34d84a673b9b85 perf: improve release build performance