Logprep

log data pre processing in python

LGPL-2.1 License

Downloads
2.3K
Stars
26
Committers
13

Bot releases are hidden (Show)

Logprep - v13.0.0

Published by ekneg54 4 months ago

Breaking

  • This release limits the maximum python version to 3.12.3 because of the issue
    #612.
  • Remove normalizer processor, as it's functionality was replaced by the grokker, timestamper and field_manager processors
  • Remove elasticsearch_output connector to reduce maintenance effort

Features

  • add a helm chart to install logprep in kubernetes based environments

Improvements

  • add documentation about behavior of the timestamper on ISO8601 and UNIX time parsing
  • add unit tests for helm chart templates
  • add helm to github actions runner
  • add helm chart release to release pipeline

Bugfix

  • fixes a bug where it could happen that a config value could be overwritten by a default in a later configuration in a multi source config scenario
  • fixes a bug in the field_manager where extending a non list target leads to a processing failure
  • fixes a bug in pseudonymizer where a missing regex_mapping from an existing config_file causes logprep to crash continuously

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v12.0.0...v13.0.0

Logprep - v12.0.0

Published by dtrai2 4 months ago

Breaking

  • pseudonymizer change rule config field pseudonyms to mapping
  • clusterer change rule config field target to source_fields
  • generic_resolver change rule config field append_to_list to extend_target_list
  • hyperscan_resolver change rule config field append_to_list to extend_target_list
  • calculator now adds the error tag _calculator_missing_field_warning to the events tag field instead of _calculator_failure in case of missing field in events
  • domain_label_extractor now writes _domain_label_extractor_missing_field_warning tag to event tags in case of missing fields
  • geoip_enricher now writes _geoip_enricher_missing_field_warning tag to event tags in case of missing fields
  • grokker now writes _grokker_missing_field_warning tag to event tags instead of _grokker_failure in case of missing fields
  • requester now writes _requester_missing_field_warning tag to event tags instead of _requester_failure in case of missing fields
  • timestamp_differ now writes _timestamp_differ_missing_field_warning tag to event tags instead of _timestamp_differ_failure in case of missing fields
  • timestamper now writes _timestamper_missing_field_warning tag to event tags instead of _timestamper_failure in case of missing fields
  • rename --thread_count parameter to --thread-count in http generator
  • removed --report parameter and feature from http generator
  • when using extend_target_list in the field managerthe ordering of the given source fields is now preserved
  • logprep now exits with a negative exit code if pipeline restart fails 5 times
    • this was implemented because further restart behavior should be configured on level of a system init service or container orchestrating service like k8s
    • the restart_count parameter is configurable. If you want the old behavior back, you can set this parameter to a negative number
  • logprep now exits with a exit code of 2 on configuration errors

Features

  • add UCL into the quickstart setup
  • add logprep http output connector
  • add pseudonymization tools to logprep -> see: logprep pseudo --help
  • add restart_count parameter to configuration
  • add option mode to pseudonymizer processor and to pseudonymization tools to chose the AES Mode for encryption and decryption
  • add retry mechanism to opensearch parallel bulk, if opensearch returns 429 rejected_execution_exception

Improvements

  • remove logger from Components and Factory signatures
  • align processor architecture to use methods like write_to_target, add_field_to and get_dotted_field_value when reading and writing from and to events
    • required substantial refactoring of the hyperscan_resolver, generic_resolver and template_replacer
  • change pseudonymizer, pre_detector, selective_extractor processors and pipeline to handle extra_data the same way
  • refactor clusterer, pre_detector and pseudonymizer processors and change rule_tree so that the processor do not require process override
    • required substantial refactoring of the clusterer
  • handle missing fields in processors via _handle_missing_fields from the field_manager
  • add LogprepMPQueueListener to outsource logging to a separate process
  • add a single Queuehandler to root logger to ensure all logs were handled by LogprepMPQueueListener
  • refactor http_generator to use a logprep http output connector
  • ensure all cached_properties are populated during setup time

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v11.3.0...v12.0.0

Logprep - v11.3.0

Published by ekneg54 5 months ago

Features

  • add gzip handling to http_input connector
  • adds advanced logging configuration
    • add configurable log format
    • add configurable datetime formate in logs
    • makes hostname available in custom log formats
    • add fine grained log level configuration for every logger instance

Improvements

  • rename logprep.event_generator module to logprep.generator
  • shorten logger instance names

Bugfix

  • fixes exposing OpenSearch/ElasticSearch stacktraces in log when errors happen by making loglevel configurable for loggers opensearch and elasticsearch
  • fixes the logprep quickstart profile

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v11.2.1...v11.3.0

Logprep - v11.2.1

Published by ekneg54 6 months ago

Bugfix

  • fixes bug, that leads to spawning exporter http server always on localhost

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v11.2.0...v11.2.1

Logprep - v11.2.0

Published by djkhl 6 months ago

Features

  • expose metrics via uvicorn webserver
    • makes all uvicorn configuration options possible
    • add security best practices to server configuration
  • add following metrics to http_input connector
    • nummer_of_http_requests
    • message_backlog_size

Bugfix

  • fixes a bug in grokker rules, where common field prefixes wasn't possible
  • fixes bug where missing key in credentials file leads to AttributeError

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v11.1.0...v11.2.0

Logprep - v11.1.0

Published by ekneg54 6 months ago

11.1.0

Features

  • new documentation part with security best practices which compiles to user_manual/security/best_practices.html
    • also comes with excel export functionality of given best practices
  • add basic auth to http_input

Bugfix

  • fixes a bug in http connector leading to only first process working
  • fixes the broken gracefull shutdown behaviour

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v11.0.1...v11.1.0

Logprep - v11.0.1

Published by djkhl 6 months ago

Bugfix

  • fixes a bug where the pipeline index increases on every restart of a failed pipeline
  • fixes closed log queue issue by run logging in an extra process

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v11.0.0...v11.0.1

Logprep - v11.0.0

Published by ekneg54 7 months ago

Breaking

  • configuration of Authentication for getters is now done by new introduced credentials file

Features

  • introducing an additional file to define the credentials for every configuration source
  • retrieve oauth token automatically from different oauth endpoints
  • retrieve configruation with mTLS authentication
  • reimplementation of HTTP Input Connector with following Features:
    • Wildcard based HTTP Request routing
    • Regex based HTTP Request routing
    • Improvements in thread-based runtime
    • Configuration and possibility to add metadata

Improvements

  • remove versioneer dependency in favor of setuptools-scm

Bugfix

  • fix version string of release versions
  • fix version string of container builds for feature branches
  • fix merge of config versions for multiple configs

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v10.0.4...v11.0.0

Logprep - v10.0.4

Published by ekneg54 7 months ago

Improvements

  • refactor logprep build process and requirements management

Bugfix

  • fix generic_adder not creating new field from type list

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v10.0.3...v10.0.4

Logprep - v10.0.4

Published by ekneg54 7 months ago

Improvements

  • refactor logprep build process and requirements management

Bugfix

  • fix generic_adder not creating new field from type list

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v10.0.3...v10.0.4

Logprep - v10.0.4

Published by djkhl 7 months ago

Improvements

  • refactor logprep build process and requirements management

Bugfix

  • fix generic_adder not creating new field from type list

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v10.0.3...v10.0.4

Logprep - v10.0.3

Published by ekneg54 8 months ago

Bugfix

  • fix loading of configuration inside the AutoRuleCorpusTester for logprep test integration
  • fix auto rule tester (test unit), which was broken after adding support for multiple configuration files and resolving paths in configuration files

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v10.0.2...v10.0.3

Logprep - v10.0.2

Published by djkhl 8 months ago

Bugfix

  • fix versioneer import
  • fix logprep does not complain about missing PROMETHEUS_MULTIPROC_DIR

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v10.0.1...v10.0.2

Logprep - v10.0.1

Published by djkhl 8 months ago

v10.0.1

Bugfix

  • fix entrypoint in setup.py that corrupted the install

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v10.0.0...v10.0.1

Logprep - v10.0.0

Published by ekneg54 8 months ago

v10.0.0

Breaking

  • reimplement the logprep CLI, see logprep --help for more information.
  • remove feature to reload configuration by sending signal SIGUSR1
  • remove feature to validate rules because it is already included in logprep test config

Features

  • add a number_of_successful_writes metric to the s3 connector, which counts how many events were successfully written to s3
  • make the s3 connector work with the new _write_backlog method introduced by the confluent_kafka commit bugfix in v9.0.0
  • add option to Opensearch Output Connector to use parallel bulk implementation (default is True)
  • add feature to logprep to load config from multiple sources (files or uris)
  • add feature to logprep to print the resulting configruation with logprep print json|yaml <Path to config> in json or yaml
  • add an event generator that can send records to Kafka using data from a file or from Kafka
  • add an event generator that can send records to a HTTP endpoint using data from local dataset

Improvements

  • a do nothing option do dummy output to ensure dummy does not fill memory
  • make the s3 connector raise FatalOutputError instead of warnings
  • make the s3 connector blocking by removing threading
  • revert the change from v9.0.0 to always check the existence of a field for negated key-value based lucene filter expressions
  • make store_custom in s3, opensearch and elasticsearch connector not call batch_finished_callback to prevent data loss that could be caused by partially processed events
  • remove the schema_and_rule_checker module
  • rewrite Logprep Configuration object see documentation for more details
  • rewrite Runner
  • delete MultiProcessingPipeline class to simplify multiprocesing
  • add FDA to the quickstart setup
  • bump versions for fastapi and aiohttp to address CVEs

Bugfix

  • make the s3 connector actually use the max_retries parameter
  • fixed a bug which leads to a FatalOutputError on handling CriticalInputError in pipeline

Details

New Contributors

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v9.0.3...v10.0.0

Logprep - v9.0.3

Published by ekneg54 10 months ago

v9.0.3

Breaking

Features

  • make thread_count, queue_size and chunk_size configurable for parallel_bulk in opensearch output connector

Improvements

Bugfix

  • fix parallel_bulk implementation not delivering messages to opensearch

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v9.0.2...v9.0.3

Logprep - v9.0.2

Published by dtrai2 11 months ago

Bugfix

  • remove duplicate pseudonyms in extra outputs of pseudonymizer

What's Changed

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v9.0.1...v9.0.2

Logprep - v9.0.1

Published by ekneg54 11 months ago

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v9.0.0...v9.0.1

Logprep - v9.0.0

Published by ekneg54 11 months ago

v9.0.0

Breaking

  • remove possibility to inject auth credentials via url string, because of the risk leaking credentials in logs
    • if you want to use basic auth, then you have to set the environment variables
      • :code:LOGPREP_CONFIG_AUTH_USERNAME=<your_username>
      • :code:LOGPREP_CONFIG_AUTH_PASSWORD=<your_password>
    • if you want to use oauth, then you have to set the environment variables
      • :code:LOGPREP_CONFIG_AUTH_TOKEN=<your_token>
      • :code:LOGPREP_CONFIG_AUTH_METHOD=oauth

Features

Improvements

  • improve error message on empty rule filter
  • reimplemented pseudonymizer processor
    • rewrote tests till 100% coverage
    • cleaned up code
    • reimplemented caching using pythons lru_cache
    • add cache metrics
    • removed max_caching_days config option
    • add max_cached_pseudonymized_urls config option which defaults to 1000
    • add lru caching for peudonymizatin of urls
  • improve loading times for the rule tree by optimizing the rule segmentation and sorting
  • add support for python 3.12 and remove support for python 3.9
  • always check the existence of a field for negated key-value based lucene filter expressions

Bugfix

  • fix the rule tree parsing some rules incorrectly, potentially resulting in more matches
  • fix confluent_kafka commit issue after kafka did some rebalancing, fixes also negative offsets

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v8.0.0...v9.0.0

Logprep - v8.0.0

Published by ekneg54 11 months ago

v8.0.0

Breaking

  • reimplemented metrics so the former metrics configuration won't work anymore
  • metric content changed and existent grafana dashboards will break
  • new rule id could possibly break configurations if the same rule is used in both rule trees
    • can be fixed by adding a unique id to each rule or delete the possibly redundant rule

Features

  • add possibility to convert hex to int in calculator processor with new added function from_hex
  • add metrics on rule level
  • add grafana example dashboards under quickstart/exampledata/config/grafana/dashboards
  • add new configuration field id for all rules to identify rules in metrics and logs
    • if no id is given, the id will be generated in a stable way
    • add verification of rule id uniqueness on processor level over both rule trees to ensure metrics are counted correctly on rule level

Improvements

  • reimplemented prometheus metrics exporter to provide gauges, histograms and counter metrics
  • removed shared counter, because it is redundant to the metrics
  • get exception stack trace by setting environment variable DEBUG

Details

Full Changelog: https://github.com/fkie-cad/Logprep/compare/v7.0.0...v8.0.0