semgrep

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

LGPL-2.1 License

Stars
9.7K
Committers
170

Bot releases are visible (Hide)

semgrep - Release v0.100.0

Published by github-actions[bot] over 2 years ago

0.100.0 - 2022-06-22

Added

  • taint-mode: New experimental pattern-propagators feature that allows to specify
    arbitrary patterns for the propagation of taint by side-effect. In particular,
    this allows to specify how taint propagates through side-effectful function calls.
    For example, you can specify that when tainted data is added to an array then the
    array itself becomes tainted. (#4509)

Changed

  • --config auto no longer sends the name of the repository being scanned to the Semgrep Registry.
    As of June 21st, this data is not recorded by the Semgrep Registry backend, even if an old Semgrep version sends it.
    Also as of June 21st, none of the previously collected repository names are retained by the Semgrep team;
    any historical data has been wiped.
  • Gitlab SAST output is now v14.1.2 compliant
  • Removed the following deprecated semgrep scan options:
    --json-stats, --json-time, --debugging-json, --save-test-output-tar, --synthesize-patterns,
    --generate-config/-g, --dangerously-allow-arbitrary-code-execution-from-rules,
    and --apply (which was an easter egg for job applications, not the same as --autofix)
  • PHP: switch to GA maturity! Thanks a lot to Sjoerd Langkemper for most of the
    heavy work

Fixed

  • Inline join mode rules can now run taint-mode rules
  • Python: correctly handle with context expressions where the value is not
    bound (#5513)
  • Solidity: update to a more recent tree-sitter-solidity to fix certain parsing
    errors (#4957)
semgrep - Release v0.98.0

Published by github-actions[bot] over 2 years ago

0.98.0 - 2022-06-15

Added

  • New language R with experimental support (#2360)
    Thanks to Zythosec for some contributions.
  • Autodetection of CI env now supports Azure Pipelines, Bitbucket, Buildkite, Circle CI, Jenkins,
    and Travis CI in addition to GitHub and GitLab
  • You can now disable version checks with an environment variable by setting
    SEMGREP_ENABLE_VERSION_CHECK=0
  • Dataflow: spread operators in record expressions (e.g. {...foo}) are now translated into the Dataflow IL
  • An experimental LSP daemon mode for semgrep. Try it with semgrep lsp --config auto!

Changed

  • Rules are now downloaded from the Semgrep Registry in JSON format instead of YAML.
    This speeds up rule parsing in the Semgrep CLI,
    making a semgrep --config auto run on the semgrep Python package in 14s instead of 16s.

Fixed

  • Fixed a bug where --disable-version-check would still send a request
    when a scan resulted in zero findings.
  • Fixed a regression in 0.97 where the Docker image's working directory changed from /src without notice.
    This also could cause permission issues when running the image.
  • Go: single pattern field can now match toplevel fields in a composite
    literal (#5452)
  • PHP: metavariable-pattern: works again when used with language: php (#5443)
  • PHP: booleans are propagated by constant propagation (#5509)
  • PHP: named arguments work in patterns (#5508)
  • Fixed a non-deterministic crash when matching a large number of regexes (#5277)
  • Fixed issue when running in GithubActions that caused semgrep to report on
    files not changed in the PR (#5453)
  • JS/TS: $X() no longer matches new Foo(), for consistency with other languages (#5510)
  • JS/TS: Typed metavariables now match constructor calls (e.g. ($X: C) matches new C(). (#5540)
semgrep - Release v0.97.0

Published by github-actions[bot] over 2 years ago

0.97.0 - 2022-06-08

Added

  • Dataflow: XML elements (e.g. JSX elements) have now a basic translation to the
    Dataflow IL, meaning that dataflow analysis (constant propagation, taint tracking)
    can now operate inside these elements (#5115)
  • Java: you can now use a metavariable in a package directive (#5420),
    for example, package $X, which is useful to bind the package
    name and use it in the error message.

Fixed

  • The output of semgrep ci should be clear it is exiting with error code 0
    when there are findings but none of them being blockers
  • Java: support for Sealed classes and Text Blocks via tree-sitter-java
    (#3787, #4644)
  • The JUnit XML output should serialize the failure messages as a single
    string instead of a python list of strings.
  • Typescript: update to latest tree-sitter-typescript, with support
    for 'abstract' modifier in more places
  • Scala: stop parsing parenthesized expressions as unary tuples
  • yarn.lock files with no depenencies, and with dependencies that lack URLs, now parse
  • Scala: fixed bug where typed patterns inside classes caused an exception during name resolution
  • metavariable-regex: patterns are now unanchored as specified by the
    documentation (#4807)
  • When a logged in CI scan encounters a Git failure,
    we now print a helpful error message instead of a traceback.
semgrep - Release v0.96.0

Published by github-actions[bot] over 2 years ago

0.96.0 - 2022-06-03

Added

  • Generic mode: new option generic_ellipsis_max_span for controlling
    how many lines an ellipsis can match (#5211)
  • Generic mode: new option generic_comment_style for ignoring
    comments that follow the specified syntax (C style, C++ style, or
    Shell style) (#3428)
  • Metrics now include a list of features used during an execution.
    Examples of such features are: languages scanned, CLI options passed, keys used in rules, or certain code paths reached, such as using an :include instruction in a .semgrepignore file.
    These strings will NOT include user data or specific settings. As an example, with semgrep scan --output=secret.txt we might send "option/output" but will NOT send "option/output=secret.txt".

Changed

  • The output summarizing a scan's results has been simplified.
semgrep - Release v0.95.0

Published by github-actions[bot] over 2 years ago

0.95.0 - 2022-06-02

Added

  • Sarif output format now includes fixes section
  • Rust: added support for method chaining patterns.
  • r2c-internal-project-depends-on: support for poetry and gradle lockfiles
  • M1 Mac support added to PyPi
  • Accept SEMGREP_BASELINE_REF as alias for SEMGREP_BASELINE_COMMIT
  • r2c-internal-project-depends-on:
    • pretty printing for SCA results
    • support for poetry and gradle lockfiles
  • taint-mode: Taint tracking will now analyze lambdas in their surrounding context.
    Previously, if a variable became tainted outside a lambda, and this variable was
    used inside the lambda causing the taint to reach a sink, this was not being
    detected because any nested lambdas were "opaque" to the analysis. (Taint tracking
    looked at lambdas but as isolated functions.) Now lambas are simply analyzed as if
    they were statement blocks. However, taint tracking still does not follow the flow
    of taint through the lambda's arguments!
  • Metrics now include an anonymous Event ID. This is an ID generated at send-time
    and will be used to de-duplicate events that potentially get duplicated during transmission.
  • Metrics now include an anonymous User ID. This ID is stored in the ~/.semgrep/settings.yml file. If the ID disappears, the next run will generate a new one randomly. See the Anonymous User ID in PRIVACY.md for more details.

Changed

  • The ci CLI command will now include ignored matches in output formats
    that dictate they should always be included
  • Previously, you could use $X in a message to interpolate the variable captured
    by a metavariable named $X, but there was no way to access the underlying value.
    However, sometimes that value is more important than the captured variable.
    Now you can use the syntax value($X) to interpolate the underlying
    propagated value if it exists (if not, it will just use the variable name).
    Example:
    Take a target file that looks like
    x = 42
    log(x)
    
    Now take a rule to find that log command:
    - id: example_log
      message: Logged $SECRET: value($SECRET)
      pattern: log(42)
      languages: [python]
    
    Before, this would have given you the message Logged x: value(x). Now, it
    will give the message Logged x: 42.
  • A parameter pattern without a default value can now match a parameter
    with a default value (#5021)

Fixed

  • Numerous improvements to PHP parsing by switching to tree-sitter-php
    to parse PHP target code. Huge shoutout to Sjoerd Langkemper for most
    of the heavy lifting work
    (#3941, #2648, #2650, #3590, #3588, #3587, #3576, #3848, #3978, #4589)
  • TS: support number and boolean typed metavariables (#5350)
  • When a rule from the registry fails to parse, suggest user upgrade to
    latest version of semgrep
  • Scala: correctly handle return for taint analysis (#4975)
  • PHP: correctly handle namespace use declarations when they don't rename
    the imported name (#3964)
  • Constant propagation is now faster and memory efficient when analyzing
    large functions with lots of variables.
semgrep - Release v0.94.0

Published by github-actions[bot] over 2 years ago

0.94.0 - 2022-05-25

Added

  • metavariable-regex now supports an optional constant-propagation key.
    When this is set to true, information learned from constant propagation
    will be used when matching the metavariable against the regex. By default
    it is set to false
  • Dockerfile: constant propagation now works on variables declared with ENV
  • shouldafound - False Negative reporting via the CLI

Changed

  • taint-mode: Let's say that e.g. taint(x) makes x tainted by side-effect.
    Previously, we had to rely on a trick that declared that any occurrence of
    x inside taint(x); ... was as taint source. If x was overwritten with
    safe data, this was not recognized by the taint engine. Also, if taint(x)
    occurred inside e.g. an if block, any occurrence of x outside that block
    was not considered tainted. Now, if you specify that the code variable itself
    is a taint source (using focus-metavariable), the taint engine will handle
    this as expected, and it will not suffer from the aforementioned limitations.
    We believe that this change should not break existing taint rules, but please
    report any regressions that you may find.
  • taint-mode: Let's say that e.g. sanitize(x) sanitizes x by side-effect.
    Previously, we had to rely on a trick that declared that any occurrence of
    x inside sanitize(x); ... was sanitized. If x later overwritten with
    tainted data, the taint engine would still regard x as safe. Now, if you
    specify that the code variable itself is sanitized (using focus-metavariable),
    the taint engine will handle this as expected and it will not suffer from such
    limitation. We believe that this change should not break existing taint rules,
    but please report any regressions that you may find.
  • The dot access ellipsis now matches field accesses in addition to method
    calls.
  • Made error message for resource exhausion (exit code -11/-9) more actionable
  • Made error message for rules with patterns missing positive terms
    more actionable (#5234)
  • In this version, we have made several performance improvements
    to the code that surrounds our source parsing and matching core.
    This includes file targeting, rule fetching, and similar parts of the codebase.
    Running semgrep scan --config auto on the semgrep repo itself
    went from 50-54 seconds to 28-30 seconds.
    • As part of these changes, we removed :include .gitignore and .git/
      from the default .semgrepignore patterns.
      This should not cause any difference in which files are targeted
      as other parts of Semgrep ignore these files already.
    • A full breakdown of our performance updates,
      including some upcoming ones,
      can be found here https://github.com/returntocorp/semgrep/issues/5257#issuecomment-1133395694
  • If a metrics event request times out, we no longer retry the request.
    This avoids Semgrep waiting 10-20 seconds before exiting if these requests are slow.
  • The metrics collection timeout has been raised from 2 seconds to 3 seconds.

Fixed

  • TS: support for template literal types after upgrading to a more recent
    tree-sitter-typescript (Oct 2021)
  • TS: support for override keyword (#4220, #4798)
  • TS: better ASI (#4459) and accept code like (null)(foo) (#4468)
  • TS: parse correctly private properties (#5162)
  • Go: Support for ellipsis in multiple return values
    (e.g., func foo() (..., error, ...) {}) (#4896)
  • semgrep-core: you can use again rules stored in JSON instead of YAML (#5268)
  • Python: adds support for parentheses around with context expressions
    (e.g., with (open(x) as a, open(y) as b): pass) (#5092)
semgrep - Release v0.93.0

Published by github-actions[bot] over 2 years ago

0.93.0 - 2022-05-17

Changed

  • Files where only some part of the code had to be skipped due to a parse failure
    will now be listed as "partially scanned" in the end-of-scan skip
    report.
  • Licensing: The ocaml-tree-sitter-core component is now distributed
    under the terms of the LGPL 2.1, rather than previously GPL 3.
  • A new field was added to metrics collection: isAuthenticated.
    This is a boolean flag which is true if you ran semgrep login.

Fixed

  • semgrep ci used to incorrectly report the base branch as a CI job's branch
    when running on a pull_request_target event in GitHub Actions.
    By fixing this, Semgrep App can now track issue status history with on: pull_request_target jobs.
  • Metrics events were missing timestamps even though PRIVACY.md had already documented a timestamp field.
semgrep - Release v0.92.1

Published by github-actions[bot] over 2 years ago

Added

  • Datafow: The dataflow engine now handles if-then-else expressions as in OCaml,
    Ruby, etc. Previously it only handled if-then-else statements. (#4965)

Fixed

  • Kotlin: support for ellispis in class parameters, e.g.. class Foo(...) {} (#5180)
  • fixed_lines is once again included in JSON output when running with --autofix --dryrun
semgrep - Release v0.92.0

Published by github-actions[bot] over 2 years ago

Added

  • The JSON output of semgrep scan is now fully specified using
    ATD (https://atd.readthedocs.io/) and jsonschema (https://json-schema.org/).
    See the semgrep-interfaces submodule under interfaces/
    (e.g., interfaces/semgrep-interfaces/Semgrep_output_v0.atd for the ATD spec)
  • The JSON output of semgrep scan now contains a "version": field with the
    version of Semgrep used to generate the match results.
  • taint-mode: Previously, to declare a function parameteter as a taint source,
    we had to rely on a trick that declared that any occurence of the parameter
    was a taint source. If the parameter was overwriten with safe data, this was
    not recognized by the taint engine. Now, focus-metavariable can be used to
    precisely specify that a function parameter is a source of taint, and the taint
    engine will handle this as expected.
  • taint-mode: Add basic support for object destructuring in languages such as
    Javascript. For example, given let {x} = E, Semgrep will now infer that x
    is tainted if E is tainted.

Fixed

  • OCaml: Parenthesis in autofixed code will no longer leave dangling closing-paren.
    Thanks to Elliott Cable for his contribution (#5087)
  • When running the Semgrep Docker image, we now mark all directories as safe for use by Git,
    which prevents a crash when the current user does not own the source code directory.
  • C++: Ellipsis are now allowed in for loop header (#5164)
  • Java: typed metavariables now leverages the type of foreach variables (#5181)
  • r2c-internal-project-depends-on:
    • Lockfiles that fail to parse will not crash semgrep
    • cargo.lock and Pipfile.lock dependencies that don't specify hashes now parse
    • go.sum files with a trailing newline now parse
semgrep - Release v0.91.0

Published by github-actions[bot] over 2 years ago

Added

  • --core-opts flag to send options to semgrep-core. For internal use: no guarantees made for semgrep-core options (#5111)

Changed

  • semgrep ci prints out all findings instead of hiding nonblocking findings (#5116)
semgrep - Release v0.90.0

Published by github-actions[bot] over 2 years ago

Added

  • Join mode now supports inline rules via the rules: key underneath the join: key.
  • Added vendor.name field in gitlab sast output (#5077)

Fixed

  • Keep only latest run logs in last.log file (#5070)
semgrep - Release v0.89.0

Published by github-actions[bot] over 2 years ago

Added

  • Bash/Dockerfile: Add support for named ellipses such as in
    echo $...ARGS (#4887)
  • Constant propagation for static constants in php (#5022)

Changed

  • When running a baseline scan on a shallow-cloned git repository,
    Semgrep still needs enough git history available
    to reach the branch-off point between the baseline and current branch.
    Previously, Semgrep would try to gradually fetch more and more commits
    up to a thousand commits of history,
    before giving up and just fetching all commits from the remote git server.
    Now, Semgrep will keep trying smaller batches until up to a million commits.
    This change should reduce runtimes on large baseline scans on very large repositories.
  • Semgrep-core now logs the rule and file affected by a memory warning.
  • Improved error messages from semgrep-core (#5013)
  • Small changes to text output (#5008)
  • Various exit codes changed so that exit code 1 is only for blocking findings (#5039)
  • Subcommand is sent as part of user agent (#5051)

Fixed

  • Lockfiles scanning now respects .semgrepignore
  • Workaround for git safe.directory change in github action (#5044)
  • When a baseline scan diff showed that a path changed a symlink a proper file,
    Semgrep used incorrectly skip that path. This is now fixed.
  • Dockerfile support: handle image aliases correctly (#4881)
  • TS: Fixed matching of parameters with type annotations. E.g., it is now possible
    to match ({ params }: Request) => { } with ({$VAR} : $REQ) => {...}. (#5004)
semgrep - Release v0.88.0

Published by github-actions[bot] over 2 years ago

Added

  • Scala support is now officially GA
    • Ellipsis method chaining is now supported
    • Type metavariables are now supported
  • Ruby: Add basic support for lambdas in patterns. You can now write patterns
    of the form -> (P) {Q} where P and Q are sub-patterns. (#4950)
  • Experimental semgrep install-deep-semgrep command for DeepSemgrep beta (#4993)

Changed

  • Moved description of parse/internal errors to the "skipped" section of output
  • Since 0.77.0 semgrep-core logs a warning when a worker process is consuming above
    400 MiB of memory. Now, it will also log an extra warning every time memory usage
    doubles. Again, this is meant to help diagnosing OOM-related crashes.

Fixed

  • Dockerfile: lang.json file not found error while building the docker image
  • Dockerfile: EXPOSE 12345 will now parse 12345 as an int instead of a string,
    allowing metavariable-comparison with integers (#4875)
  • Scala: unicode character literals now parse
  • Scala: multiple annotated type parameters now parse (def f[@an A, @an B](x : A, y : B) = ...)
  • Ruby: Allow 'unless' used as keyword argument or hash key (#4948)
  • Ruby: Fix regexp matching in the presence of escape characters (#4999)
  • r2c-internal-project-depends-on:
    • Generic mode rules work again
    • Semgrep will not fail on targets that contain no relevant lockfiles
  • Go: parse multiline string literals
  • Handle utf-8 decoding errors without crashing (#5023)
semgrep - Release v0.87.0

Published by github-actions[bot] over 2 years ago

0.87.0 - 2022-04-07

Added

  • New focus-metavariable operator that lets you focus (or "zoom in") the match
    on the code region delimited by a metavariable. This operator is useful for
    narrowing down the code matched by a rule, to focus on what really matters. (#4453)
  • semgrep ci uses "GITHUB_SERVER_URL" to generate urls if it is available
  • You can now set NO_COLOR=1 to force-disable colored output

Changed

  • taint-mode: We no longer force the unification of metavariables between
    sources and sinks by default. It is not clear that this is the most natural
    behavior; and we realized that, in fact, it was confusing even for experienced
    Semgrep users. Instead, each set of metavariables is now considered independent.
    The metavariables available to the rule message are all metavariables bound by
    pattern-sinks, plus the subset of metavariables bound by pattern-sources
    that do not collide with the ones bound by pattern-sinks. We do not expect
    this change to break many taint rules because source-sink metavariable
    unification had a bug (see #4464) that prevented metavariables bound by a
    pattern-inside to be unified, thus limiting the usefulness of the feature.
    Nonetheless, it is still possible to force metavariable unification by setting
    taint_unify_mvars: true in the rule's options.
  • r2c-internal-project-depends-on: this is now a rule key, and not part of the pattern language.
    The depends-on-either key can be used analgously to pattern-either
  • r2c-internal-project-depends-on: each rule with this key will now distinguish between
    reachable and unreachable findings. A reachable finding is one with both a dependency match
    and a pattern match: a vulnerable dependency was found and the vulnerable part of the dependency
    (according to the patterns in the rule) is used somewhere in code. An unreachable finding
    is one with only a dependency match. Reachable findings are reported as coming from the
    code that was pattern matched. Unreachable findings are reported as coming from the lockfile
    that was dependency matched. Both kinds of findings specify their kind, along with all matched
    dependencies, in the extra field of semgrep's JSON output, using the dependency_match_only
    and dependency_matches fields, respectively.
  • r2c-internal-project-depends-on: a finding will only be considered reachable if the file
    containing the pattern match actually depends on the dependencies in the lockfile containing the
    dependency match. A file depends on a lockfile if it is the nearest lockfile going up the
    directory tree.
  • The returntocorp/semgrep Docker image no longer sets semgrep as the entrypoint.
    This means that semgrep is no longer prepended automatically to any command you run in the image.
    This makes it possible to use the image in CI executors that run provisioning commands within the image.

Fixed

  • - is now parsed as a valid identifier in Scala
  • new $OBJECT(...) will now work properly as a taint sink (#4858)
  • JS/TS: ...{$X}... will no longer match str
  • taint-mode: Metavariables bound by a pattern-inside are now available to the
    rule message. (#4464)
  • parsing: fail fast on in semgrep-core if rules fail to validate (broken since 0.86.5)
  • Setting either SEMGREP_URL or SEMGREP_APP_URL
    now updates the URL used both for Semgrep App communication,
    and for fetching Semgrep Registry rules.
  • The pre-commit hook exposed from semgrep's repository no longer fails
    when trying to install with recent setuptools versions.
semgrep - Release v0.86.5

Published by github-actions[bot] over 2 years ago

Changed

  • pin urllib3 to ~=1.26
semgrep - Release v0.86.4

Published by github-actions[bot] over 2 years ago

0.86.0 - 2022-03-24

Added

  • Semgrep can now output findings in GitLab's SAST report and secret scanning
    report formats with --gitlab-sast and --gitlab-secrets.
  • JSON output now includes a fingerprint of each finding.
    This fingerprint remains consistent when matching code is just moved around
    or reindented.
  • Go: use latest tree-sitter-go with support for Go 1.18 generics (#4823)
  • Terraform: basic support for constant propagation of locals (#1147)
    and variables (#4816)
  • HTML: you can now use metavariable ellipsis inside (#4841)
    (e.g., <script>$...JS</script>)
  • A semgrep ci subcommand that auto-detects settings from your CI environment
    and can upload findings to Semgrep App when logged in.

Changed

  • SARIF output will include matching code snippet (#4812)
  • semgrep-core should now be more tolerant to rules using futur extensions by
    skipping those rules instead of just crashing (#4835)
  • Removed tests from published python wheel
  • Findings are now considered identical between baseline and current scans
    based on the same logic as Semgrep CI uses, which means:
    • Two findings are now identical after whitespace changes such as re-indentation
    • Two findings are now identical after a nosemgrep comment is added
    • Findings are now different if the same code triggered them on different lines
  • Docker image now runs as root to allow the docker image to be used in CI/CD pipelines
  • Support XDG Base directory specification (#4818)

Fixed

  • Entropy analysis: strings made of repeated characters such as
    'xxxxxxxxxxxxxx' are no longer reported has having high entropy (#4833)
  • Symlinks found in directories are skipped from being scanned again.
    This is a fix for a regression introduced in 0.85.0.
  • HTML: multiline raw text tokens now contain the newline characters (#4855)
  • Go: fix unicode parsing bugs (#4725) by switching to latest tree-sitter-go
  • Constant propagation: A conditional expression where both alternatives are
    constant will also be considered constant (#4301)
  • Constant propagation now recognizes operators ++ and -- as side-effectful
    (#4667)

0.86.1…0.86.4 - 2022-03-25

Fixed

  • Network timeouts during rule download are now less likely.
  • Some finding fingerprints were not matching what semgrep-agent would return.
  • The fingerprint of findings ignored with # nosemgrep is supposed to be the same
    as if the ignore comment wasn't there.
    This has previously only worked for single-line findings, including in semgrep-agent.
    Now the fingerprint is consistent as expected for multiline findings as well.
  • --timeout-threshold default set to 3 instead of 0
semgrep - Release v0.85.0

Published by github-actions[bot] over 2 years ago

Added

  • C#: use latest tree-sitter-c-sharp with support for most C# 10.0 features
  • HTML: support for metavariables on tags (e.g., `<$TAG>...</$TAG>) (#4078)
  • Scala: The data-flow engine can now handle expression blocks. This used to
    cause some false negatives during taint analysis, which will now be reported.
  • Dockerfile: allow e.g. CMD ... to match both CMD ls and CMD ["ls"] (#4770).
  • When scanning multiple languages,
    Semgrep will now print a table of how many rules and files are used for each language.

Fixed

  • Fixed Deep expression matching and metavariables interaction. Semgrep will
    not stop anymore at the first match and will enumarate all possible matchings
    if a metavariable is used in a deep expression pattern
    (e.g., <... $X ...>). This can introduce some performance regressions.

  • JSX: ellipsis in JSX body (e.g., <div>...</div>) now matches any
    children (#4678 and #4717)

  • ℹ️ During a --baseline-commit scan,
    Semgrep temporarily deletes files that were created since the baseline commit,
    and restores them at the end of the scan.

    Previously, when scanning a subdirectory of a git repo with --baseline-commit,
    Semgrep would delete all newly created files under the repo root,
    but restore only the ones in the subdirectory.
    Now, Semgrep only ever deletes files in the scanned subdirectory.

  • Previous releases allowed incompatible versions (21.1.0 & 21.2.0)
    of the attrs dependency to be installed.
    semgrep now correctly requires attrs 21.3.0 at the minimum.

  • package-lock.json parsing defaults to packages instead of dependencies as the source of dependencies

  • package-lock.json parsing will ignore dependencies with non-standard versions, and will succesfully parse
    dependencies with no integrity field

Changed

  • File targeting logic has been mostly rewritten. (#4776)
    These inconsistencies were fixed in the process:

    • ℹ️ "Explicitly targeted file" refers to a file
      that's directly passed on the command line.

      Previously, explicitly targeted files would be unaffected by most global filtering:
      global include/exclude patterns and the file size limit.
      Now .semgrepignore patterns don't affect them either,
      so they are unaffected by all global filtering,

    • ℹ️ With --skip-unknown-extensions,
      Semgrep scans only the explicitly targeted files that are applicable to the language you're scanning.

      Previously, --skip-unknown-extensions would skip based only on file extension,
      even though extensionless shell scripts expose their language via the shebang of the first line.
      As a result, explicitly targeted shell files were always skipped when --skip-unknown-extensions was set.
      Now, this flag decides if a file is the correct language with the same logic as other parts of Semgrep:
      taking into account both extensions and shebangs.

  • Semgrep scans with --baseline-commit are now much faster.
    These optimizations were added:

    • ℹ️ When --baseline-commit is set,
      Semgrep first runs the current scan,
      then switches to the baseline commit,
      and runs the baseline scan.

      The current scan now excludes files
      that are unchanged between the baseline and the current commit
      according to git status output.

    • The baseline scan now excludes rules and files that had no matches in the current scan.

    • When git ls-files is unavailable or --disable-git-ignore is set,
      Semgrep walks the file system to find all target files.
      Semgrep now walks the file system 30% faster compared to previous versions.

  • The output format has been updated to visually separate lines
    with headings and indentation.

semgrep - Release v0.84.0

Published by github-actions[bot] over 2 years ago

Added

  • new --show-supported-languages CLI flag to display the list of languages
    supported by semgrep. Thanks to John Wu for his contribution! (#4754)
  • --validate will check that metavariable-x doesn't use an invalid
    metavariable
  • Add r2c-internal-project-depends on support for Java, Go, Ruby, and Rust
  • PHP: .tpl files are now considered PHP files (#4763)
  • Scala: Support for custom string interpolators (#4655)
  • Scala: Support parsing Scala scripts that contain plain definitions outside
    an Object or Class
  • JSX: JSX singleton elements (a.k.a XML elements), e.g., <foo /> used to
    match also more complex JSX elements, e.g., <foo >some child</foo>.
    This can now be disabled via rule options:
    with xml_singleton_loose_matching: false (#4730)
  • JSX: new matching option xml_attrs_implicit_ellipsis that allows
    disabling the implicit ... that was added to JSX attributes patterns.
  • new focus-metavariable: experimental operator (#4735) (the syntax may change
    in the near futur)

Fixed

  • Report parse errors even when invoked with --strict
  • Show correct findings count when using --config auto (#4674)
  • Kotlin: store trailing lambdas in the AST (#4741)
  • Autofix: Semgrep no longer errors during --dry-runs where one fix changes the line numbers in a file that also has a second autofix.
  • Performance regression when running with --debug (#4761)
  • Allow metrics flag and metrics env var at the same time if both are set to the same value (#4703)
  • Scan yarn.lock dependencies that do not specify a hash
  • Run project-depends-on rules with only pattern-inside at their leaves
  • Dockerfile patterns no longer need a trailing newline (#4773)
semgrep - Release v0.83.0

Published by github-actions[bot] over 2 years ago

Added

  • semgrep saves logs of last run to ~/.semgrep/last.log
  • A new recursive operator, -->, for join mode rules for recursively
    chaining together Semgrep rules based on metavariable contents.
  • Semgrep now lists the scanned paths in its JSON output under the
    paths.scanned key.
  • When using --verbose, the skipped paths are also listed under the
    paths.skipped key.
  • C#: added support for typed metavariables (#4657)
  • Undocumented, experimental metavariable-analysis feature
    supporting two kinds of analyses: prediction of regular expression
    denial-of-service vulnerabilities (ReDoS, redos analyzer, #4700)
    and high-entropy string detection (entropy analyzer, #4672).
  • A new subcommand semgrep publish allows users to upload private,
    unlisted, or public rules to the Semgrep Registry

Fixed

  • Configure the PCRE engine with lower match-attempts and recursion limits in order
    to prevent regex matching from potentially "hanging" Semgrep
  • Terraform: Parse heredocs respecting newlines and whitespaces, so that it is
    possible to correctly match these strings with metavariable-regex or
    metavariable-pattern. Previously, Semgrep had problems analyzing e.g. embedded
    YAML content. (#4582)
  • Treat Go raw string literals like ordinary string literals (#3938)
  • Eliminate zombie uname processes (#4466)
  • Fix for: semgrep always highlights one extra character

Changed

  • Improved constant propagation for global constants
  • PHP: Constant propagation now has built-in knowledge of escapeshellarg and
    htmlspecialchars_decode, if these functions are given constant arguments,
    then Semgrep assumes that their output is also constant
  • The environment variable used by Semgrep login changed from SEMGREP_LOGIN_TOKEN to SEMGREP_APP_TOKEN
semgrep - Release v0.82.0

Published by github-actions[bot] over 2 years ago

0.82.0 - 02-08-2022

Added

  • Experimental baseline scanning. Run with --baseline-commit GIT_COMMIT to only
    show findings that currently exist but did not exist in GIT_COMMIT

Changed

  • Performance: send all rules directly to semgrep-core instead of invoking semgrep-core
  • Scans now report a breakdown of how many target paths were skipped for what reason.
    • --verbose mode will list all skipped paths along with the reason they were skipped
  • Performance: send all rules directly to semgrep-core instead of invoking semgrep-core
    for each rule, reducing the overhead significantly. Other changes resulting from this:
    Sarif output now includes all rules run. Error messages use full path of rules.
    Progress bar reports by file instead of by rule
  • Required minimum version of python to run semgrep now 3.7 instead of EOL 3.6
  • Bloom filter optimization now considers import module file names, thus
    speeding up matching of patterns like import { $X } from 'foo'
  • Indentation is now removed from matches to conserve horizontal space

Fixed

  • Typescript: Patterns E as T will be matched correctly. E.g. previously
    a pattern like v as $T would match v but not v as any, now it
    correctly matches v as any but not v. (#4515)
  • Highlighting has been restored for matching code fragments within a finding