semgrep

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

LGPL-2.1 License

Stars
9.7K
Committers
170

Bot releases are visible (Hide)

semgrep - Release v1.58.0

Published by github-actions[bot] 9 months ago

1.58.0 - 2024-01-23

Added

  • Added a severity icon (e.g. "❯❯❱") and corresponding color to our CLI text output
    for findings of known severity. (grow-97)

  • Naming has better support for if statements. In particular, for
    languages with block scope, shadowed variables inside if-else blocks
    that are tainted won't "leak" outside of those blocks.

    This helps with features related to naming, such as tainting.

    For example, previously in Go, the x in sink(x) will report
    that x is tainted, even though the x that is tainted is the
    one inside the scope of the if block.

    func f() {
      x := "safe";
      if (c) {
        x := "tainted";
      }
      // x should not be tainted
      sink(x);
    }
    

    This is now fixed. (pa-3185)

  • OSemgrep can now scan remote git repositories. Pass --experimental --pro --remote http[s]://<website>/.../<repo>.git to use this feature (pa-remote)

Changed

  • Rules stored under an "hidden" directory (e.g., dir/.hidden/myrule.yml)
    are now processed when using --config .
    We used to skip dot files under dir, but keeping rules/.semgrep.yml,
    but not path/.github/foo.yml, but keeping src/.semgrep/bad_pattern.yml
    but not ./.pre-commit-config.yaml, ... This was mainly because
    we used to fetch rules from ~/.semgrep/ implicitely when --config
    was not given, but this feature was removed, so now we can keep it simple. (hidden_rules)
  • Removed support for writing rules using jsonnet. This feature
    will be restored once we finish the port to OCaml of the semgrep CLI. (jsonnet)
  • The primitive object construct expression will no longer match the new
    expression pattern. For example, the pattern new $TYPE will now only match
    new int, not int(). (pa-3336)
  • The placement new expression will no longer match the new expression without
    placement. For instance, the pattern new ($STORAGE) $TYPE will now only match
    new (storage) int and not new int. (pa-3338)

Fixed

  • Java: You can now use metavariable ellipses properly in
    function arguments, as statements, and as expressions.

    For instance, you may write the pattern

    public $F($...ARGS) { ... }
    ``` (gh-9260)
    
  • Nosemgrep: Fixed a bug where Semgrep would err upon reading a nosemgrep
    comment with multiple rule IDs. (gh-9463)

  • Fixed bugs in gitignore/semgrepignore globbing implementation affecting --experimental. (gh-9544)

  • Fixed rule IDs, descriptions, findings, and autofix text not wrapping as expected.
    Use newline instead of horiziontal separator for findings with a shared file
    but for different rules per design spec. (grow-97)

  • Keep track of the origin of return; statements in the dataflow IL so that
    recently added (Pro-only) at-exit: true sinks work properly on them. (pa-3337)

  • C++: Improve translation of delete expressions to the dataflow IL so that
    recently added (Pro-only) at-exit: true sinks work on them. Previously
    delete expression at "exit" positions were not being properly recognized
    as such. (pa-3339)

  • cli: fix python runtime error with 0 width wrapped printing (pa-3366)

  • Fixed a bug where Gemfile.lock files with multiple GEM sections
    would not be parsed correctly. (sc-1230)

semgrep - Release v1.57.0

Published by github-actions[bot] 9 months ago

1.57.0 - 2024-01-18

Added

  • Added a severity icon (e.g. "❯❯❱") and corresponding color to our CLI text output
    for findings of known severity. (grow-97)

  • Naming has better support for if statements. In particular, for
    languages with block scope, shadowed variables inside if-else blocks
    that are tainted won't "leak" outside of those blocks.

    This helps with features related to naming, such as tainting.

    For example, previously in Go, the x in sink(x) will report
    that x is tainted, even though the x that is tainted is the
    one inside the scope of the if block.

    func f() {
      x := "safe";
      if (c) {
        x := "tainted";
      }
      // x should not be tainted
      sink(x);
    }
    

    This is now fixed. (pa-3185)

  • OSemgrep can now scan remote git repositories. Pass --experimental --pro --remote http[s]://<website>/.../<repo>.git to use this feature (pa-remote)

Changed

  • Rules stored under an "hidden" directory (e.g., dir/.hidden/myrule.yml)
    are now processed when using --config .
    We used to skip dot files under dir, but keeping rules/.semgrep.yml,
    but not path/.github/foo.yml, but keeping src/.semgrep/bad_pattern.yml
    but not ./.pre-commit-config.yaml, ... This was mainly because
    we used to fetch rules from ~/.semgrep/ implicitely when --config
    was not given, but this feature was removed, so now we can keep it simple. (hidden_rules)
  • The primitive object construct expression will no longer match the new
    expression pattern. For example, the pattern new $TYPE will now only match
    new int, not int(). (pa-3336)
  • The placement new expression will no longer match the new expression without
    placement. For instance, the pattern new ($STORAGE) $TYPE will now only match
    new (storage) int and not new int. (pa-3338)

Fixed

  • Java: You can now use metavariable ellipses properly in
    function arguments, as statements, and as expressions.

    For instance, you may write the pattern

    public $F($...ARGS) { ... }
    ``` (gh-9260)
    
  • Fixed bugs in gitignore/semgrepignore globbing implementation affecting --experimental. (gh-9544)

  • Fixed rule IDs, descriptions, findings, and autofix text not wrapping as expected.
    Use newline instead of horiziontal separator for findings with a shared file
    but for different rules per design spec. (grow-97)

  • Keep track of the origin of return; statements in the dataflow IL so that
    recently added (Pro-only) at-exit: true sinks work properly on them. (pa-3337)

  • C++: Improve translation of delete expressions to the dataflow IL so that
    recently added (Pro-only) at-exit: true sinks work on them. Previously
    delete expression at "exit" positions were not being properly recognized
    as such. (pa-3339)

  • Fixed a bug where Gemfile.lock files with multiple GEM sections
    would not be parsed correctly. (sc-1230)

semgrep - Release v1.56.0

Published by github-actions[bot] 9 months ago

1.56.0 - 2024-01-10

Added

  • Added a new field that breaks down the number of findings per product
    in the metrics that are sent out by the CLI. This will help Semgrep
    understand users better. (pa-3312)
semgrep - Release v1.55.2

Published by github-actions[bot] 10 months ago

1.55.2 - 2024-01-05

Fixed

  • taint-mode: Semgrep was missing some sources occurring inside type expressions,
    for example:

    char *p = new char[source(x)];
    sink(x);
    

    Now, if x is tainted by side-effect, Semgrep will check x inside the type
    expression char[...] and record it as tainting, and generate a finding for
    sink(x). (pa-3313)

  • taint-mode: C/C++: Sanitization by side-effect was not working correctly for
    ptr->fld l-values. In particular, if ptr is tainted, and then ptr->fld is
    sanitized, Semgrep will now correctly consider ptr->fld as clean. (pa-3328)

semgrep - Release v1.55.1

Published by github-actions[bot] 10 months ago

1.55.1 - 2024-01-04

Fixed

  • Honor temporary folder specified via the TMPDIR environment variable (or
    equivalent on Windows) in some instances where it used to be hardcoded as
    /tmp. (gh-9534)
  • Fix pipfile manifest parser error (sc-1084)
semgrep - Release v1.54.1

Published by github-actions[bot] 10 months ago

1.54.1 - 2023-12-20

No significant changes.

semgrep - Release v1.54.0

Published by github-actions[bot] 10 months ago

1.54.0 - 2023-12-19

Added

  • Pro only: taint-mode: In a function/method call, it is now possible to arbitrarily
    propagate taint between arguments and the callee. For example in C, one can
    propagate taint from the second argument of strcat to the first, that is,
    strcat($TO, $FROM). Another example, in C++ one can propagate taint from the
    left operand of >> to the right one, that is, $FROM >> $TO. (pa-3131)
  • Semgrep IDE integrations will now cache workspace targets, so a full traversal of a workspace is no longer needed on every scan (pdx-148)

Changed

  • OCaml: switch to using the tree-sitter based parser instead of
    the menhir parser, which has a more complete AST, especially
    for objects and classes. (ocaml)

Fixed

  • solidity: support ellipsis in for loops header in the init part. (gh-9431)

  • taint-mode: Fixed recently added by-side-effect: only option for taint sources,
    so that it does not incorrectly taint expressions that are not l-values, e.g.
    given this taint source:

    pattern-sources:
      - by-side-effect: only
        patterns:
          - pattern: delete $VAR;
          - focus-metavariable: $VAR
    

    The get(*from) expression should not become tainted since it's not an l-value:

    delete get(*from);
    ``` (pa-2980)
    
  • In C++, the string literal now has a type of char *. It won't match with the
    string type. For instance,

    - metavariable-type:
        metavariable: $EXPR
        type: string
    

    will only match

    string f;
    // MATCH
    int x = f.length();
    

    but not

    const char *s;
    // OK
    s = "foo";
    ``` (pa-3236)
    
  • taint-mode: Semgrep will now treat lambdas' parameters as fresh, so a taint rule
    that finds double-delete's should not be triggered on the code below:

    for (ListNode *node : list) {
    	list.erase(node, [](ListNode *p) {
    		delete p;
    	});
    }
    ``` (pa-3298)
    
  • Fixed bug where empty tables in pyproject.toml files would fail to parse (sc-1196)

semgrep - Release v1.53.0

Published by github-actions[bot] 10 months ago

1.53.0 - 2023-12-12

Added

  • Users can now ignore findings locally in Semgrep IDE Extensions, per workspace, and this will persist between restarts (pdx-154)
  • A new subcommand 'semgrep test', which is an alias for 'semgrep scan
    --test'. This means that if you were running semgrep on a test
    directory, you will now have to use 'semgrep scan test' otherwise it
    will be interpreted as the new 'semgrep test' subcommand. (subcommand_test)

Changed

  • Handling qualified identifiers in constant propagation

    We've added support for qualified identifiers in constant propagation. Notably,
    this enables the following matches (with the pro engine):

    rules:
      - id: cpp-const-field
        languages:
          - cpp
        message: testing
        severity: INFO
        pattern: std::cout<<1
    
    #include<iostream>
    #include "a.h"
    
    namespace B {
    class Bar {
        public:
            static const int one = 1;
    };
    }
    
    int main() {
        // ruleid: cpp-const-field
        std::cout<<1<<std::endl;
    
        // ruleid: cpp-const-field
        std::cout<<A::Foo::one<<std::endl;
    
        // ruleid: cpp-const-field
        std::cout<<B::Bar::one<<std::endl;
    }
    ``` (gh-9354)
    
    
    

Fixed

  • Updated the parser used for Rust. The largest change relates to how macros are
    parsed. (rust)
semgrep - Release v1.52.0

Published by github-actions[bot] 11 months ago

1.52.0 - 2023-12-05

Added

  • Java: Semgrep will now recognize String.format(...) expressions as constant
    strings when all their arguments are constant, but it will still not know
    what exact string it is. For example, code String.format("Abc %s", "123")
    will match pattern "..." but it will not match pattern "Abc 123". (pa-3284)

Changed

  • Inter-file diff scan will be gradually introduced to a small percentage of
    users through a slow rollout process. Users who enable the pro engine and
    engage in differential PR scans on Github or Gitlab may experience the impact
    of this update. (ea-268)
  • secrets: now performs more aggressive deduplication for instances where an
    invalid and valid match are reported at the same range. Instead of reporting
    both, we now report only the valid match when they are otherwise visually
    identical. (scrt-271)

Fixed

  • In expression-based languages, definitions are also expressions.

    This change allows dataflow to properly handle definition expressions.

    For example, the pattern 0 == 0 will match x == 0 in

    def f(c) do
      x = (y = 0)
      x == 0
    end
    

    because now dataflow is able to handle the expression y = 0. (pa-3262)

  • In version 1.14.0 (pa-2477) we made sink-matching more precise when the sink
    specification was like:

    pattern-sinks:
      - patterns:
         - pattern: sink($X, ...)
         - focus-metavariable: $X
    

    Where the sink specification most likely has the intent to specify the first
    argument of sink as a sink, and sink(ok1 if tainted else ok2) should NOT
    produce a finding, because tainted is not really what is being passed to
    the sink function.

    But we only intercepted the most simple pattern above, and more complex sink
    specifications that had the same intent were not properly recognized.

    Now we have generalized that pattern to cover more complex cases like:

    patterns:
     - pattern-either:
       - patterns:
         - pattern-inside: |
             def foo(...):
               ...
         - pattern: sink1($X)
       - patterns:
         - pattern: sink2($X)
         - pattern-not: bar(...)
     - focus-metavariable: $X
    ``` (pa-3284)
    
  • Updated the parser used for Rust (rust)

semgrep - Release v1.51.0

Published by github-actions[bot] 11 months ago

1.51.0 - 2023-11-29

Added

  • taint-mode: Added experimental rule option taint_match_on: source that makes
    Semgrep report taint findings on the taint source rather than on the sink. (pa-3272)

Changed

  • Elixir got moved to Pro. (elixir_pro)
  • The 'fix_regex' field has been removed from the semgrep JSON output. Instead,
    the 'fix' field contains the content the result of the fix_regex. (fix_regex)
  • taint-mode: Tweaked experimental option taint_only_propagate_through_assignments
    so that when it is enabled, tainted.field and tainted(args) will no longer
    propagate taint. (pa-2193)

Fixed

  • Fixed Kotlin parse error.

    Previously, code like this would throw a parse error

    fun f1(context : Context) {
        Foo(context).elem = var1
    }
    

    due to not recognizing Foo(context).elem = ... as valid.
    Now calls are recognized as valid in the left hand of
    assignments. (ea-104)

  • Python: async statements are now translated into the Dataflow IL so Semgrep
    will be able to report findings e.g. inside async with ... statements. (gh-9182)

  • In gitlab output, use correct url attached to rule instead of generating it.
    This fixes url for supply chain findings. (gitlab)

    • The language server will no longer crash on startup for intellij (language-server)
    • The language server no longer crashes when installed through pip on Mac platforms (language-server-macos)
  • taint-mode: When we encountered an assignment lval := expr where expr returned
    no taints, we automatically cleaned lval. This was correct in the early days of
    taint-mode, before we introduced taint by side-effect, but it is wrong now. The LHS
    lval may be tainted by side-effect, in which case we cannot clean it just because
    expr returns no taint. Now that we introduced by-side-effect: only it is also
    possible for expr to taint lval by side-effect and return no immediate taint.

    This kind of source should now work as expected:

    - by-side-effect: true
      patterns:
        - pattern: |
            $X = source()
        - focus-metavariable: $X
    ``` (pa-3164)
    
  • taint-mode: Fixed a bug in the recently added by-side-effect: only option
    causing that when matching l-values of the form l.x and l[i], the l
    occurence would unexpectedly become tainted too. This led to FPs in some
    typestate rules like those checking for double-lock or double-free.

    Now a source such as:

    - by-side-effect: only
      patterns:
      - pattern: lock($L)
      - focus-metavariable: $L
    

    will not produce FPs on code such as:

    lock(obj.l)
    unlock(obj.l)
    lock(obj.l)
    ``` (pa-3282)
    
  • taint-mode: Removed a hack that made lval = new ... assignments to not clean
    the lval despite the RHS was not tainted. This caused FPs in double-free rules.
    For example, given this source:

    pattern-sources:
      - by-side-effect: only
        patterns:
          - pattern: delete $VAR;
          - focus-metavariable: $VAR
    

    And the code below:

    while (nondet) {
      int *v = new int;
      delete v; // FP
    }
    

    The delete v statement was reported as a double-free, because Semgrep did not
    consider that v = new int would clean the taint in v. (pa-3283)

semgrep - Release v1.50.0

Published by github-actions[bot] 11 months ago

1.50.0 - 2023-11-17

No significant changes.

semgrep - Release v1.49.0

Published by github-actions[bot] 11 months ago

1.49.0 - 2023-11-15

Added

  • Added support in Ruby, Julia, and Rust to match implicit return statement inside functions.

    For example:

    return 0
    

    can now match 0 in

    function f()
      0
    end
    

    This matching is enabled by default and can be disabled with the rule option implicit_return. (gh-8408)

  • Pro engine supports constant propagation of numbers defined via macro in C++ (gh-9221)

  • taint-mode: The by-side-effect option for taint sources (only) now accepts a
    third value only (besides true and false). Setting by-side-effect: only
    will define a taint source that only propagates by side effect. This option
    should allow (ab)using taint-mode for writing some typestate rules.

    For example, this taint rule:

    pattern-sources:
      - by-side-effect: only
        patterns:
        - pattern: lock($L)
        - focus-metavariable: $L
    pattern-sanitizers:
      - by-side-effect: true
        patterns:
        - pattern: unlock($L)
        - focus-metavariable: $L
    pattern-sinks:
      - pattern: lock($L)
    

    will match the second lock(x) in this code:

    lock(x) # no finding
    lock(x) # finding
    

    The first lock(x) will not result in any finding, because the occurrence of x in
    itself will not be tainted. Only after the function call we will record that x is
    tainted (as a side-effect of lock). The second lock(x) will result in a finding
    because the x has been tainted by the previous lock(x). (pa-2980)

Changed

  • In the metrics sent we now record the languages for which we invoked the interfile engine.
    This will enable us to measure the performance impact and error rates of new interfile
    languages. (For scans which don't send metrics, there is no change.) See the PRIAVCY.md
    for more information. (ea-251)

  • Removed support for named snippets (org_name:rule_id) from semgrep scan which were removed from semgrep.dev a few months ago. (gh-9203)

  • Added support for --config <code|secrets> to semgrep scan. When using
    code or secrets, the environment variable SEMGREP_REPO_NAME must be set.

    For example,

    $ SEMGREP_REPO_NAME=test_repo semgrep --config secrets
    

    Internally, semgrep scan --config <product> now uses the same endpoint as the
    semgrep ci to fetch the scan configuration. (gh-9205)

  • Improved handling of unused lambdas to reduce false positives

    Previously, we used to insert the CFGs of unused lambdas at the declaration
    site. However, this approach triggered some false positives. For example,
    consider the following code:

    void incorrect(int *p) {
      auto f1 = [&p]() {
        source(p);
      };
      auto f2 = [&p]() {
        sink(p);
      };
    }
    

    In this code, there's no actual control flow between the source and sink, and
    the lambdas are never even called. But when we inserted their CFGs at the
    declaration site, it incorrectly indicated a taint finding. To prevent these
    types of false positives while still scanning the body of unused lambdas, we
    now insert their CFGs in parallel at the end of their parent function, right
    after all other statements and just before the end node. (pa-3089)

  • Bumped timeout (per-rule and per-file) from 2s to 5s. Recently we lowered it
    from 30s down to 2s, but based on what we have observed so far, we believe 5s
    is a better timeout for the time being. (timeout)

Fixed

  • Fixed a bug where enabling the secret beta causes the default scan mode to be
    set to OSS, even when the Pro flag is turned on in the web UI. (ea-248)

  • Semgrep does not stop a scan anymore for parsing errors due to
    unconventional exceptions (e.g., Failure "not a program") in some
    parsers. Instead, such errors are reported as "Other syntax error". (lang-13)

  • Fix regression for the unused lambda change in react-nextjs-router-push test

    A lambda expression defined in a return expression is also treated as used at
    the location of the return expression. (pa-3089)

  • Updated the Rust parser with miscellaneous improvements. In particular, Semgrep can now parse yield expressions in Rust. (rust)

  • taint-mode: If an expressions is tainted by multiple labels A and B, with B
    requiring A, the expression will now get boths labels A and B. (taint-labels)

semgrep - Release v1.48.0

Published by github-actions[bot] 12 months ago

1.48.0 - 2023-11-06

Added

  • Matching: Matches with the same range but bindings in different locations
    will now no longer deduplicate.

    For instance, the pattern $FUNC(..., $A, ...) would produce only
    one match on the target file:

    foo(true, true)
    

    because you would have two matches to the range of the call, and both
    bindings of $A would be to true.

    Now, the deduplication logic sees that the bindings of $A are in
    different places, and thus should not be considered the same, and
    produce two matches. (pa-3230)

Fixed

  • Fixed out of bounds list access error in Cargo.lock parser (sc-1072)
  • Secrets: metadata overrides specified in validators were incorrectly applied on
    top of one another (on a per-rule basis), so that only the last was applied.
    Each update is now correctly applied independently to each finding based on the
    rule's validators. (scrt-231)
semgrep - Release v1.47.0

Published by github-actions[bot] 12 months ago

1.47.0 - 2023-11-01

Added

  • taint-mode: Added a Boolean exact option to sources and sanitizers to make
    matching stricter (default is false).

    If you specify a source such as foo(...), and Semgrep encounters foo(x),
    by default foo(x), foo, and x, will all be considered tainted. If you add
    exact: true to the source specification, then only foo(x) will be regarded
    as tainted, that is the "exact" match for the specification. The same applies
    to "exact" sanitizers. (gh-5897)

  • Added sg alias for semgrep binary which is functionally equivalent to

    alias sg="/opt/homebrew/bin/semgrep"
    

    with one fewer step. (gh-9117)

  • secrets: Added independent targeting from other semgrep products.

    This change allows Secrets to scan all tracked files. In particular, those ignored
    by semgrepignore will now get scanned. There will be additional changes
    in the future to allow configuring the files that are scanned secrets. (gh-9125)

  • Adds an optional --no-secrets-validation flag to skip secrets validation. (no-secrets-validation)

  • Secrets rules (i.e., with metadata product: secrets) now mask, by replacing
    with *s the ending component of the matched content. (pa-2333)

  • Commutativity Support for Comparison Operators EQ and NOT_EQ

    We've introduced the commutative_compop rule option, enabling commutativity
    for comparison operators EQ and NOT_EQ. With this option, a == b will also
    match b == a, and a != b will also match b != a. (pa-3140)

  • Validation errors are separated from unvalided findings in the terminal output. (validation-error)

Changed

  • For taint rules using labels (experimental) Semgrep now preferably picks a
    source without requires for the taint trace

    Semgrep now prioritizes taint sources without requires condition when
    choosing a representative taint trace from multiple source traces. This helps
    users to more clearly identify the initial taint source when multiple traces
    are involved. (pa-3122)

  • Unreachable supply chain findings report only on line dependency was found in (no longer incorrectly including the next line)
    this change could affect syntactic_id generated by said findings (sc-727)

  • When running semgrep ci --supply-chain, defaults to using OSS engine even if
    PRO engine would otherwise be used (turned on in semgrep.dev, or with --pro flag) (supply-chain-oss)

Fixed

    • Semgrep no longer supports python 3.7 (gh-8698)
  • Semgrep will now refuse to run incompatible versions of the Pro Engine, rather than crashing with a confusing error message. (gh-8873)
  • Fixed an issue that prevented the use of semgrep install-semgrep-pro --custom-binary ... when logged out. (gh-9051)
  • The --severity=XXX scan flag is working again. (gh-9062)
  • The --sarif does not crash when semgrep itself encountered errors
    while processing targets. (gh-9091)
  • Fixed how the end positions assigned to metavariable bindings are computed, in
    order to handle trailing newlines. This affected Semgrep's JSON output. If a
    metavariable $X was bound to a piece of text containing a trailing newline,
    such as "a\n", where the starting position was e.g. at line 1, Semgrep reported
    that the end position was at line 2, when in fact the text is entirely within
    line 1. If the text happened to be at the end of a file, Semgrep could report
    an end position that was outside the bounds of the file. (lang-18)
    • Semgrep Language Server now only scans open files on startup
    • Semgrep Language Server no longer scans with pro engine rules (ls)
  • Rust: unsafe blocks are now translated into the Dataflow IL so e.g. it becomes
    possible for taint analysis to track taint from/to an unsafe block. (pa-3218)
  • Correctly handle parsing toolchain directive in go.mod files (parsegomode)
semgrep - Release v1.46.0

Published by github-actions[bot] 12 months ago

1.46.0 - 2023-10-24

Added

  • semgrep install-semgrep-pro now takes an optional --custom-binary flag to install the specified semgrep-core-proprietary binary rather than downloading it. (custom-pro-binary)

Fixed

  • pyproject.toml parser now handles optional newlines right after section headers. (gh-10879)

  • Updated the parsers for poetry.lock, pipfile.lock, and requirements.txt to ignore case sensitivity from package names.
    This matches their respective specifications. Test cases were added to account for this change. (gh-8984)

  • Reduced the limits for the prefilter optimization so that rules that cause
    computing the prefilter to blow up will abort more quickly. This improves
    performance by 2-3 seconds for each of the slowest rules. May cause a
    slowdown if a rule that previously could be filtered out no longer will be,
    but based on testing this is unlikely. (gh-9040)

  • Fixed issue where conditional expressions aren't handled properly in expression based language.

    Rust example:

    Before:

    fn expr_stmt_if(c) {
      y = 0;
      x = if c { y = 1 };
    
      // Before: this matches when it shouldn't because y is not always 1.
      // After: this does not match, which is the correct behavior.
      y == 1;
    }
    ``` (pa-3205)
    
  • Fixed type error in creation of DependencyParserError object in the pnpm-lock.yaml parser (sc-1115)

semgrep - Release v1.45.0

Published by github-actions[bot] about 1 year ago

1.45.0 - 2023-10-18

Changed

  • Previously, to ignore a finding from a rule foo.bar.my-rule, nosemgrep ignored a finding only if its fully qualified name was used: nosemgrep: foo.bar.my-rule. Now, nosemgrep can also accept just the rule ID: nosemgrep: my-rule. (#8979)

  • [Breaking Change] Improved Matching of C++ Constructors (pa-3114)

    • In this update, the Semgrep team has enhanced Semgrep's ability to match C++ constructors more accurately.
    • C++ introduces a syntactic ambiguity between function and variable definitions, particularly with constructors. The C++ compiler determines how to interpret an expression based on contextual information, such as whether the immediate parent scope is a function or a class, and whether the identifiers within the parentheses represent variables or types.
    • Due to this complexity, static analyzers face challenges in precisely parsing these expressions without additional information.
    • This commit introduces several workarounds to provide a better solution for handling this ambiguity:
      - By default, when parsing a target file, Semgrep will consider an expression like foo bar(x, y, z); defined within the body of a function as a variable definition with a constructor. This is because variable initialization is a more common use case within the body of a function.
      - Users can specify rule options that annotate, in patterns where the expression can be interpreted in both ways, which interpretation should take precedence. For instance, foo bar(x, y, z); will be parsed as a function definition when the as_fundef option is used and as a variable definition with a constructor when the as_vardef_with_ctor option is applied. It's worth noting that an expression like foo bar(1, y, z); will be parsed as a variable definition without any additional annotation since 1 cannot be a type.
    • Here's an example rule and its corresponding target file to illustrate these changes:
    rules:
      - id: cpp-match-func-def
        message: Semgrep found a match
        options:
          cpp_parsing_pref: as_fundef
        languages:
          - cpp
        severity: WARNING
        pattern-either:
          - pattern: foo $X($Y);
          - pattern: foo $X($Y, $Z);
    
      - id: cpp-match-ctor
        message: Semgrep found a match
        options:
          cpp_parsing_pref: as_vardef_with_ctor
        languages:
          - cpp
        severity: WARNING
        patterns:
          - pattern: foo $X(...);
          - pattern-not: foo $X(3, ...);
    
      - id: cpp-match-ctor-3
        message: Semgrep found a match
        languages:
          - cpp
        severity: WARNING
        pattern: foo $X(3, ...);
    
    class Test {
    
      // ruleid: cpp-match-func-def
      foo bar(x);
      // ruleid: cpp-match-func-def
      foo bar(x, y);
    
      void test() {
        // ruleid: cpp-match-ctor
        foo bar(1);
        // ruleid: cpp-match-ctor
        foo bar(1, 2);
    
        // ruleid: cpp-match-ctor
        foo bar(x);
        // ruleid: cpp-match-ctor
        foo bar(x, y);
    
        // ruleid: cpp-match-ctor
        foo bar(x, 2);
        // ruleid: cpp-match-ctor
        foo bar(1, y);
    
        // ruleid: cpp-match-ctor-3
        foo bar(3);
        // ruleid: cpp-match-ctor-3
        foo bar(3, 4);
        // ruleid: cpp-match-ctor-3
        foo bar(3, y);
      }
    };
    

Fixed

  • Semgrep Docker image: Reduction of the docker image size by using --no-cache when apk upgrading. Thanks to Peter Dave Hello for the contribution.

  • Fixed a bug with pre-filtering introduced in 1.42.0 that caused significant slowdowns, particularly for Kotlin repos. Kotlin repos running default pro rules may see a 30 minute speedup. (ea-208)

  • Taint analysis: track ptr->field l-values in C++

    • In C++, we now track tainted field access via pointer dereference. For instance, consider the following code snippet:
    void test_intra_001() {
      TestObject *obj = new TestObject();
    
      obj->a = taint_source();
      obj->b = SAFE_STR;
    
      // ok: cpp-tainted-field-ptr
      sink(obj->b, __LINE__);
      // ruleid: cpp-tainted-field-ptr
      sink(obj->a, __LINE__);
    }
    

    This can be matched by the rule (gh-1058):

    rules:
      - id: cpp-tainted-field-ptr
        languages:
          - cpp
        message: testing flows though C++ ptrs
        severity: INFO
        mode: taint
        pattern-sources:
          - pattern: taint_source()
        pattern-sinks:
          - patterns:
              - pattern: sink($X, ...)
              - focus-metavariable:
                  - $X
    
  • Do not crash anymore with an Invalid_arg exception when the terminal has very few columns (e.g., in some precommit context). (#8792)

  • Add --supply-chain flag to semgrep ci --help documentation (#8975)

  • Avoid catastrophic Invalid_argument: index out of bounds errors when reporting the location of findings (#9011)

  • IntelliJ and VSCode extensions: The Semgrep Language Server (LSP) no longer freezes while scanning long files.

  • Pre-filtering is now less aggressive and tries not to skip files that could be matched by a rule due to constant-propagation. Previously, a rule searching for the string "foobar" would skip a file that did not contain exactly "foobar", but that contained e.g. "foo" + "bar". (#8767)

  • semgrep ci does not crash anymore when ran from git repositories coming from Azure projects with whitespaces in the name. (#8971)

  • The --test flag now processes test target files even if they do not match the paths: directive of a rule. This is especially useful for rules using the include: which is now disabled in a test context. (#8192)

semgrep - Release v1.44.0

Published by github-actions[bot] about 1 year ago

1.44.0 - 2023-10-11

Added

  • A new --matching-explanations CLI flag has been added, to get matching
    explanations. This was internally used by the Semgrep Playground to
    help debug rules, but is now available also directly from the CLI. (explanations)

  • Using C++ tree-sitter as a failsafe pattern parser for C (gh-8905)

  • Allowing multiple type fields in metavariable-type rule syntax

    Users have the flexibility to utilize multiple type fields to match the type of
    metavariables. For instance:

    metavariable-type:
    metavariable: $X
    types:
    - typeA
    - typeB

    This approach is also supported in rule 2.0. (gh-8913)

  • Support for parsing pubspec (Dart/Flutter) lockfiles (gh-8925)

  • Added support for matching template type arguments using metavariables in C++.
    Users can now successfully match code snippets like:

    #include <memory>
    using namespace std;
    
    void foo() {
        int *i = 0;
    
        // ruleid: match-with-template
        shared_ptr<int> p;
    }
    

    with the pattern:

    shared_ptr<$TY> $LOCAL_VAR;
    ``` (pa-3102)
    
    
    

Fixed

  • Avoid fatal "missing plugin" exceptions when scanning some Apex rules
    for which no Apex pattern is used by the rule such as a pattern-regex:
    and nothing else. (gh-8945)

  • Semgrep can now parse optional assignments in Swift (e.g. a.b? = 1). (lang-1)

  • Sequential tainting is now supported in Elixir.

    def f() do
      x = "tainted"
      y = x
    
      # This now matches.
      sink(y)
    end
    ``` (pa-3130)
    
  • Target files that disappeared before the scan or that have special byte
    characters in their filename do not cause the whole scan to crash anymore.
    The file is skipped instead. (pa-3144)

  • go.mod parsing now correctly allows arbitrary newlines and whitespace between dependencies (sc-1076)

  • fix: Improve typed metavariable matching against expressions consisting of names only. (type-inference)

semgrep - Release v1.43.0

Published by github-actions[bot] about 1 year ago

1.43.0 - 2023-10-03

Added

  • Dart: Full Semgrep support for Dart has been added, whereas previously
    most Semgrep constructs (and Semgrep itself) would not work correctly. (pa-2968)

Changed

  • We have reduced the default timeout (per-rule and per-file) to 2s (down from 30s).
    Typically, running a rule on a file should take a fraction of a second. When a rule
    takes more than a couple of seconds is often because the rule is not optimally
    written, or because the file is unusually large (a minified file or machine-
    generated code), so waiting 30s for it does not tend to bring any value. Plus, by
    cutting it earlier, we may prevent a potential OOM crash when running the rule is
    very memory intensive. (pa-3155)

Fixed

  • The language server will no longer surface committed findings when a user types but does not save (pdx-ls-git)
semgrep - Release v1.42.0

Published by github-actions[bot] about 1 year ago

1.42.0 - 2023-09-29

Added

  • Rule-writing: Capture group metavariables used in regexes in a
    metavariable-regex can now introduce their bindings into the
    scope of the pattern, similarly to metavariable-pattern.

    For instance, in the pattern:
    patterns:

    • pattern: |
      foo($BAR)
    • metavariable-regex:
      metavariable: $BAR
      regex: "(?.*)end"
    • focus-metavariable: $X

    the rule will match the contents of what is inside of the
    foo to the regex that binds anything before an "end" to
    the metavariable $X. This metavariable can then be focused
    at a later time, or processed somewhere above this pattern. (pa-3011)

  • Try-catch-else-finally is now supported in taint analysis.

    This change also includes some updates to our analysis. Previously we assumed that
    any statement inside the try clause may throw an exception, but now only
    function calls are assumed to possibly throw exceptions.

    Throw statements always throw an exception as it was before.

    This kind of statement is supported in languages including Python, Ruby, and Julia.

    Python example:

    def f(tainted_input):
      try:
        a = 0
        b = 0
        c = tainted_input
        d = tainted_input
      except RuntimeError:
        a = tainted_input
        c = sanitize(c)
      else:
        b = tainted_input
      finally:
        d = sanitize(d)
    
      # a is not tainted because exception wasn't assumed to be thrown
      sink(a)
      # b is tainted through the else clause
      sink(b)
      # c is tainted at the beginning, but it was not sanitized
      # because an exception was not thrown
      sink(c)
      # d is tainted at the beginning, but it was sanitized
      # because the finally clause is always executed
      sink(d)
    ``` (pa-3054)
    
  • Semgrep can now derive facts about constants from equality tests.

    For example, pattern foobar(&nullptr) will not match here:

    int* ptr = nullptr;
    
    do_something(ptr);
    
    if (ptr == nullptr) {
        return;
    }
    
    foobar(&ptr); // OK
    

    But it will match here:

    if (ptr != nullptr) {
        return;
    }
    
    foobar(&ptr); // finding
    ``` (pa-3091)
    
  • Metavariable-type rule support for C, C++

    Users now can use metavariable-type rules in both C and C++. For instance, the
    provided code snippet:

    #include <fstream>
    
    using namespace std;
    
    void test_001() {
        ifstream in;
        // ruleid: match-simple-metavar-type
        in.get(str, 2);
    
        mystream my;
        // ok: type mismatch
        my.get(str, 2);
    }
    

    can be matched by the following rule:

    rules:
      - id: match-simple-metavar-type
        patterns:
          - pattern: $X.get($SRC, ...)
          - metavariable-type:
              metavariable: $X
              type: ifstream
        message: Semgrep found a match
        languages:
          - cpp
        severity: WARNING
    ``` (pa-3106)
    
  • C/C++: If conditions such as if (int x = f()) are now correctly translated
    into the Dataflow IL, so Semgrep can report a finding in the example below:

    if (const char *tainted_or_null = source("PATH"))
    {
        // ruleid:
        sink(tainted_or_null);
    }
    ``` (pa-3107)
    
    
    

Changed

  • The _comment field in the JSON output of semgrep scan has been removed. (_comment)
  • Use config=auto by default for the scan command when other options are not specified (grow-50)
  • Use subprocess.run to get contributions instead of StreamingSemgrepCore so crashes don't affect the actual scan. (os-967)

Fixed

  • The CLI autocompletion code has been removed. It was not currently working
    and nobody reported it, which probably means nobody was using it. (autocomplete)

  • The --core-opts flag has been removed. (core_opts)

  • fix: metavariable-type now correctly matches non-primitive types in php (gh-8781)

  • fixed the regression in --registry-caching and add better error message
    to tell the user he needs also --experimental. (gh-8828)

  • Support labeled let bindings within Swift case statements

    Correctly parsing labeled let bindings within Swift case statements.
    For instance, the code snippet:

    switch self {
      case .bar(_, _, x: let y):
        return y
    }
    

    now successfully matches the pattern:

    switch self {case .$X(..., $Y: $Z): ...}
    ``` (pa-3120)
    
  • Add parsing support for various rare Swift constructs (swift-parsing)

semgrep - Release v1.41.0

Published by github-actions[bot] about 1 year ago

Changed

  • Rule validation no longer fails if a rule contains additional unknown fields. This makes it so older versions of semgrep do not fail rules that contain extra functionality. When writing a custom rule, the min-version field should be used to identify rules that should not be run, meaning that the additional functionality present in the min-version of Semgrep is necessary in running the rule. (#8712)
  • Limit collection of the contributions from git log to the last 30 days of commits.

Fixed

  • semgrep ci now shows a more specific error message if a scan cannot complete due to a user disabling all rules on semgrep.dev (#8716)
  • Docker: For the nonroot Docker build stage, moved semgrep-core to /home/semgrep/bin and updated $PATH env variable with the new location. This avoids permissions issues when running and installing Pro Engine while using the nonroot Docker image. (#8685)
  • Ruby: Fixed a bug where patterns such as <id> ... do ... end would not
    match properly. (#8714)
  • Swift: Implemented key path expression parsing in Swift. The following example should
    now be correctly matched by the \$X.isActive pattern:
    employee.filter(\.isActive)
    
    Note that when the implicit type is used, the metavariable X binds to the
    backslash character instead of the type name. (#8694)
  • C++: Translate for (T var : E) loops into the Dataflow IL as for-each loops,
    so that Semgrep reports no finding in the following code:
      for (int *p : set) {
        sink(p); // no finding
        source(p);
      }
    
    Since each p is (in principle) a different object, even if source(p) taints
    the current p, that should not affect the next one. (#8749)
  • Ruby: Fixed patterns which involve command calls with blocks and Semgrep ellipses, when there are newlines around. For instance, the pattern
    $METHOD ... do
      ...
    end
    
    now parses properly. (#8758)
  • Fixed a bug in which Semgrep miscategorized direct dependencies that were erroneously identified as transitive in Node.js v9, lockfile version 3 and above.