Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.
LGPL-2.1 License
Bot releases are visible (Hide)
Published by github-actions[bot] over 1 year ago
Added lone decorators as a valid Python semgrep pattern, so for example $NAME($X)
will
generate two seperate findings here:
@hello("world")
@hi("semgrep!")
def shift():
return "left!"
``` (gh-4722)
Add tags to the python wheel for 3.10 and 3.11 (gh-8040)
JS/TS: Patterns for class properties can now have the static
and async
modifiers.
For instance:
@Foo(...)
async bar(...) {
...
}
or
@Foo(...)
static bar(...) {
...
}
``` (pa-2675)
Semgrep Language Server now supports multi-folder workspaces (pa-2772)
New pre-commit hook semgrep-ci
to use CI rules in pre-commit, which will pull from the rule board + block those in the block column (pa-2795)
Added support for date comparison and functionality to get current date.
Currently this requires date strings to be in the format "yyyy-mm-dd" next step is to support other formats. (pa-7992)
--debug
will be much less verbose by default, it will only showtaint analysis: Improve handling of dataflow for tainted value propagation in class field definitions
This change resolves an issue where dataflow was not correctly accounted for
when tainted values flowed through field definitions in class/object
definitions. For instance, in Kotlin or Scala, singleton objects are commonly
used to encapsulate executable logic, where each field definition behaves like
a statement during object initialization. In order to handle this scenario, we
have introduced an additional step to analyze a sequence of field definitions
as a sequence of statements for taint analysis. This enhancement allows us to
accurately track tainted values during object initialization. (gh-7742)
Allow any characters in file paths used to create dotted rule IDs. File path
characters that aren't allowed in rule IDs are simply removed. For example, a
rule whose ID is my-rule
found in the file hello/@world/rules.yaml
becomes hello.world.my-rule
. (gh-8057)
Diff aware scans now work when git state isn't clean (pa-2795)
Published by github-actions[bot] over 1 year ago
Published by github-actions[bot] over 1 year ago
o.setX(taint); o.getX()
but now it can relate setters and getters to properties (e.g. o.setX(taint); o.x
). (getters)taint_assume_safe_booleans
andtaint_assume_safe_numbers
to avoid propagating taint coming from expressions($X: $T)
with a metavariable-regex
operating on $T
). (pa-2822)Published by github-actions[bot] over 1 year ago
...
in multilinePublished by github-actions[bot] over 1 year ago
Published by github-actions[bot] over 1 year ago
New experimental aliengrep engine that can be used as an alternative to the
default spacegrep engine with options.generic_engine: aliengrep
. (aliengrep)
Pro: Taint labels now mostly work interprocedurally, except for labeled propagators.
Note that taint labels are experimental! (pa-2507)
Pro: Taint-mode now supports inter-procedural field-sensitivity for JS/TS.
For example, given this class:
class Obj {
constructor(x, y) {
this.x = x;
this.y = y;
}
}
Semgrep knows that an object constructed by new Obj("tainted", "safe")
has its
x
attribute tainted, whereas its y
attribute is safe. (pa-2570)
(List<$T> $X)
would fail to match a value ofList<String>
. (typed-mvar)Published by github-actions[bot] over 1 year ago
On scan complete during logged in semgrep ci
scans, check returned exit code to
see if should block scans. This is to support incoming features that requires
information from semgrep.dev (complete)
Extract mode: users can now choose to include or exclude rules to run on, similar to paths:
. For example,
to only run on the rules example-1
and example-2
, you would write
rules:
- id: test-rule
mode: extract
rules:
include:
- example-1
- example-2
To run on everything except example-1
and example-2
, you would write
rules:
- id: test-rule
mode: extract
rules:
exclude:
- example-1
- example-2
``` (gh-7858)
Kotlin: Added literal metavariables, from patterns like "$FOO"
.
You can still match strings that only contain a single interpolated
ident by using the brace notation, e.g. "${FOO}"
. (pa-2755)
Increase timeout of semgrep ci
upload findings network calls
and make said timeout configurable with env var SEMGREP_UPLOAD_FINDINGS_TIMEOUT (timeout)
Relaxed restrictions on symbolic propagation so that symbolic values survive
branching statements. Now (with symbolic-propagation enabled) foo(bar())
will
match match the following code:
def test():
x = bar()
if cond:
exit()
foo(x)
Previously any symbolically propagated value was lost after any kind of branching
statement. (pa-2739)
catch
, to-l dockerfile
, files named dockerfile
as well as Dockerfile
will be scanned. (gh-7824)Published by github-actions[bot] over 1 year ago
semgrep ci
scans, report lockfile parse errors to display in webUI (lockfileparse)C
sets its field x
to ao = new C()
, Semgrep will know that o.getX()
is tainted. (pa-2570)Failure "Call AST_utils.with_xxx_equal to avoid this error." (gh-7694)
Published by github-actions[bot] over 1 year ago
Published by github-actions[bot] over 1 year ago
semgrep --pro
still requires a single target, but this target no longermetavariable-pattern
operator in one rule may cause a finding to be missed, andget<name>
methods in aPublished by github-actions[bot] over 1 year ago
Java: Private static variables that are defined just once in a static block,
even if they are not declared final
, will be considered as final
by
constant-propagation. (pa-2228)
Scala: Can now parse indented matches, like:
e match
case foo => "foo"
case bar => "bar" (pa-2687)
Scala: Can now parse arguments with using
, as well as splatted arguments.
E.g. foo(using bar) and foo(1, 2, bar*) (pa-2688)
Scala: Added parsing of enum
constructs. (pa-2691)
Scala: Can now parse given
definitions (pa-2692)
Scala: Can now parse export
s (pa-2693)
Scala: Can now parse top-level definitions (as added in Scala 3) (pa-2694)
Scala: Can now parse indented for
expression, such as
for
_ <- 5
yield
... (pa-2695)
The title of Supply Chain findings will now consist of the package name and CVE,
instead of just the rule's UUID. (sc-580)
diff
the outputs of two runs. (pa-2700)CLI: Setting Semgrep-specific environment variables for metadata (like
SEMGREP_REPO_NAME, SEMGREP_REPO_URL, SEMGREP_PR_ID, and friends) now
properly works on GitHub and GitLab CI scans.
If not set, functionality is same as before. (pa-2644)
CLI: Fixed a bug where repositories with a dot in the name would cause
semgrep ci scans to crash (pa-2655)
Published by github-actions[bot] over 1 year ago
Metavariable comparison: Added support for **, the exponentiation operator. (gh-7474)
Pro: Java: Semgrep is now able to track the propagation of taint from the
arguments of a method, to the object being called. So e.g. given a method
public void foo(int x) {
this.x = x;
}
and a call o.foo(tainted)
, Semgrep will be able to track that the field
x
of o
has been tainted. (pa-2570)
Kotlin: Class fields will now receive the correct types, and be
found by typed metavariables correctly
This applies to examples such as:
class Foo {
var x : int
}
for the variable x
(pa-2684)
Supply Chain support for package-lock.json version 3 (sc-586)
metavariable-pattern: When used with the nested language
key, if there was an
error parsing the metavariable
's content, that error could abort the analysis
of the current file. If there were other rules that were going to produce findings
on that file, those findings were not being reported. (gh-7271)
Matching: Fixed a bug where explicit casts of expressions would produce two matches to
other explicit casts.
So for instance, a pattern (int $X)
in Java would match twice to (int) 5
. (gh-7403)
taint-mode: Given x = tainted
, then x.a = safe
, then x.a.b = tainted
, Semgrep
did not report sink(x.a.b)
. Because x.a
was clean, that made Semgrep disregard
the tainting of any field of x.a
such as x.a.b
. This now works as expected. (pa-2486)
When using metavariable-pattern
to match embedded PHP code, Semgrep was
unconditionally adding the <?php
opening to the embedded code. When
<?php
was already present, this caused parsing errors. (pa-2696)
Lockfile-only supply chain findings correctly include line numbers in their match data, improving the appearence of CLI output (sc-658)
Increase timeout for semgrep install-semgrep-pro
to avoid failures when the download is slow. (timeout)
Fixed the range reported by findings for YAML files that include an anchor, so that the match does not include the original location of the snippet bound to the anchor. (yaml-alias)
Published by github-actions[bot] over 1 year ago
Published by github-actions[bot] over 1 year ago
taint_assume_safe_comparisons
, disabled by default, thattainted != "something"
Published by github-actions[bot] over 1 year ago
using
, and soft modifiers like inline
and open
. (pa-2672)maven_dep_tree.txt
filesmaven_dep_tree.txt
files concatenated with cat
. (maven-dep-forest)foo="true"
) (gh-7344)Published by github-actions[bot] over 1 year ago
On full sca scans with dep search feature on, send dependency data for dep search (depsearch)
metavariable-comparison: Added support for bitwise operators ~
, &
, |
and ^
. (gh-7284)
Taint: pattern-propagators
now have optional fields requires
and label
,
which are used identically to their counterparts in pattern-sources
and pattern-sinks
, for the experimental taint labels feature.
For instance, we can define:
pattern-propagators:
- pattern: |
$TO.foo($FROM)
from: $FROM
to: $TO
requires: A
replace-labels: [A, C]
label: B
to denote a propagator which only propagates from $FROM to $TO if $FROM has
taint label A. In addition, it converts any taints from $TO with labels
A or C to have label B.
If label
is not specified, the to
is tainted with the same label of taint
that $FROM has. If requires
is not specified, it does not require $FROM to
have a particular label of taint.
Additionally, replace-labels
only restricts the label being propagated if
the output label
is specified. (pa-1633)
taint-mode: Java: Support for basic field sensitivity via getters and setters.
Given obj.setX(tainted)
, Semgrep will identify that a subsequent obj.getX()
carries the same taint as tainted
. It will also differentiate between
obj.getX()
and obj.getY()
. Note that Semgrep does not examine the definitions
for the getter or setter methods, and it does not know whether e.g. some other
method obj.clearX()
clears the taint that obj.setX(tainted)
adds. (pa-2585)
Pro Engine: Semgrep CLI will now download a version of Semgrep Pro Engine
compatible with the current version of Semgrep CLI, as opposed to the most
recently released version.
This behavior is only supported for Semgrep 1.12.1 and later. Previous
versions will still download the most recently released version, as before. (pa-2595)
Pro: semgrep ci
will run intrafile interprocedural taint analysis by default
in differential scans (aka PR scans). (Note that interfile analysis is not run
in differential scans for performance reasons.) (pa-2565)
Remove custom entrypoint for returntocorp/semgrep Docker images, now you must
explicitly call semgrep.
This won't work now: docker run -v $(pwd):/src returntocorp/semgrep scan ...
Must do this instead: docker run -v $(pwd):/src returntocorp/semgrep semgrep scan ...
(pa-2642)
Changed Maven version comparison to more closely reflect usage, so versions with more than 3 increments will not be treated as plain strings (sc-656)
The AST dump produced by semgrep-core is now usable from Python
with the provided ATD interface and the Python code derived from it with
atdpy. (gh-7296)
Terraform: Nested blocks can now be used as sources and sinks for taint.
For instance, the block x
in
resource $A $B {
x {
...
}
} (pa-2475)
CLI: The scan progress bar now shows progress with higher granularity, and has fewer big jumps when using the Pro Engine.
The abstract unit of 'tasks' has been removed, and now only a percentage number will be displayed. (pa-2526)
Fix an error with rule targeting for extract mode. Previously, if a ruleset had
two rules, the first being the extract rule, the second being the rule to run,
no rules would run on the extracted targets. Additionally, with multiple rules
the wrong rule might be run on the extracted target, causing errors. Now, in
extract mode all the rules for the destination language will be run. (pa-2591)
Metrics: logged in semgrep ci
scans now send metrics, as our Privacy.md indicates
(previously they incorrectly did not, which made it harder for us to track failure events) (pa-2592)
Rust: Basic let-statement bindings (such as let x = tainted
) now properly
carry taint. (pa-2605)
Improved error reporting for rule parsing by correctly reporting parse errors
instead of engine errors in certain cases. (pa-2610)
Taint: Fixed an issue where an error could be thrown if semgrep-core's output
contained a dataflow trace without a sink. (pa-2625)
Julia: Properly allow string literal metavariables like "$A" to be patterns. (pa-2630)
Published by github-actions[bot] over 1 year ago
taint-mode: Historically, the matching of taint sinks has been somewhat imprecise.
For example, sink(ok if tainted else ok)
was flagged. Recently, we made sink-
matching more precise for sinks like sink(...)
declaring that any argument of
a given function is a sink. Now we make it more precise when specific arguments of
a function are sinks, like:
pattern-sinks:
- patterns:
- pattern: sink($X, ...)
- focus-metavariable: $X
So sink(ok1 if tainted else ok2)
, sink(not_a_propagator(tainted))
, and
sink(some_array[tainted])
, will not be reported as findings. (pa-2477)
The --gitlab-sast
and --gitlab-secrets
output formats have been upgraded.
The output is now valid with the GitLab v15 schema,
while staying valid with the GitLab v14 schema as well.
Code findings now include the confidence of the rule.
Supply Chain findings now include the exposure type. (sc-635)
if (cond && x = 42) S1; S2
to be interpreted asx = 42; if (cond && x) S1; S2
, thus incorrectly flagging x
as a constantclass $X : Foo
will also match class Stuff : Bar, Foo
). (gh-7248)sink(sanitizer(source) if source else ok)
will not bepattern-not
within metavariable-pattern
in some cases. (pa-2510)--oss-only
previously required --oss-only true
to be passed. This PR fixes--oss-only
will invoke the oss engine. Note that --oss-only true
Published by github-actions[bot] over 1 year ago
BITBUCKET_TOKEN
from environment to authenticate with the Bitbucket API. (app-3691)by-side-effect
, just like sources andby-side-effect
for propagators is true
taint_assume_safe_functions: true
, this allows to specify functions that must pattern-propagators:
- by-side-effect: false
patterns:
- pattern-inside: $F(..., $X, ...)
- focus-metavariable: $F
- pattern-either:
- pattern: unsafe_function
from: $X
to: $F
Without by-side-effect: true
, unsafe_function
itself would be tainted by side-final
class attributes.Published by aryx over 1 year ago
Published by github-actions[bot] over 1 year ago
--verbose
when the contents of a metavariable fails to parse. (pa-2537)(optional)
would fail to parse (sc-622)"resolved": false
as a result of a bug in NPM will now parse (sc-npm-bug)