Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.
LGPL-2.1 License
Bot releases are visible (Hide)
Published by github-actions[bot] about 1 year ago
.vscode
and .vimrc
, are now displayed in the skip report when using --verbose
and --develop
.--code
command-line option. This works the same as --supply-chain
(#8679)taint-mode
: Semgrep now tracks taint via globals or class attributes that are effectively final
(as in Java), for example:
class Test {
private String x = source();
void test() {
sink(x); // finding here !
}
}
Semgrep recognizes that x
must be tainted because it is a private class attribute that is initialized to source()
, and it is not re-defined anywhere else. This also works if x
is initialized in the constructor (if there is only one constructor), or in a static
block. (#8652)-dump_contributions
flag to semgrep-core
and include contributions when posting findings to Scan API.semgrep show
command to display information about Semgrep, for example semgrep show supported-languages
. The goal is to clean up semgrep scan
which is currently abused to not scan but also display Semgrep information, for example, semgrep scan --show-supported-languages
. See semgrep show --help
for more information.dump_contributions
core command in pysemgrep.semgrep ci
(#8665) ...
foo()
can now properly match a target of
foo()
"foo" + 123
now matches the string pattern "foo123".Published by github-actions[bot] about 1 year ago
A.B.C.x
, and match the usage in the program:from A.B import *
foo(C.x)
min-version
or max-version
constraints has been improved. (#8634)metavariable-type
cannot be evaluated then it defaults to "false", that is, it filters out the range. The following rule:
patterns:
- pattern: private int $X;
- metavariable-type:
metavariable: $Y
type: int
now produces no matches because $Y
is not bound to anything. (#8566)using
and import
now match separately, instead of before, where if you wrote using $X
, you would also match to import
s. (#8567)Published by github-actions[bot] about 1 year ago
No significant changes.
Published by github-actions[bot] about 1 year ago
Published by github-actions[bot] about 1 year ago
python -m semgrep
. This change originated in https://github.com/returntocorp/semgrep/pull/8504. (gh-8605)Published by github-actions[bot] about 1 year ago
semgrep ci
(cli-timestamp)min-version
and max-version
fields for each rule,Running just semgrep
now displays the help message. Semgrep does not
try anymore to look for a .semgrep.yml config file or .semgrep/ in the
current directory, which used to cause issues when running from your
home directory which can contain the .semgrep/settings.yml file (which
is actually not a semgrep rule). (gh-4457)
Fixed CLI output to display matches from different rules with the same message. (gh-8557)
Semgrep PyPI package can now be pip install-ed on aarch64 libmusl platforms (e.g. Alpine) (gh-8565)
Updated --max-memory
help description to make it more clear/concise. To say "Defaults to 0 for all CLI scans." implies a different default for non-CLI scans, where in practicality the default is 0 for all scans except when using Pro Engine, where the default is 5000. (max_memory_help)
Julia: Fixed a bug where let end
blocks were not being parsed
correctly, causing their contents to not strictly match while inside of
a block.
For instance, let ... end
would not count as being inside of the let
,
and would match everything. (pa-3029)
Fixed bug where dependencies in (pnpm-lock.yaml at version 6.0 or above) files were not parsed. (sc-1033)
Published by github-actions[bot] about 1 year ago
semgrep scan
is now more resilient to failures when fetching config from semgrep.dev. If it can't fetch a config from semgrep.dev it will use backup infrastructure to fetch the most recent successful config for that customers environment. (gh-8459)foo
that contain 42
somewheresemgrep ci
displays enabled products when scans are created and/or when the scansemgrep --experimental --lang python --dump-ast foo.py
(dumpast)Parsing: Some parsing errors involving tree-sitter inserting fake "missing"
nodes were previously unreported. They are now reported as errors although the
parse tree is preserved, including the phony node inserted by tree-sitter.
This should not result in different Semgrep findings. It results only in more
reports of partial parsing. See the original issue at
https://github.com/returntocorp/ocaml-tree-sitter-core/issues/8 for technical
details. (gh-8190)
fix(extract): correctly map metavariable locations into source file (gh-8416)
fix(julia): correctly parse BitOr and BitAnd (gh-8449)
Implement missing pcre-ocaml stub (pcre_get_stringnumber_stub_bc) in JavaScript (gh-8520)
Julia: Fixed a bug where parenthesized expressions would sometimes
not match in constructs like metavariable-comparison
. (pa-2991)
Fixed a regression introduced three years ago in 0.9.0, when optimizing
the evaluation of ...
(ellipsis) to be faster. We made ...
only match
deeply (inside an if
for example) if nothing matched non-deeply, thus
causing that this pattern:
foo()
...
bar($A)
would only produce a match rather than two on this code:
foo()
if cond:
bar(x)
bar(y)
Semgrep matched from foo()
to bar(y)
and because of that it did not
try to match inside the if
, thus there was no match from foo()
to bar(x)
.
However, if we commented out bar(y)
, then Semgrep did match bar(x)
.
Semgrep now produces the two expected matches. (pa-2992)
Julia: Type information from declarations can now be used in
metavariable-type
. For instance, the program:
x :: Int64 = 2
will now allow uses of x
to match to the type Int64
. (pa-3001)
Julia: Metavariables should now be able to appear anywhere that
identifiers can.
For instance, they were not able to appear as the argument to a
do block. Now, we can write patterns like:
map($Y) do $X
...
end
``` (pa-3007)
Java: Fixed naming bug affecting Java and other OO languages that allowed a
method parameter to shadow a class attribute, e.g. in:
class Test {
private int x;
public void test2(int x) {
foo(this.x);
}
}
Semgrep was considering that this.x
referred to the parameter x
of test2
rather than to the class attribute x
. (pa-3010)
Fixed bug where packages in build.gradle files had their names incorrectly parsed without their group ID (sc-1012)
Published by github-actions[bot] about 1 year ago
Added general machinery to support languages with case insensitive identifiers and generalized php to use these case insensitive identifiers.
For example, in php the pattern MyClass()
will now match calls with different capitalization such as myclass()
and Myclass()
. (gh-8356)
Published by github-actions[bot] about 1 year ago
fix(promql): make aggregation labels not depend on order
"sum by (..., b, a, c, ...) (X)" should match "sum by (a,b,c) (X)" (gh-8399)
Published by github-actions[bot] about 1 year ago
feat(eval): add "parse_promql_duration" function to convert a promql
duration into milliseconds. This makes it possible to write comparisons like this:
- metavariable-comparison:
metavariable: $RANGE
comparison: parse_promql_duration(str($RANGE)) > parse_promql_duration("1d")
``` (gh-8381)
Published by github-actions[bot] about 1 year ago
Added support for naming propagation when the left-hand side (lhs) of a variable definition is an identifier pattern
In certain languages like Rust, the variable definition is parsed as a pattern assignment, for example:
let x: SomeType = SomeFunction();
This commit ensures that the annotated type is propagated to the identifier pattern on the left-hand side (lhs) of the assignment, thus ensuring proper naming behavior. (gh-8365)
feat(metavar type): Metavariable type support for Julia
Metavariable type is supported for Julia. (gh-8367)
New --legacy flag to force the use of the old Python implementation of
Semgrep (also known as 'pysemgrep'). Note that by default most semgrep
commands are still using the Python implementation (except 'semgrep
interactive'), so in practice you don't need to add this flag, but as
we port more commands to OCaml, the new --legacy flag might be useful
if you find some regressions. (legacy)
Matching: Added the ability to use metavariables in parameters to match more
sophisticated kinds of parameters.
In particular, metavariables should now be able to match self
parameters,
such as in Rust.
So fn $F($X, ...) { ... }
should match fn $F(self) { }
. (pa-2937)
taint-mode: Added experimental control: true
option to pattern-sources
,
e.g.:
pattern-sources:
- control: true
pattern: source(...)
Such sources taint the "control flow" (or the program counter) so that it is
possible to implement reachability queries that do not require the flow of any
data. Thus, Semgrep reports a finding in the code below, because after source()
the flow of control will reach sink()
, even if no data is flowing between both:
def test():
source()
foo()
bar()
#ruleid: test
sink()
``` (pa-2958)
taint-mode: Taint sanitizers will be included in matching explanations. (pa-2975)
.yarn/
directory are now ignored by the default .semgrepignore patterns. (dotyarn)foo!(&x)
and foo!(*x)
) now properly transmit taint (pa-2951)Published by github-actions[bot] about 1 year ago
No significant changes.
Published by github-actions[bot] about 1 year ago
#[get(...)]
) (gh-8234).h
files will now run when C or C++ are selected as the language. (pa-123).cjs
and .mjs
files will now run when javascript is selected as the language. (pa-124)fn f ((x, (y, z)): t) {
let x = 2;
}
tainting the sole argument to this function will result in all of the identifiersx
, y
, and z
now being tainted. (pa-2919)interfile: true
, so this can be set under options:
as itinterfile
metadata
. Metadata is not mean to have any effect on how a rule is run. (pro-94)api_scans_findings
to ci_scan_results
, removed gitlab_token
field and added ignores
and renamed_paths
field to ci_scan_results
. (app-4252)Dockerfile language support: String matching is now done by contents, treating
the strings foo
, 'foo'
, or "foo"
as equal. (gh-8229)
Fixed error where we were not filtering the logging of a new third party library. (gh-8310)
Julia: Fixed a bug where try-catch patterns would not match properly.
Now, you can use an empty try-catch pattern, such as:
try
...
catch
...
end
to catch only Julia code which does not specify an identifier for the catch
.
Otherwise, if you want to match any kind of try-catch, you can specify an ellipsis
for the catch identifier instead:
try
...
catch ...
...
end
and this will match any try-catch, including those that do not specify an
identifier for the catch
. It is strictly more general than the previous. (pa-2918)
Rust: Fixed an issue where implicit returns did not allow taint to flow,
and various other small translation issues that would affect taint. (pa-2936)
Fixed bug in gradle.lockfile parser where we would error on empty=
with nothing after it (sc-987)
Published by github-actions[bot] over 1 year ago
#[get(...)]
) (gh-8234).h
files will now run when C or C++ are selected as the language. (pa-123).cjs
and .mjs
files will now run when javascript is selected as the language. (pa-124)fn f ((x, (y, z)): t) {
let x = 2;
}
tainting the sole argument to this function will result in all of the identifiersx
, y
, and z
now being tainted. (pa-2919)interfile: true
, so this can be set under options:
as itinterfile
metadata
. Metadata is not mean to have any effect on how a rule is run. (pro-94)api_scans_findings
to ci_scan_results
, removed gitlab_token
field and added ignores
and renamed_paths
field to ci_scan_results
. (app-4252)Dockerfile language support: String matching is now done by contents, treating
the strings foo
, 'foo'
, or "foo"
as equal. (gh-8229)
Fixed error where we were not filtering the logging of a new third party library. (gh-8310)
Julia: Fixed a bug where try-catch patterns would not match properly.
Now, you can use an empty try-catch pattern, such as:
try
...
catch
...
end
to catch only Julia code which does not specify an identifier for the catch
.
Otherwise, if you want to match any kind of try-catch, you can specify an ellipsis
for the catch identifier instead:
try
...
catch ...
...
end
and this will match any try-catch, including those that do not specify an
identifier for the catch
. It is strictly more general than the previous. (pa-2918)
Fixed bug in gradle.lockfile parser where we would error on empty=
with nothing after it (sc-987)
Published by github-actions[bot] over 1 year ago
feat(docker): Create a semgrep user for our docker container so that people can run it as a non-root user (gh-8116)
feat(typed metavar): Typed metavariable support for Rust
Users can create TypedMetavar
using Rust's type annotation syntax :
.
For example, the following rule works for matching HttpResponseBuilder
type of variables:
rules:
- id: no-direct-response-write
patterns:
- pattern: '($BUILDER : HttpResponseBuilder).body(...)'
- pattern-not: '($BUILDER : HttpResponseBuilder).body("...".to_string())'
message: find dangerous codes
severity: WARNING
languages: [rust]
``` (gh-8200)
async () => {}
, etc.). (gh-7353)Published by github-actions[bot] over 1 year ago
No significant changes.
Published by github-actions[bot] over 1 year ago
No significant changes.
Published by github-actions[bot] over 1 year ago
Make CLI hit the new endpoint for the reliable fixed status on the Semgrep app. (cod-16)
feat(rule syntax): Metavariable Type Extension for Semgrep Rule Syntax 2.0
This PR introduces the changes made in Semgrep rule syntax 1.0 to version 2.0 as well.
rules:
rules:
Rust: Added the ability to taint macro calls through its arguments, in macro calls
with multiple arguments. (pa-2902)
Add severity and suggested upgrade versions to Supply Chain findings (sc-772)
Added support for pnpm lockfile versions >= 6.0 (sc-824)
(sc-866)
() => {}
). (gh-7353)Published by github-actions[bot] over 1 year ago
feat(rule syntax): Support metavariable-type field for Kotlin, Go, Scala
metavariable-type
field is now supported for Kotlin, Go and Scala. (gh-8147)
feat(rule syntax): Support metavariable-type field for csharp, typescript, php, rust
metavariable-type
field is now supported for csharp, typescript, php, rust. (gh-8164)
Pattern syntax: You may now introduce metavariables from parts of regular
expressions using pattern-regex
, by using regular expression with
named capturing groups (see https://www.regular-expressions.info/named.html)
Now, such capture group metavariables must be explicitly named.
So for instance, the pattern:
pattern-regex: "foo-(?P<X>.*)"
binds what is matched by the capture group to the metavariable $X
,
which can be used as normal.
pattern-regex
patterns with capture groups, such
as
pattern-regex: "(.*)"
will still introduce metavariables of the form $1
, $2
, etc, but this
should be considered deprecated behavior, and that functionality will be
taken away in a future release. Named capturing groups should be primarily
used, instead. (pa-2765)
Rule syntax: Errors during rule parsing are now better. For instance,
parsing will now complain if you miss a hyphen in a list of patterns,
or if you try to give a string to patterns
or pattern-either
. (pa-2877)
JS/TS: Now, patterns of records with ellipses, like:
{ $X: ... }
properly match to records of anonymous functions, like:
{
func: () => { return 1; }
}
``` (pa-2878)
Published by github-actions[bot] over 1 year ago
feat(rule syntax): Metavariable Type Extension for Semgrep Rule Syntax
We've added a dedicated field for annotating the type information of
metavariables. By adopting this approach, instead of relying solely on
language-specific casting syntax, we provide an additional way to enhance
the overall usability by eliminating the need to write redundant type cast
expressions for a single metavariable.
Moreover, the new syntax brings other benefits, including improved support for
target languages that lack built-in casting syntax. It also promotes a unified
approach to expressing type, pattern, and regex constraints for metavariables,
resulting in improved consistency across rule definitions.
Current syntax:
rules:
- id: no-string-eqeq
severity: WARNING
message: find errors
languages:
- java
patterns:
- pattern-not: null == (String $Y)
- pattern: $X == (String $Y)
Added syntax:
rules:
- id: no-string-eqeq
severity: WARNING
message: find errors
languages:
- java
patterns:
- pattern-not: null == $Y
- pattern: $X == $Y
- metavariable-type:
metavariable: $Y
type: String
``` (gh-8119)
feat(rule syntax): Support metavariable-type field for Python
metavariable-type
field is now supported for Python too. (gh-8126)
New --experimental flag to switch to a new implementation of Semgrep entirely
written in OCaml with faster startup time, incremental display of matches,
AST and registry caching, a new interactive mode and more. Not all
features of the legacy Python Semgrep have been ported though. (osemgrep)
Matching: Writing a pattern which is a sequence of statements, such as
foo();
...
bar();
now allows matching to sequences of statements within objects, classes,
and related language constructs, in all languages. (pa-2754)
taint_assume_safe_{booleans,numbers}
options.@ named_expr_test NEWLINE
, so for example with the patternlambda $X:$X($X)
:
#match 1
@omega := lambda ha:ha(ha)
def func():
return None
#match 2
@omega[lambda a:a(a)].a.b.c.f("wahoo")
def fun():
return None
``` (gh-4946)
func f() {
i_1 := &tau.rho{}
i_2 := new(tau.rho)
i_1.shift() //miss one
i_2.left() //miss two
return 101
}
but now we don't miss those two findings! (gh-6733)$TYPE $NAME[101];
will now produce two matches in the following snippet:
int main() {
int bad_len = 101;
/* match 1 */
int arr1[101];
/* match 2 */
int arr2[bad_len];
return 0;
}
``` (gh-8037)
pragma solidity >= $VER;
(gh-8104)#[Attr1]
#[Attr2]
In code such as
#[Attr1]
#[Attr2]
function test ()
{
echo "Test";
}
Previously, to match against multiple attributes it was required to write
#[Attr1, Attr2]
``` (pa-7398)