Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.
LGPL-2.1 License
Bot releases are visible (Hide)
Dockerfile support: Avoid a silent parsing error that was possibly accompanied
with a segfault when parsing Dockerfiles that lack a trailing newline
character. (gh-10084)
Fixed bug that was preventing the use of metavariable-pattern
with
the aliengrep engine of the generic mode. (gh-10222)
Added support for function declarations on object literals in the dataflow analysis.
For example, previously taint rules would not have matched the
following javascript code but now would.
let tainted = source()
let o = {
someFuncDecl(x) {
sink(tainted)
}
}
``` (saf-1001)
Osemgrep only:
When rules have metavariable-type, they don't show up in the SARIF output. This change fixes that.
Also right now dataflow traces are always shown in SARIF even when --dataflow-traces is not passed. This change also fixes that. (saf-1020)
Fixed bug in rule parsing preventing patternless SCA rules from being validated. (saf-1030)
Published by github-actions[bot] 6 months ago
Pro: const-prop: Previously inter-procedural const-prop could only infer whether
a function returned an arbitrary string constant. Now it will be able to infer
whether a function returns a concrete constant value, e.g.:
def bar():
return "bar"
def test():
x = bar()
foo(x) # now also matches pattern `foo("bar")`, previously only `foo("...")`
``` (flow-61)
Python: const-prop: Semgrep will now recognize "..." * N expression as arbitrary
constant string literals (thus matching the pattern "..."). (flow-75)
--beta-testing-secrets-enabled
option, deprecated for several months, is now removed. Use --secrets
as its replacement. (gh-9987)When using semgrep --test --json, we now report in the
config_missing_fixtests field in the JSON output not just rule files
containing a fix:
without a corresponding ".fixed" test file; we now also
report rule files using a fix-regex:
but without a corresponding a
.fixed test file, and the fix:
or fix-regex:
can be in
any rule in the file (not just the first rule). (fixtest)
Fixes matching for go struct field tags metadata.
For example given the program:
type Rectangle struct {
Top int `json:"top"`
Left int `json:"left"`
Width int `json:"width"`
Height int `json:"height"`
}
The pattern,
type Rectangle struct {
...
$NAME $TYPE $TAGS
...
}
will now match each field and the $TAGS
metavariable will be
bound when used in susequent patterns. (saf-949)
Matching: Patterns of statements ending in ellipsis metavariables, such as
x = 1
$...STMTS
will now properly extend the match range to accommodate whatever is captured by
the ellipsis metavariable ($...STMTS). (saf-961)
The SARIF output format should have the tag "security" when the "cwe"
section is present in the rule. Moreover, duplicate tags should be
de-duped.
Osemgrep wasn't doing this before, but with this fix, now it does. (saf-991)
Fixed bug in mix.lock parser where it was possible to fail on a python None error. Added handler for arbitrary exceptions during lockfile parsing. (sc-1466)
Moved --historical-secrets
to the "Pro Engine" option group, instead of
"Output formats", where it was previously (in error). (scrt-570)
Published by github-actions[bot] 6 months ago
Added guidance for resolving API token issues in CI environments. (gh-10133)
The osemgrep show command supports 2 new options: dump-ast
dump-pattern
.
See osemgrep show --help
for more information. (osemgrep_show)
Added additional output flags which allow you to write output to multiple files in multiple formats.
For example, the comand semgrep ci --text --json-output=result.json --sarif-output=result.sarif.json
Displays text output on stdout, writes the output that would be generated by passing the --json
flag
to result.json
, and writes the output that would be generated by passing the --sarif
to result.sarif.json
. (saf-341)
Added an experimental feature for users to use osemgrep to format
SARIF output.
When both the flags --sarif and --use-osemgrep-sarif are specified,
semgrep will use the ocaml implementation to format SARIF.
This flag is experimental and can be removed any time. Users must not
rely on it being available. (saf-978)
[\w-.]
, such a pattern would now need to be written[\w.-]
or [\w\-.]
since PCRE2 rejects the first as having an invalid range. (scrt-467)Semgrep LS now waits longer for users to login (gh-10109)
When semgrep ci finishes scanning and uploads findings, it tells the
app to mark the scan as completed.
For large findings, this may take a while and marking the scan as
completed may timeout. When a scan is not marked as completed, the app
may show that the repo is still processing, and confuses the user.
This change increases the timeout (previously 20 minutes) to 30
minutes. (saf-980)
Fix semgrep ci --oss-only
when secrets product is enabled. (scrt-223)
Published by github-actions[bot] 6 months ago
--trace-endpoint <url>
.LOG_TAGS
. You can get all debug logs with LOG_TAGS=everything
. We do notSEMGREP_
(or PYTEST_SEMGREP_
) to avoid namespaceSEMGREP_LOG_TAGS
PYTEST_SEMGREP_LOG_TAGS
. (gh-10087)everything
to all
. All debug-level messages shown by default aredefault
tag. (gh-10089)a a b
, the pattern a ... b
will match a b
as before buta ...
will now match the longer a a b
rather than a b
. (gh-10039)Published by github-actions[bot] 6 months ago
LOG_LEVEL
(as well as PYTEST_LOG_LEVEL
) isSEMGREP_LOG_LEVEL
is consulted. PYTEST_SEMGREP_LOG_LEVEL
is alsoLOG_LEVEL
destined to another application. (gh-10044)Published by github-actions[bot] 7 months ago
semgrep ci
without being logged in to clarify that --config
is used with semgrep scan
. (gh-9485)Published by github-actions[bot] 7 months ago
Pro only: taint-mode: Added experimental at-exit: true
option for sinks, that
makes a sink spec only apply on the "exit" instructions/statements of a function.
That is, the instructions after which the control-flow exits the function. This is
useful for writing rules to find "leaks", such as checking that file descriptors
are being closed within the same function where they were opened.
For example, given this taint rule:
pattern-sources:
- by-side-effect: true
patterns:
- pattern: $FILE = open(...)
- focus-metavariable: $FILE
pattern-sanitizers:
- by-side-effect: true
patterns:
- pattern: $FILE.close(...)
- focus-metavariable: $FILE
pattern-sinks:
- at-exit: true
pattern: |
def $FUN(...):
...
Semgrep will report a finding in the code below since at print(content)
, after
which the control flow reaches the exit of the function, the file
has not yet
been closed:
def test():
file = open("test.txt")
content = file.read()
print(content) # FINDING
``` (pa-3266)
Published by github-actions[bot] 7 months ago
--historical-secrets
flag for running Semgrep Secrets regex rules on git--experimental
. (scrt-531)Files with the .phtml
extension are now treated as PHP files. (gh-10009)
[IMPORTANT] Logged in users running semgrep ci
will now run the pro engine by default! All semgrep ci
scans will run with our proprietary languages (Apex and Elixir), as well as cross-function taint within a single file, and other single file pro optimizations we have developed. This is equivalent to semgrep ci --pro-intrafile
. Users will likely see improved results if they are running semgrep ci
and did not already have additional configuration to enable pro analysis.
The current default engine does not include cross-file analysis. To scan with cross-file analysis, turn on the app toggle or pass in the flag --pro
. We recommend this unless you have very large repos (talk to our support to get help enabling cross-file analysis on monorepos!)
To revert back to our OSS analysis, pass the flag --oss-only
(or use --pro-languages
to continue to receive our proprietary languages).
Reminder: because we release first to our canary image, this change will only immediately affect you if you are using semgrep/semgrep:canary
. If you are using semgrep/semgrep:latest
, it will affect you when we bump canary to latest. (saf-845)
Fixed a parsing error in Kotlin when there's a newline between the class name and the primary constructor.
This could not parse before
class C
constructor(arg:Int){}
because of the newline between the class name and the constructor.
Now it's fixed. (saf-899)
Published by github-actions[bot] 7 months ago
Published by github-actions[bot] 7 months ago
Published by github-actions[bot] 7 months ago
yield
keyword in Python. The Proosemgrep --remote will no longer clone into a tmp folder, but instead the CWD (cdx-remote)
[IMPORTANT] Inter-file differential scanning is now enabled for all Pro users.
Inter-file differential scanning is now enabled for all Pro users. While it may
take longer than intra-file differential scanning, which is the current default
for pro users, it offers deeper analysis of dataflow paths compared to
intra-file differential scanning. Additionally, it is significantly faster
than non-differential inter-file scanning, with scan times reduced to
approximately 1/10 of the non-differential inter-file scan. Users who
enable the pro engine and engage in differential PR scans on GitHub or
GitLab may experience the impact of this update. If needed, users can
revert to the previous intra-file differential scan behavior by configuring
the --no-interfile-diff-scan
command-line option. (saf-268)
Published by github-actions[bot] 7 months ago
Published by github-actions[bot] 8 months ago
ci
: Updated logic for informational message printed when no rules are sent toPublished by github-actions[bot] 8 months ago
{ body: { param } }
{ body: { param } } = tainted
Semgrepparam
as tainted. (flow-68)metavariable-regex
can now match on metavariables of interpolatedsemgrep ci
scans now reflect a custom SEMGREP_APP_URL, if one is set. (saf-353)Published by github-actions[bot] 8 months ago
Pro: Adds support for python constructors to taint analysis.
If interfile naming resolves that a python constructor is called taint
will now track these objects with less heuristics. Without interfile
analysis these changes have no effect on the behavior of tainting.
The overall result is that in the following program the oss analysis
would match both calls to sink while the interfile analysis would only
match the second call to sink.
class A:
untainted = "not"
tainted = "not"
def __init__(self, x):
self.tainted = x
a = A("tainted")
# OK:
sink(a.untainted)
# MATCH:
sink(a.tainted)
``` (ea-272)
Pro: taint-mode: Added basic support for "index sensitivity", that is,
Semgrep will track taint on individual indexes of a data structure when
these are constant values (integers or strings), and the code uses the
built-in syntax for array indexing in the corresponding language
(typically E[i]
). For example, in the Python code below Semgrep Pro
will not report a finding on sink(x)
or sink(x[1])
because it will
know that only x[42]
is tainted:
x[1] = safe
x[42] = source()
sink(x) // no more finding
sink(x[1]) // no more finding
sink(x[42]) // finding
sink(x[i]) // finding
There is still a finding for sink(x[i])
when i
is not constant. (flow-7)
taint-mode: Added exact: false
sinks so that one can specify that anything
inside a code region is a sink, e.g. if (...) { ... }
. This used to be the
semantics of sink specifications until Semgrep 1.1.0, when we made sink matching
more precise by default. Now we allow reverting to the old semantics.
In addition, when exact: true
(the default), we simplified the heuristic used
to support traditional sink(...)
-like specs together with the option
taint_assume_safe_functions: true
, now we will consider that if the spec
formula is not a patterns
with a focus-metavarible
, then we must look for
taint in the arguments of a function call. (flow-1)
The project name for repos scanned locally will now be local_scan/<repo_name>
instead
of simply <repo_name>
. This will clarify the origin of those findings. Also, the
"View Results" URL displayed for findings now includes the repository and branch names. (saf-856)
requires
of the sink, and if it has the shape A and ...
, thenA
as the preferred label and report its trace. (flow-65)Published by github-actions[bot] 8 months ago
Added performance metrics using OpenTelemetry for better visualization.
Users wishing to understand the performance of their Semgrep scans or
to help optimize Semgrep can configure the backend collector created in
libs/tracing/unix/Tracing.ml
.
This is experimental and both the implementation and flags are likely to
change. (ea-320)
Created a new environment variable SEMGREP_REPO_DISPLAY_NAME for use in semgrep CI.
Currently, this does nothing. The goal is to provide a way to override the display
name of a repo in the Semgrep App. (gh-8953)
The OCaml/C executable (semgrep-core
or osemgrep
) is now passed through
the strip
utility, which reduces its size by 10-25% depending on the
platform. Contribution by Filipe Pina (@fopina). (gh-9471)
--pro
) will nowPublished by github-actions[bot] 8 months ago
Rule syntax: Metavariables by the name of $_
are now anonymous, meaning that
they do not unify within a single pattern or across patterns, and essentially
just unconditionally specify some expression.
For instance, the pattern foo($_, $_)
may match the code foo(1, 2)
.
This will change the behavior of existing rules that use the metavariable
$_
, if they rely on unification still happening. This can be fixed by simply
giving the metavariable a real name like $A
. (ea-837)
Added infrastructure for semgrep supply chain in semgrep-core. Not fully functional yet. (ssc-port)
Dataflow: Simplified the IL translation for Python with
statements to let
symbolic propagation assume that with foo() as x: ...
entails x = foo()
,
so that e.g. Session().execute("...")
matches:
with Session() as s:
s.execute("SELECT * from T") (CODE-6633)
Published by github-actions[bot] 8 months ago
Rule syntax: Metavariables by the name of $_
are now anonymous, meaning that
they do not unify within a single pattern or across patterns, and essentially
just unconditionally specify some expression.
For instance, the pattern foo($_, $_)
may match the code foo(1, 2)
.
This will change the behavior of existing rules that use the metavariable
$_
, if they rely on unification still happening. This can be fixed by simply
giving the metavariable a real name like $A
. (ea-837)
Added infrastructure for semgrep supply chain in semgrep-core. Not fully functional yet. (ssc-port)
Published by github-actions[bot] 9 months ago
taint-mode: Pro: Semgrep can now track taint via static class fields and global
variables, such as in the following example:
static char* x;
void foo() {
x = "tainted";
}
void bar() {
sink(x);
}
void main() {
foo();
bar();
}
``` (pa-3378)
Published by github-actions[bot] 9 months ago
($X : ty)
. (pa-3370)Add Elixir to Pro languages list in help information. (gh-9609)
Removed sg
alias to avoid naming conflicts
with the shadow-utils sg
command for Linux systems. (gh-9642)
Prevent unnecessary computation when running scans without verbose logging enabled (gh-9661)
Deprecated option taint_match_on
introduced in 1.51.0, it is being renamed
to taint_focus_on
. Note that taint_match_on
was experimental, and
taint_focus_on
is experimental too. Option taint_match_on
will continue
to work but it will be completely removed at some point after 1.63.0. (pa-3272)
Added information on product-related flags to help output, especially for Semgrep Secrets. (pa-3383)
taint-mode: Improve inference of best matches for exact-sources, exact-sanitizers,
and sinks. Now we also avoid FPs in cases such as:
dangerouslySetInnerHTML = {
// ok:
{__html: props ? DOMPurify.sanitize(props.text) : ''} // no more FPs!
}
where props
is tainted and the sink specification is:
patterns:
- pattern: |
dangerouslySetInnerHTML={{__html: $X}}
- focus-metavariable: $X
Previously Semgrep wrongly considered the individual subexpressions of the
conditional as sinks, including the props
in props ? ...
, thus producing a
false positive. Now it will only consider the conditional expression as a whole
as the sink. (rules-6457)
Removed an internal legacy syntax for secrets rules (mode: semgrep_internal_postprocessor
). (scrt-320)
Autofix: Fixes that span multiple lines will now try to align
inserted fixed lines with each other. (gh-3070)
Matching: Try blocks with catch clauses can now match try blocks that have
extraneous catch clauses, as long as it matches a subset. For instance,
the pattern
try:
...
catch A:
...
can now match
try:
...
catch A:
...
catch B:
...
``` (gh-3362)
Previously, some people got the error:
Encountered error when running rules: Other syntax error at line NO FILE INFO YET:-1:
Invalid_argument: String.sub / Bytes.sub
Semgrep should now report this error properly with a file name and line number and
handle it gracefully. (gh-9628)
Fixed Dockerfile parsing bug where multiline comments were parsed incorrectly. (gh-9628-2)
The language server will now properly respect findings that have been ignored via the app (lsp-fingerprints)
taint-mode: Pro: Semgrep will now propagate taint via instance variables when
calling methods within the same class, making this example work:
class Test {
private String str;
public setStr() {
this.str = "tainted";
}
public useStr() {
//ruleid: test
sink(this.str);
}
public test() {
setStr();
useStr();
}
}
``` (pa-3372)
taint-mode: Pro: Taint traces will now reflect when taint is propagated via
class fields, such as in this example:
class Test {
private String str;
public setStr() {
this.str = "tainted";
}
public useStr() {
//ruleid: test
sink(this.str);
}
public test() {
setStr();
useStr();
}
}
Previously Semgrep will report that taint originated at this.str = "tainted"
,
but it would not tell you how the control flow got there. Now the taint trace
will indicate that we get there by calling setStr()
inside test()
. (pa-3373)
Addressed an issue related to matching top-level identifiers with meta-variable
qualified patterns in C++, such as matching ::foo with ::$A::$B. This problem
was specific to Pro Engine-enabled scans. (pa-3375)