Scan your data stores for unencrypted personal data (PII)
MIT License
Scan your data stores for unencrypted personal data (PII)
Uses data sampling and naming, and works with compressed files
đź’Ą Zero runtime dependencies and minimal database load
Download the latest version:
You can also install it with Homebrew or Docker.
pdscan elasticsearch+http://user:pass@host:9200
For HTTPS, use elasticsearch+https://
.
You can also specify indices.
pdscan elasticsearch+http://user:pass@host:9200/index1,index2
Wildcards are also supported.
pdscan "elasticsearch+http://user:pass@host:9200/index*"
pdscan file://path/to/file.txt
You can also specify a directory.
pdscan file://path/to/directory
For absolute paths, use file:///
.
pdscan file:///absolute/path/to/file.txt
For paths relative to your home directory on Mac and Linux, use:
pdscan file://$HOME/file.txt
pdscan mariadb://user:pass@host:3306/dbname
pdscan mongodb://user:pass@host:27017/dbname
pdscan mysql://user:pass@host:3306/dbname
pdscan opensearch+http://user:pass@host:9200
For HTTPS, use opensearch+https://
.
You can also specify indices.
pdscan opensearch+http://user:pass@host:9200/index1,index2
Wildcards are also supported.
pdscan "opensearch+http://user:pass@host:9200/index*"
pdscan postgres://user:pass@host:5432/dbname
Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full
. If you don’t do this, your database credentials can be compromised.
If your connection doesn’t use SSL, append to the URI:
?sslmode=disable
For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).
CREATE EXTENSION tsm_system_rows;
pdscan redis://user:pass@host:6379/db
pdscan s3://bucket/path/to/file.txt
Requires
s3:GetObject
permission
You can also specify a prefix by ending with a /
.
pdscan s3://bucket/path/to/directory/
Requires
s3:ListBucket
ands3:GetObject
permissions
pdscan sqlite://path/to/dbname.sqlite3
Not available with prebuilt binaries
pdscan "sqlserver://user:pass@host:1433?database=dbname"
Show the data found
pdscan --show-data
Show low confidence matches
pdscan --show-all
Change the sample size
pdscan --sample-size 50000
Specify the number of processes to use (defaults to 1)
pdscan --processes 4
Scan for only certain types of data
pdscan --only email,phone,location
Scan for all except certain types of data
pdscan --except ip,mac
Specify the minimum number of rows/documents/lines for a match (experimental)
pdscan --min-count 10
Specify a custom pattern (experimental)
pdscan --pattern "\d{16}"
Output newline delimited JSON (experimental)
pdscan --format ndjson
With Homebrew, you can use:
brew install ankane/brew/pdscan
Get the Docker image with:
docker pull ankane/pdscan
And run it with:
docker run -ti ankane/pdscan <connection-uri>
For data stores on the host machine, use host.docker.internal
as the hostname
docker run -ti ankane/pdscan "postgres://[email protected]:5432/dbname?sslmode=disable"
On Linux, this requires Docker 20.04+ and
--add-host=host.docker.internal:host-gateway
For files on the host machine, use:
docker run -ti -v /path/to/files:/data ankane/pdscan file:///data
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
To get started with development:
git clone https://github.com/ankane/pdscan.git
cd pdscan
make test