A utility to export DyanmoDb tables to csv files
MIT License
A simple library / CLI tool for exporting a dynamodb table to a CSV file. CSV file can be written to local file system or streamed to S3.
$ [sudo] npm install dynamodbexportcsv -g
$ npm install dynamodbexportcsv --save
$ ./bin/DynamoDBExportCSV --awsregion "us-west-2" --awsid "<id>" --awssecret "<secret>" --table "<mytable>" --columns "<columna,columnb,columnc>" --gzip
var csvExport = require('DynamoDbExportCsv');
var exporter = new csvExport('<accessKey>', '<secretKey>', '<awsRegion>');
exporter.exportTable('<tableName>', ['columna','columnb'], 4, true, 250, null, null, function(err) {
console.info('Done');
});
This will create a sub directory in the current working directory with the same name as the table. It will use a parallel scan to create 4 files simultaneously and create a new file every 250MB. The csv files will be compressed with gzip.
Parallel Scans are useful to maximize usage of throughput provisioned on the DynamoDb table.
Sets up the AWS credentials to use
Arguments
awsAccessKeyId
- AWS access keyawsSecretAccessKey
- AWS secretawsRegion
- AWS regionExports the specified columns in the dynamodb table to one or more files. This method will spawn multiple child processes for each parallel scan. This allows it to maximize performance by utilizing multiple cores.
Arguments
table
- Name of dynamodb tablecolumns
- Array of column names. Dynamodb has no way to query the table and determine columns without scanningtotalSegments
- Number of parallel scans to run. The dynamodb table key space is split into this many segmentscompressed
- When set to true files are output in compressed gzip formatfilesize
- Maximum size of each file in megabytes. Once file hits this size it is closed and a new files3Bucket
- Optional. If specified the files are streamed to s3 instead of the local file system.s3Path
- Optional. Key prefix for files in s3. Used as a prefix with sequential numberscallback(err)
- A callback which is executed when finished and includes any errors that occurredExports one slice of a dynamodb table. Used when running parallel scans. If you use exportTable there is no reason to use this. You might want to use it to break up a scan into chunks perhaps across machines and manually break up what exportTable already does for you.
Arguments
table
- Name of dynamodb tablecolumns
- Array of column names. Dynamodb has no way to query the table and determine columns without scanningtotalSegments
- Number of parallel scans to run. The dynamodb table key space is split into this many segmentscompressed
- When set to true files are output in compressed gzip formatfilesize
- Maximum size of each file in megabytes. Once file hits this size it is closed and a new files3Bucket
- Optional. If specified the files are streamed to s3 instead of the local file system.s3Path
- Optional. Key prefix for files in s3. Used as a prefix with sequential numberscallback(err)
- A callback which is executed when finished and includes any errors that occurredWith the last update I ran a few performance comparisons while improving performance. These were not rigorously isolated, repeatable performance comparisons. All tests were run against a DynamoDb table scaled to 5000 read IOPS. The table contained 187,363,510 rows and was 98GB in size. All tests wrote the resulting CSV files to S3.
Instance Size | Scans | Execution Time | CPU | IOPS |
---|---|---|---|---|
c4.4xlarge | 10 | 120 m | 45% | 1450 |
c4.4xlarge | 20 | 86 m | 90% | 2500 |
c4.8xlarge | 20 | 45 m | 42% | 4500 |
c4.8xlarge | 30 | 36 m | 67% | 5500 |
The author is Chris Kinsman from PushSpring