internetarchive

A Python and Command-Line Interface to Archive.org

AGPL-3.0 License

Downloads
161.6K
Stars
1.6K
Committers
55

Bot releases are hidden (Show)

internetarchive - Version 1.9.0

Published by jjjake almost 5 years ago

Features and Improvements

  • Implemented new archive.org Tasks API <https://archive.org/services/docs/api/tasks.html>_.
  • Added support for darking and undarking items via the Tasks API.
  • Added support for submitting arbitrary tasks
    (only darking/undarking currently supported, see Tasks API documentation).

Bugfixes

  • ia download now displays download failed instead of success when download fails.
  • Fixed bug where Item.get_file would not work on unicode names in Python 2.
internetarchive - Version 1.8.5

Published by jjjake over 5 years ago

Features and Improvements

  • Improved timeout logging and exceptions.
  • Added support for arbitrary targets to metadata write.
  • IA-S3 keys now supported for auth in download.
  • Authoraization (i.e. ia configure) now uses the archive.org xauthn endpoint.

Bugfixes

  • Fixed encoding error in --get-task-log
  • Fixed bug in upload where connections were not being closed in upload.
internetarchive - Version 1.8.4

Published by jjjake over 5 years ago

  • It's now possible to retrieve task logs, given a task id, without first retrieving the items task history.
  • Added examples to ia tasks help.
internetarchive - Version 1.8.3

Published by jjjake over 5 years ago

Features and Improvements

  • Increased search timeout from 24 to 300 seconds.

Bugfixes

  • Fixed bug in setup.py where backports.csv wasn't being installed when installing from pypi.
internetarchive - Version 1.8.2

Published by jjjake over 5 years ago

Features and Improvements

  • Documnetation updates.
  • Added support for write-many to modify_metadata.

Bugfixes

  • Fixed bug in ia tasks --task-id where no task was being returned.
  • Fixed bug in internetarchive.get_tasks() where it was not possible to query by task_id.
  • Fixed TypeError bug in upload when uploading with checksum=True.
internetarchive - Version 1.8.1

Published by jjjake over 6 years ago

Bugfixes

  • Fixed bug in ia tasks --get-task-log that was returning an unable to parse JSON error.
internetarchive - Version 1.8.0

Published by jjjake over 6 years ago

Feautres and Improvements

  • Only use backports.csv for python2 in support of FreeBDS port.
  • Added a nicer error message to ia search for authentication errors.
  • Added support for using netrc files in ia configure.
  • Added --remove option to ia metadata for removing values from single or mutli-field metadata elements.
  • Added support for appending a metadata value to an existing metadata element (as a new entry, not simply appending to a string).
  • Added --no-change-timestamp flag to ia download.
    Download files retain the timestamp of "now", not of the source material when this option is used.

Bugfixes

  • Fixed bug in upload where StringIO objects were not uploadable.
  • Fixed encoding issues that were causing some ia tasks commands to fail.
  • Fixed bug where keep-old-version wasn't working in ia move.
  • Fixed bug in internetarchive.api.modify_metadata where debug and other args were not honoured.
internetarchive - Version 1.7.7

Published by jjjake over 6 years ago

Feautres and Improvements

  • Added support for downloading on-the-fly archive_marc.xml files.

Bugfixes

  • Improved syntax checking in ia move and ia copy.
  • Added Connection:close header to all requests to force close connections after each request.
    This is a workaround for dealing with a bug on archive.org servers where the server hangs up before sending the complete response.
internetarchive - Version 1.7.6

Published by jjjake almost 7 years ago

Feautres and Improvements

  • Added ability to set the remote-name for a directory in ia upload (previously you could only do this for single files).

Bugfixes

  • Fixed bug in ia delete where all requests were failing due to a typo in a function arg.
internetarchive - Version 1.7.5

Published by jjjake almost 7 years ago

Feautres and Improvements

  • Turned on x-archive-keep-old-version S3 header by default for all ia upload, ia delete, ia copy, and ia move commands.
    This means that any ia command that clobbers or deletes a command, will save a version of the file in <identifier>/history/files/$key.~N~.
    This is only on by default in the CLI, and not in the Python lib.
    It can be turne off by adding -H x-archive-keep-old-version:0 to any ia upload, ia delete, ia copy, or ia move command.
internetarchive - Version 1.7.4

Published by jjjake almost 7 years ago

Feautres and Improvements

  • Increased timeout in search from 12 seconds to 24.
  • Added ability to set the max_retries in :func:internetarchive.search_items.
  • Made :meth:internetarchive.ArchiveSession.mount_http_adapter a public method for supporting complex custom retry logic.
  • Added --timeout option to ia search for setting a custom timeout.

Bugfixes

  • The scraping API has reverted to using items key rather than docs key.
    v1.7.3 will still work, but this change keeps ia consistent with the API.
internetarchive - Version 1.7.2

Published by jjjake about 7 years ago

Feautres and Improvements

  • Added support for adding custom headers to ia search.

Bugfixes

  • internetarchive.utils.get_s3_xml_text() is used to parse errors returned by S3 in XML.
    Sometimes there is no XML in the response.
    Most of the time this is due to 5xx errors.
    Either way, we want to always return the HTTPError, even if the XML parsing fails.
  • Fixed a regression where : was being stripped from filenames in upload.
  • Do not create a directory in download() when return_responses is True.
  • Fixed bug in upload where file-like objects were failing with a TypeError exception.
internetarchive - Version 1.7.1

Published by jjjake about 7 years ago

Bugfixes

  • Fixed bug in ia upload where all commands would fail if multiple collections were specified (e.g. -m collection:foo -m collection:bar).
internetarchive - Version 1.7.0

Published by jjjake about 7 years ago

Feautres and Improvements

  • Loosened up jsonpatch requirements, as the metadata API now supports more recent versions of the JSON Patch standard.
  • Added support for building "snap" packages (https://snapcraft.io/).

Bugfixes

  • Fixed bug in upload where users were unable to add their own timeout via request_kwargs.
  • Fixed bug where files with non-ascii filenames failed to upload on some platforms.
  • Fixed bug in upload where metadata keys with an index (e.g. subject[0]) would make the request fail if the key was the only indexed key provided.
  • Added a default timeout to ArchiveSession.s3_is_overloaded().
    If it times out now, it returns True (as in, yes, S3 is overloaded).
internetarchive - Version 1.6.0

Published by jjjake over 7 years ago

Features and Improvements

  • Added 60 second timeout to all upload requests.
  • Added support for uploading empty files.
  • Refactored Item.get_files() to be faster, especially for items with many files.
  • Updated search to use IA-S3 keys for auth instead of cookies.

Bugfixes

  • Fixed bug in upload where derives weren't being queued in some cases where checksum=True was set.
  • Fixed bug where ia tasks and other Catalog functions were always using HTTP even when it should have been HTTPS.
  • ia metadata was exiting with a non-zero status for "no changes to xml" errors.
    This now exits with 0, as nearly every time this happens it should not be considered an "error".
  • Added unicode support to ia upload --spreadsheet and ia metadata --spreadsheet using the backports.csv module.
  • Fixed bug in ia upload --spreadsheet where some metadata was accidentally being copied from previous rows
    (e.g. when multiple subjects were used).
  • Submitter wasn't being added to ia tasks --json ouptut, it now is.
  • row_type in ia tasks --json was returning integer for row-type rather than name (e.g. 'red').
internetarchive - Version 1.4.0

Published by jjjake over 7 years ago

Features and Improvements

  • Added ia copy and ia move for copying and moving files in archive.org items.
  • Added support for outputing JSON in ia tasks.
  • Added support to ia download to write to stdout instead of file.

Bugfixes

  • Fixed bug in upload where AttributeError was rasied when trying to upload file-like objects without a name attribute.
  • Removed identifier validation from ia delete.
    If an identifier already exists, we don't need to validate it.
    This only makes things annoying if an identifier exists but fails internetarchive id validation.
  • Fixed bug where error message isn't returned in ia upload if the response body is not XML.
    Ideally IA-S3 would always return XML, but that's not the case as of now.
    Try to dump the HTML in the S3 response if unable to parse XML.
  • Fixed bug where ArchiveSession headers weren't being sent in prepared requests.
  • Fixed bug in ia upload --size-hint where value was an integer, but requests requries it to be a string.
  • Added support for downloading files to stdout in ia download and File.download.
internetarchive - Version 1.0.8

Published by jjjake about 8 years ago

Features and Improvements

  • Increased maximum identifier length from 80 to 100 characters in ia upload.

Bugfixes

  • As of version 2.11.0 of the requests library, all header values must be strings (i.e. not integers).
    internetarchive now converts all header values to strings.
internetarchive - Version 1.0.7

Published by jjjake about 8 years ago

Features and Improvements

  • Added internetarchive.api.get_user_info().
internetarchive - Version 1.0.5

Published by jjjake over 8 years ago

Features and Improvements

  • All metadata writes are now submitted at -5 priority by default. This is friendlier to the archive.org catalog, and should only be changed for one-off metadata writes.
  • Expanded scope of valid identifiers in utils.validate_ia_identifier (i.e. ia upload). Periods are now allowed. Periods, underscores, and dashes are not allowed as the first character.
internetarchive - Version 1.0.4

Published by jjjake over 8 years ago

Features and Improvements

  • Search now uses the v1 scraping API endpoint.
  • Moved internetarchive.item.Item.upload.iter_directory() to internetarchive.utils.
  • Added support for downloading "on-the-fly" files (e.g. EPUB, MOBI, and DAISY) via ia download <id> --on-the-fly or item.download(on_the_fly=True).

Bugfixes

  • s3_is_overloaded() now returns True if the call is unsuccessful.
  • Fixed bug in upload where a derive task wasn't being queued when a directory is uploaded.
Package Rankings
Top 1.8% on Pypi.org
Top 6.67% on Proxy.golang.org
Top 30.18% on Formulae.brew.sh