internetarchive

A Python and Command-Line Interface to Archive.org

AGPL-3.0 License

Downloads
161.6K
Stars
1.6K
Committers
55

Bot releases are visible (Hide)

internetarchive - Version 4.1.0 Latest Release

Published by jjjake 5 months ago

What's Changed

Full Changelog: https://github.com/jjjake/internetarchive/compare/v4.0.1...v4.1.0

internetarchive - Version 4.0.1

Published by jjjake 6 months ago

Features and Improvements

  • Partially downloaded files will now automatically resume where they left off when retried.
  • Use Last-Modified header to set all mtimes (this includes files.xml now).
internetarchive - Version 3.7.0

Published by jjjake 7 months ago

Features and Improvements

  • Added support for JSON Patch test operations, via the expect parameter.
  • Added support for moving values via --append-list (Now, rather than ignoring any requests where the value is already present, --append-list will move the value to the end of the list).
  • Switched to importlib-metadata to drop deprecated pkg_resources.

Bugfixes

  • Fixed automatic size hint on uploads.
  • Fixed bug where auth wasn't being sent for searches with user_aggs params.
internetarchive - Version 3.4.0

Published by jjjake over 1 year ago

Features and Improvements

  • Added parameters for filtering files based on their source value in files.xml.
  • Added support for downloading multiple files to stdout.
  • Added timeout parameter to download.
internetarchive - Version 3.3.0

Published by jjjake over 1 year ago

Features and Improvements

  • Added support for inserting metadata into an existing multi-value metadata
    field. It differs from ia metadata <id> --modify collection[0]:foo in
    that it does not clobber. For example,
    ia metadata <id> --insert collection[0]:foo will insert foo as the
    first collection, it will not clobber.

Bugfixes

  • Fixed bug in search where timeouts would always be returned on queries
    submitted to the files index where more than 10,000 results would be
    returned.
internetarchive - Version 3.2.0

Published by jjjake almost 2 years ago

Features and Improvements

  • Added support for admins to delete reviews via itemname.
internetarchive - Version 3.1.0

Published by jjjake almost 2 years ago

Bugfixes

  • Fixed bug in ia search --fts where --itemlist was printing empyt lines.
  • Fixed bug in ia search --fts where -p scope:all was not working.
  • Fixed directory creation race conditions in download.
  • Fixed bug in ia download --stdout where nothing would be printed to stdout
    if the specified file existed on disk.
  • Fixed bug that made it impossible to upload to user items.
  • Fixed memoryview error when running Item.upload with StringIO input
    and verbose=True.
  • Fixed bug in upload where a period was not being expanded properly to the
    contents of the current directory.

Features and Improvements

  • Added support for admins to delete other users reviews.
  • Added support for excluding files in ia download via the --exclude parameter.
  • Various refactoring and code simplifications.
internetarchive - Version 3.0.2

Published by jjjake over 2 years ago

Bugfixes

  • Fixed bug where installation would fail in some cases if requests, tqdm,
    or jsonpatch were not already installed.
internetarchive - Version 3.0.1

Published by jjjake over 2 years ago

Features and Improvements

  • Cut down on the number of HTTP requests made by search.
  • Added Python type hints, and other Python 3 improvements.
internetarchive - Version 3.0.0

Published by jjjake over 2 years ago

Breaking changes

  • Removed Python 2.7, 3.5, and 3.6 support
  • ia download no longer has a --verbose option, and --silent has been renamed to --quiet.
  • internetarchive.download, Item.download and File.download no longer have a silent
    keyword argument. They are silent by default now unless verbose is set to True.

Features and Improvements

  • page parameter is no longer required if rows parameter is specified in search requests.
  • advancedsearch.php endpoint now supports IAS3 authorization.
  • ia upload now has a --keep-directories option to use the full local file paths as the
    remote name.
  • Added progress bars to ia download

Bugfixes

  • Fixed treatment of list-like file metadata in ia list under Python 3
  • Fixed ia upload --debug only displaying the first request.
  • Fixed uploading from stdin crashing with UnicodeDecodeError or TypeError exception.
  • Fixed ia upload silently ignoring exceptions.
  • Fixed uploading from a spreadsheet with a BOM (UTF-8 byte-order mark) raising a KeyError.
  • Fixed uploading from a spreadsheet not reusing the identifier column.
  • Fixed uploading from a spreadsheet not correctly dropping the item column from metadata.
  • Fixed uploading from a spreadsheet with --checksum crashing on skipped files.
  • Fixed minor bug in S3 overload check on upload error retries.
  • Fixed various messages being printed to stdout instead of stderr.
  • Fixed format selection for on-the-fly files.
internetarchive - Version 2.3.0

Published by jjjake over 2 years ago

Features and Improvements

  • Added support for IA_CONFIG_FILE environment variable to specify the configuration file path.
  • Added --no-derive option to ia copy and ia move.
  • Added --no-backup option to ia copy, ia move, ia upload, and ia delete.

Bugfixes

  • Fixed bug where queries to the Scrape API (e.g. most search requests made by internetarchive)
    would fail to return all docs without any error reporting, if the Scrape API times out.
    All queries to the Scrape API are now tested to assert the number of docs returned matches the
    hit count returned by the Scrape API.
    If these numbers don't match, an exception is thrown in the Python API and the CLI exits with
    a non-zero exit code and error message.
  • Use .archive.org as the default cookie domain. This fixes a bug where an AttributeError exception
    would be raised if a cookie wasn't set in a config file.
internetarchive - Version 2.2.0

Published by jjjake almost 3 years ago

Features and Improvements

  • Added ia reviews <id> --delete.
  • Added ability to fetch a users reviews from an item via ia reviews <id>.

Bugfixes

  • Fixed bug in ArchiveSession object where domains weren't getting set properly for cookies.
    This caused archive.org cookies to be sent to other domains.
  • Fixed bug in URL param parser for CLI.
  • Fixed Python 2 bug in ia upload --spreadsheet.
internetarchive - Version 2.1.0

Published by jjjake about 3 years ago

Features and Improvements

  • Better error messages in ia upload --spreadsheet.
  • Added support for REMOTE_NAME in ia upload --spreadsheet via a REMOTE_NAME column.
  • Implemented XDG Base Directory specification.

Bugfixes

  • Fixed bug in FTS where searches would crash with a TypeError exception.
  • Improved Python 2 compatability.
internetarchive - Version 2.0.1

Published by jjjake over 3 years ago

Bugfixes

  • Exit with 0 in ia tasks --cmd ... if a task is already queued or running.
internetarchive - Version 2.0.0

Published by jjjake over 3 years ago

Features and Improvements

  • Automatic paging scrolling added to ia search --fts.
  • Default support for lucene queries in ia search --fts.
  • Added support for getting rate-limit information from the Tasks API (i.e. ia tasks --get-rate-limit --cmd derive.php).
  • Added ability to set a remote-filename in a spreadsheet when uploading via ia upload --spreadsheet ....

Bugfixes

  • Fixed bug in ia metadata --remove ... where multiple collections would be removed
    if the specified collection was a substring of any of the existing collections.
  • Fixed bug in ia metadata --remove ... where removing multiple collections was sometimes
    not supported.
internetarchive - Version 1.9.9

Published by jjjake over 3 years ago

Features and Improvements

  • Added beta support for FTS API.
  • Validate identifiers in spreadsheet before uploading file with ia upload --spreadsheet.
  • Added ia configure --print-cookies.
    This is helpful for using your archive.org cookies in other programs like curl.
    e.g. curl -b $(ia configure --print-cookies) <url> ...
internetarchive - Version 1.9.6

Published by jjjake almost 4 years ago

Features and Improvements

  • Added ability to submit tasks with a reduced priority.
  • Added ability to add headers to modify_metadata requests.

Bugfixes

  • Bumped version requirements for six.
    This addresses the "No module named collections_abc" error.
internetarchive - Version 1.9.4

Published by jjjake over 4 years ago

Features and Improvements

  • Added support for adding file-level metadata at time of upload.
  • Added --no-backup to ia upload to turn off backups.

Bugfixes

  • Fixed bug in internetarchive.get_tasks where no tasks were returned unless catalog or history params were provided.
  • Fixed bug in upload where headers were being reused in certain cases.
    This lead to issues such as queue-derive being turned off in some cases.
  • Fix crash in ia tasks when a task log contains invalid UTF-8 character.
  • Fixed bug in upload where requests were not being closed.
internetarchive - Version 1.9.3

Published by jjjake over 4 years ago

Features and Improvements

  • Added support for remvoing items from simplelists as if they were collections.
  • Added Item.derive() method for deriving items.
  • Added Item.fixer() method for submitting fixer tasks.
  • Added --task-args to ia tasks for submitting task args to the Tasks API.

Bugfixes

  • Minor bug fix in ia tasks to fix support for tasks that do not require a --comment option.
internetarchive - Version 1.9.2

Published by jjjake over 4 years ago

Features and Improvements

  • Switched to tqdm for progress bar (clint is no longer maintained).
  • Added Item.identifier_available() method for calling check_identifier.php.
  • Added support for opening details page in default browser after upload.
  • Added support for using item or identifier as column header in spreadsheet mode.
  • Added ArchiveSession.get_my_catalog() method for retrieving running/queued tasks.
  • Removed backports.csv requirement for newer Python releases.
  • Authorization header is now used for metadata reads, to support privileged access to /metadata.
  • ia download no longer downloads history dir by default.
  • Added ignore_history_dir to Item.download(). The default is False.

Bugfixes

  • Fixed bug in ia copy and ia move where filenames weren't being encoded/quoted correctly.
  • Fixed bug in Item.get_all_item_tasks() where all calls would fail unless a dict was provided to params.
  • Read from ~/.config/ia.ini with fallback to ~/.ia regardless of the existence of ~/.config
  • Fixed S3 overload message always mentioning the total maximum number of retries, not the remaining ones.
  • Fixed bug where a KeyError exception would be raised on most calls to dark items.
  • Fixed bug where md5 was being calculated for every upload.
Package Rankings
Top 1.8% on Pypi.org
Top 6.67% on Proxy.golang.org
Top 30.18% on Formulae.brew.sh