
Create WebKit/Safari .webarchive files on any platform

MIT License



This is a quick hack demonstrating how to create WebKit/Safari .webarchive files, inspired by pocket-archive-stream.


TARGET_URL= python3

Why .webarchive?

.webarchive is the native web page archive format on the Mac, and is essentially a serialized snapshot of Safari/WebKit state. On a Mac, these files are Spotlight-indexable and can be opened by just about anything that takes a "webpage" as input.

Despite the rising prominence of WARC as the standard web archiving format (which to this day requires plug-ins to be viewable on a browser) I quite like .webarchive, and built this in order to both demonstrate how to use it and have a minimally viable archive creator I can deploy as a service.

Anatomy of a .webarchive file

The file format is a nested binary .plist, with roughly the following structure:

    "WebMainResource": {
        "WebResourceURL": String(),
        "WebResourceMIMEType": String(),
        "WebResourceResponse": NSKeyedArchiver(NSObject)),
        "WebResourceData": Bytes(),
        "WebResourceTextEncodingName": String(optional=True)
    "WebSubresources": [
        {item, item, item...}


So creating a .webarchive turns out to be fairly straightforward if you simply build a dict with the right structure and then serialize it using biplist (which works on any platform).

The only hitch would be WebResourceResponse (which uses a rather more complex way to encode the HTTP result headers), but fortunately that appears not to be necessary at all.

Next Steps