This weekend I had the pleasure (pain) of working with an API with infinite storage and infinite expanse, except it has little documentation and even littler support community. But, it’s basically free S3, when it works…

Archive.org API is similar to S3, with objects (files) contained in a bucket (organizer) system. The base url is s3.us.archive.org

Putting files on there requires an odd curl syntax, with basic-auth – this may cause issues with some existing S3 wrappers/SDKs.

  • This syntax will both create a bucket and upload a file at the same time.
  • curl --location --header 'x-amz-auto-make-bucket:1'
           --header 'x-archive-meta01-collection:opensource'
           --header 'x-archive-meta-mediatype:texts'
           --header "authorization: LOW [KEY]:[SECRET]"
           --upload-file [/path/to/filename] http://s3.us.archive.org/[bucketname]/[filename]
    

    After you call this, you should delay() or sleep() for 10 seconds or more, or do a mini-DDOS loop checking if the bucket resource exists, before proceeding to adding more files:

  • This will let you add more files to the above bucket:
  • curl --location --header "authorization: LOW [KEY]:[SECRET]"
           --silent --show-error
           --upload-file [/path/to/filename] http://s3.us.archive.org/[bucketname]/[filename]
    

    Make sure that a space separates all parameter and values.

  • Once the object is stored…
    • The endpoint to load its page is:

      http://archive.org/details/[bucketname]/[objectfilename]

    • The endpoing to download the content is:

      http://archive.org/download/[bucketname]/[objectfilename]

  • The server is very unreliable. If you followed the above, and put sleep() or delay() methods in between calls, and things still don’t work, it’s very likely archive.org ran into a blip, usually in retrieval, rather than storage.