Overview

WARC Software Developement Kit (shortly WSDK) represents a set of simple, tiny (~300K), and highly optimized
Erlang modules to manipulate the WARC ISO 28500:2009 file format.
WSDK is used to read, validate, transform, and write plain (i.e non compressed) and GZIP compressed WARC files.
Immediate benefits for your programs are:
  • Fault tolerance
  • Combine heterogeneous tasks
  • Extremely fast response time
  • Access to an efficient networking stack
  • Can be debugged and fixed while running (no downtime)
  • Highly scalable: rock-solid base for Cloud developments
WSDK is a FREE SOFTWARE API brought to you by Aleph Archives.
It is provided with a documented API, large number of unit tests (UnitTestReport), examples, and support.
The core library was extracted, and adapted from Aleph Archives’s commercial Web Archiving platform CAMA.
WSDK is Standalone (32/64-bit), multicore-aware, portable (Linux, Windows, OSX, etc.), and requires no external dependency.
Other modules and softwares will be released soon at WebArchivingBucket ... stay tuned.

Alternatives

Some open-source alternatives exist:

Project Language Company Activity Multicore-Aware Unit Tests Code Maturity Last Commit
WARC-Tools C Hanzo + IA dead project no
2009-06-23
WARC-Tools-Py Python Hanzo medium no
2012-03-02
JWAT Java Royal Danish Library high no extensive young 2012-08-26

erlang logo