OverviewΒΆ

The WARC Search Engine (shortly WSE) is a scalable Erlang server that lets you index all your WARC files in a distributed manner.
WSE uses Elastic Search as a default backend to ensure a linear scalability.
Whether you have ten, a thousand, or million WARC files, WSE will let you index them all in parallel on multiple nodes.
For maximum performance, not only the indexing process is parallel between WARCs, but also inside a WARC itself
by indexing multiple WARC-Records at a time.
Moreover, WSE supports plain, and compressed WARC files, thanks to WSDK.
Immediate benefits for your programs are:
  • Linear scalability
  • Apache Lucene search based capabilities
  • Built-in backpressure support on connections that are indexing too fast
  • A simple yet intuitive API (02 function calls)
  • Can be debugged and fixed while running (no downtime)
  • All features available through a RESTful JSON API
  • No Single Point of Failure (SPOF)
  • Apache Tika for the detection and the extraction of metadata and structured text content from HTML, PDF, Word, PPT, etc.
WSE is a FREE SOFTWARE API brought to you by Aleph Archives.
It is provided with a simple API, examples, and support.
WSE is Standalone (32/64-bit), multicore-aware, portable (Linux, Windows, OSX, etc.), and requires no external dependency.

erlang logo