This project is read-only.

Who are We
Neuzilla is the studio behind Toxy. For detail, you can check

What's Toxy
Toxy is a .NET data/text extraction framework similar to Apache Tika in Java. It supports a lot of popular formats such as docx, xlsx, xls, pdf, csv, txt, epub, html and so on.
Toxy Supported File Formats

Why Toxy
In the past, we have to use IFilter to extract texts from other documents. But Toxy is platform independent. It will try to support not only Windows but also Linux (with Mono installed). The usage of Toxy will be very easy. You don't need to care much about what extension you are extracting because it is a clever framework to help identify the file formats and extract the data/text into some unified structures.
Toxy Road Map

Toxy on SNS
QQ群: 297128022
Latest Source Code
Codeplex: (synced with github periodically)

Unified Data Structures

For documents, the data structure is called ToxyDocument.
For spreadsheets, the data structure is called ToxySpreadsheet.
For emails, the data structure is called ToxyEmail.
For business cards, the data structure is called ToxyBusinessCard.
For DOM based structure, the data structue is called ToxyDom.
For metadata, the data structure is called ToxyMetadata (Since Toxy 1.3)
For slideshows, the data structure is called ToxySlideshow (Since Toxy 1.5)


Last edited Mar 22, 2015 at 8:52 PM by tonyqus, version 30