Toxy 1.4 - Having trouble with PDF, RTF, PPTX


Hi I am currently evaluating your Toxy component, via the Extraction Viewer application. I installed via Nuget and have Toxy version 1.4, it is not clear why it wasn't v1.5. In any case I am having trouble opening the following file formats:

PDF - I tried opening the PDF file found here, and the application hangs - http://www.digitalpreservation.gov/formats/digformatspecs/Word97-2007BinaryFileFormat(doc)Specification.pdf

RTF - Again I downloaded a sample RTF document to test the text extraction and I got no text at all. The RTF document I tried is here - http://www.snake.net/software/RTF/Old/RTF-Spec-1.2.rtf

PPTX - In the toxy-master.zip on github, there is a PPTX presentation "Toxy Framework.pptx" which i thought I would try as an example of a PPTX file. Again I get an exception ".pptx is not supported for CreateText"

Maybe I am doing something wrong? Any comments would be appreciated! Thanks


tonyqus wrote Aug 3, 2015 at 9:41 PM

We will try to fix some issues for PDF extraction.

PPTX extraction looks to be a bug or something. It's a internal issue thrown by NPOI or something.

tonyqus wrote Aug 26, 2015 at 10:43 PM

I tested the RTF case. It's actually a bug. We will fix it in 1.6.