python - PDFMINER tool pdf2txt grabbling data order -

i want extract data pdf files. i'm using pdfminer tool pdf2txt convert pdf plain text. text file produced has messed order of data( wherever table encountered , after also). tried cnverting pdf html but, alas, same results. new python... , couldn't understand extensive working of pdfminer library. there way preserve order of data ?

try running script these additional args: -m 30 -w .95 -l .03

i have had same problem described, , improved output lot. however, better results pdftotext.exe, part of xpdf. download here:

http://www.foolabs.com/xpdf/download.html

mike

Search This Blog

Brayton

python - PDFMINER tool pdf2txt grabbling data order -

Comments

Post a Comment

Popular posts from this blog

JQuery Autocomplete without using label, value, id -

JAVA - what is the difference between void and boolean methods? -

c++ - Accessing inactive union member and undefined behavior? -