python - PDFMINER tool pdf2txt grabbling data order -


i want extract data pdf files. i'm using pdfminer tool pdf2txt convert pdf plain text. text file produced has messed order of data( wherever table encountered , after also). tried cnverting pdf html but, alas, same results. new python... , couldn't understand extensive working of pdfminer library. there way preserve order of data ?

try running script these additional args: -m 30 -w .95 -l .03

i have had same problem described, , improved output lot. however, better results pdftotext.exe, part of xpdf. download here:

http://www.foolabs.com/xpdf/download.html

mike


Comments

Popular posts from this blog

javascript - backbone.js Collection.add() doesn't `construct` (`initialize`) an object -

php - Get uncommon values from two or more arrays -

Adding duplicate array rows in Php -