python - PDFMINER tool pdf2txt grabbling data order -
i want extract data pdf files. i'm using pdfminer tool pdf2txt convert pdf plain text. text file produced has messed order of data( wherever table encountered , after also). tried cnverting pdf html but, alas, same results. new python... , couldn't understand extensive working of pdfminer library. there way preserve order of data ?
try running script these additional args: -m 30 -w .95 -l .03
i have had same problem described, , improved output lot. however, better results pdftotext.exe, part of xpdf. download here:
http://www.foolabs.com/xpdf/download.html
mike
Comments
Post a Comment