Design & implementation of a PDF to Excel conversion tool (P2X)
by Penny, LaToyia DeVonne, M.S., OKLAHOMA STATE UNIVERSITY, 2008, 90 pages; 1467427

Abstract:

The scope of this study is limited to focus on an implementation of a conversion tool (P2X); developed to automatically convert large batches of PDF tabular data (PDF tables) to spreadsheet format (MS Excel). We begin by introducing the PDF specification standards on table structure. A scenario example of the problem and a description of the conversion tool (P2X) architecture. Specific details of the algorithms and applications used during the PDF to plain text format (PTF) conversion process follows. A brief overview of the reformatting process and a formalization of the table tags that we identified using regular expressions will be introduced. Lastly, a description of the GUI, its images, and functionality will be discussed in the User Interface section.

Findings and conclusions. We have implemented a working conversion tool to show the conversion of PDF tabular data to MS Excel spreadsheets can be simple by use of a graphical user interface with user interaction. This system was produced using the high-level programming languages Java and Visual Basic 6.0. These implementations are presented. A user's manual has been incorporated to validate the use of the system and reduce user error. More visuals of the P2X tool to further assist the user with the problems presented throughout this research. Although P2X proved to be a successful conversion approach, it was discovered at the end of the final testing phase that the final output of the text data stored in the Excel spreadsheet file will need minimal manual editing by the user to dispose of unwanted non-breaking space and to suit the individual user's storage preferences. These preferences are expected to vary on a case-by-case basis.

 
AdviserK. M. George
SchoolOKLAHOMA STATE UNIVERSITY
SourceMAI/ 47-06, p. , Sep 2009
Source TypeThesis
SubjectsComputer science
Publication Number1467427
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:1467427
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.