Nick Reddan – PDF Indexer Excel
Background
I wanted to create an index from a PDF file in a database format so it could be uploaded to a the web and made more accessible to more people. When you copy a phrase in a PDF reader it may contain additional cariage returns which when pasted to Excel create multiple cells. I developed a way of removing these extranious cells then developed macros to add the indexing classes and to rearrange "John Smith" to "SMITH, John". This latter rearrangement is necessary for genelogical purposes.
How it works
Set up
The defaults sheet data
First go to the "Defaults" sheet. Ther are two things you may want to change. You can enter up to 10 levels for the first level terms in your index. Typically you will have at least "Person" and "Place". You might include say "Famous Person". The macro uses them to name the buttons in the userforms that guide you through building the index.
The second parameter is the "size_bunch" which is the number of SQL insert statements in each column of the "SQL out" sheet. This sheet is used to load your data to an MySQL or other remote database.
The index sheet
The index sheet data
The index sheet contains you index data. It can be edited by hand say, to correct mistakes or remove duplicates.
It has buttons that can be used to run various macros. The first button "Index term in the clipboard and add row" copies the data in the clip board to the next available row and puts all the data into one cell. It then starts a series of userforms which allow you to decide the way these data will be indexed.
How you select the first level indexing term
Once you select the first level term another selection needs to be made. That is whether the data is a given name family name that needs to be reversed in the indexing process. If you say it is a family name indexing the family name will appear in colum "C" and the Given name in column "D".
Index Family name first
Select Volume part and Page
Finally you select the Volume, Part and Page numbers. The defaults for this selection are the values in these columns in the row immediately above the new one.
Creating the SQL
The SQL Base sheet
On this sheet you can add the name of the SQL table and its fields. There are then used to create the SQL for putting you data on the web. The create table statement is in the cell D12 and the header for the insert statenments is in cell E12.
To run the macro go to the Visual basic window or click on the macros button in the View ribbon and run the export_sql macro.
The excel macro workbook
Also see the "How to use this workbook" sheet in the workbook.
Creating a text index for including in a document
To make a text index run the macro make_text_index. The index will not have duplicates and will have the pages numbers separated by commas. The text index can be found on the "all index text out" sheet and be used as an aid to see whether your indexing has been consistent. There will be a semi-colon and space after indexed term, and a comma between each page number. You can use search and replace to have other characters as separators.