This is a simple PDF Editor to just do the basic document manipulation on scanned docs like rotating and deleting pages + splitting and combining docs.
Background
The background to this is that I’ve been working for some time on removing as much paper filing from our house as possible. We’ve found that scanning with a decent scanner (we use the Fujitsu ScanSnap 1500M*) is now almost error free and the conversion of documents to searchable PDFs with OCR is also pretty good. But despite the attempts by Fujitsu to provide suitable filing software we’ve found the process of finding the right date and categorising scanned documents to be very time consuming.
I’ve looked hard for scan filing software and found some possible alternatives but in each case eventually decided that they didn’t do what we wanted, were too darned expensive or just weren’t open enough to allow modification.
So I have embarked on a process of development to see how difficult if would be to address some of the most glaring time-wasters. The hit-list is as follows:
- Automatically finding the right date for filing
- Categorising the document
- Basic editing of the scanned PDF (e.g. deleting unwanted pages, combining PDFs, etc.)
I’ve decided to attach these challenges in reverse order and this post is about a simple PDF editor which I’m hoping will reduce the time we spend manipulating documents before we file them.
* This has been superseded by a similar looking model which also gets rave reviews.
Operation
This is a WPF program written to run on Windows. I’ve used a library called MahApps which does an amazing job of making a windows desktop app look more modern.
As you can see in the image the program is very simple and has the following features:
- There are buttons over each page for rotation and deletion.
- There’s a scissors icon which can be used to split the file at any point and create multiple output files from one input.
- If you add a second (or subsequent) file it is appended to the end of the document and then when you save the documents are combined together.
- Pages can be dragged and dropped from place to place in the document.
Software
There’s also nothing clever about the code. What I have done is to use two libraries:
The combination of these allows me to avoid using the horrible Acrobat viewer control or any of the plethora of third-party alternatives. All the control’s I’ve tried (and that’s a lot) have really poor APIs and seem to concentrate on giving the user stacks of complex functions accessed through myriad toolbars and menus.
All I need to do is to display PDF pages (Ghostscript.NET is used for rendering) and manipulate the actual PDF files (that’s where iTextSharp comes in). It’s a pretty simple requirement and this approach seems to work pretty well.
I’ve also made use of an auto-scroll extension for drag-and-drop which I think originated in this blog.