2007-10-26 PDF Madness

So, you have two copies of a document. One of them contains an outline or bookmarks that you want to use for the other document. Except that maybe you also want to fix it up a bit. Basically it all boils down to this:

1. Extract bookmarks from an existing PDF document into a text file.

2. Edit bookmarks.

3. Apply bookmarks from the text file to an existing PDF document.

Painful experimentation has led me to this: Use mbtPdfAsm.

mbtPdfAsm

I also submitted this – in French! – to the mbtPdfAsm author → Editer des signets.

Editer des signets

First, get the outline:

mbtPdfAsm -mSource.pdf -gO > Outline.txt

This will take the outline from Source.pdf and write it to Outline.txt.

Edit the outline. The format is hard to read and hard to write. Here’s an example:

1 0 1 1 My Book
2 1 1 1 Front Cover
3 1 2 2 Credits
4 1 3 3 Contents

Edit at will, harr harr, then apply the new outline:

mbtPdfAsm -mSource.pdf -dResult.pdf -oOutline.txt

This will take Source.pdf and Outline.txt creating Result.pdf.

Clearly we need a program to be able to write a decent outline.

outline-to-text.pl

This is a Perl program to convert the outline produced by mbtPdfAsm into editable text.

#!/usr/bin/perl
while (<>) {
  my ($key, $parent, $no, $page, $title) = m/(\d+) (\d+) (\d+) (\d+) (.*)/;
  next unless $key;
  print '    ' if $parent > 0;
  print "$page $title\n";
}

Example use:

perl ~/bin/outline-to-text.pl < Outline.txt > Editable.txt

The format of Editable.txt is as follows:

1. If the line starts with a page number, it’s a chapter

2. If the line is indented and starts with a page number, it’s a section

3. After the page number is a space

4. Everything after that is the title of the bookmark

Example:

1 My Book
    1 Front Cover
    2 Credits
    3 Contents
4 My First Chapter
    4 First Section
    7 Second Section

We also need something to convert it back...

text-to-outline.pl

This is a Perl program to convert the editable text described above into the kind of outline used by mbtPdfAsm.

#!/usr/bin/perl
my $id = 1;
my ($chapter, $chapterno, $section);
while (<>) {
  my ($indent, $page, $title) = m/([ \t]*)(\d+) (.*)/;
  next unless $page;
  if (not $indent) {
    $chapter = $id;
    $chapterno++;
    $section = 1;
  }
  print join(' ', $id++,
	     $indent ? $chapter : 0,
	     $indent ? $section++ : $chapterno,
	     $page, $title), "\n";
}

Example use:

perl ~/bin/text-to-outline.pl < Editable.txt > Outline.txt

And it works! 😄

​#PDF ​#Software