opus.stedden

Git Versioned PDF Visualizer

jump to the code

I've been writing a very long document over the past couple of years. Because I love tracking my own behavior and Open science, I decided to version control the whole editing process of my thesis on GitHub. Because I've been keeping track of all my updates with git, that means I can monitor all the changes that have gone into my thesis over time. This can also be useful for collaborating on writing papers too (though my prototypical Luddite advisor would rather do things by emailing Mac Pages documents).

Git PDF movie maker

As a cool way to visualize all the changes, I wrote a series of a couple of scripts that checkout each version and make a movie out of all the .pdfs in the repo over time. Here's the current state (note that the thesis isn't even close to finished yet; I'll update when I get more on it.)

If you think you'd like to use this, you can bounce over to the git-pdf-viz GitHub repo and download it now. Read on below for more info on how it works, how to modify it, and to see the results of running git-pdf-viz on my thesis.

A Real Bash!

Writing bash scripts is a party, I know. This entire project is comprised of just a few bash scripts that are meant to be run sequentially (with some editing maybe).

  1. save_all_pdfs
  2. pdfs_to_ims
  3. ims_to_montage
  4. montage_to_frame
  5. frame_to_movie

Save All PDFs

The first script really takes care of the core of this program. There's lots of extra details but the key piece of save_all_pdfs.sh is "git rev-list master," which gives you all the commit ids for your repo. This pseudo-bash shows the general idea:

for commit in $(git rev-list master)
do
    git checkout $commit
    ...copy all pdfs to somewhere safe...
done

"git checkout" updates all the files that are sitting in the project directory locally. So with the list of revisions from "git rev-list master" I can reset the state of my file system to reflect the state at each committed phase of the project. After that, I just need to copy all the files I need (in this case my pdfs) somewhere safe for use later. (Caveat: I'm currently only copying pdfs if they have a corresponding latex file because I wanted to ignore any figures that weren't included. You can modify the script accordingly if you need all your pdfs copied.)

Playing with pdfs and images

The next three scripts rely on GhostScript and ImageMagick suite to jockey our pdfs into a composite image that will eventually be turned into movie frames. You can view the scripts themselves for more details about implementation, but I'll explain the idea behind each script for reference.

  • gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf *.pdf uses ghostscript to merge all the saved pdfs together
  • convert merged.pdf im/i_%03d.png converts the merged pdf to a sequence of individual pngs
  • montage -mode concatenate -tile 10x i_* ../final.png stitches all the individual png "pages" into a "montage" panel with 10 columns
  • convert "${now}/final.png" -resize 1000x -gravity northwest -background white -extent 1000x700 -colorspace RGB "PNG32:bydate/${now}.png" rescales each montage and ensures that all images are the same size and colorspace
  • mogrify -gravity southeast -pointsize 26 -annotate +10+10 %t *.png adds a little timestamp (based on the title) in the southeast corner

At the end of this process there will be a folder named bydate/ with an image for every frame of a movie that's about to get made.

Lights, Camera, Action

As the final step we use ffmpeg to convert all of the images into an mp4 movie. This requires two lines because ffmpeg is made to work with numerically sequential files, and my files are named based on their date.

ls *.png | sort -V | xargs -I {} echo "file '{}'" > list.txt
ffmpeg -r 2 -f concat -i list.txt -r 30 -c:v libx264 -pix_fmt yuv420p ../git_pdf_viz.mp4

This is a good example of how compicated ghostscript, ImageMagick, and ffmpeg can be to use. Through all of this there are about 20 extra parameters that have to get passed to each of these programs to make them work properly. I can't explain all of them in detail, so I recommend the individual software's documentation if you find you need to modify these extensively. These are amazingly powerful programs that make life so much easier for any kind of batch audiovisual project so I highly encourage getting acquainted with them at some point in your life.

Have fun, and I hope that this encourages you to use open source version control whenever you write/edit any large document project.