13

I need to merge a few dozed pdfs, and i want all of the input pdfs to start on an odd page in the output pdf.

Example: A.pdf has 3 pages, B.pdf has 4 pages. I don't want my output to have 7 pages. What I want is an 8-page pdf in which pages 1-3 are from A.pdf, page 4 is empty, and pages 5-8 are from B.pdf. How can I do this?

I know about pdftk, but I didn't find such an option in the man page.

5 Answers 5

7

The pypdf library makes this sort of things easy if you're willing to write a bit of Python. Save the code below in a script called pdf-cat-even (or whatever you like), make it executable (chmod +x pdf-cat-even), and run it with output redirected to a file ./pdf-cat-even a.pdf b.pdf >concatenated.pdf — current versions of pypdf don't support writing to a pipe).

#!/usr/bin/env python3 import copy, sys from pypdf import PdfWriter, PdfReader output = PdfWriter() output_page_number = 0 alignment = 2 # to align on even pages for filename in sys.argv[1:]: # This code is executed for every file in turn input = PdfReader(filename) for p in input.pages: # This code is executed for every input page in turn output.add_page(p) output_page_number += 1 while output_page_number % alignment != 0: output.add_blank_page() output_page_number += 1 output.write(sys.stdout.buffer) 
4
  • Thanks, this worked for me! As i prefer to read the names of the pdfs from a file, i've modified your code slightly and posted it as a separate answer. Commented Mar 1, 2013 at 12:28
  • @JanekWarchol If your file names don't contain shell special characters such as whitespace: ./pdf-cat-even $(cat list-of-file-names.txt) >concatenated.pdf Commented Mar 1, 2013 at 12:53
  • Unfortunately they do contain whitespaces. But thanks nevertheless - i didn't realize it could be done this way. Commented Mar 2, 2013 at 15:23
  • @JanekWarchol Then you can use <list-of-file-names.txt tr '\n' '\0' | xargs -0 ./pdf-cat-even >concatenated.pdf Commented Mar 4, 2013 at 1:33
3

The first step is to produce a pdf file with an empty page. You can do this easily with a lot of programs (LibreOffice/OpenOffice, inkscape, (La)TeX, scribus, etc.)

Then just include this empty page where needed:

pdftk A.pdf empty_page.pdf B.pdf output result.pdf 

If you want to do this automatically with a script, you can use e.g. pdftk file.pdf dump_data | grep NumberOfPages | egrep -o '[0-9]*' to extract the page count.

4
  • This feels like a bit of a hack. Though if it works, it works I suppose. Commented Feb 28, 2013 at 16:18
  • This approach almost worked for me: i wrote a script that produced a list of pdfs with epmtyPage.pdf added where necessary, but i couldn't get pdftk to correctly parse this list if the filenames contained spaces. I've tried changing IFS value, using quotation marks but to no avail - maybe it's pdftk's fault. Anyway, the answer using pypdf worked for me. Commented Mar 1, 2013 at 12:18
  • @JanekWarchol Which version of pdftk did you use? At least pdftk 1.44 and newer seems to support whitespaces in filenames. Commented Mar 9, 2013 at 1:11
  • @jofel pdftk --version returns pdftk 1.44. I remember that my more-bash-savvy friends spent at least 15 minutes trying different things to get this work and gave up. Commented Mar 9, 2013 at 8:44
1

Gilles' answer worked for me, but since i have to merge many files it's more convenient if i can read their names from a text file. I've slightly modified Gilles' code to do just that, maybe it would help someone else:

#!/usr/bin/env python # requires PyPdf library, version 1.13 or above - # its homepage is http://pybrary.net/pyPdf/ # running: ./this-script-name file-with-pdf-list > output.pdf import copy, sys from pyPdf import PdfFileWriter, PdfFileReader output = PdfFileWriter() output_page_number = 0 # every new file should start on (n*alignment + 1)th page # (with value 2 this means starting always on an odd page) alignment = 2 listoffiles = open(sys.argv[1]).read().splitlines() for filename in listoffiles: # This code is executed for every file in turn input = PdfFileReader(open(filename)) for p in [input.getPage(i) for i in range(0,input.getNumPages())]: # This code is executed for every input page in turn output.addPage(p) output_page_number += 1 while output_page_number % alignment != 0: output.addBlankPage() output_page_number += 1 output.write(sys.stdout) 
1

You could also use LaTeX to do this (though I'm aware it's probably not what you want). Something like the following should work:

\documentclass{book} \usepackage{pdfpages} \begin{document} \includepdf[pages=-]{A} \cleardoublepage % Make sure we clear to an odd page \includepdf[pages=-]{B} % This inserts all pages. Or you can specify specific pages, a range, or `{}` for a blank page \end{document} 

Note that \cleardoublepage only inserts a blank page with classes that are made for two sided printing (eg. book)

More options and info on pdfpages can be found on CTAN.

6
  • 2
    To include all pages automatically, you can use \includepdf[pages=-]{...}. Commented Feb 28, 2013 at 16:41
  • @jofel Thanks, fixed the question. I think it defaults to all pages too, I just put it in there to show that it was possible to select certain pages. Commented Feb 28, 2013 at 17:51
  • @jofel Also, \cleardoublepage only inserts a blank page if you're using a class made for two sided printing. I was using article which doesn't work; I fixed it and updated the question to reflect that. Commented Feb 28, 2013 at 17:56
  • \includepdf includes only the first page by default (not all pages). \documentclass[twoside]{article} works also. Commented Mar 1, 2013 at 0:57
  • From what i see i'd have to explicitely write all files that have to be included, so that's not good enough for me. But thanks anyway. Commented Mar 1, 2013 at 12:19
0

Here's the code with PyPDF2 and python3

#!/usr/bin/env python # requires PyPdf2 library, version 1.26 or above - # its homepage is https://pythonhosted.org/PyPDF2/index.html # running: ./this-script-name output.pdf file-with-pdf-list import copy, sys from PyPDF2 import PdfFileWriter, PdfFileReader output = PdfFileWriter() output_page_number = 0 # every new file should start on (n*alignment + 1)th page # (with value 2 this means starting always on an odd page) alignment = 2 for filename in sys.argv[2:]: # This code is executed for every file in turn input = PdfFileReader(open(filename, "rb")) output.appendPagesFromReader(input) output_page_number += input.getNumPages() while output_page_number % alignment != 0: output.addBlankPage() output_page_number += 1 output.write(open(sys.argv[1], "wb")) 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.