Note: All of those commands are Mac compatibile. (use brew to install pdftk-java and imagemagick)

I’ve had a bunch of scanned documents “name 01.jpg”, “name 02.jpg”, “name 03.jpg”, that I wanted to push together into a nice PDF file. Documents had to remain separated from each other, so all-to-one concatenation was a no-go. Also, there was a freaking setload of them (20+ different document) to get done, and they seemed to differ in length.

Conversion

Depends on quality of PDF you want to have from your JPEG:

/bin/ls *.jpg | xargs -I% convert -verbose -density 150 -trim -quality 100 -flatten -sharpen 0x1.0 % %.pdf

/bin/ls *.jpg | xargs -I% convert -quality 100 % %.pdf

/bin/ls *.jpg | xargs -I% convert % %.pdf

Move all *.jpg.pdf to *.pdf:

for f in *.jpg.pdf; do mv "$f" "${f/.jpg/}"; done

Concatenation

My files are now in naming schema

name 01.pdf
name 02.pdf
name 03.pdf

so to concatenate them to single file

filename="name"; pdftk "$filename 01.pdf" "$filename 02.pdf" cat output "$filename.pdf"

Automating the command generation:

find . -name "*02.pdf" | sed -E "s/ 02.pdf//g" | sed -E "s/.\///g" | xargs -I% echo filename=\"%\"\; pdftk \"\$filename 01.pdf\" \"\$filename 02.pdf\" cat output \"\$filename.pdf\"

Cleanup

find . -name "*02.pdf" | sed -E "s/ 02.pdf/ 01.pdf/g" | xargs -I% rm %
find . -name "*02.pdf" | xargs -I% rm %
find . -name "*02.jpg" | sed -E "s/ 02.jpg/ 01.jpg/g" | xargs -I% rm %
find . -name "*02.jpg" | xargs -I% rm %

Real life procedure

In order not to mess up and generate PDFs shorter than they really are, we need to concatenate and remove longer PDFs first.

So, few checks were made: I’ve listed all PDFs in length of X as following:

find - name "*01.pdf"

X here being 1.

I’ve incremented the X untill I found no more PDFs. For me, for example, last valid X was 5. This was the only document found that was so long.

So,

filename="this-damn-long-file"; pdftk "$filename 01.pdf" "$filename 02.pdf" "$filename 03.pdf" "$filename 04.pdf" "$filename 05.pdf" cat output "$filename.pdf" && find . -name "$filename*.pdf"

Output should’ve returned:

this-damn-long-file 05.pdf
this-damn-long-file 04.pdf
this-damn-long-file 03.pdf
this-damn-long-file 02.pdf
this-damn-long-file 01.pdf
this-damn-long-file.pdf

So then, after opening and verifying this-damn-long-file.pdf

find . -name "*05.pdf" | sed -E "s/ 05.pdf/ 04.pdf/g" | xargs -I% rm %
find . -name "*05.pdf" | sed -E "s/ 05.pdf/ 03.pdf/g" | xargs -I% rm %
find . -name "*05.pdf" | sed -E "s/ 05.pdf/ 02.pdf/g" | xargs -I% rm %
find . -name "*05.pdf" | sed -E "s/ 05.pdf/ 01.pdf/g" | xargs -I% rm %
find . -name "*05.pdf" | xargs -I% rm %
find . -name "*05.jpg" | sed -E "s/ 05.jpg/ 01.jpg/g" | xargs -I% rm %
find . -name "*05.jpg" | sed -E "s/ 05.jpg/ 02.jpg/g" | xargs -I% rm %
find . -name "*05.jpg" | sed -E "s/ 05.jpg/ 03.jpg/g" | xargs -I% rm %
find . -name "*05.jpg" | sed -E "s/ 05.jpg/ 04.jpg/g" | xargs -I% rm %
find . -name "*05.jpg" | xargs -I% rm %

Alas, cry and repeat all of those steps for each of find . -name "*04.pdf", 03, and 02. For the latter one, command generator up above made more lot of sense as there was a bunch of them.

Todo

Automate scripts for i>2, because concatenation and cleanup gets far longer and nastier. Bash For-loops should be sufficient.

Or learn moar regex.

Conversion#

Concatenation#

Cleanup#

Real life procedure#

Todo#

Conversion

Concatenation

Cleanup

Real life procedure

Todo