Note: All of those commands are Mac compatibile. (use brew to install pdftk-java and imagemagick)
I’ve had a bunch of scanned documents “name 01.jpg”, “name 02.jpg”, “name 03.jpg”, that I wanted to push together into a nice PDF file. Documents had to remain separated from each other, so all-to-one concatenation was a no-go. Also, there was a freaking setload of them (20+ different document) to get done, and they seemed to differ in length.
Conversion
Depends on quality of PDF you want to have from your JPEG:
/bin/ls *.jpg | xargs -I% convert -verbose -density 150 -trim -quality 100 -flatten -sharpen 0x1.0 % %.pdf
or
/bin/ls *.jpg | xargs -I% convert -quality 100 % %.pdf
or
/bin/ls *.jpg | xargs -I% convert % %.pdf
Move all *.jpg.pdf to *.pdf:
for f in *.jpg.pdf; do mv "$f" "${f/.jpg/}"; done
Concatenation
My files are now in naming schema
- name 01.pdf
- name 02.pdf
- name 03.pdf
so to concatenate them to single file
filename="name"; pdftk "$filename 01.pdf" "$filename 02.pdf" cat output "$filename.pdf"
Automating the command generation:
find . -name "*02.pdf" | sed -E "s/ 02.pdf//g" | sed -E "s/.\///g" | xargs -I% echo filename=\"%\"\; pdftk \"\$filename 01.pdf\" \"\$filename 02.pdf\" cat output \"\$filename.pdf\"
Cleanup
find . -name "*02.pdf" | sed -E "s/ 02.pdf/ 01.pdf/g" | xargs -I% rm %
find . -name "*02.pdf" | xargs -I% rm %
find . -name "*02.jpg" | sed -E "s/ 02.jpg/ 01.jpg/g" | xargs -I% rm %
find . -name "*02.jpg" | xargs -I% rm %
Real life procedure
In order not to mess up and generate PDFs shorter than they really are, we need to concatenate and remove longer PDFs first.
So, few checks were made: I’ve listed all PDFs in length of X as following:
find - name "*01.pdf"
X here being 1.
I’ve incremented the X untill I found no more PDFs. For me, for example, last valid X was 5. This was the only document found that was so long.
So,
filename="this-damn-long-file"; pdftk "$filename 01.pdf" "$filename 02.pdf" "$filename 03.pdf" "$filename 04.pdf" "$filename 05.pdf" cat output "$filename.pdf" && find . -name "$filename*.pdf"
Output should’ve returned:
this-damn-long-file 05.pdf
this-damn-long-file 04.pdf
this-damn-long-file 03.pdf
this-damn-long-file 02.pdf
this-damn-long-file 01.pdf
this-damn-long-file.pdf
So then, after opening and verifying this-damn-long-file.pdf
find . -name "*05.pdf" | sed -E "s/ 05.pdf/ 04.pdf/g" | xargs -I% rm %
find . -name "*05.pdf" | sed -E "s/ 05.pdf/ 03.pdf/g" | xargs -I% rm %
find . -name "*05.pdf" | sed -E "s/ 05.pdf/ 02.pdf/g" | xargs -I% rm %
find . -name "*05.pdf" | sed -E "s/ 05.pdf/ 01.pdf/g" | xargs -I% rm %
find . -name "*05.pdf" | xargs -I% rm %
find . -name "*05.jpg" | sed -E "s/ 05.jpg/ 01.jpg/g" | xargs -I% rm %
find . -name "*05.jpg" | sed -E "s/ 05.jpg/ 02.jpg/g" | xargs -I% rm %
find . -name "*05.jpg" | sed -E "s/ 05.jpg/ 03.jpg/g" | xargs -I% rm %
find . -name "*05.jpg" | sed -E "s/ 05.jpg/ 04.jpg/g" | xargs -I% rm %
find . -name "*05.jpg" | xargs -I% rm %
Alas, cry and repeat all of those steps for each of find . -name "*04.pdf"
, 03, and 02. For the latter one, command generator up above made more lot of sense as there was a bunch of them.
Todo
Automate scripts for i>2, because concatenation and cleanup gets far longer and nastier. Bash For-loops should be sufficient.
Or learn moar regex.