PDFs with the same MD5 hash have previously been constructed by Gebhardt et al. [12] by exploiting so-called Indexed Color Tables and Color Transformation functions. However, this method is not effective for many common PDF viewers that lack support for these functionalities. Our PDFs rely on distinct parsings of JPEG images, similar to Gebhardt et al.’s TIFF technique [12] and Albertini et al.’s JPEG technique [1]. Yet we improved upon these basic techniques using very low-level “wizard” JPEG features such that these work in all common PDF viewers, and even allow very large JPEGs that can be used to craft multi-page PDFs.
Some details of our work will be made public later only when sufficient time to implement additional security measures has passed. This includes our improved JPEG technique and the source-code for our attack and cryptanalytic tools.
One can insert arbitrary data into JPGs. Given that, the researchers embedded a JPG in a PDF, and manipulated the arbitrary data until it resulted in a collision.
> A picture is worth a thousand words, so here it is.
> http://shattered.io/static/pdf_format.png
This picture is meaningless to me. Can someone explain what's going on?