Post by Anthony DeMarco, Junior Copywriter
Google’s algorithm assesses meaning from webpages by accessing certain data points via the HTML structure such as META data, text on page, and hyperlink anchor text. However, we know from a September 2011 Google Webmaster Central Blog that Google also indexes PDF files. The author discloses several important facts in the blog post:
Given that Google has been indexing PDF files since 2001 and can extract large amounts of data from them, it is worth considering the pros and cons of utilizing PDFs on a website:
For SEO purposes, the advantages of PDF conversion to HTML outweigh the disadvantages. Additionally, content in HTML format is more malleable. It is easier to optimize and update than recreating an entire PDF.
The major deciding factor is the scale of the project. For a small number of PDF files the SEO advantages of converting are considerable versus the labor involved in the process. However, if a website makes use of a large number of PDFs (e.g., a scholarly journal that uses its website to distribute 1,000s of academic articles in PDF format) the decision would have to be more nuanced. A calculation of the cost of labor to perform the process would need to be made against the possible increase in traffic/revenue, and such an analysis is likely to reveal the cost outweighs the benefits.
Optimizing a PDF
If the cost involved prevents conversion, there are still a number of steps that can be taken to optimize PDFs:
To learn more about PDFs to HTML conversion, contact Performics today.