Documents vs. HTML

Files have significant shortcomings compared to HTML.

Summary

A lot of government content still exists in file format: .pdf, .doc, and .xlsx. However, files have significant shortcomings compared with HTML.

Below, you’ll learn about why we recommend you convert as many of your files to HTML as you can. There are exceptions, such as when you expect users to print a piece of content instead of reading it online. However, generally, HTML is easier to update, manage, and more accessible. It’s also easier to track how HTML content is performing.

What’s HTML?

HTML, or Hypertext Markup Language, is a language for creating web content. When you fill out fields in the CMS, what you write gets automatically converted to HTML so that a browser (such as Google Chrome or Internet Explorer) can read it.

HTML is better for reading online

HTML adapts to different devices and browsers.

All Mass.gov pages are responsive. This means that it doesn’t matter if constituents read your content on their desktops, tablets, or phones: Their content will fit their screen neatly.

For reference, since Jan. 1, 2018, nearly 40% of Mass.gov traffic comes from mobile visitors. When those constituents have to open .pdfs, they’ll spend much more time and effort zooming, scrolling, and panning on their device to read them. They’ll also frequently have to download large files and might require another app to open.

Files tend to be larger and slow down access to information

In addition to being clunkier to navigate, .pdfs are often larger and therefore take longer to load. They often consume more of constituents’ mobile data, and might require other applications to open. All of this means constituents will have a more difficult, slower time getting to your content.

Unfortunately, these inconveniences also tend to fall disproportionately on more vulnerable citizens -- people with older or slower machines, pay-as-you go data plans, who haven’t purchased the requisite apps or software, and who only have access to the internet through their mobile phones.

It’s also true that pages that take even a second or 2 longer can deter users. When it comes to loading times, every little improvement helps -- and converting files to HTML can make a big difference.

HTML is (usually) more accessible

The majority of .pdfs and .docx files are not accessible to constituents who use assistive devices like screen readers to access content. To be accessible, they must have been properly created or revised with Adobe Acrobat Professional. This is an onerous process, and you shouldn’t assume that this has been done for your .pdfs. On top of this, many browsers will open .pdfs in the browser instead of in Adobe Reader, leaving behind the tags that allow for accessibility in the process.

In the U.S., this means complying with Section 504 of the Rehabilitation Act, Section 508 of the Rehabilitation Act and the Americans with Disabilities Act (ADA).

You should also consider that not everybody has the right software to open the files your providing. Constituents who don’t have Microsoft Office or Adobe Reader won’t be able to open .doc, .docx, or .pdfs, for example.

It’s easier to tell if HTML content is successful

If you want to make sure that your content is performing well, you should definitely convert files to HTML.

Files are difficult to track using web analytics. Usually, you can use Google Analytics to determine how many times a file has been downloaded -- but that’s about it.

With HTML content, you can track how often users have visited and if any of those users are repeat visitors. By comparison, you wouldn’t know if someone who downloads your file opens it several times, since web analytics don’t measure offline behavior.

You can can measure HTML in lots of other ways, too: What do people do after they land on your content, and where do they go after? What links do they click, and how often? You can even track how far down the page a user scrolled, and how long they spent reading the page.

Lastly, HTML content is usually easier for search engines, which value responsiveness, fast load times, and accessibility. All of these are strengths your Mass.gov pages have by default when compared with most files.

HTML is easier to manage

Content you add to the Mass.gov CMS is easier to update and remove than files are. This is because you can use the CMS to save a change to an HTML file, but you can’t use Word, Excel, or Acrobat to modify files on the internet. That is, you can use them to change files that live on your computer, but files on the internet live on a server that your software doesn’t talk to.

As a consequence, if you need to change a few words on an HTML page, all you need to do is log in the CMS, edit the page, and save. The CMS will update your HTML file, and anyone who sees it (i.e. visits your web page) will see your update. In addition, the CMS stores every version of a file you save. If you ever need to return to an earlier version, you can visit the “Revisions” tab while editing any page and revert.

However, if you need to correct a typo on a file or change a date, you’ll need to upload a brand new version of the file. The old file will then need to be deleted or unpublished from the server on which it lives. There are more moving pieces, which means more things can break or take longer. In addition, search engines like Google don’t automatically know that a new copy of your file exists. They’ll often keep indexing the old file’s web address, even if it’s been deleted, for days or even weeks.

When are files the right choice?

There are times when it’s a good idea to make sure users have access to a file. First, if your users will want to print and read your content offline, .pdfs are often the right choice. Alternately, you don’t have to choose: Most Mass.gov pages offer a download section, and you could add a file version of an HTML page.

It might also be a good idea to include a file in situations requiring permanent records, if constituents might need to access content offline, or if you want readers to be able to interact with (highlight, add margin notes, etc.) texts.

Was this article helpful?

Tell us what you think button