Documents vs. HTML

What to consider in publishing documents vs. HTML content

While a lot of government content still exists in file format (pdf, .doc, .xlsx, and others) files have significant shortcomings compared with HTML content

We recommend that you convert as many of your files to HTML as you can since HTML is easier to update and manage. It’s also generally more accessible and searchable.

There are exceptions, when documents are preferable, such as when you expect people to print a piece of content instead of reading it online. Some documents can also be difficult to convert, such as when they include numerous graphics.

What's HTML?

HTML, or Hypertext Markup Language, is used for creating web content. When you fill out fields in the content management system, what you write gets automatically converted to HTML so that a browser (such as Google Chrome) can read it.

HTML is better for reading online

All Mass.gov pages are responsive. This means that it doesn’t matter if constituents read your content on their desktops, tablets, or phones: Their content will fit their screen neatly.

When constituents use mobile devices to open .pdfs, they’ll spend much more time and effort zooming, scrolling, and panning to read them. They’ll also possibly need to open the files in a separate app.

In addition to being clunkier to navigate, .pdfs are often larger and therefore take longer to load. HTML is (usually) more accessible

The majority of .pdfs and .docx files are not accessible to constituents who use assistive devices like screen readers. To be accessible, they must have been properly created or revised with Adobe Acrobat Professional. This is an onerous process, and you shouldn’t assume that this has been done for your .pdfs. On top of this, many browsers will open .pdfs in the browser instead of in Adobe Reader, leaving behind the tags that allow for accessibility in the process.

In the U.S., this means complying with Section 504 of the Rehabilitation Act, Section 508 of the Rehabilitation Act and the Americans with Disabilities Act (ADA).

You should also consider that not everybody has the right software to open the files you’re providing. Constituents who don’t have Microsoft Office or Adobe Reader might not be able to open .doc, .docx, or .pdfs, for example.

It’s easier to tell if HTML content performs well

If you want to make sure that your content is performing well, you should convert files to HTML.

Files are difficult to track using web analytics. Usually, you can use Google Analytics to determine how many times a file has been downloaded--but that’s about it.

With HTML content, you can track how often users have visited and if any of those users are repeat visitors. By comparison, you wouldn’t know if someone who downloads your file opens it several times, since web analytics don’t measure offline behavior.

You can measure HTML in lots of other ways, too: What do people do after they land on your content, and where do they go after? What links do they click, and how often? You can even track how far down the page a user scrolled, and how long they spent reading the page.

Lastly, HTML content is usually easier for search engines, which value responsiveness, fast load times, and accessibility. All of these are strengths your Mass.gov pages have by default when compared with most files.

HTML is easier to manage

Content you add to the Mass.gov CMS is easier to update and remove than files are. This is because you can use the CMS to save a change to an HTML file, but you can’t use Word, Excel, or Acrobat to modify files on the internet. That is, you can use them to change files that live on your computer, but files on the internet live on a server that your software doesn’t talk to.

As a consequence, if you need to change a few words on an HTML page, all you need to do is log in the CMS, edit the page, and save. The CMS will update your HTML file, and anyone who sees it (i.e. visits your web page) will see your update. In addition, the CMS stores every version of a file you save. If you ever need to return to an earlier version, you can visit the “Revisions” tab while editing any page and revert.

However, if you need to correct a typo on a file or change a date, you’ll need to upload a brand new version of the file. The old file will then need to be deleted or unpublished from the server on which it lives. There are more moving pieces, which means more things can break or take longer. In addition, search engines like Google don’t automatically know that a new copy of your file exists. They’ll often keep indexing the old file’s web address, even if it’s been deleted, for days or even weeks.

When are files the right choice?

There are times when it’s a good idea to make sure Mass.gov visitors have access to a file. First, if they will want to print and read your content offline, .pdfs are often the right choice. Alternately, you don’t have to choose: Most Mass.gov pages offer a download section, and you could add a file version of an HTML page.

It might also be a good idea to include a file in situations requiring permanent records, if constituents might need to access content offline, or if you want them to be able to interact with (highlight, add margin notes, etc.) texts.

Last updated