Why can’t I find archival material on Google?

Recently the Archives received a question from a student who wanted to know: why aren’t archival records searchable with Google? Is there any way to make Google show archival results?

While on the surface this seems like a simple question, the issue is quite nuanced and dependent on individual practices of different repositories. The main reason that Google doesn’t reflect archival information is that the majority of archival material is not digitized (i.e., converted from a physical format to a digital one, such as via scanning paper material). Something that I usually tell students is that when you’ve heard teachers and librarians tell you that “80% of all information can’t be found online,” the material in the archives comprises a large chunk of that 80% which requires more effort to find. Other scholarly resources that end up behind paywalls, as well as files that are currently in use or not yet deposited anywhere comprise another sizable portion. Even when archival material is born-digital or digitized, it tends to be accessible and searchable mainly within the catalog or database that it lives in, as opposed to through major search engines like Google.

There are many factors that contribute to the fact that most archival material is not digitized, but the main factor is scale. The University Archives alone contains over 30,000 cubic feet of processed material, and the time and money required to scan all of that, plus create accurate and helpful metadata for the digital objects, would be astronomical. To tackle the issue of access, our repository takes an approach called “digitization on demand,” meaning that we scan material for researchers as they ask for it. Separately, we can identify particularly significant, fragile, high-use, or otherwise suitable candidates for larger digitization projects, often funded by grants or donors.

Once material is scanned, it will often be put online for future researchers to use, but even then, this may not be the case. Some material contains sensitive information and therefore can be made accessible by request only: student records contain FERPA-protected information, copyright ownership can be fuzzy, or the personal privacy of individuals may be at stake (such as student protest photos in which faces are easily recognizable). Digitized or born-digital archival material for which there are privacy or copyright concerns are typically not accessible online, though the repository will usually make a note of this in the collection description. Here, we call these materials “nearline,” and here’s an example of what this looks like in our catalog.

That said, some archival databases are more Google-able than others. The Illinois Digital Library is relatively easy to access with major search engines. For example, if I want to know what material has been digitized from the James B. Reston Papers at the Archives, I can Google “James B. Reston Papers” and the link to the material in the Digital Library (IDPL) is the first result on the page. The second result is the catalog record for the physical material held in the Archives. It’s important to keep in mind that one of the core principles with archival research is context – when an archival result shows up in a Google search, it is removed from the greater context of the institution that it is held in, and it can be difficult to recognize that the result is for archival material at a glance.

Archon, the archival cataloging software that we use at UIUC, is built to allow for Google to crawl and index its records, allowing for them to show up in Google results pages. For example, I can search for “Edmund J. James” and one of the Archives’ digitized images of James appears in the search results. Other catalogs may be closed or inaccessible to standard web crawlers – when you search Google for a book title, you don’t expect to get results for that book within your local or university library’s catalog, because Google is not searching that database for results. The same applies to scholarly articles which are held in a database behind a paywall, and can apply to some archival material, especially when it is only described within a larger library catalog.

A good example of archival material which does show up in Google is the Illinois Digital Newspaper Project (IDPN). The newspapers in this database were scanned from their microfilm or print formats and then their text was made searchable using AI text recognition. You can very easily search within these newspapers using the IDNP database, and the results are Google-able with the right terms: for example, I Googled the name of a newspaper that I knew was in this database (the Daily Illini) along with a randomly-selected date, and an article from that issue was the first result in my Google search.

For the most part, however, finding and accessing archival material requires a different skillset from finding articles, books, etc. in library collections, or even from finding general information online. This is due to a variety of factors, one of the largest being the fact that most archival material is not digitized. Another major factor is the way that archival information is arranged – in archives, we strive to preserve the original order and provenance of the material in our holdings, meaning that the way materials are arranged and labelled reflects how their creator originally arranged and used them; and this arrangement may not make immediate sense to a researcher using them today.

A good example of this can be found in the Papers of UIUC’s first Regent (President), John M. Gregory. As you can see in the catalog, an effort is made in the archival description to outline all the formats, topics, and significant features of the material in this series, making it accessible via a top-level search as well as to Google searches. However, this is still quite general and requires a bit of human parsing to determine if what you’re looking for is within those papers. If I want to read letters between Gregory and his wife, I may be searching for “letters” and miss the fact that in this series, “correspondence” is a more proper term to use, and thereby miss this result entirely.

Archives do typically try to provide better access to their materials than this top-level description alone, however. A key part of archival records is the finding aid that accompanies them – in most cases, this is a list of all of the boxes in that series, as well as a list of the folder titles within each box (or, if the box contains something like artifacts or multimedia material, it will list those contents). Depending on the software used to manage the material, these finding aids could be searchable by Google, or they could be stored in a format that Google and other search engines can’t index. At the UIUC Archives, many of our finding aids are saved in a PDF format. These PDFs are indexed by Google, but it can be very difficult to identify them among other search results if you are not expecting to find a PDF result, or do not have the context to understand that you are looking at a finding aid. However, our catalog does have the ability to run a deep search through those PDFs, so you can find material in these finding aids more easily while on our website (context is key!).

It’s important to note here that the folder or item titles in the finding aid typically reflect what the creator called them and how the creator organized them – so if I want only correspondence between John Gregory and his wife, Louisa, I’m going to have to either look through all of the physical folders of material or narrow my search by date, since that’s how Gregory arranged his correspondence (rather than by the name of his correspondents). In many cases, you’ll have to search quite broadly and with an open mind to find what you’re looking for in archives.

Archival searching also tends to require a bit of background knowledge into the person or organization whose records you are seeking. If I wanted to know more about Gregory’s interactions with early UIUC faculty, I could check his papers for correspondence with those individuals, but I might also want to see if any of those faculty’s papers are held in the Archives, as they might have more insight and a different perspective. If I want to know more about President Draper’s involvement in student discipline, I might check his papers, but also the information in the University’s Board of Trustees reports and Council of Administration Minutes, knowing (as I do from doing reference here) that those bodies took care of most of the student disciplinary action in the early days of the University.

So, what do you do if you don’t know where to start looking for archival information that you need? First of all, most archives employ reference archivists/staff whose job is to answer questions and help get researchers in touch with the material that will best answer their questions. These people are there to help you, so don’t hesitate to reach out to them with questions! We encounter researchers every single day at different stages in their projects – some people reach out to us directly with the exact series numbers and boxes that they want to view or folder titles that they want scanned, while others have more general questions, like “what information do you have on Fred Francis” and “what was the first year that UIUC required SAT/ACT scores for undergraduate admission?” My point being that no question is a bad one; though the archivist might ask you to narrow down your request if it’s too broad.

If you don’t know where to begin finding the archival material to answer your research question, a place that I like to start is ArchiveGrid. This site is something like the WorldCat of archival repositories. If you know the name of an individual, organization, event, etc. that you need archival material on, you can search this database for material around the world; this will at least help you locate which repositories to start reaching out to.

In short, there are several reasons why you might have trouble finding archival information on standard search engines like Google. Archives contain vast amounts of historical records, largely stored in physical format, which require a different skillset to find and access than most other types of material. For this reason, even when they are technically “Google-able,” you’re most often better served by searching for them in their source databases than on the web.

Have questions, comments, or a topic you want to research using archival material but don’t know where to start? Reach out to our reference team at illiarch@illinois.edu. We are here to help you get access to the primary sources you need!

Updated on