We Are All Archivists Now

ar·chi·vist

'ärkəvəst,'ärˌkīvəst/

noun: archivist; plural noun: archivists
a person who maintains and is in charge of archives.

In the Beginning…

Before there was the written word at the beginnings of human civilization the professions of entertainer and historian were blurred and often filled by the same person within a settlement or town. This person held great power and responsibility for they were the nascent civilization's stores of knowledge. But with great power comes great responsibility. The wielder of the town's knowledge was expected to remember it accurately. Some of course had a sort of knack for remembering. Some used mnemonic devices. Some were just really good at making up plausible details that fit with the listener's world view when they did not know the facts. Most relied on a combination of all these. Therefore when the written word emerged in Mesopotamia, it was nothing short of revolutionary.

The written word transformed humanity. No longer was history conflated with entertainment by necessity. We could now be objective and we could record for perfect later recall massive amounts more information than ever before. An oral history might include a rough idea of the crop yields during a ruler's reign and maybe the overall yield of the fields in a territory in recent years but with writing we could now record the yield of each field in every year along with the taxes due, the taxes collected, the debts owed and how the debts were discharged.

All this information came with a cost. Massive troves of clay tablets were produced but those clay tablets had to be protected from the elements and organized such that a particular tablet could be located at a later date. The scribes that wrote these same tablets, we believe, became their first keepers and thus the first archivists.

The medium for the storage of information would change with time and location. Some records such as those carved in stone proved more durable than others written on papyrus. Some records such as those written on papyrus were more economical to produce and store than others written in stone. Some governments would create centralized stores of knowledge such as the Library of Alexandria or the Library of Congress while others, especially after the invention of the printing press, created distributed systems.

Intuitively we know that the collections of information were archives and the people that assembled, maintained, curated, indexed and managed the collections were archivists. Yet it is likely that the vast majority of archives were not thought of as archives nor have the vast majority of archivists thought of themselves as archivists. Archives were largely incidental to the administration of religion, government and business. The archivists likely thought of themselves as accountants, clerks, librarians, engineers, architects, photographers, film producers… anything but archivists.

The Internet AKA the Democratization of Archives

There is this place called CERN on the border between France and Switzerland where physicists congregate. Whatever else you have heard or think you know about CERN you should know that the people at CERN are in the business of making data. Gobs and gobs of data. Physicists are a peculiar bunch. Do you think the New York Times has a lot of data? The entire history of the New York times 1851-present day is approximately how much data the physicists at CERN record every day. So it is perhaps little wonder that among the physicists at CERN we find the guy that changed everything.

Tim Berners-Lee didn't invent the internet. The internet was invented by Leonard Kleinrock, Paul Baran, Donald Davies, Lawrence Roberts, Robert Kahn and Vint Cerf among others. Paul Mockapetris, Jon Postel, Douglas Terry, Mark Painter, David Riggle, Songnian Zhou, Kevin Dunlap, Mike Karels, Phil Almquist, Paul Vixie and a whole host of others made the internet usable by humans. What Tim Berners Lee did was solve the problem of how to find a paper at CERN, made the internet useful and turned us all into archivists.

Tim Berners-Lee had a problem, he wanted to find a papers related to his research. Physicists, like other academics, write papers. Like other academics the better papers eventually get published in peer review journals. Unlike many academic disciplines, physicists sometimes work together in large groups, groups that fill small cities such as those around CERN. At a University a professor may read preprint articles written in their department, average sized departments though are generally about a dozen faculty and another dozen graduate students, well within the ability for the average person to be able to sift through the body of work being produced. Large departments at larger universties may have a hundred faculty, about as many post docs and maybe twice that many graduate students which gets beyond the abilities of most of us to keep track of thus such departments have historically employed a librarian or more commonly fractured into sub-disciplines that rather talk to each other. At CERN there may be ten thousand people writing and circulating papers at and given time. This makes keeping works organized difficult for even a librarian. Tim Berners-Lee solution was the Uniform Reference Locator, URL for short. The URL lets anyone find a document shared on the internet. It lets document authors using the Hyper Text Markup Language (HTML) create links between works. This changed the world.

Traditional academic journals like Physics Review Letters publish articles. And while some of these articles are readable, photocopied and pinned to cork boards, many articles are horrible morasses of data, derivations and disputes over the interpretation and validity of data… which is to say, the kind of stuff you need to support your thesis and to make sure someone else has not shown a result, critical to your thesis, to be false. Thus the journals printed indexes. Often these indexes where printed yearly, only covered the articles published in that year and at most included a short abstract. Tracking ideas through this system was slow, tedious and often led to out of date work being cited. The URL and HTML changed all of this. Following a thread of thought now meant just clicking on links embedded in the text.

Archives

ahr-kahyv

noun: archive; plural noun: archives
Usually, archives. documents or records relating to the activities, business dealings, etc., of a person, family, corporation, association, community, or nation.
archives, a place where public records or other historical documents are kept.
any extensive record or collection of data:
The encyclopedia is an archive of world history. The experience was sealed in the archive of her memory.
Digital Technology
  • a long-term storage device, as a disk or magnetic tape, or a computer directory or folder that contains copies of files for backup or future reference.
  • a collection of digital data stored in this way.
  • a computer file containing one or more compressed files.
  • a collection of information permanently stored on the Internet:
    The magazine has its entire archive online, from 1923 to the present.

The important point in the definition is that an archive contains information, and the information has context which links individual pieces of information together and excludes unrelated information. Therefore we can see that libraries, photo albums, papers stuck into a family bible, recipe boxes and seed collections are all archives. Archives are important because they embody our culture in a persistant manner, though the quality of some archives may be better than others.

Ad Hoc Archives become Museums and Libraries

We think of archives throughout most of human history being ad hoc. In Western Europe, the first formal archives start to appear towards the end of the 16th century though it is not until the beginning of the 20th century that techniques for cataloging and conservation advance hand written lists of works and common sense storage of works e.g. keep out of damp environments. At a glance this dovetails neatly with the narrative of the spread of the printing press and paper bringing about an revolution in the availability of information thus incidentally leading to the need for storage of information. Yet, archeologists have found troves of clay tablets from the bronze age enumerating commerce in the eastern mediterranean and the taxes levied. Why did it take millennia for humans to learn to value this information, to take the time to preserve and catalogue it?

While humans have been recording information for more than four millennia, for much of this time information the longevity of this information was of limited value. The earliest records were those deemed important enough to write down and we wrote them down because we wanted a record of them for longer than the unaided human could remember them. Information like judgements, tax collections, and trade records so taxes could be assessed met the criteria of being important enough that we could no trust them to memory. However, judgements did not need to be remembered beyond the lifetime of those involved, taxes assessments were only needed until the necessary payment was made, and records of tax payments were only needed until the next levy. Furthermore, as kingdoms and empires rose and fell, as religious grew and were replaced the records of the old order were often destroyed intentionally as often as not. Wiping the slate clean, as allusion to erasing records kept as dust on a stone surface was sometimes employed as a tactic to bring people to an upstarts side. Thus, limited longevity of records was seen not as a problem but as a feature.

After the fifth century, the Roman hegemony fragmented. The center of European culture shifted from the mild climates of Italy to the harsher climates of Northern Europe and the British Isles and as the church that forced Europeans to take time to care for precious records. Alternatively, it can be argued that the rise of archival science is a result of the explosion of long distance communication associated with the Enlightenment.

What is a Digital Archive?

What is an archive really?

In a sense the Internet both is as is not an archive. It is limited is scope; publicly accessible digital information and that information is linked together but the Internet is more a platform for archives eg. the WaybackMachine, Wikipedia. As envisioned by Tim Berners-Lee and subscribers to the idea of the Semantic Web, we are all archivists. We, people, have interests in something; say flying a kite and to connect with other people interested in flying, kites and flying kites we would create and curate html documents collecting pictures, video and text related to flying kites. Then we would share this information on the internet. Internet archives pass the culture test, vast swaths of our culture since 1993 CE exist only in Internet archives it also has many of the same quality issues of traditional archives.

Guidance

Problems

Where is the data stored?

How to publish?

How to index

How to reference, attribution when anyone can publish, then latter edit?

What about data that shouldn't be shared publicly?