How to Stop The Internet's Vanishing Act

Future News 143

Feb 01, 2023

We can thank the inefficient rubbish men of ancient Egypt and unique weather conditions for the little knowledge we do have about early human civilizations.

Archaeologists and classicists have been excavating and deciphering bits of papyrus paper and animal skin parchments from a site south-southwest of Cairo since the late 19th Century.

The 500,000 fragments of literary, cultural, legal, scientific, administrative, religious and political documents at Oxyrhynchus go as far back as the last dynasty of ancient Egypt, The Ptolemaic Kingdom (established in 305 BC), and run through to the Muslim conquest of Egypt in the 7th Century AD.

The texts are written in Greek, ancient Egyptian, Coptic, Latin, Arabic, Hebrew, Aramaic, Syriac, and Pahlavi, making the already mammoth task of putting together and translating the scraps of information even more difficult.

How hard exactly? The University of Oxford’s annual Oxyrhynchus Papyri has catalogued more than 5,000 items so far. The project started in 1898 and continues to this very day.

All the while a more vivid picture of our forefather’s true history – from day-to-day work to full politico-religious structures – is yet to be fully revealed.

Even when it comes to more contemporary figures, such as 17th Century-born William Shakespeare, we don’t know as much as we think we do (see Bill Bryon’s Shakespeare: The World As A Stage).

That is probably because figures of the past, beyond academics and amateur historians, didn’t think much of preserving their history as they looked into the future.

The modern media age has allowed us to counter this slide into the historical abyss and document our lives in a multitude of accessible ways, including on film, in books and in audio format (Studs Terkel’s oral histories are a stand-out).

Libraries and other institutions have traditionally catalogued and preserved these cultural, scientific and political items. This should all be much easier in our digital age. But there’s a big, fundamental problem.

The Problem of Link Decay

The old internet is dissolving before our very eyes. That is because hyperlinks, the connective tissue of the World Wide Web, are deteriorating and becoming disconnected.

The issue was first identified in the late 1990s by Wallace Koehler in the Journal of the American Society for Information Science. Koehler explored the “short memory” and “mind changing” phenomena of the Web, later discovering that external hyperlinks typically have a half-life of two years.

That means if you had a website with 100 external hyperlinks and you checked back in two years time, only 50 would be pointing to the correct webpage. Some webpages would have died, while other URLs would have been updated as part of a problem called ‘content drift’.

Considering that there is estimated to be more than 1.7bn websites, the scale and scope of link decay is vast and leads you to one conclusion: most of the internet is dead – and the rest of it is dying.

Sir Tim Berners-Lee, inventor of the Web, is well aware of the issue and valiantly urged early webmasters to be more careful when changing URLs.

“When someone follows a link and it breaks, they generally lose confidence in the owner of the server. They also are frustrated - emotionally and practically from accomplishing their goal,” he wrote.
“Enough people complain all the time about dangling links that I hope the damage is obvious. I hope it also obvious that the reputation damage is to the maintainer of the server whose document vanished.”

Impact On The Present

The real life consequence is that important information is going missing right, left and centre. Think about the courts and think about the first draft of history, journalism.

A 2021 Harvard University study of The New York Times’ website found that 25% of all links were completely inaccessible, with link rot becoming more common over time.

“6% of links from 2018 had rotted, as compared to 43% of links from 2008 and 72% of links from 1998. 53% of all articles that contained deep links had at least one rotted link.
“On top of that, some reachable links were not pointing to the information journalists had intended.
“An additional 13% of “healthy” links from a human-reviewed sample of 4,500 had drifted significantly since publication, with content drift becoming more common over time – 4% of reachable links published in articles from 2019 had drifted, as compared to 25% of reachable links from articles published in 2009.”

How To Help

The researchers offered-up some helpful suggestions to counter the decay, including working to improve the visibility of digital archives and considering what to link to in the first place.

“Before linking, for instance, journalists should decide whether they want a dynamic link to a volatile web—risking rot or content drift, but enabling further exploration of a topic—or a frozen piece of archival material, fixed to represent exactly what the author would have seen at the time of writing,” they said.

In addition, the academics urged newsrooms to consider adopting tools to help seamlessly preserve their work, including partnerships between information professionals, libraries and news media outlets.

A great example would be the work of the Internet Archive. Most journalists will be familiar with the initiative’s Wayback Machine, which allows users to look-up old and sometimes defunct websites.

The digital library is much more than that, holding more than 36m books and texts, over 11m movies and videos and 15m audio files. So far the Wayback Machine has been able to preserve 780bn web pages.

The US-headquartered non-profit has been running since 1996 after being established by Brewster Kahle, who also co-founded web traffic analysis business Alexa Internet. Preserving a webpage via The Internet Archive is brutally simple and only takes a minute or two (you will find some of Future News’s top posts in the archive).

The internet archaeologists are also always looking for volunteers. A standout project is the effort to try and archive newsletters, which present their own problems, including having to reply to an initial email for confirmation. Another team is hoping to archive YouTube’s metadata.

But why should anyone care about all of this in the first place? The case of the Oxyrhynchus site shows us just how fragile our history, culture and shared knowledge is.

Looking back at the grand scope of humanity’s existence, we are part of the 300-year-old industrial age, in which the internet age (over 30 years and running) is an extremely important part when its come productivity, progress and technological advancement.

Through carelessness, indifference and ignorance, we are allowing our history to disintegrate. If nothing else, old knowledge, as Google is discovering, can help save lives.

📺 Media and tech questions to think about

Is the BBC caught between a rock and a hard place on impartiality?
Will BuzzFeed quizzes improve under the guise of AI?
Should we now call Twitter a fintech company?
Can Snap become one of the major social media players?
Spotify now has more than 200m paid subscribers, where does it go next?

📖 Essays

📧 Contact

For high-praise, tips or gripes, please contact the editor at iansilvera@gmail.com or via @ianjsilvera. Follow on LinkedIn here.

FN 142 can be found here
FN 141 can be found here
FN 140 can be found here
FN 139 can be found here
FN 138 can be found here

Tech, Power & Media

Discussion about this post

Ready for more?