Mastodon

I looked at 17,702 links I’ve saved since 2009 to see how bad “link rot” really is

Posted by Matt Birchler
— 4 min read
I looked at 17,702 links I’ve saved since 2009 to see how bad “link rot” really is

I’ve been saving links to Pinboard since 2009, which has added up to 17,702 links saved. I’ve used these saved links differently over the years, but ultimately, these are web pages I found interesting at the point I saved them. I saved them for later, and now it’s much later.

My question: how many of these pages are still there? After all, a few weeks ago I told young people to download the things they love so that they don’t disappear, so let’s see how many pages are still accessible.

Methodology

I imported my Pinboard collection into AnyBox, created smart lists broken down by year, and ran AnyBox’s broken link checker on each year’s saved links. After that it was some simple math to see what percentage of links were still alive.

This method is limited by the amount of time I can devote to this. Some of these URLs will "load" but they will just go to a "the page you are looking for isn't here" pages, so consider these numbers an overcounted number of living links. Overcounted by how much? Sadly I'm not sure (again, time).

Obviously this data set is not going to be a perfect match for everyone, so consider this a single data point and not a sweeping statement that will be true everywhere, although I would suspect the general trend is pretty similar.

Results

The headline result here is that 25.9% of the links I’ve saved in the past 15 years are dead. Here’s the breakdown by year

And here are the raw numbers:

Year Links Saved Broken Links Percent Alive
2009 14 3 78.6%
2010 220 96 56.4%
2011 38 15 60.5%
2012 2,133 788 63.1%
2013 3,176 1,095 65.5%
2014 4,329 868 79.9%
2015 4,000 718 82.1%
2016 1,757 290 83.5%
2017 430 40 90.7%
2018 0 0 N/A
2019 134 7 94.8%
2020 517 30 94.2%
2021 904 89 90.2%
2022 2 0 100.0%
2023 48 0 100.0%

A couple things of note:

  1. The sample sizes from a couple years are quite small (2009, 2011, 2022, 2023), so they may not be as representative as other years. Still, those years are all basically in line with the larger trends, so maybe not.
  2. I basically took 2018 and 2022 off from using Pinboard, apparently.
  3. Despite saving things from different places over the years, the trend is quite consistent anyway.
  4. Things never seem to fall off a cliff, but about 3-4% of links die off every year, which really adds up.
      1. As a quick note, I sometimes research old gadgets and trying to find articles about these things from the time can be more difficult than you’d expect. Given this rate of link rot, I would expect 77% of all things written about the original iPod release to be gone by now, for example.2. One more note…if I had to guess, I would bet that we have gotten better about things staying online for longer in the past decade, so there likely is a cliff at some point in the early 2000s when we weren’t as focused on preservation.
  5. I was really into saving links in 2014-15 (11.4 links per day!).

Takeaway

My takeaway here is the same as it was in my previous article: save the things you really love.

My next project is to find a good way to archive the remaining sites on this list in a reasonable format that I can browse in the future. AnyBox lets you save pages as a .webarchive or .pdf file, which is great, but it saves each one in a separate folder inside it’s app package. I’d love to be able to save these all into my own folder structure and open them in something like Obsidian so I could easily search them later and browse them when I’m feeling nostalgic. I could certainly do this with some manual work, but with this many links, I really want to automate basically all of it if I can.

Similarly, Pinboard offers a full archive export that might be good here, which might be even better, since Pinboard should be creating archives of these saved links when I save them. Why does that matter? Well, my attempts to archive pages with AnyBox will save the 2023 versions of these websites, while Pinboard could have the period-accurate styling, which will be a more authentic representation of what I was saving.

I requested my Pinboard backup a few days ago and haven’t gotten the email with a download link yet. On my account page in the app I see this message:

15150 of your bookmarks have been archived, representing 86% of your collection. This consumes 52.36 G of disk space.

That’s a significant amount of data, so I’m definitely questioning whether this request is just going to time out and never run. Might be time to reach out to support if this doesn’t work in the next day or two, so fingers still solidly crossed here.