Flutterby™! : A resilient web

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

A resilient web

2014-06-19 22:54:43.825282+00 by Dan Lyke 7 comments

So back in 1994, when Meuon and I first started Chattanooga On-line, a reasonable dial-up speed was a 28.8 modem, which the "Convert Everything download speed calculator" says says is 12.36mb/hr. My current home ADSL2+ line is syncing at about 8Mbps, which by that same standard runs about 3.76 gb/hr, or about 300 times faster.

EIDE hard drives happened in 1994, which broke the 540 megabyte hard drive size. If you were to buy a new leading edge hard drive today it'd be about 4 terabytes. So that's, what, 8000 times larger? And a hell of a lot cheaper.

When I left Pixar, the graphics R&D group was exploring the fundamental changes in rendering that might occur as the available memory far outstripped the needs of resolution. All of a sudden the cost of the frame buffer was no longer a limiting factor: Store a link to the geometry that made up each pixel rather than compute small sections of the screen at once? Sure, why not? Toy Story had a horizontal resolution of 1536x962, about 1.5 megapixels per frame. Modern 4k video is 12.6 megapixels, a ratio of a little over 8, and in the mean-time memory sizes have grown by thousands.

Seems like there's a similar differential in growth between bandwidth and storage. And the "N2" problem of the network, as exposed through the current fights over Net Neutrality, suggest that the link to the end user is likely to continue to stagnate, while their local storage continues to explode.

The web is degrading. Bit rot means a random walk back through the Flutterby archives reveals lots of links to spam farms, domains that were abandoned and bought up by people desperate for traffic, and many of those then abandoned yet again as Google changed their algorithms, so that the vast interconnected database that Vannevar Bush or Ted Nelson or even Tim Berners Lee once envisioned has, essentially, become centralized: We don't find things by following links as much as we do by searching.

Heck, last summer Charlene and I went to the "Weave Your Heart in San Francisco" square dance convention, but if I reference that by URL now, you get a page selling cash advances and credit card processors, not a record of a cultural event with depth and meaning.

In my utopian world, we'd use the increasing gap between storage and bandwidth to move documents around, and cache them. If we had the legal and cultural framework, there's no reason a blog like Flutterby couldn't cache a document, referenced by some GUID, and if the original source for that document disappeared or moved, we'd still have context.

Right now, nobody much cares, but there's something there... something I want to see happen...

[ related topics: Pixar Interactive Drama Politics Weblogs Animation Spam broadband Invention and Design Bay Area Sociology Law Monty Python Graphics California Culture Chattanooga Video Databases Archival ]

Inbound links

comments in ascending chronological order (reverse):

#Comment Re: made: 2014-06-20 01:55:22.330725+00 by: Jack William Bell [edit history]

From the Wikipedia Page for Project Xanadu:

While at Autodesk, the group, led by Gregory, completed a version of the software, written in the C programming language, though the software did not work the way they wanted. However, this version of Xanadu was successfully demonstrated at The Hackers Conference and generated considerable interest. Then a newer group of programmers, hired from Xerox PARC, used the problems with this software as justification to rewrite the software in Smalltalk. This effectively split the group into two factions, and the decision to rewrite put a deadline imposed by Autodesk out of the team's reach. In August 1992, Autodesk divested the Xanadu group, which became the Xanadu Operating Company, which struggled due to internal conflicts and lack of investment.

The text above, that I had to cut and paste between blockquotes, would have been supported by Xanadu as 'transclusions' and stored locally even if the original dissapeared. Morever, if the orginal changed the fragment could be updated locally, but history kept to compare to the originally stored text. Aaaaannd if the original was copyrighted and shared under license terms, everyone who read the transclusion would pay a fractional cent for the rights to do so.

But none of that happened. Why? The clue is in the text above: Xanadu was complex, it was closed source, and you had various stakeholders with different ideas of how it should work, while at the same time someone with money was bankrolling it and expecting an eventual payoff.

Contrast and compare to Tim Berners Lee and the World Wide Web. It had some of the same fragmentation problems, but the basic protocols and much of the original code was Open Source. Plus it was dead simple. So simple, in fact, it took more than a decade before a genius figured out why HTTP was so powerful.

There is a lesson there, I think. But it doesn't solve your problem. Except...

...Except maybe it does. I think there might some things we could do with RSS to enable the power of transclusions. I've been thinking about it for a long time, as a matter of fact.

#Comment Re: made: 2014-06-20 10:15:30.359301+00 by: meuon

"there's no reason a blog like Flutterby couldn't cache a document, referenced by some GUID, and if the original source for that document disappeared or moved, we'd still have context."

I hate the GUID word, but like the idea. the GUID would need a checksum/signature component, so the document you want that was found on the web as "ABC123..." could be seen as the original and not one with the content replaced before you actually viewed it.

#Comment Re: made: 2014-06-20 17:13:53.672752+00 by: Jack William Bell

Why do you need a GUID? Each page already has a guarenteed unique identifier: the URL. Just use that as the ID when you store locally.

Moreover, if you set up an RSS or Atom feed from the originating site you can get notifications that a page has chaged, along with the changed content, with the URL referenced in the RSS feed.

Provide a way to keep history and diffs in your local storage, use RSS/Atom feeds for sync and to load the content free of the other crap on the page you don't care about, come up with some way to identify the 'transclusions' (excerpts) you want to display that also maintains fair use policy, provide a link that either takes you to the originating page or to the stored page if the originating page is not available (when someone needs more context) . . . Bob's your uncle.

#Comment Re: made: 2014-06-20 17:35:26.688151+00 by: Dan Lyke

So elaborating on the GUID idea...

What the "GUID" really needs to be something like both a hash of the original document, signed by the creator's public key, and a hash of the revision of the document, also signed.

This way I can say "I linked to X rev Y", and if that resource goes away then I can ask other caching hosts for X, I can acknowledge when the document changes, and there's some notion of proving authorship.

#Comment Re: made: 2014-06-20 21:33:36.112626+00 by: Mars Saxman

Once upon a time I built a distributed data sharing system I called "echomesh", which I never released for legal reasons having to do with the copyright mafia, but I had all these ideas for how you could distribute web services across a suitably constituted relay network. And the basic idea was much like you describe: instead of hosting web pages on a server, you publish a document, with an ID and a digital signature, and people copy it every which way. To update it, you publish a new document, with the same ID, containing a reference to the original document, and a new signature from the same key. Kind of like the bitcoin block chain, except vastly simpler because single-writer/multiple-reader.

To get someone's blog, then, you ask your peers for the latest copy of $ID, they return whatever they've got, and you use the one with the longest block chain whose signature matches the last one you saw.

Anyway, same basic idea you're talking about. No central point of failure, no need for a database of identities, you just validate that the same entity keeps signing each revision of the page. Of course you can step back in time to watch the edits if you like, but so what? It was all public anyway.

Bittorrent has solved the core problem but I still think there are some valuable things my system could have done. I wonder sometimes if it would work to use bittorrent as a transport layer for my economics-based distributed-query system... but, whatever, the tech has moved on and I doubt I have much I could contribute to the state of the art at this point.

#Comment Re: made: 2014-06-20 23:25:34.221187+00 by: TheSHAD0W

Something BitTorrent-like could be used in some part, but BitTorrent is really designed for what is essentially central-server authentication; you download a .torrent file (or a magnet link) from the server, and that authenticates the torrent content. Distributing the .torrent file via your system is certainly possible.

Even with this system you aren't immune from bit-rot and potential exploitation. For instance, an attacker could mess with your system by deliberately re-distributing old data files more widely than your most recent releases. Adding expiration notices could help, but there goes a lot of your redundancy.

#Comment Re: made: 2014-06-21 21:35:16.430379+00 by: dexev

Thoughts: * URL isn't a GUID for content: there's no prohibition on recycling URLs for different content (see 'http://www.flutterby.com/' today and next week)

* public-key encryption is still stuck in the chicken/egg swamp. Making a scheme dependent on PKE sticks it in the same swamp.

* The power law applies here: the perfect is the enemy of the good. What would work 80% of the time, and not get in the way of eventually fixing the remaining 20%?

* My own pet idea: Caching proxies -- if you keep things even after they 'expire', and you can let the client know that you have something but it might be out of date, or via a third party....