The Power of Content-addressing

PreviousTutorial: The Myriad ways to Access and Distribute IPFS Content NextRetrieving content from a peer

Last updated 5 years ago

Was this helpful?

The Power of Content-addressing

Goals

This lesson introduces the concept of content addressing and explores the powerful implications of using this approach.

After doing this Lesson you will be able to

Define content addressing and compare it with location-addressing
Explain the implications of being able to access IPFS content through so many different paths

Explanation

The Problem: Identifying Content by its Location

When you use an http:// or https:// link to point to a webpage, image, spreadsheet, dataset, tweet, etc, you're identifying content by its location. The link is an identifier that points to a particular location on the web, which corresponds to a particular server, or set of servers, somewhere on the web. Whoever controls that location controls the content. That's how HTTP works. It's location-addressed. Even if a thousand people have downloaded copies of a file, meaning that the content exists in a thousand locations, HTTP points to a single location. This location-addressed approach forces us all to pretend that the data are in only one location. Whoever controls that location decides what content to return when people use that link. They also decide whether to return any content at all.

To get a sense of how impractical it is to address content by its location, imagine if I used location-addressing to recommend the book .

If I identify the book by its content, saying "Check out the book called Why Information Grows by César Hidalgo. The ISBN is 0465048994.", you will be able to get any copy of the book from any source and know that you're reading the information I recommended. You might even say "Oh. I already read it." or "My roommate has it in the other room. I'll borrow it from him.", saving yourself the cost or effort of getting another copy.

By contrast, if I used location-addressing to identify the book, I would have to point to a location, saying something like "Go to the news stand at Market & 15th in Philadelphia and ask for the thing 16 inches from the south end of the third shelf on the east wall" Those instructions are confusing and awkward, but that is how http links work. They identify content by its location and they rely on the 'host' at that location to provide the content to visitors. There are lots of things that could go wrong with this approach. It also puts a lot of power and responsibility on the shoulders of whoever controls the location you're pointing to - in this case the news stand.

Let's consider the responsibilities of whoever controls the location we've pointed to. If the people running the news stand want my directions (aka. my "link") to remain valid, allowing people to access the book, they have to:

Always be open, 24/7, in case someone wants to read the book.
Provide the book to everyone who seeks the book, whether it's one person or hundreds of thousands of people.
Protect the integrity of the book by preventing anyone from tampering with it.
Never remove the book from its shelf - if they get rid of it, or even move it, my link is broken and nobody will be able to use my instructions to find the book.

Along with those responsibilities come a great amount of power. The proprietors of the news stand control the location that my directions point to, so they can choose to:

Dictate who is allowed to see the book.
Move the book without telling anyone.
Destroy the book.
Charge people money to access the book or force them to watch ads when they walk in the door.
Collect data about everyone who accesses my book, using that information however they want.
Replace the book with something else -- They might not even put a book there, since my instructions are just describing a location, a malicious actor could replace the book with something dangerous, turning the location into a trap!

Location-addressing has worked on the web for 25 years, but it's starting to get painful and It's about to get much worse. As long as we continue to rely on it, the web will continue to be unstable, insecure, and prone to manipulation or exploitation.

The Solution: Identify Information by its Fingerprint, not its Location

When we identify content in this way, using the content's cryptographic hash instead of its location to identify it, this is called content-addressing. The cryptographic hash for a piece of content never changes, which means content addressing guarantees that the links will always return the same content, regardless of where I retrieve the content from, regardless of who added the content to the network, and regardless of when the content was added. That's the essential power of using a content-addressed protocol like IPFS instead of using a location-addressed protocol like HTTP.

The Implications of Content Addressing

It lets us store data together.

This decentralized, content-addressed approach radically increases the durability of data. It ensures that data will not become endangered as long as anyone is still relying on it because anyone can hold a valid copy of the data they care about. If you hold a copy of a dataset on any of your devices, or if you pay someone to host it on an IPFS node for you, you become part of the network of stewards who protect that dataset from being lost. You won't have to worry about whether someone is going to turn off the servers where your data are hosted because you are one of the hosts. You and your peers hold the data among yourselves and are able to share the data directly with each other without relying on centralized points of failure.

It increases the integrity of data.

Decentralization also increases the integrity of data because links are content-addressed. This means we can validate data by checking the data's fingerprints against the links. That kind of validation is impossible with location-addressed links. This is especially powerful on the large scale, where millions of websites and datasets reference each other billions of times. With location-addressed links, all of those connections are brittle. With content-addressed links, the connections become resilient and reliable.

Links can come back to life.

As soon as any node has the content, everyone's links start working. Even if someone destroys all the copies on the network, it only takes one node adding the content in order to restore availability. A cryptographic hash permanently points to the content it was derived from, so IPFS links permanently point to their content. Even if the content becomes unavailable for a period, the links will work as soon as anyone starts providing the content again.

Harder to attack, easier to recover.

Even if the original publisher is taken down, the content can be served by anyone who has it. As long as at least one node on the network has a copy of the content, everyone will be able to get it. This means the responsibility for serving content can change over time without changing the way people link to the content and without any doubt that the content you're reading is exactly the content that was originally published.

The content you download is cryptographically verified to ensure that it hasn’t been tampered with.

IPFS can work in partitioned networks - you don’t need a stable connection to the rest of the web in order to access content through IPFS. As long as your node can connect to at least one node with the content you want, it works!

If one IPFS gateway gets blocked, you can use another one. IPFS gateways are all capable of serving the same content, so you’re not stuck relying on one point of failure.

Lightening the load: With IPFS, people viewing the content are also helping distribute the content (unless they opt out) and anyone can choose to pin a copy of some content on their node in order to help with access and preservation.

You can read anonymously. As with HTTP, IPFS can work over Tor and other anonymity systems

IPFS does not rely on DNS. If someone blocks your access to DNS or spoofs DNS in your network, it will not prevent IPFS nodes from resolving content over the peer-to-peer network. Even if you're using the DNSlink feature of IPFS, you just need to find a gateway that does have access to DNS. As long as the gateway you're relying on has access to DNS it will be able to resolve your DNSlink addresses.

IPFS does not rely on the Certificate Authority System, so bad or corrupt Certificate Authorities do not impact it.

IPFS nodes work hard to find each other on the network and to reconnect with each other after connections get cut.

(experimental) You can even form private IPFS networks to share information only with computers you've chosen to connect with.

Next Steps

PreviousTutorial: The Myriad ways to Access and Distribute IPFS Content NextRetrieving content from a peer

Last updated 5 years ago

Was this helpful?

The Power of Content-addressing

The Power of Content-addressing

Goals

Explanation

The Problem: Identifying Content by its Location

The Solution: Identify Information by its Fingerprint, not its Location

The Implications of Content Addressing

It lets us store data together.

It increases the integrity of data.

Links can come back to life.

Harder to attack, easier to recover.

Further Reading

Next Steps

Goals

Explanation

The Problem: Identifying Content by its Location

The Solution: Identify Information by its Fingerprint, not its Location

The Implications of Content Addressing

It lets us store data together.

It increases the integrity of data.

Links can come back to life.

Harder to attack, easier to recover.

Further Reading

Next Steps