How File Sharing Works

The Magic of Distributed Networks

vic_elor
Whether used for legal purposes or not, the power of distributed networking is amazing. I would not feel I was going too far out on a limb to say its nature is truly something magnificent, demonstrating for all to see the power of cooperation.

I apologize right now for those of you who aren't very good with math or don't enjoy it. Stick around, fight through it, I promise it be worth your time. I'll also do my best to make it as clear as possible for the non-math inclined while still balancing out the necessary details.

For those of you who have ever seen the CBS program Numb3rs (pronounced numbers) this will be were I try to pull a Charlie Epps impression:

Imagine you have a large file that you want to share, it can be whatever you want as long as it's fairly sizable. For this example we can pretend that I want to share a lengthy home-video of me riding a polar bear through the Amazon. We'll assume that I have it on my computer and while it's definitely high quality if not in a state where it's in digital editing copy form. Just so we have a size I'll say its 300 MB. Now let's say I want to share this with a friend of mine. Unfortunately since there's just one person who has the file (me) and one person who wants it (my friend) there is really no way to speed up the process beyond simply sharing the file directly.

But let's be honest, a video of me riding a polar bear through the Amazon would totally be more popular than just one friend wanting to see it. Instead of one friend, let's say 1000 friends want to see it. Now if I were to try to transfer this file directly to all 1000 of my friends I'd need to transmit about 300 GB of information. For those of you not in the know that's a fairly sizable amount and would take me quite a long time to transmit (depending on the speed and configuration of my home network transmission would likely take days at a minimum.)

Do you remember when you were small child and a teacher would tell you that "it's always good to share" and "if you worked together it will get done faster?" Those factors are the secret behind distributed file sharing.

In our new mode of transfer we'll no longer think of my video as one contiguous entity measuring 300 MB but rather 1000 individual pieces of equal size that when put together equal my final video. Now to make this example work easier we are no longer going to view time in the form of seconds or minutes but instead in rounds with rounds being the exact amount of time it takes to transmit one piece of my video. We are also going to assume that while each person that wants a copy can download at whatever speed we'll assume that each person (including myself) can only upload or share at a rate of one piece per round.

So at the beginning of the game I have the entire file (all 1000 pieces) and 1000 people have none. During round one I transfer the first piece to person number one while person number one receives a piece for me and no one else does anything. Now at the end of round one I have all 1000 pieces and person number one has piece number one and nothing else while the other 999 friends still have nothing. At the beginning of round two I begin transferring piece number two to person number two while at the same time person number one transmits piece number one to person number two. At the end of round number two person number one has piece number one, person number two has piece number one and two and the other 998 people still have nothing. In round number three I transmit piece number three to person number three while at the same time person number one transmits piece number one to person number three while at the same time person number two transmits piece number two to person number one. At the end of round three person number one as pieces number one and two, person number two has pieces one and two and person number three has piece number three. In round four I transmit piece number four to person number four while person number one transmits piece number one person number four while at the same time person number two transmits piece number two to person number three while person number three transmits piece number three to person number one. At the end of round four person number one as the first three pieces, person number two as the first two pieces person number three has the first three pieces in person number four has piece number four.

This cycle continues until I have transmitted piece number 1000 to person number 1000 at which point I can either stop sharing or increase the speed of the sharing by retransmitting more parts. For the sake of this example I'll assume that I stop sharing. This process, ignoring any overhead, has taken me no more time nor bandwidth than if I were to simply transmits the video to one person. At the end of round 1001 person number one will have received a copy of piece number 1000 from person 1000 and now person number one has a complete video. There is now essentially no difference between myself as the original source and person number one so for the sake of this example rather than continuing to share and speed up the process we will assume that person number one stops transmitting. At the end of round 1002 person 1000 will have transmitted piece 1000 to person number two which gives person number two a full copy as well and once again the person number two stops transmitting. This process continues until round 1999 at which point there are only two people left (person 999 and person 1000) each swap their final pieces and now have a full set and go off-line.

As I alluded to in the previous paragraph this process can be altered due to a number of factors such as the number of pieces capable of being transmitted by a person during a round as well as people with complete videos remaining online and sharing pieces other than their original piece, causing the prices to speed up potentially exponentially. Both of those factors as well as many factors I'm going to ignore simply because it only complicates the example.

Now if you been keeping track you'll notice that the total bandwidth used by all people really hasn't changed any and by that I mean in total we still transferred 300 GB of data which is the same amount of data that we would've transferred in the first method of simply directly sharing. What's strikingly different between these two methods is the amount of time it took to get a full copy of the video to all 1000 people. If I were to have shared the file directly and individually with each person it would've taken me 1 million rounds (1000 pieces of data transmitted one at a time to 1000 people) while on the other hand we've already shown that it really only takes 2000 rounds using the distributed file sharing method.

The fact that we cut this down to 1/500 of the original time is not the only advantage that we've attained. Using current price models bandwidth is essentially free, with cost really only becoming a factor when massive amounts of bandwidth used. If I were a corporation and I needed to use 300 GB worth of bandwidth there is a pretty good chance that I would've exceeded the maximum bandwidth that my web hosting company would provide me with and I would have to pay for more. On the other hand, no single person including myself has uploaded more than one copy of the video leaving me in a spot where neither I nor any of my 1000 friends would have used up enough bandwidth that anyone would even notice.

There is another amazing advantage to this distributed file sharing method; it only gets better the larger things are and or the more people who want the item. Let's say the item I wanted to share was the equivalent of 1 million pieces and I had 1 million friends who wanted the video (and let's be honest... I think 1 million people would want to see a video of a man riding a polar bear through the Amazon.) All things being equal the mathematical equation for the amount of time needed in the distributed method is the number of pieces plus the number of people or in this case 2 million rounds. On the other hand, the traditional method of direct download is expressed by the number of people times the number of pieces or in that case 1 trillion rounds. That's right, with those numbers we've cut the time down to 1/500,000.

So it turns out our kindergarten teacher was right; sharing and working together really are like magic... just the kind of magic that has numbers rather then doves and fake flowers.

Published by vic_elor

After many years as a student and a corporate drone, I'm now free. Of course, that might be code for unemployed but the first way sounds better.  View profile

If I were to transfer a file sized at 1 million pieces to 1 million people it would take a distributed model 1/500,000 the time of a direct connection model.

To comment, please sign in to your Yahoo! account, or sign up for a new account.