In the first part of this four-part blog post, we described a hypothetical robot capable of watching videos at 10 GB/s. To fully understand what this 10 GB/s means, we added another three parts to the post. The second part defined bits and bytes, which are the basic units of data, and some of the multiples of these units. The third part introduced different definitions of multiple prefixes like mega- and giga-. This fourth, and final, installment looks at measurements used in transfer rates and how they apply differently to networks and storage media.
At the time of this writing in early 2017, Adelaide, Australia, was pursuing plans to roll out a 10 Gb/s broadband network. The targeted speed is impressive, and although it sounds similar to the 10 GB/s throughput rate of the real-world Seagate SSD in our hypothetical movie-addict robot, they are definitely not the same. Both examples talk about of data transfer rates, so they are similar to one another–but they are also very different. This final part of the four-part blog will show not only the similarities and differences between the two but also why different units are used to measure them in the first place.
Everything but the Kitchen Sink
Before we delve into the nuts and bolts of data transfer as it relates to storage and networks, let’s look at how water flows in your kitchen. Your water faucet might be rated at, say, 2 g/m. This means that it was designed to have water going through its nozzle at a theoretical maximum rate of about 2 gallons for every minute that elapses. You might not be getting that rate, however, as water flow at any given moment can be compromised by factors such as mineral buildup in your faucet head, low overall water pressure in your condo or clogged pipes. A plumber diagnosing your problem might say your water flows out of the faucet at more like 1 g/m.
So how does this relate to data?
Flow rates are always measured as some quantity over some period of time. With data, these flow rates are expressed as either bandwidth or throughput, both of which measure the amount of information that can be moved over a given time period. Although these two terms are often used interchangeably, they are actually quite different.
Bandwidth describes the theoretical maximum rate of data transfer that a link is designed to handle, like the water-flow rating of your faucet. But just as external factors such as flushing toilets or leaks in your pipes might keep your faucet from delivering water at its maximum rate, in the world of data networks and computers, factors such as protocol overhead, network latency, and reliability among other things often conspire to keep your data connection from ever living up to its potential. Another term, throughput, is used to describe the actual amount of data moving through the link, like the water flow measurement that your plumber got in his test.
Both bandwidth and throughput are typically measured as some form of either bits per second (b/s) or bytes per second (B/s)—Mb/s, MB/s, Gb/s, GB/s, etc. Note the difference in case: small b is for bits; big B is for bytes. You often encounter data transfer speeds expressed in both units of measure.
The metrics used in both the Adelaide network and Seagate SSD examples describe data transfer rates, but one uses gigabits as its unit of measurement and the other uses gigabytes. As you may recall from Part 2, a byte these days is most commonly defined as 8 bits. So, usually, to convert from bits to bytes, you divide the number of bits by 8; to convert from bytes to bits multiply the number of bytes by 8. Remember, that’s a very general rule, and we will later look at a notable exception when it comes to data transfer rates.
Now that you know how to convert these units of measure, let’s look at why they are measured differently in the first place.
A general rule of thumb: With memory and storage, data transfer rates are typically expressed in bytes per second; with networks and interfaces, data transfer rates are typically expressed in bits per second.
A common misconception is that the companies in network- or internet-related fields use bits per second to fool customers into thinking that the network speeds are actually much faster than they really are. The rationale is that because 8 Gb/s sounds much better than 1 GB/s, customers will think they are getting a connection speed faster than what they are really getting. It isn’t a marketing ploy, however. If this were the case, then the same people who think so might then have to ask why the marketing groups in the memory and storage companies don’t convert everything to bits, as well. The reasons for the differences are elementary, as we will see below.
Bytes for Computers
When talking about data transfers between a computer and its memory or storage, transfers are mostly expressed in terms of bytes per second. We saw in Part 2 that the byte is the most fundamental addressable unit of data for computer memory and storage devices, so computers are primarily interested in bytes. And as we saw in Part 3, when computers access data in memory, they do so at the byte level. So, if you instruct your computer to identify or manipulate an individual bit, it must first access the byte it belongs to, then it must perform an additional operation to isolate that bit. Moreover, because files are also measured by how many bytes they contain, expressing transfer speeds in bytes per second is useful for figuring out how long you might expect it to take to copy a file from one part of your computer system to another. This is most commonly expressed as throughput.
The standard works fine for file transfers within a single system because a computer generally speaks the same language as all of its peripheral devices. But a different set of concerns presents itself when looking at how data is transferred between altogether different computer systems or through a network.
Bits for Networks
With network connections–or any situation where two or more computers are linked over wires or through the air for sharing data–transfer rates are traditionally described in terms of bits per second. This is because unlike in computer systems themselves, data moving through a network is transferred serially, or one bit at a time–not in chunks of bits, or one or more bytes at a time.
As we also saw in Part 2, historically the size of a byte was hardware-platform dependent. Some computers needed 4 bits to make their byte, others needed 6 bits, and still others needed 8 bits. So when data was transferred from one computer to another, it wasn’t practical to measure the transfers in terms of how many bytes were being moved, because the sending computer’s byte could reasonably be a different size than the receiving computer’s byte!
Even after 8 bits became the de facto standard definition of the byte, it still made little sense to express data transfers in terms of bytes. Network engineers measure transfer rates in bits per second because doing so most accurately reflects the actual capability of a network link to transfer all necessary data, not just the data in the file(s). In most network communications not all of the bits that are sent over the link are actually part of the files being transferred. The network often employs encoding schemes, tacking on additional bits for every byte being sent. These extra bits contain information that the network needs to ensure the data is sent to the correct recipient and to confirm that the data was sent without errors and received without incident, among other things.
A common encoding scheme is called 8b/10b, which gets its name because for every 8 bits sent over a link, 2 bits are tacked on as metadata–in other words, 10 bits of data are sent for every 8-bit byte. With this encoding scheme, just transferring data reliably over a network incurs an overhead of 20 percent. Although in theory Adelaide’s broadband aspirations of a 10 Gb/s network, if measured in bytes, should be approximately 1.25 GB/s (10 Gb divided by 8), when taking into account the 20 percent overhead, it would be more closely equivalent to 1 GB/s. Other schemes are becoming more prominent, most notably one adopted by PCIe 3.0 standard, which uses a 128b/130b. You guessed it; this means that 2 bits of overhead are tacked on for every 128 bits of data. The newer scheme is more efficient than the 8b/10b standard, as the upgrade results in an overhead penalty of only approximately 1.5 percent, compared with the previous 20 percent.
End of the Line
Understanding technologies and the terminologies that describe them can be very confusing if you don’t pay attention. This is especially the case when it comes to looking at data in transit. Unfortunately, although a gigabyte and gigabit are very different units for measuring similar but different things, in the real world, people often shorten 10 GB/s and 10 Gb/s the same way: as “10 gigs.”
Hopefully now, however, you’ll have a very clear idea that the robot proclaiming that it has watched its Netflix movies at 10 gigs of throughput is saying something completely different than the Aussies who’ll soon be bragging that they’re on a 10 gig network, mate.