System Buses Explained
I don't know about you, but I for one, despite thinking to myself that I had it all under my grasp always get confused when thinking about the array of buses in a modern computer. Bandbreadth of the CPU, memory, AGP plus new technologies such as Hypertransport always leave me in a spin, especially when you're speech person about it who manages to make you question your own knowledge.
I thought I would write it down for reference, and hopefully provide an understanding to those who want to know all about this topic.
Throughout this clause we will try to grasp an understanding of all the components that make a viable ADPS and hopefully see past the merchandising which preys on our lack of understanding.
The Aim
Computers are not marketed these days from a strictly technical point of view. All retailers or manufacturers will attempt to give their product an edge over very similar products in their class. Graphics card game and motherboards are an excellent example of this right now. Different names, same technology.
Marketing even goes so far as to deviate away from the correct technical nomenclature of computers. Kilo, Mega, Giga are not the same when it comes to making numbers "easy" for joe public.
Technically and correct:
1 bit is a single unit of information delineated in the form of a 1 or a 0.
There are 8 bits in a byte
There are 1024 bytes in a kilobyte
There are 1024 kilobytes in a Megabyte
There are 1024 Megabytes in a Gigabyte
And incidentally, although not used in this clause...
There are 1024 Gigabytes in a Terabyte
1024*1024*1024 is awkward and provides results that are not nice for merchandising.
Instead they move to multiples of 1000. 1000 bytes in a kilobyte, 1000 kilobytes in a MB so forth. This provides nice round numbers.
Take this e.g. (we will cover the calculations afterwards):
Technically:
PC2100 DDR Memory / DDR266 Memory
64 (bits) * 266,000,000 (Hz) = 17024,000,000 bits/s
(17024,000,000/8) / (1024*1024) = 2029.4MB/s
Marketing:
PC2100 DDR Memory / DDR266 Memory
64 (bits) * 266,000,000 (Hz) = 17024,000,000 bits/s
(17024,000,000/8) / (1000*1000) = 2128MB/s
Convenient don't you think? Not only does it provide a sorcerous 100MB/s of bandbreadth, it's also a nice number (no decimal places etc..)
Latency
The problem with high multipliers in modern CPUs is the latencies involved. The central processor clock speed (we will use 1.73GHz as an example) is far in advance of the comparatively paltry speeds of the memory bus, AGP bus etc.. the CPU finds itself having to wait around for the rest of the system to catch up.
We shall use an example to illustrate:
A central processor with a 133MHz bus speed running at 1.73GHz has a clock multiplier of 13 (13*133 = 1733).
# The CPU sends a request to the system memory for information # The CPU then waits one cycle (commonly glorious as the command rate (1T) # The memory undergoes what is glorious as a RAS/CAS latency # The memory has a delay in determination the data glorious as a CAS latency
Thus whilst the CPU has waited 1 CPU cycle so 4 bus cycles it has had to wait for 1 + (4 * multiplier) CPU cycles to get the data it was after. For every memory bus cycle the CPU has undergone 13 cycles. Not much when you consider this 1.73GHz CPU has 1.73 billion cycles per second, but how many multiplication does the CPU access main memory? Quite a bit so it all adds up.
Memory
We will consider 3 different types of memory in this clause.
# SDR-SDRAM (Single Data Rate - Synchronous Dynamic Random Access Memory) - SDR-SDRAM was the dominant memory of the late 90s. Later version were available at speeds of 66/100/133 MHz as standard. This type of memory is/was used by both Intel and AMD for their recent offerings, even used in the i845/845G chipset with the Pentium 4 processsor. Later we will show what a mistake or distinct waste of CPU that was.
# DDR-SDRAM (Double Data Rate - Synchronous Dynamic Random Access Memory) - DDR-SDRAM has confiscate where SDR memory left off. Particularly with AMD systems (Thunderbird / XP / Thoroughbred) DDR memory has come to the fore as the mainstream memory for the inevitable future, with DDR-II on the horizon.
# RDRAM (RAMBUS Dynamic Random Access Memory) - Although only really made popular in the mainstream computer market via the Intel Pentium 4 central processor, RDRAM technology dates back earlier than DDR memory.
Bandbreadth Calculations
To avoid confusion afterwards here is a reference table for bits, bytes, Mega, kilo, Giga etc...
1 bit is a single unit of information delineated in the form of a 1 or a 0.
There are 8 bits in a byte
There are 1024 bytes in a kilobyte
There are 1024 kilobytes in a Megabyte
There are 1024 Megabytes in a Gigabyte
And incidentally, although not used in this clause...
There are 1024 Gigabytes in a Terabyte
SDR-SDRAM
To calculate memory bandbreadth we need to know 2 things. Its data breadth and its in operation frequency. The last mentioned is easier to find out as it is unremarkably part of the merchandising/retail title.
We unremarkably see SDR memory at 100 or 133MHz. Taking 133MHz as the example, this means that the memory can perform an operation 133 million multiplication every second.
Finding the data breadth, well that's just something you have to look up. SDR memory has a data breadth of 64 bits or 8 bytes (8 bits in a byte).
PC100 SDR Memory
The calculation is as follows : data breadth * in operation frequency = bandbreadth (in bits/s)
To convert to more realistic and manageable figures, divide the result by 8 to give bytes/s so divide once more by 1024 to get kilobytes/s so by 1024 once more to get Megabytes/s.
Thus : 64 (bits) * 100,000,000 (Hz) = 6400,000,000 bits/s
(6400,000,000/8) / (1024*1024) = 762.9MB/s memory bandbreadth.
PC133 SDR Memory
Using the same forumla as we did for PC100 SDR memory we can easily calculate theoretical memory bandbreadth for PC133 SDR memory.
64 (bits) * 133,000,000 (Hz) = 8512,000,000 bits/s
(8512,000,000/8) / (1024*1024) = 1014.7MB/s or roughly about 1GB/s memory bandbreadth.
DDR-SDRAM
DDR memory is slightly more complex to understand for 2 reasons. Firstly, DDR memory has the power to transfer data on the rising and falling edge of a clock cycle, meaning in possibility DDR memory doubles the memory bandbreadth of a system able to use it.
Secondly, as a merchandising push to vie with a rival technology at the time DDR was introduced, RAMBUS; DDR was sold as a measure of its approximate peak theoretical bandbreadth. Similar to AMD and the PR rating of the XP central processors we have today, People buy numbers, and DDR was seen to be faster if it was sold as PC1600 and PC2100 instead of PC200 and PC266.
PC1600 DDR Memory / DDR200 Memory
DDR memory has the same data breadth as SDR memory: 64 bits.
We use the same calculation to measure bandbreadth, with the extra frequency.
64 (bits) * 200,000,000 (Hz) = 12800,000,000 bits/s
(12800,000,000/8) / (1024*1024) = 1525.9MB/s.
Notice the bandbreadth is doubly that of PC100 SDR memory.
PC2100 DDR Memory / DDR266 Memory
64 (bits) * 266,000,000 (Hz) = 17024,000,000 bits/s
(17024,000,000/8) / (1024*1024) = 2029.4MB/s or roughly 2GB/s memory bandbreadth.
With the advent of improved memory yields, modules able to run at higher clock speeds are being free to the market. PC2700 has finally come into its own with the introduction of the AMDXP2700+/2800+ and the Intel i845PE chipset.
Here are some bandbreadths for the latest memory available:
PC2700 DDR Memory / DDR333 Memory
64 (bits) * 333,000,000 (Hz) = 21312,000,000 bits/s
(21312,000,000/8) / (1024*1024) = 2540.6MB/s.
PC3200 DDR Memory / DDR400 Memory
64 (bits) * 400,000,000 (Hz) = 25600,000,000 bits/s
(25600,000,000/8) / (1024*1024) = 3051.8MB/s.
PC3500 DDR Memory / DDR434 Memory
64 (bits) * 434,000,000 (Hz) = 27776,000,000 bits/s
(27776,000,000/8) / (1024*1024) = 3311.2MB/s.
RDRAM
RDRAM memory is slightly more complex in this the bus operates at an effective 64 bit bus breadth ala DDR but is separated into 2 16/32 bit channels. What does this mean? well presently 2 sticks of RDRAM have to be used in a system. DDR has the advantage (unremarkably from a cost point of view) of being able to be used in single DIMMs.
The caclulation is in essence the same however, we just need to allow the extra channel and extra memory speed.
PC800
16 (bits) * 800,000,000 (Hz) = 12800,000,000 bits/s
(12800,000,000/8) / (1024*1024) = 1525.9MB/s. Multiplied by 2 because of the dual channel configuration - 3051.8MB/s
PC1066
16 (bits) * 1066,000,000 (Hz) = 17056,000,000 bits/s
(17056,000,000/8) / (1024*1024) = 2033.2MB/s. Multiplied by 2 because of the dual channel configuration - 4066.4MB/s
nForce
nForce is special as it promulgated the future of memory interfaces, for DDR at least. Dual DDR technology gives 2 64bit channels instead of 1 making an effective 128bit memory bus. This allows doubly the bandbreadth through the bus.
Although DualDDR technology ne'er really made a huge impact on nForce memory bandbreadth (so the benchmarks tell us at least), it has great potential to a recent DDR convert.
The Intel Pentium 4 central processor, a long standing advocate of RAMBUS/RDRAM has pledged to move away from the serial memory technology and embrace DDR. Unfortunately, as the memory bandbreadth calculations on page 4 showed, DDR in its current form has neither the bandbreadth or the potential to scale up to RDRAM bandbreadths in its current iteration.
Dual DDR will make a big difference to Pentium 4 chipsets. P4s with QDR computer architecture can accomplish bandbreadths of around 4GB/s, absolutely matched with PC1066 RDRAM. The fastest DDR memory presently available on the other hand, PC3500 has a bandbreadth of around 3.1GB/s. The P4 is lame with current DDR chipsets.
Doubling the memory bandbreadth then is something Intel is looking forward to.
PCI Bus
The PCI bus is one of the older buses in a modern system. It is the bus which connects all the expansion card game in a system to the main chipset, on with IDE and USB.
The PCI bus is a 32-bit wide bus running at 33MHz. Using our familiar calculation we can now easily calculate its maximum bandbreadth.
32 (bits) * 33,000,000 (Hz) = 1056,000,000 bits/s
(1056,000,000/8) / (1024*1024) = 125.9MB/s. Rounded up to 133MB/s
It is comparatively easy to imagine, that with modern ATA133 Hard Drives, PCI network adapters, sound card game and the like, the PCI bus can easily become saturated. There are 3 ways around this solution. 2 have already been implemented.
# Expand the bandbreadth of the bus - Server motherboards, especially with the preponderance of SCSI hard drives requiring more bandbreadth than the PCI bus can transfer, have touched to a 66MHz bus exploitation 64bit slots. This quadruples the bandbreadth afforded.
64 (bits) * 66,000,000 (Hz) = 4224,000,000 bits/s
(4224,000,000/8) / (1024*1024) = 503.5MB/s. Rounded up to 533MB/s
# Move to a dedicated bus - The frank example here is artwork card game. With ever increasing speeds of artwork card game necessary to deal with ever complex games the PCI bus of old simply cannot deal with the sheer amount of information necessary to get to the northbridge and vice versa. Thus the AGP bus was born. A direct link from the AGP card to the chipset running at 66MHz with a 32bit bus gives a maximum bandbreadth of:
32 (bits) * 66,000,000 (Hz) = 2112,000,000 bits/s
(1056,000,000/8) / (1024*1024) = 251.77MB/s; rounded up to 266MB/s
IDE
IDE hard drives transmit data to the CPU and vice versa, via the PCI Bus. Of course this means that any transfers is limited by the speed of the PCI bus, 133MB/s or thereabouts meaning ATA133 is as high as IDE can get (even though actually it ne'er gets close anyway).
Recent innovations have tried to bypass the PCI bus for IDE transfers. VIA's VLink technology is a dedicated bus running at 266MB/s between the Southbridge and Northbridge.
Serial ATA
The successor to IDE. Why is this in the PCI section? Well presently despite all the hype, Serial ATA connectors all use the PCI bus to transfer information. SATA150 with a theoretical maximum transfer of 150MB/s is limited to the paltry 133MB/s of the PCI bus. Future chipsets will alleviate Serial ATA of the PCI bus burden and allow direct access to the chipset probably on a dedicated bus. This is necessary for the next generation of SATA devices able to run at 300/600MB/s.
AGP Bus
As partially explained on page 6, the AGP bus was born to accommodate the ever expanding bandbreadth necessarily of artwork card. The 133MB/s capacity of the PCI bus simply wasn't able to handle the like card game faster than the Voodoo 3, one of the last PCI artwork card game.
The AGP bus was a 32bit bus like the PCI bus, but it operated at 66MHz giving it a maximum bandbreadth of 266MB/s. This was and is glorious as AGP 1x.
Similar to the QDR implementation of the Intel Pentium 4 central processor, the AGP bus was redesigned to allow data to be processed 2, then 4 multiplication every clock cycle. This is glorious as AGP2x/4x. More recently AGP8x has been introduced.
Each iteration of AGP has two-fold the bandbreadth of the previous standard:
# AGP1x = 266MB/s
# AGP2x = 533MB/s
# AGP4x = 1066MB/s
# AGP8x = 2132MB/s
Hypertransport
In all walks of life, things move on. Standards delineated 10 years ago ad beyond can ne'er hope to accomplish scalepower to today's necessarily.
As the 8bit ISA bus was superceded by the PCI bus, thus the out-of-date PCI necessarily to be phased out and a new interconnect communications protocol defined. The leading competitor for the throne at the moment is Hypertransport.
An AMD led pool hopes to make Hypertransport the shaping interconnect communications protocol of the inevitable future.
What Is Hypertransport?
Hypertransport is a point-to-point interconnect primarilly designed for speed, scalepower and the unification of the various system buses we have today. The same link can be accustomed retrieve data from a network card and a bank of DDR memory.
Here is an example of the typical computer bus layout as we know today:
Hypertransport would eliminate most of the bottlenecks found in today's systems. The PCI bus as explained earlier is easily saturated with the high bandbreadth peripherals in use.
In terms of speed, Hypertransport is capable (at the moment) of delivering throughputs of up to 51.2Gbps.
Using 500MHz clock rate as an example
2 (bits * 500,000,000 (Hz) = 1000,000,000 bit/s
(1000,000,000/8) / (1024*1024) = 119.2MB/s - with the power of DDR sign this is two-fold to 238.4MB/s.
or to use Gbits (in essence because it sounds more):
1000,000,000 / (1024*1024*1024) = 0.93Gbps (rounded up to 1Gbps). With the DDR sign this is shunted up to 2Gbps.
We see Hypertransport in today's technology through one company's innovation to break from the norm. NVIDIA's nForce (and nForce2 of course) use Hypertransport as the primary interconnect offering throughputs of 800MB/s (nForce1) and 1600MB/s (nForce2). Not top speed Hypertransport but more than enough for today's components.
VIA have valid Hypertransport for use in their future K8 AMD Hammer chipsets so the future is sure pick up for the fledgling communications protocol.
Roundup
Before we talk about what will come let us in brief cover what is going on at the moment.
It should have hopefully become apparent that there are many pitfalls when deciding on a new ADPS, for both home users and businesses alike. As always, technical details are buried under a big pile of merchandising. Minor advancements in technology that actually, do nothing are promulgated as the "next big thing". A quick look under the surface however, shows this not to be the case.
It nisus me to see users asking whether they should upgrade their VIA KT266a based motherboard to a VIA KT333 chipset because "it must be faster", bigger numbers mean faster right?. Wrong, a balanced system means you can squeeze the most out of your setup, be it for gaming, CAD or other intensive operations. Nobody wants to spend money needlessly so read this clause once more, get a feel for the numbers involved and come to your own conclusions.
The Future
We covered in brief the aspects regarding future IO buses. Hypertransport and PCI-Express are on the horizion, more or less are already here. We need the peripherals and components to make use of this extra bandbreadth. At the moment it seems wherever you look, there is a bottleneck.
Hopefully in the future manufacturers will fixate less buses, it's less confexploitation for the consumer and it also means that computers will become less complex. Take e.g. USB2.0 and Firewire (not covered in this clause), two combative communications protocols that in essence do the same thing. Hot-pluggable, scalable, high-bandbreadth connections. Why not fixate one and stick to it?
Anyway, end of the ranting. We hope you enjoyed this clause. It will be constantly updated as new technologies emerge in this ever-changing industry.
At the end of the day, this is a reference for us all.
0 Comments