TRILL is an acronym that stands for Transparent Interconnection of Lots of Links and is a proposed Link Layer (L2) networking standard from the IETF (Internet Engineering Task Force). TRILL is important because as large data centers begin to combine storage traffic and IP traffic on converged Ethernet links using new technologies such as FCoE, the standard Spanning Tree Protocol (STP) is no longer going to be suitable for converged networks or scale as required in very large data centers. With the growing adoption of FCoE, enterprise storage will begin to join other protocols on the IP network. TRILL’s role is to, over time, at least from a storage perspective, replace STP, which is common in Layer 2 networks.
The weakness of STP is that it was designed in the days of very small hubs, even before switches. While there are several variations of STP, by design it insures that there is only one path to a specific point; the goal was to create a loop-free infrastructure. Of course there are always redundant paths in almost any network. With STP all the redundant paths are blocked. As the environment has scaled multiple switches and more paths have been added to the infrastructure but STP continues to block all but one path. When the active path fails the network has to re-converge on a new path. In a large network re-convergence could take a few seconds. While this may seem acceptable for standard IP communications, it’s not for storage or converged networks, especially those with virtual environments.
Click to view our recorded webcast
The other weakness with STP is that it’s not very efficient when it comes to network bandwidth. First, all of the blocked off paths represent bandwidth that is sitting idle. As available per-segment bandwidth increases this means more and more bandwidth going to be unused. Second, the active path may not be the most efficient or shortest path for communication between two devices. Essentially, data is often left taking the ‘scenic route’ through the network instead of a direct or shortest available path. This is a weakness that not only affects storage but is especially detrimental with live machine migration in a virtual environment. Moving a virtual machine or application to another server may require routing through several additional paths and switches and sub-optimized routing just makes performance worse. Virtual machine (VM) movement also needs to compete with all the other traffic on the primary path. In fact many large virtualization environments set up a dedicated network just for VM migration. It would be simpler to have the network migrate a virtual machine via one of the above mentioned blocked paths that is essentially unused but set aside by STP.
One of the goals of TRILL is to know the shortest available path and take advantage of it. Doing this requires an understanding of the entire topology and how it’s being used at that moment in time. Spanning Tree was designed in an era where the hub/switch hardware could not store the configuration of the entire network. The impact was that each path, active or not, had to be designed to handle peak load. TRILL ‘knows’ the entire infrastructure and how to use it efficiently. The result is networks that are more efficient where no single segment has to be built to handle the peak. In essence, TRILL splits network loads amongst multiple paths and utilizes network bandwidth more efficiently. By addling multi-path capability to L2 networks, TRILL frees network bandwidth and makes L2 networks more resilient and better suited for virtualized environments.
Without TRILL most networks have been designed to work around the limitations of STP by building a multi-tier network that places Layer 2 infrastructure at the edge or in the access layer and Layer 3 at the aggregation layer. Then finally, core routing protocols are in the other layers of network infrastructure. This has been the primary way to design networks for over a decade. The intent of this design is to work around the limitations of STP by zoning out the Layer 2 networks from each other. That way, in the case of a failure and the need to re-converge traffic, recalculation is kept to a manageable period of time.
The downside to this is that it’s more expensive to build this type of network. First, Layer 3 or routing ports are more expensive than the Layer 2 switch ports. The more you implement this workaround the more expensive your infrastructure becomes. The second challenge to introducing Layer 3 into the environment is that it is complex and requires ongoing monitoring and management. Complexity is something that today’s stretched-too-thin IT staffs need to avoid. Finally, this design also makes the on-demand nature of the dynamic data center difficult to achieve. The ability to move bandwidth from one Layer 2 network to another, when Layer 3 is in the way, requires significant planning and is certainly not very dynamic. As a result of these issues, when Layer 3 is implemented, it’s done so on a relatively small scale and for the most part, the data center just ‘lives with’ the downside of STP’s inefficiency.
TRILL and FCoE
Initially, Fibre Channel over Ethernet (FCoE) deployments are going to be relatively basic top-of-rack implementations where a Converged Network Adapter is placed in an attaching server and a single cable segment is connected to a top-of-rack FCoE switch. This switch will then split the fibre channel storage traffic from the IP traffic, with the storage traffic typically going to the SAN infrastructure and the IP traffic going to the network infrastructure. For the present day this is an acceptable method for managing the environment and the lack of TRILL being available should not stop anyone from implementing FCoE.
As the converged network continues to evolve and scale, the limitations of Spanning Tree will become more prevalent and time required to re-converge links will be a greater challenge, especially to storage or virtual server infrastructures. FCoE requires lossless Ethernet transport that is being created with Ethernet enhancement called Converged Enhanced Ethernet (CEE) aka DCB (Data Center Bridging)… In addition to enabling L2 multi pathing, TRILL enables multi hop FCoE; thus opening the door for more sophisticated deployment of the technology into data centers.
With the growing adoption of FCoE, enterprise storage will begin to join other protocols on the IP network. The result of convergence using FCoE & DCB is that we will see infrastructures with more efficient configurations and larger networks with at least double in the amount of end points they connect. In addition, data centers are growing thanks to cloud computing and cloud storage, so the sheer number of endpoints to connect the network to will grow faster than ever. Finally, virtualization makes the environment even more dynamic when taking advantage of TRILL. The net effect will be significantly larger data centers and more resilient and efficient infrastructures with more active end points. When the data center gets to that point TRILL will play a prominent role in enabling the dynamic data center.