My last commentary addressed why the physical and virtual networking technologies within datacenters will have distinct paths of evolution going forward. Both spaces will see substantial new products and capabilities and there will be innovations around intelligently connecting the two. Since the two spaces will inevitably coexist, it is important to look at the areas where strong binding is strategic. However, in order to understand the opportunities for strategic leverage, it is also important to understand the spaces little more deeply. In my last blog I covered the principles that are involved in architecting large-scale datacenter networks. In this entry, I intend to cover the different components of large-scale cloud infrastructure, including enterprise datacenter build principles. Given that this area is truly vast, my follow up blogs will examine the network virtualization space more closely and hopefully naturally expose the synergies between the physical and virtual networking constructs.
Fig. 1 shows a large-scale cloud provider infrastructure. Typically these networks are global, richly connected to the Internet and use technologies that are optimized for each subset (shown as areas A – G). While architecting these infrastructures, it is critical to understand how different areas would scale and mutually interact with each other to create a dynamic but cost controlled service.
Each area has distinct characteristics as discussed below.
A. Intra-DC Network:
This is where most of the servers reside. The traffic profile has changed over the last few years and the applications demand a lot of lateral east-west bandwidth in addition to the usual north-south bandwidth. Multi-tier CLOS is a popular style of network topology for such areas and mature IP control planes have proven to be excellent ways to interconnect the network elements.
IP is ubiquitous, but VLANs have a dependency on L2 technologies. For VLANs, there is also an inherent expectation that the servers are in close proximity and that server-to-server latencies are low. Traditional L2 protocols are weak, and they don’t scale beyond very limited physical proximities. So for cloud applications, L2 dependencies present a handicap. Hence, barring a few legacy applications, most modern applications operate in a pure L3 world beyond rack level. Of course the applications have to instrument ways to understand server affinity/proximity and have to minimize on inter-DC traffic loads. But on the other hand, L2 dependencies confine the servers running the applications to a smaller physical segment of the datacenters (co-location areas) and pose significant scaling challenges. For these reasons, most modern applications have eliminated the dependencies on L2 networks.
In summary, the following are key characteristics of intra-DC networks:
• Abundant network capacity within datacenters, low over-subscription and dense but topologically symmetrical networks
• Services like Firewalls, Load balancing, IPS/IDS that had traditional L2 attach points are being upgraded, often as VMs or virtual appliance. Services are being architected to operate on native L3 networks.
• Dependence on L2-centric topologies and protocols are being minimized
• IP networks heavily leveraging ECMP and operating on robust L3 control plane protocols are gaining traction.
B. Local Cluster of DCs:
This setup is only used for really large-scale cloud providers. Since applications breathe well with available network bandwidth, having richly connected datacenters in close proximity has been a common approach. In these cases, the datacenters are either connected by high number of single mode fiber cables or mux-ponder technology is used to carry multiple wavelengths (lambdas) over lower number of fibers. Available technologies can easily push eighty (80) 10 Gbps bandwidth carrying wavelengths over a single fiber pair.
The main challenges in this segment are the network topology and termination cost on the routers. If typical WAN routers are used to terminate the lambdas, then given the number of ports needed, the overall cost can be prohibitive. The unit cost of colored optics needed to feed the mux-ponders also comes into play. Network topology wise, one must be thoughtful about where to put the different components of a distributed network, since clustered datacenters typically go live in phases. So if the first datacenter carries all the aggregate “spine” routers, all the subsequent datacenters will need to connect back to the first one with an appropriate fiber length. The distance between the datacenters are important since the optical technology cost can go up significantly after a certain distance.
C. Metro cluster of clusters:
In this segment either the capacity (e.g. 10Gbps links) have to be leased or metro dark fibers have to be leased. For people who have heavy traffic needs between the clusters, dark fibers have been the choice for cost reasons. Mux-ponders light up dark fibers in many cases, and colored optics are used on router termination points. In other cases, transponders have been used. Each of these optical technologies has its cost points and benefits, often subject to unique architectural and business considerations.
D. Internet Core:
This network exists for large-scale cloud providers who connect to Internet globally at multiple points with different type of fiscal arrangements. Delivering traffic to the right hand-off points is extremely important to reduce the egress cost as well as to maintain the quality of user experience.
This layer typically carries the whole Internet routing table along with multiple alternate BGP paths. Traffic engineering mechanisms are common in this layer. Some people have used RSVP-TE with MPLS, some others have used out-of-band reservation mechanisms. Typically this layer gets run similar to the core of Tier 1 Internet Service Providers. Some people have long-haul dark fibers and some have used a mix of dark fiber and leased lines. For smaller provider a default route or subset of Internet prefixes can be carried in this core.
This layer has both a L3 routing core along with an optical backbone in many cases. Some people are working on converging the two layers with packet-optical core, by formation of a super-core that doesn’t carry Internet prefixes. Networks have also instrumented QoS and L3VPN capabilities in this segment to instrument WAN virtualization and honor service level agreements. As this capability has matured, it has also become table stakes for smaller cloud operators
E. Inter-DC bypass network:
People using this network have high-capacity requirements between the datacenters that are not located in the same metro. Since the number of prefixes carried in this network is limited to internal prefixes, packet-optical integration has worked out naturally in many of these cases. Dark fiber has been the choice for most people. The ultimate goal is to connect all datacenters with abundant bandwidth so that applications can scale-out with fewer constraints.
This is one area where Open Flow based mechanisms have seen some traction. Traffic Engineering based on bandwidth reservation/calendaring and smart traffic placement has been worked on. Since the termination cost has to be low, people have tried to avoid more complex mechanisms (i.e. MPLS RSVP-TE), despite their pervasive deployment by large providers.
F. Satellite Networks at Edge:
These networks are used similar to the edge nodes of CDN providers. The service delivery points are taken closer to the end-user to improve the user experience. Caching is used heavily along with many other techniques. Connectivity to Internet is rich at these locations. Some people terminate the user connections here and then direct the user traffic within the cloud infrastructure to provide the best experience. In many cases these locations can be even just few servers that are embedded in the Telco premises.
G. Access to Public and Private Networks:
Large cloud infrastructures connect to Internet at almost all areas expect the Inter-DC bypass network. The number of connections to the Internet depends on the scale of the provider as well as the amount of traffic originated/consumed. Many providers have settlement free peering relationships, ensuring zero-cost traffic exchange with other larger networks. Policy based network ingress/egress is an area that has seen good amount of work. Policy-based network optimization ensures that selective traffic can be flexibly delivered or attracted via appropriate Internet connections.
Large providers can also have private access arrangements with Enterprise customers. These access points are typically used by the Enterprises to access dedicated services hosted by the providers. In some of these services the Enterprise customers want the hosting providers to use the same address space as used by the Enterprises within their own networks. This creates a situation where multiple overlapping addresses have to be used by the cloud service providers between the access points and the datacenters. Typically L3VPN technology has been used to provide scalable VPN address segmentation in the backbone network. This is one of the areas that I will cover in more details in future posts.
Virtual Network Implications:
The lessons learned from automation of distributed control systems such as the Internet are invaluable as cloud operators scale overlay virtual networks.
For cloud providers, supporting seamless virtual networks across these diverse segments is critical. Tenants can extend their existing enterprise networks across such a cloud infrastructure, extending their available capacity and ensuring a significantly higher level of global business agility.
In future blogs I intend to explore the relevance of network virtualization with respect to the physical networks. More importantly, I will explore ways to evolve existing architectural paradigms to enable interoperable, scalable and resilient software platforms that protect and align with the current and future physical infrastructure investments.
Parantap Lahiri – VP Solution Engineering, Contrail Systems