Under the Covers of a Distributed Virtual Computing Platform – Built For Scale and Agility – via @dlink7, #Nutanix
I must say that Dwayne did a great job with this blog post series!! It goes into expelling the Nutanix Distributed File System (NDFS) that I must say is the most amazing enterprise product out there if you need a truly scalable and agile Compute and Storage platform! I advise you to read this series!!
Under the Covers of a Distributed Virtual Computing Platform – Part 1: Built For Scale and Agility
Lots of talk in the industry about how had software defined storage first and who was using what components. I don’t want to go down that rat hole since it’s all marketing and it won’t help you at the end of the day to enable your business. I want to really get into the nitty gritty of the Nutanix Distributed Files System(NDFS). NDFS has been in production for over a year and half with good success, take read of the article on the Wall Street Journal.
Below are core services and components that make NDFS tick. There are actually over 13 services, for example our replication is distributed across all the nodes to provide speed and low impact on the system. The replication service is called Cerebro which we will get to in this series.
This isn’t some home grown science experiment, the engineers that wrote the code come from Google, Facebook, Yahoo where this components where invented. It’s important to realize that all components are replaceable or future proofed if you will. The services\libraries provide the API’s so as newest innovations happen in the community, Nutanix is positioned to take advantage.
All the services mentioned above run on multiple nodes in cluster a master-less fashion to provide availability. The nodes talk over 10 GbE and are able to scale in a linear fashion. There is no performance degradation as you add nodes. Other vendors have to use InfiniBand because they don’t share the metadata cross all of the nodes. Those vendors end up putting a full copy of the metadata on each node, this eventually will cause them to hit a performance cliff and the scaling stops. Each Nutanix node acts a storage controller allowing you to do things like have a datastore of 10,000 VM’s without any performance impact… continue reading part 1 here.
Under the Covers of a Distributed Virtual Computing Platform – Part 2: ZZ Top
In case you missed Part 1 – Part 1: Built For Scale and Agility
No it’s not Billy Gibbons, Dusty Hill, or drummer Frank Beard. It’s Zeus and Zookeeper providing the strong blues that allow the Nutanix Distributed File System to maintain it’s configuration across the entire cluster.
Zeus is the Nutanix library that all other components use to access the cluster configuration. As mentioned before Zeus allows the interaction with other components in the file system but allows the component, Zookeeper to replaced if need be. This is very important as the open source community is having 200,000+ engineers in your back pocket. There is interesting article about Netflix using Zookeeper as well. Sure you still need bright minds but we have those. I think our Hardware to Software Engineering spit was 1 to 9. End of the day we are software company that delivers medicine to Enterprises in a hardware form factor. Zeus keeps tracks of IP addresses of ESXi hosts, virtualized storage controllers, and health information thru IPMI(ilo\DRAC), capacities, data replication rules and all of the cluster configuration. Zeus helps to provide the glue between storage & compute to form a single active identity. Even without having the IPMI plugged in the Nutanix Command Center UI can get all the health stats it needs.
Zookeeper runs only on three nodes on the cluster, no matter how big or small the cluster gets. Since it’s tracking configuration data that doesn’t change the often there is no impact on performance. Using multiple nodes prevents stale data from being returned to other components, while having an odd number provides a method for breaking ties if two nodes have different information. One Zookeeper node is elected as the leader. The leader...continue reading part 2 here.
Under the Covers of a Distributed Virtual Computing Platform – Part 3: Metadata
Part 1 was the overview of the magic of the Nutanix Distributed File System(NDFS).
Part 2 was an overview of Zookeeper in regards maintaining configuration across a distributed cluster built for virtual workloads.
Part 3 is the reason why Nutanix can scale to infinity, a distributed metadata layer make up of Medusa and Apache Cassandra.
Before starting at Nutanix I wrote a brief article on Medusa, Nutanix: Medusa and No Master. Medusa is a Nutanix abstraction layer that sits in front of a NoSQL database that holds the metadata of all data in the cluster. The database is distributed across all nodes in the cluster, using a modified form of Apache Cassandra. As virtual machines move around the nodes(servers) in the cluster they know where all their data is sitting. The ability to quickly know where all the data is sitting is why hard drive failures, node failures and even whole blocks* can fail and the cluster can carry on.
When a file reaches 512K in size, the cluster creates a vDisk to hold the data. Files small than 512K will be stored inside of Cassandra. Cassandra runs on all nodes of the cluster. These nodes communicate with each other once a second, using the Gossip protocol, ensuring that the state of the database is current on all nodes.
A vDisk is a subset of available storage within a container. The cluster automatically creates and manages vDisks within an NFS container. A general rule is that you will see a vDisk for every vmdk since most times they are over 512K. While the vDisk is abstracted away from the virtualization admin it’s important to understand. vDisk’s are how Nutanix is able to present vast amounts of storage to virtual machines with only having a subset of the total amount on anyone node.
Under the Covers of a Distributed Virtual Computing Platform – Part 4: Stargate, The Point Man
Part 1 was the overview of the magic of the Nutanix Distributed File System(NDFS).
Part 2 was an overview of Zookeeper in regards maintaining configuration across a distributed cluster built for virtual workloads.
Part 3 is the reason why Nutanix can scale to infinity, a distributed metadata layer make up of Medusa and Apache Cassandra.
Part 4: Stargate is the main point of contact for the Nutanix cluster. All read and write requests are sent to the Stargate for processing. Stargate checksums data before writing it and verifies it upon reading. Data integrity is number 1.
Stargate depends on Medusa to gather metadata and Zeus to gather cluster configuration data.
Stargate has 6 components that make service:
Front-end Adapter
Receives read/write requests from the ESXi host. It keeps tracks on incoming writes and helps to localize all traffic in the cluster for performance. The front-end adapter lets your 3+N storage controllers to work together to elevate hot spots and even run mixed workloads.
Admission Control
Determines which requests to forward to vDisk controllers, based on the type of request and
number of outstanding requests. Admission control provides the balancing act between doing Guest IO versus maintenance tasks, replication, snapshots and continually data scrubbing.
vDisk Controller
Responds to incoming requests based on whether it is random or sequential Random IO is sent to the oplog. Sequential requests are sent directly to the extent store, unless they are short, small requests.These requests are treated as random IO and sent to the oplog as well. The vDisk controller plays the first step in helping to serialize all the writes so you don’t have to worry about disk alignment. vDisk controller also plays a critical role in the Nutanix Hot Optimized Tiering, moving data up and down tiers thru adaptive algorithms and bring data closer to the compute. The vDisk Controller is the poster child for virtualized Hadoop.
//Richard