CIO - Facebook today "re-open-sourced" the Thrift binary communication protocol with its own internal branch of Thrift, which is designed to provide a new set of core features and crank up performance.
Facebook Software Engineer Dave Watson explains that the company always wants to choose the best tools and implementations for its backend services, regardless of programming language. By using programming languages on a case-by-case basis, it can optimize performance, ease and speed of development, leverage existing libraries and so on.
"To support this practice, in 2006 we created Thrift, a cross-language framework for handling RPC [remote procedure calls], including serialization/deserialization, protocol transport and server creation," Watson says. "Since then, usage of Thrift at Facebook has continued to grow. Today, it powers more than 100 services used in production, most of which are written in C++, Java, PHP or Python."
After a year of internal use, Facebook released Thrift to the open source community, where development of Apache Thrift continues. But, as Watson notes, while Apache Thrift gained wide use outside Facebook, IT organizations using it ran into performance concerns and issues separating the serialization and transport logic.
Inside Facebook, IT was running into similar issues as it gained experience running Thrift infrastructure. Watson says the team realized that Thrift was missing a core set of features, and that a lot more could be done for performance.
"For example, one issue we ran into was that internal service owners were constantly reinventing the same features again and again-such as transport compression, authentication and counters - to track the health of their servers. Engineers were also spending a lot of time trying to eke more performance from their services."
"When Thrift was originally conceived, most services were relatively straightforward in design," Watson adds. "A Web server would make a Thrift request to some backend service, and the service would respond. But as Facebook grew, so did the complexity of the services. Making a Thrift request was no longer so simple. Not only did we have tiers of services (services calling other services), but we also started seeing unique future demands for each service, such as the various compression or trace/debug needs.
Over time, Watson says, it became obvious that Thrift was in need of an upgrade for some of our specific use cases. In particular, we sought to improve performance for asynchronous workloads, and we wanted a better way to support per-request features."
The end result is fbthrift, which Facebook released today on GitHub. Watson says the largest changes are in the new C++ code generator (available as the new target language cpp2), as well as header transport and protocol changes for several languages, including C++, Python and Java. He adds that a number of services that have moved to the new cpp2-generated code have achieved up to a 50 percent decrease in latency and large decreases in memory footprint.
Watson notes that it doesn't reflect all Apache Thrift changes, but the team did track the upstream changes closely, and he adds that Facebook hopes to work with the Apache Thrift maintainers to incorporate the work on fbthrift.
Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn.
Read more about enterprise architecture in CIO's Enterprise architecture Drilldown.
- Hadoop for Dummies Today, organizations in every industry are being showered with imposing quantities of new information. Along with traditional sources, many more data channels and...
- The Top Five Ways to Get Started with Big Data Despite the increased focus on big data over the past few years, most organizations are still talking about what big data is rather...
- Data Warehouse Augmentation: The Queryable Data Store While organizations have, to date, been busy exploring and experimenting, they are now beginning to focus on using big data technologies to solve...
- The IBM Big Data Platform IBM is unique in having developed an enterprise class big data platform that allows you to address the full spectrum of big data...
- Live Webcast Best Practices: How to Improve Business Continuity with Virtualization VMware solutions include a range of business continuity capabilities to help ensure availability for applications across your virtualized environment. Learn More>>
- Cloud Knowledge Vault Learn how your organization can benefit from the scalability, flexibility, and performance that the cloud offers through the short videos and other resources...
- Endpoint Data Management: Protecting the Perimeter of the Internet of Things Not surprisingly, "Internet of Things" (IoT) and Big Data present new challenges AND opportunities for enterprise IT. Teams need to harness, secure and... All Data Center White Papers | Webcasts