Kadir Ozdemir is currently a big data storage architect at Salesforce. He has 18 years of data storage system development experience in delivering highly available distributed systems in roles including chief architect for large engineering teams at EMC and Symantec. He has patents on various data storage and network systems and technologies, and published papers on data storage, formal protocol validation, neural networks, and expert systems. He holds a PhD in computer science from the University of Ottawa, Canada.
Strongly Consistent Global Indexes for Phoenix
Without transactional tables, the global indexes can get easily out of sync with their data tables in Apache Phoenix. Transactional tables require a separate transaction manager, have some restrictions and performance penalties, are still in beta. This technical talk lays out a design to have strongly consistent global indexes without the need for an external transaction manager. In addition to having strongly consistent indexing, the proposed design aims to have minimal impact on read performance. In Phoenix, global indexing is implemented using a separate table for each secondary index of a table. Updating a table with one or more global index requires updating multiple table regions likely distributed over multiple region servers. Translating a single table update operation into a multi-table write operation poses consistency issues as Phoenix does not provide a reliable multi-table update capability without using transactional tables. Although updating multiple tables (in our case, a data table and its indexes) atomically requires implementing a form of two-phase commit protocol, we have observed that achieving strongly consistent global indexing does not require implementing a general-purpose transaction capability. Another important observation is that HBase is a log-structured data store, that is, updates are never done in place. In these systems, writes are much faster when compared to in-place update systems because random writes are handled as fast as sequential writes. This allows us to add an extra write phase during updates without severely impacting the write performance, which simplifies the overall design. In this talk, we will present the details of the proposed design, the proof of its correctness, and our performance analysis and test results.
Timings: 11:30 - 12:00