SQL PASS Summit 2017 Day 2 - Keynote Notes and Ideas

on November 2, 2017

This morning, Dr Rimma Nehme tells us the story of the birth of Azure Cosmos DB, a global, scale-out database system.

At the beginning of the talk, I can already tell that I’m going to recommend you watch the recording– and here it is! Summarizing this and paying attention simultaneously is going to be tough.

So what I’m going to write today is the ideas I have and highlights that I notice while listening.

Notes and ideas

Designing this project was a huge challenge, to say the least.

  • Partitioning is critical for distribution of data (YEP! Also true elsewhere)
  • ACID properties are per partition
  • A customer’s data can be spread across machines, clusters, and regions (but the customer doesn’t have to deal with any of these, they pick regions)

The customer experience is “like driving a car. You turn the key and go. You don’t have to worry about how the engine is working.”

The connection between the application and database is typically logical, you can add and remove regions without impacting the application. You can prioritize regions in case of the possibility of failure. (They really thought about the customer here –  and it shows in the storytelling.)

Customers can come and simulate a regional failure. YES! “Making it fail” is something that we should be doing, whether in the cloud or not.

What if a partition goes down? It will go to a partition in another region? Data is always available due to replication.

Future: multi-master topology. (WHOOOOO…..)

Periodically take full backups stored in blob storage. This is for “Oops delete” scenarios. Is this the most likely cause of data loss for many companies? Dr Nehme didn’t say that, but I think maybe :)

System is designed to scale storage and throughput as the data size grows, as the throughput grows. Behind the scenes: SSD storage, automatically indexed. (How does it decide what to index I wonder?)

  • This happens on ingest
  • No schema versioning needed
  • I NEED TO KNOW MORE

Elastically scales throughput per region depending on activity.

The “currency” aka “bitcoin of Cosmos DB world” is the RU. It is rate based and adjustable. Normalized across DB operations - sophisticated costing system. Used machine learning to calculate various query costs against telemetry data collected from hardware on the data centers. (So if you’re frustrated that you don’t know what a DTU is, well….  this is some measurement like Miles Per Gallon, but exactly measuring it yourself isn’t going to be easy.)

Write optimized, latch free database engine to avoid concurrency concepts.

Failures are inevitable. What happens if communication is interrupted? CAP theorem. Do you want strong consistency/high latency, or eventual consistency/low latency?

But there are many consistency models.

Cosmos offers:

  • Strong
  • Bounded-stateless
  • Session
  • Consistent prefix
  • Eventual

Dr Nehme recommends paper: Replicated Data Consistency Explained Through Baseball

Fun opportunity for digging in here and finding creative ways to make understanding them accessible! (Speakers rejoice)

Consistency for a price. The customer decides what makes the most sense.

They offer SLAs in a new way. They had to change the entire MS SLA page…. ok, that was probably an incredibly hard thing to get done in a large company.

Modern applications aren’t static. Needs to be easy to change things. So the object model is “schema free/ schema agnostic.”

  • This is a “come as you are” approach: store the data as is
  • Example: document data model. How do you query? Humanly readable as a JSON document.
  • At a global scale, no create index/ drop index, alter index
  • Data is indexed as a tree
  • Basically a union of all the document trees
  • You can consolidate them into one, only keep the unique values that are idiosyncratic
  • Inverted index

Are the paths very large in volume? Relatively no. (WHOA)

Paper: Schema-agnoistic indexing with Azure DocumentDB / Cosmos DB.

Bw tree, optimized for SSD, lock free. Copy on write delta updates. Avoid random writes. Structure storage as log. (More details in the paper.) Delta updates treaded as insert, won’t block, won’t conflict. (Yeah, I totally don’t get this at this point.)

What about SQL queries? (One door in the house.) It’s a declarative language, it’s a bit different. Have basic components of sql processing system: compiler, code get, query runtime, index planner, many parts.

There are many other doors into the house as well.

Ideas about presenting

I really like the way this talk included complex technical slides, but brought them into a real storytelling event by using chunky marker drawings to highlight information and draw attention to concepts. This worked for both the “feel” of the talk as well as for explaining the concepts.

A lot of the slides had black backgrounds and white text, which worked very well in a large, dark keynote room.

Strong meme game.

Near the end, the talk featured a Tolstoy reference and joke. “All happy databases are alike, every unhappy database is unique.” That filled my inner humanities nerd with pure, sweet love.