文档介绍:Data Structures and Algorithms for
Big Databases
Michael A. Bender Bradley C. Kuszmaul
Stony Brook & Tokutek MIT & Tokutek
Big data problem
data ingestion
queries + answers
oy vey
365
42
???
???
???
???
data indexing query processor
Important and universal problem.
2 Hot topic.
For on-disk data, one sees funny tradeoffsBig data in theproblem speeds
of data ingestion, query speed, and freshness of data.
data ingestion
queries + answers
oy vey
365
42
???
???
???
???
data indexing query processor
Important and universal problem.
2 Hot topic.
Funny tradeoff in ingestion, querying, freshness
•“I'm trying to create indexes on a table with 308 million rows. It took ~20
minutes to load the table but 10 days to build indexes on it.”
‣ MySQL bug #9544
•“Select queries were slow until I added an index onto the timestamp field... • Typical record of all kinds of metadata is < 150 bytes.
Adding the index really helped our reporting, BUT now the inserts are taking • Different parts of metadata are accessed separately.
forever.”
‣ Comment on
•“They indexed their tables, and indexed them well,
And lo, did the queries run quick!
But that wasn’t the last of their troubles, to tell–
Their insertions, like molasses, ran thick.”
‣ Not from Alice in Wonderland by Lewis Carroll
queries +
answers
???
42
data
ingestion
query processor
3 data indexing Don’t Thrash: How to Cache Your Hash in Flash
Funny tradeoff in ingestion, querying, freshness
•“I'm trying to create indexes on a table with 308 million rows. It took ~20
minutes to load the table but 10 days to build indexes on it.”
‣ MySQL bug #9544
•“Select queries were slow until I added an index onto the timestamp field... • Typical record of all kinds of metadata is < 150 bytes.
Adding the index really helped our reporting, BUT now the inserts are taking • Different parts of metadata are accessed separately.
forever.”
‣