文档介绍:bigdatabigdata ™™
FlexibleFlexible
ReliableReliable
AffordableAffordable
WebWeb--puting.
bigdata™ 1 SYSTAP™, LLC
Presented to © 2007-2008 All Rights Reserved
OSCONOSCON 20082008
• Background
– How bigdata relates to other efforts.
• Architecture
– Some examples
• RDF DB
– Some examples
• Web Processing
– Using map/reduce and RDF together
bigdata™ 2 SYSTAP™, LLC
Presented to © 2007-2008 All Rights Reserved
ScaleScale--outout SystemsSystems
• Google has published several inspiring papers
that have captured a huge mindshare.
• Competition has emerged among “cloud as
service” providers:
– E3, S3, GAE, BlueCloud, etc.
• An increasing number of open source efforts
provide puting frameworks:
– Hadoop, CouchDB, Hypertable, Zookeeper, mg4j,
Cassandra, etc.
bigdata™ 3 SYSTAP™, LLC
Presented to © 2007-2008 All Rights Reserved
ScaleScale--outout SystemsSystems
• Distributed file systems
– GFS, S3, HDFS
• Map / reduce
– Lowers the bar for puting
– Good for data locality in inputs
• ., documents in, hash-partitioned full text index out.
• Sparse row stores
– High read / write concurrency using atomic row
operations
– Basic data model is
•{ primary key, column name, timestamp } : { value }
bigdata™ 4 SYSTAP™, LLC
Presented to © 2007-2008 All Rights Reserved
SemanticSemantic WebWeb
• Fluid schema
• Graph structured data
• Restricted inference (RDFS+)
• High level query (SPARQL)
• Declarative schema alignment
• owl:equivalentClass; owl:equivalentProperty; owl:sameAs
• Mashups of unstructured, semi-structured, and
structured data
bigdata™ 5 SYSTAP™, LLC
Presented to © 2007-2008 All Rights Reserved
ChallengesChallenges
• Graph structured data
– Poor data locality
• Inference and high-level query (SPARQL)
– JOINs, multiple access paths, increased
concurrency control requirements
bigdata™ 6 SYSTAP™, LLC
Presented to © 2007-2008 All Rights Reserved
bigdatabigdata ™™
architecturearchi