Chapter 6. ER-Diagram?

ER Diagram -(Reduction)-> Relation Schemas

Entity Set

  • Strong entity set
    • (entity set).pk => (schema).pk
    • composite attribute: flatten (separtate attribute fore each component)
    • multivalued attribute: new schema $R$ w/ (ES).pk + (Multivalued attribute)
      • e.g. INST_PHONE = (**ID**, **phone_number**)
student(**ID**, name, tot_cred)
  • Weak entity set : (entity set).pk/discriminator => (schema).pk
section(*course_id*, *sec_id*, *semester*, *year*)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     COURSE     β”‚                      β”‚    SECTION     β”‚
β”‚                β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚                β”‚
β”‚ **course_id**  │◀───  sec_course  β”œβ”€β”€β”€β”‚    *sec_id*    β”‚
β”‚     title      β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚   *semester*   β”‚
β”‚    credits     β”‚                      β”‚     *year*     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Representing Relationship Set

  • pK: union of pks participating in relationship set
  • many-many: create new schema
ADVISOR = (s_id, i_id)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   INSTRUCTOR   β”‚                      β”‚    STUDENT     β”‚
β”‚                β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚                β”‚
β”‚     **ID**     │────   advisor    β”œβ”€β”€β”€β”‚     **ID**     β”‚
β”‚      name      β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚      name      β”‚
β”‚     salary     β”‚                      β”‚    tot_cred    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • many-one: add (pk of "one" side) attribute to "many" side -> foreign key constraint
  • one-one: either side can be chosed
  • partial participation: may contain null values

Redundancy of Schemas

TBD

How many tables are required? : Many-Many relationship, multivalued attribute require new table

Extended E-R Feature

Specialization: Top-down / Generalization: Bottom-up

                               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                               β”‚     PERSON     β”‚
                               β”‚                β”‚
                               β”‚     **ID**     β”‚
                        β”Œβ”€β”€β”€β”€β”€β–Άβ”‚      name      │◀─────┐
                        β”‚      β”‚     street     β”‚      β”‚
                        β”‚      β”‚      city      β”‚      β”‚
                        β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚    EMPLOYEE    β”‚             β”‚    STUDENT     β”‚
         β”Œβ”€β”€β”€β”€β–Άβ”‚                │◀────┐       β”‚                β”‚
         β”‚     β”‚     salary     β”‚     β”‚       β”‚  tot_credits   β”‚
         β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   INSTRUCTOR   β”‚           β”‚   SECRETARY    β”‚
β”‚                β”‚           β”‚                β”‚
β”‚      rank      β”‚           β”‚ hours_per_week β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Overlapping: EMPLOYEE / STUDENT
  • Disjoint: INSTRUCTOR / SECRETARY

Attribute Inheritance: lower-level ES inherits higher-level: LOWER IS A HIGHER

  • Specialization to Schema
    • method 1: requires access to two tables
      • schema for HIGH entity
      • schema for LOW entity: HIGH.pk + local attribute
    person: ID, name, street, city
    stduent: ID, tot_cred
    employee: ID, salary
    
    • method 2: redundant infos
      • each ES contains local and inherited attributes
    person: ID, name, street, city
    stduent: ID, name, street, city, tot_cred
    employee: ID, name, street, city, salary
    

Completeness Constraint (Total/Partial generalization)

  • Total generalization: Each HIGH must belong to some LOW
  • Partial is default

Aggregation

  • How to express relationships among relationships?
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚                     β”‚   project   │──┐                  β”‚
β”‚                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                  β”‚
β”‚                            β”‚         β”‚                  β”‚
β”‚                            β–Ό         β”‚                  β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚ β”‚ instructor  │────▢│ proj_guide  │◀─┼───│   student   β”‚β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚        β”‚                             β”‚          β”‚       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚          β”‚
         └───────────▢│  eval_for   β”‚β—€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β–²
                             β”‚
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚ evaluation  β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

> Relationship proj_guide is redundatant! => aggregate (proj_guide)
  • pk of aggregated relationship + pk or associated ES + any descriptive attributes

Big Data

Motivation

very large volumes of data being colleteced

  • Volume: larger amount of data stored
  • Velocity: higher rate of insertion
  • Variety: many types of data,

Big data query: high scalability

Big Data storage system

Distributed file system
> large collection of machiines, but gives single view to the clients
  • e.g. Hadoop File System (HDFS)
    • NameNode

      • filename -> list of block identifiers
    • DataNode

      • block identifier -> physical location
    • Data coherency

      • write once read many
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        NN         β”‚
β”‚    fn -> block    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        DN         β”‚
β”‚   block -> addr   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

'##### Key-value storage

  • Partitioned records across multiple machines
  • queries routed by system to appropriate machine, replica consistency
  • replicated data among machines (better availability)a
  • What they store?
    • bytes (with associated keys)
    • wide-table (attribute name + associated key)
    • json (Document stores)
  • supporting operations
    • put
    • get
    • delete
  • optional operations
    • range query
    • query on non-key attributes
  • kvs's are note full-fledge DBs! (NoSQl)
parallel and distributed DBs
  • run on multiple machines (cluster): store and process query on large machines

Replication and consistency

  • Availability: system can run even if parts have failed
  • Consistency: system operations sees the latest version

MapReduce

Abstracting the issues of distributed and parallel environment from programming (functional programing?)

  • Input: "One a penny, two a penny, host cross buns"
  • Processing(split): ("One a penny", "two a penny," "host cross buns")
  • map
  • reduce
map(k1, v1) -> list(k2, v2)
e.g. (void, textline: string) -> (first: string, count: int)

reduce(k2, list(v2)) -> list(k3, v3)
e.g. (first: string, counts: []int) -> (first: string, total: int)
// Map function
public static 
class Map extends Mapper<LongWritable, Text, Text, IntWritable>

// Map function

Algebraic Operations