class: title, smokescreen, shelf, bottom, no-footer background-image: url(images/iotshadow.png) # 181U Spring 2020 ## Persistent Data <style> h1 { border-bottom: 8px solid rgb(32,67,143); border-radius: 2px; width: 90%; } .smokescreen h1 { border-bottom: none; } .small.remark-slide-content.compact {font-size:1.4rem} .smaller.remark-slide-content.compact {font-size:1.1rem} .small-code.remark-slide-content.compact code {font-size:1.0rem} .very-small-code.remark-slide-content.compact code {font-size:0.9rem} .line-numbers{ /* Set "line-numbers-counter" to 0 */ counter-reset: line-numbers-counter; } .line-numbers .remark-code-line::before { /* Increment "line-numbers-counter" by 1 */ counter-increment: line-numbers-counter; content: counter(line-numbers-counter); text-align: right; width: 20px; border-right: 1px solid #aaa; display: inline-block; margin-right: 10px; padding: 0 5px; } </style> --- layout: true .footer[ - 181U - See acknowledgements ] --- class: compact # Agenda * Types of Data * Data Characteristics * Relational Data & SQL * NoSQL * RealTime databases * Digital Shadows (Caching) * Data Visualization for IoT --- class: compact # Types of data * Configuration * Status : e.g. device health * Sensors - periodic (scheduled) - episodic (e.g. on change) --- class: compact # Data Rates * Low - configuration - status - alarms - episodic for slow-moving measurements (e.g. temperature) * Medium - environment measurements * High - audio - images - video --- class: compact, small # Relational Data ![](images/relation.png# w-40pct fr) * The relational model for organizing data defined by Edgar Codd in 1969 * The basis for most large database systems (Oracle, MySql, Postgress, sqlite) * The model provides a structure and language consistent with first-order predicate logic - First-order logic uses quantified variables over non-logical objects and allows the use oo sentences that contain variables - example: “there exists x such that x is Socrates and x is a man” * All data is represented in terms of tuples, grouped into relations * The purpose of the relational model is to provide a declarative method for specifying data and queries - users directly state what information the database contains and what information they want from it. (https://en.wikipedia.org/wiki/Relational_model) --- class: compact,small # Relations in a Database ![](images/relation-example.png# w-50pct fr) * the STORE table contains three columns: - Store_key - City - Region * the PRODUCT table contains three columns: - Product key - Description - Brand * the SALES_FACT table contains five columns: - Store_key - Product_key - Sales - Cost - Profit --- class: compact,small # Relations in a Database ![](images/relation-example.png# w-50pct fr) * Relational databases are organized as sets for relations (aka tables). * The data in these tables are accessed through queries * This example database has three tables – stores, products, and sales. * Example queries - How many copies of toy story were sold in the EAST region ? - What product had the highest profit in New York ? * Notice that each table has an “key” attribute and within a table, each row (tuple) has a unique value for that key (note, the sales fact table should have a separate key) * Notice that the tuples in other tables may reference that key. https://docs.oracle.com/cd/B14099_19/bi.1012/b13915/i_rel_chapter.htm --- class: compact, small # Example Design ![](images/customertable.png# w-50pct fr) * Customer: (**Customer ID**, Tax ID, Name, Address, City, State, Zip, Phone, Email, Sex) * Order: (**Order No**, *Customer ID*, Invoice No, Date Placed, Date Promised, Terms, Status) * Order Line: (*Order No*, **Order Line No**, *Product Code*, Qty) * Invoice: (**Invoice No**, *Customer ID*, *Order No*, Date, Status) * Invoice Line: (*Invoice No*, **Invoice Line No**, *Product Code*, Qty Shipped) * Product: (**Product Code**, Product Description) * The "primary keys" are in **bold** and "foreign keys" in *italic* * An example query might ask for all invoices to a particular customer containing a particular product * Separating data in this way minimizes redundant information, enables maintenance of the database (for example, changing the address of a customer is a localized change), and makes execution of queries more efficient https://en.wikipedia.org/wiki/Relational_model --- class: compact,smaller,col-2,very-small-code # Flat vs. Relational Tables ``` City,Edition,Sport,Discipline,Athlete,NOC,Gender,Event,Event_gender,Medal Athens,1896,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100m freestyle,M,Gold Athens,1896,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100m freestyle,M,Silver Athens,1896,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100m freestyle for sailors,M,Bronze Athens,1896,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100m freestyle for sailors,M,Gold ``` Athletes ``` id,name,gender,nationality 1,"ÖRSTED, Hans-Henrik",M,DEN 2,"ÖRTEGREN, Ruben",M,SWE 3,"ÖRVIG, Erik",M,NOR ``` Countries ``` id,country AFG,Afghanistan ALB,Albania ALG,Algeria ``` Events ``` id,name,sport,discipline 1,- 47.63kg (flyweight),Boxing,Boxing 2,- 47.6kg (light-flyweight),Wrestling,Wrestling Free. 3,- 48 kg,Judo,Judo ``` Editions ``` id,year,city 1,1896,Athens 2,1900,Paris 3,1904,St Louis ``` --- class: compact # SQL (Structured Query Language) * One of the top five programming languages * Based upon E.F. Codd's relational model * Provides a way to manage data that is hardware independent * Two language representations * Relational algebra , is a family of algebras with a well-founded semantics used for modelling the data stored in relational databases, and defining queries on it. * Relational calculus provides declarative language for specifying queries (e.g. SQL) * Codd proved that they are equivalent --- class: compact # SQL (Structured Query Language) The relational algebra might suggest these steps to retrieve the phone numbers and names of book stores that supply Some Sample Book: 1. Join book stores and titles over the BookstoreID. 2. Restrict the result of that join to tuples for the book Some Sample Book. 3. Project the result of that restriction over StoreName and StorePhone. The relational calculus would formulate a descriptive, declarative way: 1. Get StoreName and StorePhone for book stores such that there exists a title BK with the same BookstoreID value and with a BookTitle value of Some Sample Book. --- class:compact # SQL Three parts to a Query (only first two are required) 1. SELECT 2. FROM 3. [WHERE] * SELECT specifies what you want as output * FROM specifies the input * WHERE specifies conditions that must be true on output (filter) --- class: compact,small,very-small-code,hljs-tomorrow-night-eighties,line-numbers # SQL Join on Orders, Customers | OrderID | CustomerID | OrderDate | | -------- | ---------- | --------- | | 10308 | 2 | 1996-09-18 | | 10309 | 37 | 1996-09-19 | | 10310 | 77 | 1996-09-20 | | CustomerID | CustomerName | ContactName | Country | | ---------- | ------------ | ------------- | -------- | | 1 | Alfreds Futterkiste | Maria Anders | Germany | | 2 | Ana Trujillo Emparedados y helados | Ana Trujillo | Mexico | | 3 | Antonio Moreno Taquería | Antonio Moreno | Mexico | ```sql SELECT Orders.OrderID, Customers. CustomerName, Orders.OrderDate FROM Orders INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID; ``` | OrderID | CustomerName | OrderDate | |---------|--------------|------------| |10308 | Ana Trujillo Emparedados y helados | 9/18/1996 | (https://www.w3schools.com/sql/sql_ref_join.asp) --- class: compact # SQL Database Properties (ACID) * Atomicity -- transactions are atomic -- all or none * Consistency -- database remains in a consistent state after transation * Isolation -- transactions are executed *as if* they are serial * Durability -- database preserves latest updates even if the system fails or restarts These properties are expensive to satisfy -- especially in a distributed system with high update rate. --- class: compact # NoSQL * Sacrifices consistency for availability, partition tolerance, speed * Uses key/value data organization rather than relational tables * Scale to clusters of machines * Examples - Google's Bigtable - Amazon's DynamoDB --- class: compact # Time Series Data * Scenarios 1. Smart electric meter records electricity consumption data per our and generates billing data in real time 2. A windmill generates real-time data of rotational speed and energy production 3. A cpu monitor produces temperature, workload, and utilization data * A time series data set has these characteristics 1. New data are stored and recorded as entries 2. Data are stored in chronological order 3. All data are timestamped https://dzone.com/articles/time-series-database-vs-common-database-technologi --- class: compact,small # Time Series Databases * A time series database (TSDB) is a database optimized for time-stamped or time series data. * Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. * Features of timeseries databases 1. Create, Read, update, manage time-value pairs 2. Calculate, interpolate, and filter time-value pairs. * Example: With a time series database, it is common to request a summary of data over a large time period. - This requires going over a range of data points to perform some computation like a percentile increase this month of a metric over the same period in the last six months, summarized by month. https://www.influxdata.com/time-series-database/ https://en.wikipedia.org/wiki/Time_series_database --- class: compact # Example Time Series Data ![](images/2020-02-17-14-49-15.png# w-40pct) ![](images/space.png# w-2-12th) ![](images/2020-02-17-14-50-49.png# w-40pct) [Time Series White Paper from InfluxDB](https://www.influxdata.com/time-series-technical-paper-2/?utm_source=google&utm_medium=cpc&utm_campaign&utm_term=influxdb&utm_content&gclid=Cj0KCQiAkKnyBRDwARIsALtxe7hqxAzBlPoAT6JXj8-h01BbCQX5OTHiZmCHjEWvIQfPWQkw_1p6jaQaAi1eEALw_wcB#form) --- class: compact # Some Leading Time Series databases * [InfluxDB](https://www.influxdata.com/?utm_source=google&utm_medium=cpc&utm_campaign&utm_term=influxdb&utm_content&gclid=Cj0KCQiAkKnyBRDwARIsALtxe7hsbB8-YHD1iEidBcQqWkLva6f_86hD5w_WeFAsdYaK8qrMAdAJGnkaAjDGEALw_wcB) * [Graphite](https://graphiteapp.org/) * [Prometheus](https://prometheus.io/) --- class: compact,hljs-tomorrow-night-eighties # InfluxDB Data Model * The InfluxData stack organizes time series in a structured format. * At the top level is a measurement name, - followed by a set of key/value pairs called tags that describe the metadata, - followed by key/value pairs of the actual values called fields. - Finally, there is a timestamp for the set of values. InfluxDB has a line protocol for sending time series data ```plaintext <measurement name>,<tag set> <field set> <timestamp> ``` For example CPU metrics ```plaintext cpu,host=serverA,region=uswest idle=23,user=42,system=12 1549063516 ``` --- class: compact # Visualization is a key application for Time Series data ![](images/space.png# w-20pct) ![](images/infludb1.png# w-60pct) --- class: compact # InfluxDB's web app is Chronograf ![](images/space.png# w-20pct) ![](images/chronograf.png# w-60pct) --- class: compact # Grafana is another open source visualization tool <iframe src="https://grafana.com/video/visualize.mp4" height=400px width=600px></iframe> https://play.grafana.org/d/000000012/grafana-play-home?from=1581975730832&to=1581976882123 --- class: compact # Grafana * Broad support for various databases - graphite - influxdb - prometheus - AWS cloudwatch - mysql - Amazon Timestream - ... * Graph plugins - Graph - Heatmap - Piechart - Radar graphs - ... --- class: compact # Device Shadows (AWS) * Device shadows -- For low frequency events such as configuration or status - provide most recent information if the network is partitioned - allows system to contiue if there is an intermittent failur * At the implementation level, a shadow service uses a JSON document to hold data and metadata associated with each device * Uses MQTT topics to facilitate communication between applications and devices --- class: compact,small,very-small-code,hljs-tomorrow-night-eighties,col-2 # Shadow Example ```javascript { "state": { "desired": { "color": "RED", "state": "STOP" }, "reported": { "color": "GREEN", "engine": "ON" }, "delta": { "color": "RED", "state": "STOP" } }, "metadata": { "desired": { "color": { "timestamp": 12345 }, "state": { "timestamp": 12345 }, "reported": { "color": { "timestamp": 12345 }, "engine": { "timestamp": 12345 } }, "delta": { "color": { "timestamp": 12345 }, "state": { "timestamp": 12345 } } }, "version": 17, "timestamp": 123456789 } } ``` --- class: compact # Azure Device Twin ![](images/device-twin.png# w-50pct fr) * A device twin contains desired properties, reported properties, and tags. - A desired property is set by a back-end application and read by a device. - A reported property is set by a device and read by a back-end application. - A tag is set by a back-end application and is never sent to a device