CS312 - Spring 2012 - Class 8
administrative
- CS talks tomorrow starting at 12:35
- go/codegolf
databases
- what is a database?
- data
- a collection of data
- stored on disk
- in a structured manner
- interface
- a well-defined way for inserting, deleting and querying the data
- why are they useful?
- structured data shows up in many places
- highly optimized to answer queries on the structured data
- deals with data that is too big to fit into memory
- centralized location for data
- often allow for multiple access and will maintain data consistency
- what aren't they good at?
- free formed querying
- still on disk, so can be slow for some applications
- many different kinds of databases
- MySQL
- Oracle
- DB2
- SQLite
SQL
- structured query language
- programming language for interacting/querying databases
- supported by most common databases
- there is a standard
- however, some slight variation from database to database
SQLite
- open source
- you can grab your own copy:
http://www.sqlite.org/
- operates via a single file on disk
- comes installed with many linux/unix/mac distributions
- upsides
- low barrier to entry
- easy setup
- downsides
- doesn't have all the performance optimization of mysql and others
- may not fare as well with multiple processes
- no server model
- for now, this is what we'll use, though most of the commands we'll examine are portable across databases
tables
- a database consists of one or more tables
- a table contains a collection of information with shared attributes
- a table contains columns
- columns indicate data types
- for example, we might have a table for students with columns
- id
- first name
- last name
- address
- city
- gpa
- ...
- the columns dictate the form of the data that will be stored in this table
- columns have names
- columns have types (i.e. the type of data that is stored)
- may also have other associated with them
- what the default value is
- whether or not a value is required
- tables contain data
- an entry in a table is called a record
- each record consists a value for each column in the table
- look at friends.db in
SQL code
- we can start sqlite by calling it with a database name on the command-line
$ sqlite3 friends.db
- just like the command-line prompt and the irb shell, we can issue commands interactively from there
sqlite>
- we can enter queries/commands to the tab
- inside this database are multiple tables one of which is called friends
- the friends table consists of the following columns
- an id
- last_name
- first_name
- email
- and a profession_id (more on this soon)
examining the table entries
- all examples were run on friends.db from
SQL code
- there are many "special" commands for sqlite that are non-standard for SQL
- in sqlite they all start with a period, e.g. ".exit" exits
-
http://www.sqlite.org/sqlite.html
- we can find out what tables are associated with a database using the .tables command
sqlite> .tables
friends professions
- this example has two different tables
- if we want to know how a given table is structured, we can use the .schema command
- called by itself, we get the schema (i.e. table definition) for all tables
sqlite> .schema
CREATE TABLE friends (
friend_id INTEGER PRIMARY KEY AUTOINCREMENT,
last_name TEXT NOT NULL,
first_name TEXT NOT NULL,
email TEXT NOT NULL DEFAULT '',
prof_id INTEGER /* foreign key */
);
CREATE TABLE professions (
prof_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL default ''
);
- or you can call it and pass it a particular table name
sqlite> .schema friends
CREATE TABLE friends (
friend_id INTEGER PRIMARY KEY AUTOINCREMENT,
last_name TEXT NOT NULL,
first_name TEXT NOT NULL,
email TEXT NOT NULL DEFAULT '',
prof_id INTEGER /* foreign key */
);
- the schema command gives a lot of information about a table
- it tells use the columns by name:
friend_id
last_name
first_name
email
prof_id
- it also tells us information about what type of data is stored in that column
- sqlite has a limited number of "data types" compared to other databases
-
http://sqlite.org/datatype3.html
- integer
- text
- numeric
- real
- none
- in fact, everything is basically stored as text in an sqlite database
- other database schemas allow you much more control over the types
- this can be useful for type checking
- and can result in improved performance
- but it takes more work and finesse
- it also tells us some other information
- NOT NULL indicates that we must specify a value (instead of allowing it to be NULL)
- DEFAULT allows us to specify a default value
- PRIMARY KEY
- requires uniqueness (though you can also use UNIQUE)
- unique keys allow for much faster indexing
- cannot be NULL
- indicates to other people looking at your schema that this is the main thing to index off of
- AUTOINCREMENT
- if a NULL value is specified, the next integer in a sequence will be used
select command
- all examples below were run on friends.db from
SQL code
- the select command allows us to select records from the database
- in its most basic form:
SELECT [column_name(s)] FROM [table]
sqlite> select first_name from friends;
Joe
Phoebe
Tony
Jennifer
- NOTE: sql is case insensitive
- good form usually capitalized keywords (e.g. SELECT), however, it is not required
- I often won't do it when typing interactively, but will if I put it in code or in a document
- we can select multiple columns by separating them by a ,
sqlite> select first_name, last_name from friends;
Joe|Smith
Phoebe|Lee
Tony|Speaker
Jennifer|Johnson
sqlite> select last_name, first_name from friends;
Smith|Joe
Lee|Phoebe
Speaker|Tony
Johnson|Jennifer
- entries are separated by a | (though this can be changed)
- and are ordered based on how they were ordered in a query
- we can select all of the columns using *
sqlite> select * from friends;
1|Smith|Joe|jsmith@woohoo.com|1
2|Lee|Phoebe|superchef@aol.com|4
3|Speaker|Tony|ts@xyz.edu|5
4|Johnson|Jennifer|jenniferj@medny.net|2
- if you want to toggle the column headers on or off for your select queries use .header on and .header off
sqlite> .header on
sqlite> select * from friends;
friend_id|last_name|first_name|email|prof_id
1|Smith|Joe|jsmith@woohoo.com|1
2|Lee|Phoebe|superchef@aol.com|4
3|Speaker|Tony|ts@xyz.edu|5
4|Johnson|Jennifer|jenniferj@medny.net|2
sqlite> .header off
sqlite> select * from friends;
1|Smith|Joe|jsmith@woohoo.com|1
2|Lee|Phoebe|superchef@aol.com|4
3|Speaker|Tony|ts@xyz.edu|5
4|Johnson|Jennifer|jenniferj@medny.net|2
- special commands in sqlite are inticated by a prefixed .
additional modifiers to SELECT queries
- WHERE clause
- allows us to set criterion for those that we select
- follows the basic select statement
sqlite> select * from friends where first_name='Tony';
3|Speaker|Tony|ts@xyz.edu|5
sqlite> select first_name, last_name from friends where first_name='Tony';
Tony|Speaker
sqlite> select * from friends where first_name <> 'Tony';
1|Smith|Joe|jsmith@woohoo.com|1
2|Lee|Phoebe|superchef@aol.com|4
4|Johnson|Jennifer|jenniferj@medny.net|2
- we can use a variety of operators
= (equal)
<> (not equal)
> (greater than)
< (less than)
>=
<=
BETWEEN (between a range, range is separated with "and", e.g. "between 2 and 3")
LIKE (pattern search)
IN (values in a set of values, specified by a set, e.g. "(2, 3)")
- we can use AND and OR to combine multiple filtering criterion
sqlite> select * from friends where first_name='Tony' or email like '%woohoo.com';
1|Smith|Joe|jsmith@woohoo.com|1
3|Speaker|Tony|ts@xyz.edu|5
- % is a wildcard like * that matches anything
- '_' matches just one character
sqlite> select * from friends where first_name='Tony' and friend_id < 3;
sqlite>
- LIMIT
- we can limit the number of results we want to see using the LIMIT command
sqlite> select * from friends limit 2;
1|Smith|Joe|jsmith@woohoo.com|1
2|Lee|Phoebe|superchef@aol.com|4
sqlite> select * from friends where friend_id <= 3 limit 1;
1|Smith|Joe|jsmith@woohoo.com|1
- this can be useful for larger databases when you want to make sure your query is right
- sometimes you also just want a few results
- ORDER BY
- we can order the results by any column we want
sqlite> select * from friends order by first_name;
4|Johnson|Jennifer|jenniferj@medny.net|2
1|Smith|Joe|jsmith@woohoo.com|1
2|Lee|Phoebe|superchef@aol.com|4
3|Speaker|Tony|ts@xyz.edu|5
sqlite> select * from friends order by first_name desc;
3|Speaker|Tony|ts@xyz.edu|5
2|Lee|Phoebe|superchef@aol.com|4
1|Smith|Joe|jsmith@woohoo.com|1
4|Johnson|Jennifer|jenniferj@medny.net|2
sqlite> select * from friends where email like '%.com' order by last_name;
2|Lee|Phoebe|superchef@aol.com|4
1|Smith|Joe|jsmith@woohoo.com|1
- they are ordered in ascending order by default
- the order is determined by the column type
- strings/characters are alphabetical
- numbers are numerical
- etc
- if you want it to be in descending order, add DESC after the column name
- you can put multiple columns separated by a ',' in which case ties are broken by the next column in the list
interacting with multiple tables
- so far, all the queries that we've looked at have only dealt with a single table
- often we split up the data across multiple tables
- in our friends.db, we also have a professions table
sqlite> .header on
sqlite> select * from professions;
prof_id|name
1|Software Developer
2|Medical Doctor
3|Financial Analyst
4|Chef
5|Professor
- the two tables are linked to eachother via the prof_id index
- the friends table contains an prof_id index corresponding to the prof_id index in the professions table
- why separate this data across multiple tables? why not just put this all in one table?
- more space efficient
- just have to store an id rather than all the data in the friends table
- easier to manage
- in this case, we only have a the name of a profession
- but we may have more data associated with a profession (e.g. salary, degree required, etc.)
- we don't want to append all this information in the friends table
- this information is associated with the profession
- easier/faster to interact with
- we can query each of these tables independently if we want
- shared information
- there may be other tables besides friends that also index into the professions table
JOIN
- the JOIN command allows us to merge two tables on a shared index
SELECT column_name(s)
FROM table_name1
JOIN table_name2
ON table_name1.column_name=table_name2.column_name
- if the column names are ambiguous (i.e. to tables share the same column name), then you can use table.column to specify a particular table entry
sqlite> select * from friends join professions on friends.prof_id=professions.prof_id;
friend_id|last_name|first_name|email|prof_id|prof_id|name
1|Smith|Joe|jsmith@woohoo.com|1|1|Software Developer
2|Lee|Phoebe|superchef@aol.com|4|4|Chef
3|Speaker|Tony|ts@xyz.edu|5|5|Professor
4|Johnson|Jennifer|jenniferj@medny.net|2|2|Medical Doctor
sqlite> select first_name,last_name,name from friends join professions on friends.prof_id=professions.prof_id;
first_name|last_name|name
Joe|Smith|Software Developer
Phoebe|Lee|Chef
Tony|Speaker|Professor
Jennifer|Johnson|Medical Doctor
sqlite> select friends.first_name, friends.last_name, professions.name from friends join professions on friends.prof_id=professions.prof_id;
first_name|last_name|name
Joe|Smith|Software Developer
Phoebe|Lee|Chef
Tony|Speaker|Professor
Jennifer|Johnson|Medical Doctor
- we can also combine this with any of the modifiers for select we say above
sqlite> select first_name,last_name,name from friends join professions on friends.prof_id=professions.prof_id where name='Chef';
first_name|last_name|name
Phoebe|Lee|Chef
- AS: it can be confusing when we start to join multiple tables. the AS modifier allows you to change the name of how the header is displayed
sqlite> select first_name as "First", last_name as "Last", name as "Profession" from friends join professions on friends.prof_id=professions.prof_id where name='Chef';
First|Last|Profession
Phoebe|Lee|Chef
inserting into a table
- so far, all we've done is query existing data in the table
- we can both insert, update and delete data into the tables
- INSERT: two basic forms:
- specify all of the values
INSERT INTO table_name VALUES (value1, value2, value3,...)
- specify particular column values (assuming a default is allowed for that column)
INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,...)
sqlite> insert into friends (last_name, first_name, email, prof_id) values("Z", "Bobby", "bobbyz@phoenix.edu", 5);
sqlite> select * from friends;
friend_id|last_name|first_name|email|prof_id
1|Smith|Joe|jsmith@woohoo.com|1
2|Lee|Phoebe|superchef@aol.com|4
3|Speaker|Tony|ts@xyz.edu|5
4|Johnson|Jennifer|jenniferj@medny.net|2
5|Z|Bobby|bobbyz@phoenix.edu|5
- NULL
- NULL represents no value in sql (like a null pointer)
- our friends table has 5 values
- the first is an friends_id
- we could specify this ourselves, but better to let sqlite just take care of it for us
- we can say we don't have a value by specifying null
sqlite> insert into friends values(NULL, "Z", "Bob", "bob@aol.com", 2);
sqlite> select * from friends;
friend_id|last_name|first_name|email|prof_id
1|Smith|Joe|jsmith@woohoo.com|1
2|Lee|Phoebe|superchef@aol.com|4
3|Speaker|Tony|ts@xyz.edu|5
4|Johnson|Jennifer|jenniferj@medny.net|2
5|Z|Bobby|bobbyz@phoenix.edu|5
6|Z|Bob|bob@aol.com|2
- if we didn't include the NULL we'd get an error, since we need to specify 5 things
DISTINCT
- sometimes we only want those entries that are unique
- the DISTINCT keyword allows us to specify that
sqlite> select last_name from friends;
last_name
Smith
Lee
Speaker
Johnson
Z
Z
sqlite> select distinct last_name from friends;
last_name
Johnson
Lee
Smith
Speaker
Z