# Lab 04b: Persistence

Spring 2019

The questions below are due on Thursday February 28, 2019; 10:00:00 PM.

Partners: You have not yet been assigned a partner for this lab.
You are not logged in.

Note that this link will take you to an external site (https://oidc.mit.edu) to authenticate, and then you will be redirected back to this page.
A Python Error Occurred:

Error on line 7 of python tag (line 7 of file /S19/lab04b):



The Musical Cats makes no sense until it is seen as a metaphor for databases

Goals:Today you'll (maybe) eventually be working in a team of teams (two teams, so four to five people most likely). We'll get some experience with database operations so as to add persistent storage to our server-side scripting capabilities, allowing us to create stateful web experiences. We'll bring in both of the pieces we developed over the last week to have a system that reports where you're at on campus, and which of your teammates are close by to you.

## 1) Persistent Storage

Our capabilities with scripts that we can run on the server are currently limited to code that operates based solely on user-provided inputs, and not on history. By this we mean every time you visit your final code from Lab 04A and provide the same GPS coordinates, you'll get the exact same response. We have no means of remembering previous queries, or saving posted data. In effect, our system is stateless. We can fix this, however, by providing persistent storage in the form of a database. The database will store information beyond the scope of each request_handler function call, effectively providing us with global variables.

There are many different types of databases that exist, and while we do not mean to trivialize them in saying this, they all do pretty much the same thing, which is to provide a means to store and access information efficiently1. In 6.08 we'll use SQLite2 which is a database management system. In order to interact with SQLite, we'll use SQL, which is a language used for managing databases. We will interact with SQLite and generate SQL commands in Python3 using the sqlite3 library which provides a convenient interface.

With SQLite, databases are files with the .db file extension. Within the database are tables, and within those tables are rows of data that we append to, extract from, and do whatever we want with. These files are automatically saved to the actual hard drive so if you have a Python script that messes with them in some way, and you run that Python script, after the script is finished, the changes made will live on into the future in that database, far beyond the lifespan of that Python file's runtime.

## 2) SQLite

SQLite has a pretty easy-to-use Python interface once you get the basics of it. What's nice is that it is very modular so you can create databases and move/copy them as needed. Let's make a database. You'll want to start a Python process on your local computer. In general our order of operations will be

1. import sqlite3
2. specify/create a database
3. establish a connection to the database
4. establish a cursor to the database (how we send it SQL commands)
5. do stuff with the database (via the cursor)
6. properly close connection to the database

In line with this approach, the snippet of code below will create a database called example.db and create a table within it. Create a file called db_experiment.py, place the code below in it, and run it:

import sqlite3
example_db = "example.db" # just come up with name of database

def create_database():
conn = sqlite3.connect(example_db)  # connect to that database (will create if it doesn't already exist)
c = conn.cursor()  # make cursor into database (allows us to execute commands)
c.execute('''CREATE TABLE test_table (user text,favorite_number int);''') # run a CREATE TABLE command
conn.commit() # commit commands
conn.close() # close connection to database

create_database()  #call the function!


Now in your terminal locally run (in the same directory where your Python process was) ls. You should see a file in the list called example.db. What we've done is create a database that includes one table in it. We made this table by using a SQL command. Specifically, the command used was the following:

CREATE TABLE test_table (user text,favorite_number int);


which says: "create a table called 'test_table', and have it contain two columns, one which contains a text entry called 'user' and one which contains an integer entry called 'favorite_number'". (there are other data types of entries as well). After execution and commitment, the structure of our database looks like the following:

If you happen to run the same program again you'll get:

Traceback (most recent call last):
File "db_experiment.py", line 11, in
create_database()  #call the function!
File "db_experiment.py", line 7, in create_database
c.execute('''CREATE TABLE test_table (user text,favorite_number int);''') # run a CREATE TABLE command


And this is because when the function create_database runs the second time, it clashes with the fact that a table already exists in that database called test_table. You cannot have identically named tables in the same database. You can avoid this with the following command:

CREATE TABLE IF NOT EXISTS test_table (user text,favorite_number int);


If you're wondering where the syntax/grammar of the SQL command we used came from, it is basic SQL, which has numerous tutorials online like this one. SQL applies to a lot of the SQL family of databases out there. Let's go and do more stuff with SQL in our database.

### 2.1) Inserting

OK this database is pretty useless until we have some actual information in it. Let's add some stuff to the database then! Make another Python function called insert_into_database and add it to your db_experiment.py file. Copy-paste the lines of code below into that file, save it, and run it. Totally feel free to change the name and number too.

def insert_into_database():
conn = sqlite3.connect(example_db)
c = conn.cursor()
c.execute('''INSERT into test_table VALUES ('joe',5);''')
conn.commit()
conn.close()

insert_into_database() #call this function!


When it runs you'll see nothing happen probably...and that's good. In looking at the code, you should see a very similar structure to the first one, except the SQL command we run is the INSERT operation. In particular:

INSERT into test_table VALUES ('joe',5);


which says: "Insert into test_table, the values (in order with the table structure) of 'joe', then 5". This will result in the following new state to our database:

### 2.2) Selecting

It would now be great to see what is in the database, and/or have the ability to extract information from it. For this we will need the SELECT operation. So, let's create another function called lookup_database with the structure shown below. Copy-paste the code below into the file we've been working in, comment out previous function calls, save it, and run it.

def lookup_database():
conn = sqlite3.connect(example_db)
c = conn.cursor()
things = c.execute('''SELECT * FROM test_table;''').fetchall()
for row in things:
print(row)
conn.commit()
conn.close()

lookup_database()#call it!


When you run it you should see (assuming you've only run the file once), and taking into account any name changes you might have made.

('joe', 5)


And this should make sense, since what we've done is run the SQL command:

SELECT * FROM test_table;


which says: "Select everything from test_table." (The * symbol is the wildcard, meaning 'everything') By default in SQLite, execution of a SELECT query will return a SQLite iterator, and you can use either the fetchone() or fetchall() methods to get the first returned value from the query or all returned values in a Python list, respectively, allowing us the ability to manipulate them as needed. Alternatively, we can use the iterator directly in a for loop, just like we do for the range() operator—if you just need to iterate over all the elements!

If you just want to see the favorite number of the users in your database, instead of using * you could specify the columns you want to see such as:

SELECT favorite_number FROM test_table;


### 2.3) More Inserting

When we inserted data into our database above, we had a pre-made (hard-coded) Python string that we used for our SQL command. In particular we did c.execute('''INSERT into test_table VALUES ('joe',5);''') which runs the SQL command INSERT into test_table VALUES ('joe',5);. What if we'd instead had some values we generated programatically that we wanted to be inserted into the database? For example, if we had a Python variable user and another variable favorite_of_user. How can we incorporate them into the SQL string we need to generate?

Well as we know in Python we could create a string from other datatypes by casting and some concatenation action. For example we could do this (the old-school way):

sql_command = '''INSERT into test_table VALUES ('%s',%d);''' %(user,favorite_of_user)
c.execute(sql_command)


or if we do it the newer Python way like this:

sql_command = '''INSERT into test_table VALUES ('{}',{});'''.format(user,favorite_of_user)
c.execute(sql_command)


While not apparent or a threat when running these databases on a local machine, this approach to building SQL commands can be a huge security flaw when our database is living on a server and the Python variables we're using to construct our SQL command are coming from user-specified values from GET or POST values. If a visitor to your site finds a way to manipulate the user variable to itself be a SQL command they can cause bad things to happen such as damage or destruction of your database. This is bad and is known as a SQL-Injection Attack and is the technical angle behind the comic below. Now you can say, "I get it."

In order to protect ourselves against this, inputs coming from "the outside" (anything you do not have complete control over such as user-specified inputs) should be sanitized and this can be done using the sqlite3 functionality shown below where insertion points of outside values are demarcated by question marks, and a following tuple provides the values to be inserted (in order). The SQLite Python library checks these values to make sure nothing SQL-like is in them, thus preventing injection:

c.execute('''INSERT into test_table VALUES (?,?);''',(user,favorite_of_user))


You can use SQL command syntax for all SQL queries we need to generate (creation of tables, selection of rows, etc...)

The second argument to execute is a tuple, so if you have only one value to insert, you still need to make sure your tuple is of the form (value,) (yes that is a trailing comma there), otherwise you'll get an error. Please pay attention for this! I've wasted hours on this before and it is the least satisfying of bugs to figure out. You do not feel better after figuring this one out.

### 2.4) Moving On

OK, I'm getting restless. Let's insert some more things into our database. Make a fourth Python function called, lotsa_inserts, copy-paste the code below into it, and run it. What this script does is create 100,000 random usernames and favorite numbers and inserts them one-after-the-other into the database:

#maybe put imports at top like one should do:
import random
import string

def lotsa_inserts():
conn = sqlite3.connect(example_db)
c = conn.cursor()
for x in range(100000):
number = random.randint(0,1000000) # generate random number from 0 to 1000000
user = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(10)) # make random user name
c.execute('''INSERT into test_table VALUES (?,?);''',(user,number))
conn.commit()
conn.close()

lotsa_inserts() #Call this function!


If you now go back and rerun lookup_database, you'll get waaay more things that print because there are now waaay more entries in this database.

What if we wanted to return the user/favorite number pair with the largest favorite number in the database? One solution would be to build on the script we had, and get the entire database in a Python list, and then write some Python script for us to return the max. This can be ok for small databases, but as your database grows, having Python sort 500,000 entries or something can start to get pretty slow, and this is where more refined SELECTs and sortings can be done on the database side3

The first thing we can do is modify our SQL query with a sort command, ORDER BY. In order to return the entire database in ascending order based on the value of favorite_number you'd run the following command:

SELECT * FROM test_table ORDER BY favorite_number ASC;


If you modify your lookup_database to instead have that command instead of the one we originally had in place, when you run it, you'll now see the entries in the database fly by in ascending order. If we want in descending order we could do this SQL command (try it to watch the values fly by in descending order)

SELECT * FROM test_table ORDER BY favorite_number DESC;


So far we've been running a query, generating a Python list from that query and then printing them all. If we wanted to print just the top entry, we could instead of a for loop, just do print(things[0]), but even then this can be slow because we're generating a really large Python list for no real reason other than it is in Python and that makes people feel safe. If we instead use the fetchone() method on an ordered response, we can get just a single value, and this can have significant speedup benefits. Consider the code below (which you could implement in a separate file if you'd like, so long as you keep the same directory so example.db is still accessible directly). If you run it on your currently existing database you should see a speedup of approximately three-fold between the two approaches. Please make sure you understand and appreciate why this is.

import sqlite3
import time
example_db = "example.db" #make sure this is run in the same directory as your other file.
conn = sqlite3.connect(example_db)
c = conn.cursor()

start = time.time()
things = c.execute('''SELECT * FROM test_table ORDER BY favorite_number DESC;''').fetchall()
print(things[0])
print(time.time()-start)

start = time.time()
thing = c.execute('''SELECT * FROM test_table ORDER BY favorite_number DESC;''').fetchone()
print(thing)
print(time.time()-start)

conn.commit()
conn.close()


You can also refine what is returned (and gain efficiency benefits) within the SQL query itself. For example the following SQL command is identical to the other two above, but limits what is done using SQL rather than the sqlite3 interface (and will give a speedup of about 20% or so on my machine with our current database as compared to our fast one above).

SELECT * FROM test_table ORDER BY favorite_number DESC LIMIT 1;


More details on the syntax of ORDER BY can be found here.

### 2.5) WHERE

We can add further or alternative sorting mechanisms using the WHERE keyword. For example, if we wanted a list of users that had favorite numbers only between 1132 and 1185 (to identify weird people since only weird people have favorite numbers in that range, amirite?) we could do the following SQL command (try this out with the lookup_database function, and you should see a much smaller portion of your database print out)

SELECT * FROM test_table WHERE favorite_number BETWEEN 1132 AND 1185;


This command says, select all full rows in test_table where favorite_number is between 1132 and 1185.

We can nest our specifications as needed as well. The operation above will return a Python list of the entries that meet that condition, but what if we want that in order? We could do this by appending the ORDER BY operator from earlier, like below:

SELECT * FROM test_table WHERE favorite_number BETWEEN 1132 AND 1185 ORDER BY favorite_number DESC;


More details on syntax of WHERE can be found here.

### 2.6) Timestamps

Another really useful type which we can use with our databases are timestamps. SQLite sort of has undocumented support for timestamps, but when we use SQLite in conjunction with Python and its datetime library, we're able to remember and pull elements from arrays based on times (possibly of when that row was inserted or when that row was modified). We'll do that using a new type of entry, the timestamp. Consider the following code which creates a new database time_example.db, with a single table in it called dated_table that has three entries in it: a text field called "user", an int field called "favorite_number", and a timestamp field called "timing".

import sqlite3
import datetime
example_db = "time_example.db" # just come up with name of database
conn = sqlite3.connect(example_db)  # connect to that database (will create if it doesn't already exist)
c = conn.cursor()  # make cursor into database (allows us to execute commands)
c.execute('''CREATE TABLE IF NOT EXISTS dated_table (user text,favorite_number int, timing timestamp);''') # run a CREATE TABLE command
conn.commit() # commit commands
conn.close() # close connection to database


Check Yourself:

What is the SQL query in the above code snippet?

We'll use this new database like the old one, but instead here, we'll time-stamp entries as they are made using the datetime library, which is documented here.

We'd now like to populate that database with a few (5) fictitious users and their favorite numbers. We'll space when we enter these into the database by two seconds.

import sqlite3
import datetime
import string
import random
import time
example_db = "time_example.db" # name of database from above
conn = sqlite3.connect(example_db)  # connect to that database
c = conn.cursor()  # make cursor into database (allows us to execute commands)
for x in range(5):
number = random.randint(0,1000000) # generate random number from 0 to 1000000
user = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(10)) # make random user name
c.execute('''INSERT into dated_table VALUES (?,?,?);''',(user,number,datetime.datetime.now())) #with time
print("inserting {}".format(user)) #notificatin message
time.sleep(2) #pause for 2 seconds before next insert
conn.commit() # commit commands
conn.close() # close connection to database


Check Yourself:

What is the SQL query in the above code snippet?

Just like the database before, we could then go and access and pull things out as needed:

import sqlite3
import datetime
example_db = "time_example.db" # name of database from above
conn = sqlite3.connect(example_db)  # connect to that database
c = conn.cursor()  # make cursor into database (allows us to execute commands)
things = c.execute('''SELECT * FROM dated_table;''').fetchall()
for x in things:
print(x)
conn.commit() # commit commands
conn.close() # close connection to database


But where this gets cool is we can start to sort off of timestamps. Below we create a datetime object that expresses "15 minutes ago" and then use that in combination with our SELECT query to get items from the last 15 minutes.

import sqlite3
import datetime
example_db = "time_example.db" # name of database from above
conn = sqlite3.connect(example_db)  # connect to that database
c = conn.cursor()  # make cursor into database (allows us to execute commands)
fifteen_minutes_ago = datetime.datetime.now()- datetime.timedelta(minutes = 15) # create time for fifteen minutes ago!
things = c.execute('''SELECT * FROM dated_table WHERE timing > ? ORDER BY timing DESC;''',(fifteen_minutes_ago,)).fetchall()
for x in things:
print(x)
conn.commit() # commit commands
conn.close() # close connection to database


Check Yourself:

What is the SQL query in the above code snippet?

Further documentation on the Python datetime library can be found here.

Checkoff 1:
Discuss SQLite and SQL with a staff member.

### 2.7) On the Server

A benefit of SQLite is that its databases are separate files4, and therefore makes it easy to backup things up, transfer data, etc. One way we can utilize this is to transfer a locally created database onto the server and use it in a request_handler function.

So for example, let's say you'd like the time-stamped database created above to live on the server. You can transfer time_example.db onto the server at a location such as lab04b/ using sftp or scp (with reference to your home directory on the server) and then use the __HOME__ keyword to reference its absolute location for calling it. For example the script below when placed into your home directory along with an appropriately placed database will allow us to access that database from an HTTP GET (or POST). In the particular example below, when we visit the script in the web browser, it will return all entries from the last 15 minutes, from the database, in order based on timing.

A Python Error Occurred:

Error on line 1 of python tag (line 10001 of file UNKNOWN):
print('%s' % (cs_user_logos[kerberos],))

NameError: name 'kerberos' is not defined

in case you forgot.

import sqlite3
import datetime
visits_db = '__HOME__/lab04b/time_example.db'

def request_handler(request):
conn = sqlite3.connect(visits_db)  #connect to that database (will create if it doesn't already exist)
c = conn.cursor()  #make cursor into database (allows us to execute commands)
fifteen_minutes_ago = datetime.datetime.now()- datetime.timedelta(minutes = 15) #create time for fifteen minutes ago!
things = c.execute('''SELECT * FROM dated_table WHERE timing > ? ORDER BY timing DESC;''',(fifteen_minutes_ago,)).fetchall()
outs = ""
for x in things:
outs+=str(x)+"\n"
conn.commit() #commit commands
conn.close() #close connection to database
return outs


You may get a permission denied error when trying to access your database. Assuming you have placed it in the correct location, this is most likely a permission error. To fix while in sftp, run chmod 666 database_name where "database_name" is the database of interest here.

You can also create databases on the server automatically, and I'd argue this is the better way for us here. Consider the code below. In it, the request_handler tries to create the database if it doesn't exist, and then adds a new entry to the database and returns what is currently in it. If you place this code in your home directory on the server and then visit it in the browser, every time you visit you'll receive an ever-growing list of entries! This is awesome...we now have a means of remembering stuff. On the web.

import sqlite3
import datetime
visits_db = '__HOME__/lab04b/time_example2.db'

def request_handler(request):
conn = sqlite3.connect(visits_db)  # connect to that database (will create if it doesn't already exist)
c = conn.cursor()  # make cursor into database (allows us to execute commands)
outs = ""
c.execute('''CREATE TABLE IF NOT EXISTS dated_table (user text,favorite_number int, timing timestamp);''') # run a CREATE TABLE command
fifteen_minutes_ago = datetime.datetime.now()- datetime.timedelta(minutes = 15) # create time for fifteen minutes ago!
c.execute('''INSERT into dated_table VALUES (?,?,?);''', ('joe','5',datetime.datetime.now()))
things = c.execute('''SELECT * FROM dated_table WHERE timing > ? ORDER BY timing ASC;''',(fifteen_minutes_ago,)).fetchall()
outs = "Things:\n"
for x in things:
outs+=str(x)+"\n"
conn.commit() # commit commands
conn.close() # close connection to database
return outs


It is important to realize that the code is no longer stateless. Our input to it, a GET request with no arguments is always the same every time we visit it in the browser or refresh the page, but the response it provides is different each time (since it is using the database to store past information). This is a how we could make a state machine on the web.

## 3) Deliverable

OK OK, so what we want for today is a two-part system. First we want you to update your server-side script from Lab 04B so that it accepts incoming POSTs in a application/x-www-form-urlencoded format the of current GPS coordinates (in addition to GETs) from your team's labkits, and based on that POST creates a database entry of:

• the user name (who just POSTed) (provided as the argument 'user' in your POST)
• the location the user POSTed (provided as the arguments 'lat' and 'lon' in your POST)
• the location on MIT's campus
• the timestamp of when the position POSTing happened.

Your script should generate a POST response comprised of a list of users that have reported their location in the same campus-spot within the last 45 seconds separated by new-line characters. This system should be designed to work with a number of clients like shown below:

You will mostly be working on the server for this assignment. Work together with your partner to implement the above server-side system, and modify your individual labkits' code to POST to your shared server-side code. (Pick one person's URL, and all post to that URL.)

You'll notice that we have not given you code to go onto your ESP32 just yet. That's because debugging things like POSTs and GETs can be miserable on the ESP32, as I'm sure many of you have encountered. There are tools that the gods5 made to make testing and debugging GET and POST requests far easier. As mentioned in Ex03, we recommend Postman, which you should have all installed at this point. It is a more professional stand-alone application, allows saving and more detailed creation and analysis of requests (you need to sign up, but it is free...in terms of money, anyways, I think you pay for it in other ways). We like Postman a lot.

This is probably the first time you've dealt with receiving a POST from the perspective of the server in our infrastructure. It is similar to dealing with a GET. Start with a simple script that reports back the body of the request dictionary so that you can get an idea of what a properly-formatted POST produces in that dictionary, and go from there. For example the following server script is a great way to start to just see what you're dealing with:

def request_handler(request):
return str(request)


Then in using that, build up the appropriate response. Make sure that your GET functionality from Lab04A remains intact in this new module you're making!!!

Checkoff 2:
Show your working team code to a staff member. It must exhibit all new POST functionality while also demonstrating the original GET functionality of Lab 04A.

To test your working system with your labkits, we have included a nice compact library that parses the GPS into DDM form and a starter script to run in this lab.

• Copy the TinyGPSPlus library into your Documents/Arduino/libraries folder like you've done for other libraries and make sure to restart Arduino completely!! You can use this library going forward with your GPS!
• In order for the lab04b_starter.ino file to run on your kit, you will need to update the code so that it is targeted at your group's URL and so that it uses your username. The code will POST to the URL of your script (upon a button push) the following information: username, latitude and longitude, using POST values 'user', 'lat', and 'lon', respectively, and will display whatever the response is that comes back. You must make sure to specify the proper URL otherwise it will be POSTing into a dead end and you won't get a Checkoff.

Checkoff 3:
Show your complete working system, as a team. We want to see people appearing and disappearing over time.

If you get done early, consider working on Exercise 04 since that exists and is a thing. Also consider doing a design exercise.

Footnotes

1There are numerous pros and cons to different databases. Some people will go to war for MongoDB or something else, and we're not passing judgement here. (click to return to text)

2SQLite is pronounced either "S-Q-L-lite" or "sequelite", whichever you prefer, we don't care. (click to return to text)

3the reason performing these sorts on the database side can be faster is that they'll be performed in the native C that the database is written in rather than using the bulkier higher-level structures of Python, which, while they can make our handling of data "easier", do add significant bulk to calculations and operations and make a poorer overall product. (click to return to text)

4this can also be a problem as well... (click to return to text)

This page was last updated on Thursday February 28, 2019 at 12:38:37 PM (revision 3571290).

CAT-SOOP is free/libre software, available under the terms
of the GNU Affero General Public License, version 3.