Wikipedia Interfacer

Spring 2019

The questions below are due on Sunday March 10, 2019; 11:59:00 PM.
 
You are not logged in.

If you are a current student, please Log In for full access to the web site.
Note that this link will take you to an external site (https://oidc.mit.edu) to authenticate, and then you will be redirected back to this page.
A Python Error Occurred:

Error on line 13 of python tag (line 13 of file /S19/ex05/wikipedia_be):
    kerberos = cs_user_info['username']

KeyError: 'username'

Back to Exercise 05

Can someone explain to me why Janelle Monáe did not win a Grammy this year?

1) The Big Picture

In Exercise 05, we're going to create a handheld encyclopedia with convenient User Interface (UI). Here's an awesome video of it in action. Look at me looking up random words!!!

The system will be comprised of three pieces: A User-Interface written by you in C++ on your mobile system and a server-based Python script to simplify the interfacing of your mobile device to Wikipedia's API, and of course putting everything together.

Our Overall Wikipedia System

This project will require working pieces from three primary exercises:

  • Button Class: This is button time-handling class that will be useful in the User Interface. This is an exercise in last week and needs to be working for the front-end interface.
  • Wikipedia Interfacer: This piece of Python will live on the server and act as an interface between your hand-held device and the actual Wikipedia API. We're doing it this way rather than having the microcontroller directly access the Wikipedia API in order to take advantage of Python's ability to parse and clean the responses from Wikipedia. This is what you'll be building on this page.
  • Wikipedia UI: This state-machine-based piece of code on the microcontroller will be responsible for building a topic to look up using the IMU and buttons as user-input and then displaying the query response (see video).

In this first stage we're going to develop our proxy server for our Wikipedia system. It will "live" between the microcontroller code you write in the next exercise and Wikipedia's API. If we really wanted, we could have the ESP32 directly work with the Wikipedia API, but you will see that what the Wikipedia API returns is a bit complicated and it will more convenient for us to "clean" it up on the server rather than locally on your mobile device. It also gives us an opportunity to get more practice on the server.

Our Wikipedia Proxy Server

2) Fetching Information from Wikipedia

We're going to build a Python script that will interface between our mobile device and the actual Wikipedia API.

This code will be very similar (in spirit) to the server-side scripts we wrote in the exercise from two weeks ago and the content discussed on the stuff about Python requests last week. In particular, this code should do the following: It should process an incoming GET request (which will come from your ESP in deployment, but could be Postman during development), and if a topic query argument is specified, look up that term or phrase on Wikipedia using their API (API call provided in code skeleton). The response that comes back from Wikipedia then needs to be parsed and cleaned up so that only a minimal body of text regarding the query's entry in Wikipedia is returned. A second query argument must also be specified, len, which specifies how many characters are to be returned if an article is found. However, the response can only be compsrised of complete sentences

  • If no topic and value is specified in the GET query, the function should return -1
  • If no information about the request topic can be found, you must return a -1. So if you searched for "cp7754" you should return -1.
  • If a legitimate response for the word queried is found, you should use the 'extract' entry as the basis of your response.
  • Remove all html (<b> tags, <i> tags, as well as leading spaces/tabs from the start of the extract etc...) from the 'extract' entry (consider using Beautiful Soup) and return the cleaned text

Take a look at the starting code for this exercise below. Three libraries are imported:

import requests
from bs4 import BeautifulSoup

'''BeautifulSoup Documentation (may prove to be helpful in stripping html <b>, <p>, <u> tags, etc...:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
'''

def request_handler(request):
    #your code here!
    topic = "cat"
    #use the string below for properly formatted wikipedia api access (https://www.mediawiki.org/wiki/API:Main_page)
    to_send = "https://en.wikipedia.org/w/api.php?titles={}&action=query&prop=extracts&redirects=1&format=json&exintro=".format(topic)
    r = requests.get(to_send)
    data = r.json()
    #starter line for debugging:
    return data

We've seen requests and json before, and you shouldn't need to do much beyond what is provided in the skeleton below with those two libraries. BeautifulSoup can be helpful in cleaning up/removing html and other styling elements from a body of text. Check out the documentation. It is a great library for things like this.

The starter code we are giving you is basic in functionality...every request no matter what queries Wikipedia for the meaning of the word "cat".

def request_handler(request):
    #your code here!
    topic = "cat"
    #use the string below for properly formatted wikipedia api access (https://www.mediawiki.org/wiki/API:Main_page)
    to_send = "https://en.wikipedia.org/w/api.php?titles={}&action=query&prop=extracts&redirects=1&format=json&exintro=".format(topic)
    r = requests.get(to_send)
    data = r.json()
    #starter line for debugging:
    return data

topic = "cat"
to_send = "https://en.wikipedia.org/w/api.php?titles={}&action=query&prop=extracts&redirects=1&format=json&exintro=".format(topic)
results in to_send being:
topic = "https://en.wikipedia.org/w/api.php?titles=cat&action=query&prop=extracts&redirects=1&format=json&exintro="

For those who are interested, the wikipedia api details are given here: https://en.wikipedia.org/w/api.php, though you don't need to investigate them too much for this problem.

Hints: To figure out how to get the parameters from an incoming request, look at any of our previous server-side Python code. Play around with the Wikipedia API locally on your machine. There are two main steps, broadly speaking. First, getting the Wikipedia 'extract', and second, stripping out the HTML tags. * Use Postman for experimentation. You can add parameters to the request by appending ?topic=whatever to the URL. E.g. if you want to look up the meaning of the word ant, add on ?topic=ant to your script's URL.

Your filled in request_handler function for this problem should ultimately be submitted below for full credit (and you'll use that file living on the server with the front-end part of the next section). After verifying a properly working file, upload it to the directory ex05/wiki_interfacer.py within your home directory on the server 608dev.net. Specifically:

A Python Error Occurred:

Error on line 2 of python tag (line 161 of file /S19/ex05/wikipedia_be):
    cs_print(r'''<a href="http://608dev.net/sandbox/sc/%s/ex05/wiki_interfacer.py" target="_blank">http://608dev.net/sandbox/sc/%s/ex05/wiki_interfacer.py</a>'''%(cs_user_info['username'],cs_user_info['username']))

KeyError: 'username'

A Python Error Occurred:

Error on line 63 of question tag.
    csq_tests= [{'code':'''ans = request_handler({'method': 'GET', 'values': {'topic': 'cat', 'len': '200'}, 'args': ['topic', 'len']})''', 'check_function':check_output},

NameError: name 'check_output' is not defined

When this code is working, place it at A Python Error Occurred:

Error on line 1 of python tag (line 10001 of file UNKNOWN):
    print('%s' % (kerberos,))

NameError: name 'kerberos' is not defined

/ex05/wiki_interfacer.py and run the checker below. Don't worry about the text box. Put whatever there. If you'd like you can enter your favorite flavor of ice cream.

Remember to include any imported libraries you use in your file that you upload.

Back to Exercise 05



This page was last updated on Sunday March 10, 2019 at 06:16:21 PM (revision 9a3e10c).
 
Course Site powered by CAT-SOOP 14.0.4.dev5.
CAT-SOOP is free/libre software, available under the terms
of the GNU Affero General Public License, version 3.
(Download Source Code)
CSS/stryling from the Outboxcraft library Beauter, licensed under MIT
Copyright 2017