Thursday, December 20, 2007

I should be asleep right now, but I'm a bit wired, so I decided to set up a script to automatically check if grades have been posted.

In principle, this isn't too hard - set up a cronjob to call a script. In the script, download the grade page with wget, and parse the file so the output is a bit simpler. We can then output it in whatever way - write it to a file, zephyr it, send an email (ooh - email to SMS! Instant grade notification!) The catch is that the grade site doesn't do authentication by password, in which case we could use pycurl, or HTTP Auth, which wget supports natively. Instead, it does authentication by private certificate.

Open up Firefox, in which we'll assume your certificates are already installed (you could do this with another browser, and the steps would only be slightly different). You're going to export a PKCS12 certificate; do this by going to your preferences, then advanced. Click 'view certificate', choose your MIT cert, and back it up. Then,convert it to the proper format.

Once you've followed the linked instructions, you should have mitcert.pem, mycert.pem, and mykey.pem. The wget command to download the grade page is wget --private-key=mykey.pem --certificate=mycert.pem --ca-certificate mitca.pem https://student.mit.edu/cgi-bin/shrwsgrd.sh. This is not quite what we want, though, for two reasons. One, it will save the results as shrwsgrd.sh; two, it promps for the password we assigned to mykey.pem before. The first problem is easily solvable; we want to pipe the output to grep, so it needs to go to stdout, which wget will do if we add -O - to the end. The second problem is a bit trickier. Go back and recreate mykey.pem, but this time, add the -nodes flag. NOTE! mykey.pem is a private key, so this is a Very Bad Idea. I'm sure there's a better way to do it - ssh-agent, perhaps? - but if you do decide to do it this way, keep the key somewhere secure, on your own machine, and delete it as soon as grades come in. I'm not responsible for any consequences of your key falling into the wrong hands (and if you do know the Right Way to do this, please leave a comment).

So now, we have our data. We want to parse it somehow to get just the grades for this semester. Happily, we have sed and grep at our disposal (I think it could all be done with sed, but that's a bit more hardcore), so we get just the grades (for all semesters) by doing:

grep -e [0-9A-Z][\.] -e '[ABCDFIPT+-]</TD' | sed 's:<[/A-Za-z]*>::' | sed 's:<[A-Za-z\ =]*>::' | sed 's:\s*::' | grep -v -e 8.012 -e GEN.APCR -e '[\s<]' | sed '/.[0-9A-Z]$/ {
N
s:\(\.[0-9A-Z]*\)\n*\([A-Z]\):\1 \2:
}'


By way of explanation: the first grep matches any line with either a course number, or one of the letters available in a grade, followed by a close-cell tag. The next bit removes the close tag; following that is the bit that removes the open tag (No, I'm not entirely clear on why I couldn't combine them. Simpler to have two calls to sed than to debug). Then we remove whitespace; I took out 8.012 and GEN.APCR because they didn't fit the scheme so well, but that's specific to my transcript (didn't fit b/c GEN.APCR doesn't have a normal grade; I didn't want "S" in there as a grade just for that one line. And 8.012 was messy because I dropped it; the "DR" code caused some problems). In that same grep, we removed lines that had whitespace and tags. The last call to sed looks for a course number; if the following line is a grade (has a letter in it), then it consolidates the two lines.

So now our output is in a format such that each course has a line; a line starts with the course number, and ends with either a grade (separated from the number by a space) or nothing. We only want to notify the user if there's a change from the last time we checked. So let's store it in a file called grades: cat >new-grades. We'll then do diff old-grades new-grades; this is our result, which we'll carry on to the next section. We also want to copy old-grades into new grades when everything else is done.

If the grades haven't changed, then diff old-grades new-grades will return an empty string. Let's say, though, that we have something to report. In that case, we'll grep for lines containing >, and filter out the > and whitespace using sed: grep \> | sed 's:> ::'. And this is what we want to report to the user.

I want to get an SMS, so I'll have this code send an email to [number]@vtext.com: mail -e [number]@vtext.com. Note that the -e flag is important; it tells mail not to send anything if the message is empty. So if diff old-grades new-grades is empty ... nothing is sent.

Almost done! Deep breath, then we can put all this together with pipes. First, the wget is piped to the parser, which cats into new-grades. Then, we diff new-grades and old-grades, parse the result, and send our email. Then, we move new-grades to old-grades.

wget --private-key=mykey.pem --certificate=mycert.pem --ca-certificate mitca.pem https://student.mit.edu/cgi-bin/shrwsgrd.sh -O -| grep -e [0-9A-Z][\.] -e '[ABCDFIPT+-]</TD' | sed 's:<[/A-Za-z]*>::' | sed 's:<[A-Za-z\ =]*>::' | sed 's:\s*::' | grep -v -e 8.012 -e GEN.APCR -e '[\s<]' | sed '/.[0-9A-Z]$/ {
N
s:\(\.[0-9A-Z]*\)\n*\([A-Z]\):\1 \2:
}' | cat > new-grades && diff old-grades new-grades | grep \> | sed 's:> ::' | mail -e [number]@vtext.com && mv new-grades old-grades


Plop it all into a cronjob (left as an exercise for the reader), touch old-grades, and you're all set.

1 Comments:

Blogger Diane J Standiford said...

GAWD! You gotta be kidding? AAARGH

6:52 AM  

Post a Comment

<< Home