CS470/570 Artificial Intelligence

Program #1: Fred Flintstone problem-solving
(a.k.a. Warming up our Python mojo)

Overview:

There's nothing intrinsically different or magical about AI programming; it's all just software...only it just happens to be specialized towards certain analytic orientations, algorithms, and problem-solving goals. Well heck, you're all rock-star programmers, right? So let's get that programming mojo warmed up with a little basic problem-solving challenge!

I'm calling this "Fred Flintstone problem-solving" because it's just that: we know nothing at this early point in the course, so we can offer only completely off-the-cuff, uninformed, "naive" solving of a problem. As we move forward, we'll soon develop a broader understanding of the intellectual "terrain" surrounding problems like these: what we are really doing, what the alternative solution approaches in this terrain are, and how to think about what will work best. But for now, we're just going to get out our caveman club and flail away... and then grunt happily when we get a solution!

The Problem:

In this first small programming exercise, we will consider how we can solve Boggle. If you haven't played in awhile, here's the gist of the game: There are 16 cubes with letters on the faces. These cubes are randomly arranged in a 4x4 matrix by shaking the boggle game. The goal of the game is to make words out of these letters by traversing adjacent (horizontal, vertical or diagonal) tiles. This "chain" of letters may snake all over the board, but you can only use each tile once, i.e., no fair using the same letter twice in a word. In a fixed amount of time players must make as many words as possible. Words are then scored as follows: 1 point for each 3-4 letter word, 2 points for a 5-letter word, 3 points for a 6-letter word, 5 points for a 7-letter word, 11 points for a 8 (or more)-letter word.

The Assignment: Overview

In this problem, you are asked to solve Boggle boards exhaustively in whatever way you can (go Fred Flintstone!): given a particular boggle board as input, your algorithm should enumerate all possible words that can be found in that Boggle board.

The dictionary we will use for our game of Boggle is the Tournament Scrabble Wordlist which includes 178,691 words. I've cleaned up and provided a file of dictionary words for you here.

Your program should be given a dictionary and an NxN boggle board (either in command line or program params, up to you). It should then run and discover all possible words existing in the given board and print out a summary of its findings.

Details:

You are permitted to use only basic "standard" Python data structures, i.e., lists, dictionaries, and sets. Nothing fancy like Trie, Queue, etc. which are complexity overkill that will just slow you down, confuse you, and impede your understanding of what you're really doing. In short, the only package you may import for this program is the time package to do the timing. Everything else is standard core Python.
Any dictionary used will have the same format as the twl06 dictionary, i.e., one word per line in a text file. This means that, given the appropriate dictionary file, your program can Boggle in French just as easily as in English!
A boggle board file will consist of N lines of N letters separated by spaces. You should ignore extra space at the end of the line or extra newlines at the end of a file. (Hint: check out the python strip() function)
Your solver should be able to take in whatever NxN board size you pass it, simply deducing the board size from the given board input file. A solid algorithm will work just as well on 2x2 boards as on 10x10 boards!
Efficiency matters in time-based scenarios (like games). Keep efficiency close in mind as you design your code.

This is not a particularly hard problem...provided you think it through! (Hint: elegance, recursion). Just as a reference point: my solution has one main function of about 15 lines, plus three smaller helpers to load/print out boards and stuff. Without comments, the whole thing fits on a page.

How to think about this problem:

The main point of this exercise is to just get everybody warmed back up on Python, but of course we also want to get our problem-solving knives honed up. Although we don't know much about it in any formal sense (but we will soon!), this problem involves exploration of a large space of possible states to find solutions...which is what a lot of AI is based on. In this case, a "state" is a place I've ended up after taking some path across the board, starting from some starting position...the letters in the path you've followed form a string...which might or might not be a "solution", i.e., depending on whether this string you have is in the dictionary. Ok, so it's really pretty simple: you have to write a program that (a) starts in each of the possible starting points on the boggle board; and (b) explores all possible paths from that point, recording/scoring any correct words it finds along the way. Put this way, it's simple: your key needs are a function that, given a current position, generates all possible adjacent positions (gotta stay on the board!). Then you have to consider that not all adjacent positions are possible: you have to subtract away any tiles that are already on the path you've already explored (can't use a tile twice!). Then you put it together: start at some position, see if the letter it contains is a legal word (if so, score it), then generate all possible next positions to jump to from there, jump to each one (adding its letter to the path)...and then repeat until none of your paths can go any further. If your brain is thinking "massive recursion!" then you're on the right track...

Programming Deliverables:

To make sure that everyone is moving in the right direction from the start, and to keep you on track, we'll break this project up until a couple of deliverables:

Part 1: For this part, you'll need to get a few of the basic pieces in place, before moving on to actually put them together into a solution. What I'll expect to see is:

A "loadBoard" function that takes the filename of a board file, and loads that in to work on --- returns a new board data structure (NxN matrix). Obviously you'll call this right at the start.
A "printBoard" function. Takes in a reference to a loaded board data structure (an NxN matrix) and prints it out. Simple.
A "possibleMoves" function. Takes in a current position (just an x-y pair) and a boggle board and generates all possible next positions (x-y pairs in a list, set, or whatever you decide).
A "legalMoves" function. Takes in a list of possible moves (i.e. generated by PossibleMoves) as well as a path (list of x-y pairs) of places you've already been, and essentially subtracts the latter from the former: the only legal moves are possible moves minus any places that you've already been.
An examineState function that takes in a boggle board, a current position, and a path up to that position. It adds the current position's tile to the path, computes the word now formed by that path, and returns a tuple of (<current word generated>, <yes/no depending on whether that word is in dictionary>).

Here is a shot of what these functions should look like in action. To help you get started, here is my useful little "GenBoard" Python program that randomly creates a new Boggle board file of specified size. Makes it easy to create new testing boards!

See below for exact deliverables for your submissionp packet for this part.

Part II: This is the final deliverable for this project. What I'll expect to see here is your fully functional program. Note that you might want to consider re-factoring your code from Part 1 a bit. In particular, you'll probably want to rebuilt the functionality from ExamineState into your broader function for exploring state after state throughout the whole board. And of course, you'll need a main function that loads everything up, reports starting board and resulting scores, and otherwise controls the action. What I'll expect to see here are:

When run, your program will begin by showing the board being explored, so we all know what we're working on.
To give an idea of efficiency, your program should report the time taken to run your code, and the total number of words checked. (Import time.time() in time for this). The time is merely of passing interest, since this will vary by machine (proc. speed, memory, etc). What is really telling is the number of words explored.
To ease scoring, your output should group found words into groups for 1,2,...X-letters. To further ease correctness checking, your program should then also print the total number of words found and provide an alpha-sorted list of all words.
OUTPUT: I'm providing a couple of sample boards to show you the desired output and give you some values to verify your program's correctness.
1. Here is a standard 4x4 Boggle board and here is my output file for it.
2. Just for fun, Here is a nice fat 10x10 board, along with the outputs for it. As you can see, it's not even computationally manageable without applying a little cleverness to cut it down to size! The combinatorics are killer here!
To ease grading, your program's output should be essentially identical in both format and content to my sample outputs shown above. That means it reports same info in the same order, nice and clean and readable. I don't care about detailed labeling, etc., but it should be easy to read, showing the required content in the required order

See below for the detailed deliverables to put into your submission packet for this part.

Analysis questions: thinking about what your program is showing you about the problem

As explained on our first day, this is NOT a "learning programming" class. This is the upper division, we can assume you all know very well how to program, so let's focus on the real meat. As real computer scientists, creating the code is incidental; what really interests us is using a program to explore the nature of a particular problem. Thus, the following write-up asks you to reflect on your experience in programming the solution, and use it to explore this problem more deeply. In keeping with our emphasis on thinking over programming, the write-up will be worth a substantial amount (up to 50%) of the total points, so be sure to leave time for it in your time planning! Your write-up should be professionally neat, with clearly labeled answers to each of the following:

A clear description of your algorithm, i.e., the approach/strategy that your code takes in solving the problem. Being able to clearly outline an algorithmic approach is a valuable communication skill in our business, and demonstrates the extent to which you truly understand what you're doing. Do NOT walk through your functions/code in low-level detail! You need to describe how your program solves the problem abstractly, as in the key features of the approach and the steps the program goes through in executing it.
Answer to the following "thinking" questions. Professionally present your answers to each as clear narrative interspersed with graphs, figures, equations and whatever else you need.
1. Observe your own results: Run your solver on several 4x4, 3x3, and 2x2 boards. How many different words did your solver explore on each? How much time was taken on each. Now analyze: Come up with a curve showing your results. Then use this to predict the time/moves it would take to explore a 5x5 board.
2. Analyze the problem generally: How many possible combinations of letters (i.e. actual words or not) can be constructed from an NxN board? Walk through your reasoning carefully, showing how your value comes together. Let's keep it simple just to get a decent upper bound without needing a PhD in combinatorics: Ignore detailed paths possible on the board and just assume that every letter on the board could be chained with every other letter on the board...how many words could be made that way? How does your analysis match up with your empirical findings in the last question?
3. Use your solver to solve at least 10-20 different boards, then ponder the solution stats you got. Based clearly on your observations, consider the following: Suppose there is a Boggle competition where human players are given a sequence of boards to solve, and the time they have to do so decreases with each board. Now examine the outcomes from the boards that you've run your solver on. What strategy for finding words would a "smart" (or as we'll call it in this course, "rational") player employ to maximize points in a time-limited time? Don't just speculate, support your answer clearly with your empirical results!
4. In the sample output provided above, there are two runs on the same board shown, with/without "cleverness" turned on...with drastic differences in time/resources used to find identical results. What the heck could that devious Dr. D be doing here to achieve this? Magic? Hint: put in some print statements to watch your program work...and then reflect on the implications of where effort is wasted. The difference between the two outputs in my source code is exactly one line of code...plus another easily-created resource.

To turn in:

A professional packet with the following items in exactly this order for each part:

Part 1 packet:

Cover sheet: Name, course, assignment title, date
Printouts of your program running the dynamically assigned boards. THIS LINK to the testing boards will be activated shortly before the due time. Test boards will be no larger than 4x4.
Your fully-commented code (maybe be duplex printed). All code must be clearly readable, nothing cut off, no hard-to-read screenshots!

Part 2 packet:

Cover sheet: Name, course, assignment title, date
Your analysis questions: questions presented in the order given, each question clearly labeled with the question text, then your answer to it. Nice and clean and readable.
Printouts of your program running the dynamically assigned boards. THIS LINK to the testing boards will be activated shortly before the due time.
Your fully-commented code (maybe be duplex printed).

Important: Make sure all printouts in the packet are clear and 100% readable. If I can't read it, I can't score it!