Archive for April, 2006

Caledon Golf Country Club

Friday, April 28th, 2006

I’ve been golfing for the better part of about 10 years now. Although I’ve only played courses within Ontario, I have experienced some of the best this province has to offer. At the top of my list would be Lionhead’s Masters and Legends courses, which I’ve played a handful of times.

Yesterday, a friend and I played Caledon Country Club in Inglewood, Ontario. This course is about 25 minutes north of Mississauga, set on open country land. The course is a par 71 playing stretching around 6000 yards. You have the option of walking or using a cart, but with so many hills and holes so far apart from each other, consider spending the extra money.

The course brings water into play often (10 of the 18 holes), most notably on number 9 (par 4, 298 yards), which coincidently is called ‘Go For Broke’ – enough said. Club selection is vital on a course like this, since narrow fairways, bushes, trees and water come into play very often. Words of advice, stay out of the bunkers, most have huge lips and are unforgiving.

The course is pretty pricey at around $70 CAD and was rated 14th best course in GTA. Overall given it’s still April and we had a ton of rain fall last weekend, the course was in above average condition.

Caledon Club House

Caledon Hole 1

More pictures here

iPod your car for $15

Monday, April 24th, 2006

Most new cars and after-market decks give you the option of attaching an auxiliary device (Line In). In most cases people connect a CD changer, or something else along those lines.

I wanted to connect my Nano to my car, but definitely did not want to pay for an external device, like the Alpine iPod interface. Since my deck gave me the ability to attach a line in, I went to The Source and purchased a shielded Y-Adapter, which is nothing more than a two phono plug (RCA) to stereo plug converter ($15 CAD). So now, I’d connect the iPod using the headphone jack.

You can even be creative in were you put the iPod, here are a few pics:

ipodcar1.jpg

ipod car

Note: I needed to buy an extra JVC adapter which was specific to JVC decks ($20 USD via eBay). With the above quick fix, you would need to change songs using your iPod. Obviously since its just a headphone jack connecting to the iPod, you can connect any stereo ouput device (other MP3 players, CD Player, etc.)

McMaster Kipling 2006

Sunday, April 16th, 2006

Rudyard Kipling once said:

Gold is for the mistress, silver for the maid
Copper for the craftsman, cunning at this trade
‘Good!’ said the Baron sitting in his hall
But Iron, cold Iron, is master of them all!

Every year at McMaster (and other schools) the ‘Ritual of the calling of the Engineer’ takes place (also known as Kipling). Before going to the Kipling Ceremony, which was held at Liuna Station, the ceremonial pranks are performed. One of my thesis team members was up bright and early March 31st and went around campus, armed with his digital camera. He has a great post on his blog giving more detail about Kipling and showing some of the pranks from this past year.

Yahoo’s bet on the future of the internet

Sunday, April 16th, 2006

Yahoo! is making a strong bet on the future of the internet becoming a social application. They realize that people don’t care so much about search results, as opposed to how these results are delivered. While Google, ms, etc. compete on search, Yahoo is steadily creating/buying a network of application (Buzz, Answers, Flickr, etc.), which use the internet to not only connect people, but use the knowledge of the individuals to produce a synergy, which could potentially be greater than any horizontal search could provide.

But one thing still remains, for Yahoo’s plan to work, they need to commoditize search, the catalyst: Nutch – an open source search engine started by Doug Cutting (creator of Lucene). This project was sponsored by Overture (Yahoo) for a while, but as of January 1st 2006, Yahoo! has hired Doug full-time:

On the first of this year, after four years as an independent contractor, I accepted a full-time job with Yahoo!. This isn’t as big of a change as it sounds. For much of the past four years my work on Nutch had been in-part funded by Yahoo! (and Overture before they were acquired by Yahoo!). I’m still primarily working from home, and, so far, entirely working on open-source stuff: Lucene, Hadoop and Nutch. The biggest change is that I don’t have to draft contracts, submit invoices, etc. I can now instead better focus on the technology and the open-source process.

This is brilliant on Yahoo’s behalf: Make search so widely available (and easy to start up), that eventually thousands of vertical search engines will exist. The trend has already started, tons of vertical search engines have popped up (with venture capital funding), an example krugle.com.

k-means and EM clustering algorithms

Sunday, April 16th, 2006

The final data mining assignment (which was due in the middle of exams) was to implement the k-means and EM algorithms in C. These are two pretty simple clustering algorithms, with k-means (or renditions of k-means) used in industry all the time. Both algorithms work in 2-dimensions and take an input file of data points separated by a space.

K-means works as follow:

  • Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.
  • Assign each object to the group that has the closest centroid.
  • When all objects have been assigned, recalculate the positions of the K centroids.
  • Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

      The EM algorithm is a bit more complicated, but has generally the same idea:

      em

      Since this was in the middle of exams, I wrote both solutions (code) in one night (7pm to 3am), so no guarantee whatsoever. Take a look; if you’d change anything let me know. The code is licensed under GPL v2.

    Apriori Online – Mining for Associations

    Wednesday, April 5th, 2006

    Why are associations interesting? For the same reason data mining in general is interesting – you can learn things you would have never thought of. Take Shoppers Drug Mart as an example, don’t quote me on this, but I bet when you buy a bulk of goods (and swipe you Shoppers Optimum Card) some database is registering your personal information (age, sex, location, etc.) and everything you purchased (Tylenol, Kleenex, gum, etc..).

    With high confidence (no Apriori pun indented) I can say that periodically Shoppers Drug Mart mines that data for interesting association, like perhaps:

    - Males between the ages of 25 and 45, between the hours of 7:00pm to 12:00am purchase toilet paper and Mach 3 razors.

    What would Shoppers do? In the evenings they would set up and isle display of expensive toilet paper. At the front of that isle (as the customer is walking to the cashier) they would set up another display of Mach 3 razors…10 packs. They’ll do this because they’ve found associations, which tell them with high confidence; males will purchase these two items together.

    For a data mining project this past semester, one of the deliverables we created was a web based version the Apriori algorithm. The algorithm was initially developed by Agrawal et al (Agrawal 93, Agrawal 94) and is used to find association rules from a dataset. This online version of the Apriori algorithm is based off an implementation by Christian Borgelt’s.

    The URL is http://apriori.sematopia.com. Aside from simply executing the Apriori implementation by Christian, the web-based application does a series of pre-processing on the data. To begin, all missing values in a given numeric column are filled in with the average of that attribute. Then categorization occurs on all numeric attributes to place each column value in a bin. This is based off of the Category Granularity (how many categories to make in a given column) and the Categorization Threshold (if the number of unique values in the column are < this number, then categorization will not be performed). There are many options (besides Category Granularity & Categorization Threshold) available that can be set to adjust the results Apriori returns; these include the max/min support, the min confidence, min sets per rules, etc.

    The file uploaded must be in CSV format with the first row as headings, and each subsequent row being observations. The max file size that can be uploaded is 200,000 bytes. After the algorithm finishes (which should take less than 3 seconds) the first 100 association rules will be shown. You can download the full output from the website.

    The associations produced will look like this:

    verbal_sat=c3_g7(442,469) <- math_sat=c2_g7(500,530) act=c4_g6(21,23) (36.1, 93.0)

    c3 means column 3
    g7(442,469) means group (bin) 7 were this specific bin was between 442 and 469
    (36.1, 93.0) at the end of each rule, the support and confidence is given

    Each IP address will be allowed to run 5 datasets per day.
    Good luck, happy mining.
    http://apriori.sematopia.com

    Thesis Design Project: Robot Java Code

    Monday, April 3rd, 2006

    One of my main jobs when working on this project, was to write the code which would control the Robot.
    The code is written in Java as our Robot’s OS was leJOS.

    Features of the code include:

    • A single control loop (as opposed to multiple threads and listeners)
    • A great ’stay straight’ algorithm which keeps the robot centered in the network, through 2 light sensors
    • Transmission to and from the host component for tasks and logging information
    • A task driven control structure

    You can download the code here. To learn more about our final product go here.

    Your Ad Here