Archive for the 'Projects' Category


Hacking together a map on demand – Google Maps Pivot

Sunday, February 4th, 2007

The other day I wanted to plot multiple points on a map and have them pivot around a center. Above that, I wanted the distance from a specified center. I quickly looked around, couldn’t find one, so I hacked one together using the Google Maps API. An hour later, I had a simple app I’m calling Pivot (cause that’s what it does). The only thing that took a while was realizing the Maps API had an approximate distance calculator built in – it works well for short distances.

The Google Group for Maps is great, lots of information and the API docs are detailed.

Below is a screen shot. If you need it, give it a shot.

Google Maps Pivot

Thesis Design Project: Robot Videos on Google Video

Sunday, July 2nd, 2006

This is the last thing I do for this project. It’s been fun, we made a great product, but this is it. I put the 3 robot videos on Google Video, take a look if you like. The first video is rescuing 3 victims (eggs), the second video is blockage clearing, and the third video is security camera footage of the Gates’ rescue. You can read more about the project here.





Apriori Online – Mining for Associations

Wednesday, April 5th, 2006

Why are associations interesting? For the same reason data mining in general is interesting – you can learn things you would have never thought of. Take Shoppers Drug Mart as an example, don’t quote me on this, but I bet when you buy a bulk of goods (and swipe you Shoppers Optimum Card) some database is registering your personal information (age, sex, location, etc.) and everything you purchased (Tylenol, Kleenex, gum, etc..).

With high confidence (no Apriori pun indented) I can say that periodically Shoppers Drug Mart mines that data for interesting association, like perhaps:

- Males between the ages of 25 and 45, between the hours of 7:00pm to 12:00am purchase toilet paper and Mach 3 razors.

What would Shoppers do? In the evenings they would set up and isle display of expensive toilet paper. At the front of that isle (as the customer is walking to the cashier) they would set up another display of Mach 3 razors…10 packs. They’ll do this because they’ve found associations, which tell them with high confidence; males will purchase these two items together.

For a data mining project this past semester, one of the deliverables we created was a web based version the Apriori algorithm. The algorithm was initially developed by Agrawal et al (Agrawal 93, Agrawal 94) and is used to find association rules from a dataset. This online version of the Apriori algorithm is based off an implementation by Christian Borgelt’s.

The URL is http://apriori.sematopia.com. Aside from simply executing the Apriori implementation by Christian, the web-based application does a series of pre-processing on the data. To begin, all missing values in a given numeric column are filled in with the average of that attribute. Then categorization occurs on all numeric attributes to place each column value in a bin. This is based off of the Category Granularity (how many categories to make in a given column) and the Categorization Threshold (if the number of unique values in the column are < this number, then categorization will not be performed). There are many options (besides Category Granularity & Categorization Threshold) available that can be set to adjust the results Apriori returns; these include the max/min support, the min confidence, min sets per rules, etc.

The file uploaded must be in CSV format with the first row as headings, and each subsequent row being observations. The max file size that can be uploaded is 200,000 bytes. After the algorithm finishes (which should take less than 3 seconds) the first 100 association rules will be shown. You can download the full output from the website.

The associations produced will look like this:

verbal_sat=c3_g7(442,469) <- math_sat=c2_g7(500,530) act=c4_g6(21,23) (36.1, 93.0)

c3 means column 3
g7(442,469) means group (bin) 7 were this specific bin was between 442 and 469
(36.1, 93.0) at the end of each rule, the support and confidence is given

Each IP address will be allowed to run 5 datasets per day.
Good luck, happy mining.
http://apriori.sematopia.com

Thesis Design Project: Robot Java Code

Monday, April 3rd, 2006

One of my main jobs when working on this project, was to write the code which would control the Robot.
The code is written in Java as our Robot’s OS was leJOS.

Features of the code include:

  • A single control loop (as opposed to multiple threads and listeners)
  • A great ‘stay straight’ algorithm which keeps the robot centered in the network, through 2 light sensors
  • Transmission to and from the host component for tasks and logging information
  • A task driven control structure

You can download the code here. To learn more about our final product go here.

Thesis Design Project: Robot Pictures

Tuesday, March 28th, 2006

A colleague and I took a few final photos of our robot today.
You can read more about our final product here.

Update: Proof of Concept 3 pictures can be seen here.
Update: Proof of Concept 2 pictures can be seen here.
Update: Read about the final product here.
Update: Videos of the robot can be seen here.

Robot Pic1Robot Pic2Robot Pic3Robot Pic4Robot Pic5

Thesis Design Project: Done

Wednesday, March 22nd, 2006

This past Sunday was the final competition to the IBM judges for my team’s thesis design project. The project consisted of making an autonomous robot (out of Lego) that would go into a network, and rescue hallow eggs.

There were 2 parts to the project. The first is the graphical user interface, which uses a map of the network to determine execution paths, trasmit/receive with the robot, etc. The second part is designing the actual robot. Both components had a lot of effort put into them, and collectively we are very proud of the final product.

Our company name was MBI (MicroBot Initiative) and the product name was nexSAR (Next Generation Search and Rescue). Highlights of the finished product?

  • Capable of rescuing 3 eggs in under 25 seconds (using the network provided).
  • Custom designed ‘stay straight’ algorithm, which polls light sensors to remain centered in the hallway.
  • Upon exiting the network, the robot uploads its log, and the GUI creates a VRML (3D) log re-enactment. This is complete with victim voices, which get louder as the robot nears.
  • Ubuntu Live CD. Just put the CD in the CD drive, reboot, and all the software you need is ready to go. We have a live CD for both PowerPC and for Intel.
  • USB Rescue Key. Capable of booting to DSL (Damn Small Linux) straight from a USB drive. This also has all the software pre-loaded and ready to go.
  • Complete GUI with map editor and path generation.

Here are the slides that we used in our final presentation.
The slides use 3 videos of the Robot in action – Video1Video2Video3 – (sorry 2 of the videos are large ~50mb).

Update: Final photos of the Robot (taken March 28/06) can be seen here.
Update: The Robot Java code can be downloaded here (April 4/06).
Update: The videos above have been uploaded to Google Video, you can see them here (July 2/06).

Here is a screen shot of the GUI:

Here is a screen shot of the VRML log re-enactment:

How to: Apache logging using name-based virtual hosts

Saturday, February 25th, 2006

I spent a bit of time today getting logging to work correctly on my server. The server is using name-based virtual hosts, so getting logging set up was going to take an extra step. The first thing was to add another LogFormat line in the apache2.conf file which would use the comonvhost option.

Code:
LogFormat “%v %h %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-Agent}i\”" comonvhost
LogFormat “%h %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-Agent}i\”" combined
LogFormat “%h %l %u %t \”%r\” %>s %b” common
LogFormat “%{Referer}i -> %U” referer
LogFormat “%{User-agent}i” agent

Adding this LogFormat line, lets Apache distinguish between the various virtual hosts when logging. To learn more about what the options mean (i.e. %u, etc.) you can look here. Each virtual host was given its own log file, so in the VirtualHost directive, the following line was added using CustomLog:

Code:
ServerName www.sematopia.com
DocumentRoot /var/www/sematopia.com
ServerAlias sematopia.com *.sematopia.com sematopia.net *.sematopia.net
CustomLog /var/log/apache2/sematopia/access.log comonvhost

Then for all other VirtualHost directives, a similar line was added. Once that’s done, delete or move the old logs in /var/log/apache2. Then run:

Code:
apache2 –k restart

to restart Apache. Remember, any time you delete your log files or change your apache2.conf file restart apache.

XML Mail – Batch sending and easy logging

Thursday, February 9th, 2006

I ended up purchasing a relay service from Nettica.com, so this batch emailing idea is not going to be of much use to me. In any case, I think its a pretty good way to control emails. If your site sends many emails per day, then a batch email setup might be a good option.

Using a batch emailing method, first off, allows for an inherent logging ability built into the process. Every time the application wants to send an email, it would write the email (XML) to send.xml. Periodically, from an internal or external server, the send.xml file is processed, and then automatically backed up. This can be extended to eventually gzip the XML files, etc.

Secondly, if your site sends many emails, you may not want to consume resources (in real-time) sending them on demand. Instead, sending all emails every say 5 minutes, could possibly improve server performance.

A typical example code would be:

Code:
$mailObj = new xmlMail();
$mailObj->getHandle();
$mailObj->writeEmail($to,$cc,$bcc,$from,$from,$subject,$body);
$mailObj->closeHandle();

See the PHP Class here for more details.

Design Project: POC 3

Tuesday, February 7th, 2006

We’ve come a long way from the last design of our robot. We have a bunch of ideas on the go, and I have some algorithms in the works about keeping the robot centered in the hallway network. We have our third proof of concept Friday. For this, we have to upload a map, make a turn, rescue an egg and exit successfully. Here are some pics of our current design (taken on from my cell phone, so not great resolution).

poc3_3.jpeg

poc3_2.jpeg

poc3_1.jpeg

How to: Easily parse XML with PHP

Sunday, January 29th, 2006

There are so many packages and scripts online to help a person parse XML. Unless your parsing some insanely complex files, there is no need. Its as easy as this:

Code:
$batch = new DOMDocument();
$batch->load($this->webPath);
$emails = $batch->getElementsByTagName("email");
foreach( $emails as $email ) {
  $to = $email->getElementsByTagName("to")->item(0)->nodeValue;
  $cc = $email->getElementsByTagName("cc")->item(0)->nodeValue;
  $bcc = $email->getElementsByTagName("bcc")->item(0)->nodeValue;
  $from = $email->getElementsByTagName("from")->item(0)->nodeValue;
  $subject = $email->getElementsByTagName("subject")->item(0)->nodeValue;
  $body = $email->getElementsByTagName("body")->item(0)->nodeValue;
}

How to: Embed HTML within XML

Sunday, January 29th, 2006

I spent some time today trying to figure out how to embed HTML within XML. This would obviously be an issue. When the XML would be parsed, the parser would easily mistaken the HTML tags as child XML tags.

I thought using urencode() or perhaps replacing all ‘<' and '>‘ with < and >. Then it hit me, the XML standard should have thought about this. So I checked the XML 1.0 standard. They did have a solution (remove spaces):

CDATA sections begin with the string ” < ! [C D A T A [ " and end with the string " ] ] >“

Using XML to store/send emails

Thursday, January 26th, 2006

This has been a rough week, 4 assignments, an MIS, etc. I made that phpPostFix script, its very simple, but I won’t be able to use it. My main server with GoDaddy is so terrible. Its running an old version of PHP (4.2). This version did not have file_get_contents as a function, so to make up for it, a while ago, I set up the PEAR Compat replacements. To make a long story short, there are limitations in PHP 4.2 that for some reason, file_get_contents (the PEAR version used in PHP 4.2) can not accept URLs with more than 1 parameter.

I know what your thinking: “Just upgrade your version of PHP”. You obviously have never used a GoDaddy product. This is not an easy task, especially since their servers are married to Plesk and all the crap that come with it. Above all that the main server is fully deployed, I can’t risk upgrading a shady setup. Anyways, here is the script that I wrote. It does the job as explained in the previous post.

Using the script is as easy as the following:

Code:
$to = "[email protected]";
$subject = "Test email";
$query_string = 'to='.urlencode($to).'&subject='.urlencode($subject);
file_get_contents("http://www.someserver.com/phpPostFix.php?".htmlentities($query_string));

I still need to be able to send email from development server. So I spent some time thinking about it earlier this week. I’m going to write the emails which need to be sent to an XML file (send.xml). The schema will be as follows:

Code:
<email>
<to></to>
<cc></cc>
<bcc></bcc>
<subject></subject>
<body></body>
</email>

Every 5 minutes the main server will read the XML file, parse it, and send the emails based on the results. Once the emails are read, it will clear the original XML file (or append it to another backup.xml file). At this point any new emails will be written to a fresh send.xml. This method makes intuitive sense (sending in batch), especially since the development server could be sending hundreds of emails a day. Lets hope this goes without a problem.

My solution to port 25 being blocked

Thursday, January 19th, 2006

So I’ve spent the last 2 days straight trying to get a full scale mail server running on my development Ubuntu server. I had heard it wasn’t going to be easy, but it needed to happen. The plan was to go all out (Postfix + Courier IMAP + Amavisd-new + SpamAssassin + ClamAV + SASL + TLS + SquirrelMail + Postgrey). At around 7:00pm today, when I was completely defeated as to why it wasn’t working it hit me: I bet my ISP has port 25 blocked. A quick search on Google proved me right.

Depressed and annoyed I turned off my laptop, went to the library and then to the gym. While reading Data Mining slides it hit me, I have my main dedicated server at GoDaddy (not to say they haven’t done me wrong in the past). I was considering using some relay host, but I thought of something better: I’ll write a PHP script, which will reside on the main server, and will send mail the standard way (sendmail). The PHP script (lets call it phpPostFix) will be accessed as a URL, and its parameters will be simply:
- To:
- From:
- Subject:
- Body:

On the development server any time I have:

Code:
<?php

file_get_contents("http://www.xxxxxx.com/phpPostFix.php?to=[email protected]&from=.....");

?>

Then the email will be sent. Simple as that. I’ll post the script when its done.

Thesis design project: POC 2 Robot Pics

Wednesday, November 30th, 2005

Our proof of concept (POC) was today and thankfully everything went OK (see previous post). I had one of my group members take some pictures of the robot before we disassembled it. Though the design below was good for the POC, it’s not what we need for a final design. With that in mind, we took it apart, and headed back to the drawing board.

Here are some pics of the extractor robot version 0.0002

Update May 11 2006: To read about the final product go here

Lego Mindstorms

RCX Brick

LeJOS on the RCX

Lego Mindstorms

Thesis design project: Proof of Concept 2

Tuesday, November 29th, 2005

Our thesis design project this year is to make an autonomous robot (out of Lego Mindstorms) that will go into a maze and rescue black coloured eggs. This is a year long project and the class is split into teams of 5-6 people (my group has 5).

The robot will be given a text file of the map, before it enters the maze. Once in the maze the goal is simple: Rescue all the victims as fast as possible. The standard Lego Mindstorms software does not apply here. To handle the complex logic and additional constrains we decided to use LeJOS (open source Java OS for the Lego RCX brick).

Tomorrow is the second “proof of concept” and I just spent a good 6 hours today getting our robot to work. All the robot has to do for tomorrow, is go into the maze (in a straight line) detect the egg and rescue it. Seems simple enough… no. First of all, the egg is hallow and thus extremely light. Unless the egg is going to be tied down (by tape or Palestine) the touch sensor will push the egg without even detecting it. I hear using a light sensor would work better; I’ll try that in the future (or a combination!). The next issue is handling all the threads of execution that get spawned from all the listeners that are set.

Ah well, it works now, and is good for tomorrow. This project is going to take lots of work. Eventually the robot will have to make turns, clear blockages, work with a map (and without), save multiple eggs, etc. Good times.

Three things for sure in this world

Thursday, October 27th, 2005

Three things are for sure: death, taxes, and people want to take your money for nothing. My sites are hosted on a dedicated server with GoDaddy. This is a general warning: GoDaddy (and I’m sure most hosting companies) suck. All they want to do is take your money. No matter what they tell you, service is terrible.

My websites this past week started crashing – not any normal crash – only the databases failed to connect. So I email GoDaddy support, and they replied:

…it appears the amount of traffic you are receiving … exceeding the configuration settings of your web server. There is a “Max Clients” setting within Apache, and the traffic to your website is exceeding this. … We can raise this setting …. fee of $100 to your account. If you wish …. respond to this message with the last 4 digits of a credit card…

First of all — $100 to change something so simple? Anyone have problems with this, send me an email and I’ll do it for a quarter… literally… 25 cents. Secondly, MaxClients default is 10?? Whats wrong with these people? They set settings so ridiculously low, so when you have a problem they’ll charge you serious money to change it. (All the settings below were set low)

If anyone has these problems, go to your /etc/httpd/conf/httpd.conf file and change the following settings to the numbers below:

Code:
Timeout 300
KeepAlive On
MaxKeepAliveRequests 150
KeepAliveTimeout 5
MinSpareServers 20
MaxSpareServers 40
StartServers 15
MaxClients 100
MaxRequestsPerChild 75

This will give your servers some descent performance – play with these values, see what works best. Google the definition of each.

PR: Top student wins stock market challenge

Tuesday, October 18th, 2005

The Disnat DeGroote Stock Challenge recently ended, and McMaster Daily News did an article about it. The article is below — for the original click here: McMaster Daily News

————————————————————————————————

Top student wins stock market challenge

October 17, 2005

Ninety participants from the DeGroote community took part in the 2005 Disnat DeGroote School of Business Stock Market Challenge that saw more than 8,000 stock trades placed over the last few months.

On Tuesday, Oct. 11 in the Gould Trading Floor, a representative from Disnat/Desjardins awarded prizes to the winners of the challenge.

The grand prize winner was third-year commerce student Niroshan Arumugam, who received $500 for achieving the largest portfolio value and cash gain using imaginary cash. Niroshan took an initial ‘pot’ of $50,000 and by the end of the contest grew this to more than $200,000. That rate of return greatly impressed the Disnat officials who sponsored the contest. Arumugam was the winner of the month for July, August and September, receiving $150 each month. The June winner was Kate Morrissey.

During the challenge, which began in May, students tested their investment prowess against that of their professors, DeGroote staff and alumni. Special guest investor, Felix the Cat, challenged the students by making random stock investments.

The prizes were sponsored by Disnat, a division of Desjardins Securities, the brokerage arm of the Desjardins Group. George Papayiannis, a McMaster software engineering and management student, developed the simulation. The web platform is now available free online and is being used in stock challenges across North America.

————————————————————————————————

PR: Can DeGroote investors beat Felix the Cat?

Tuesday, October 11th, 2005

This PR was from May 30 2005, but since a few more are coming, I thought I’d post this one first. To read the original, see McMaster Daily News

————————————————————————————————

Can DeGroote investors beat Felix the Cat?

Investment challenge pits profs, students and others against feline competitor

May 30, 2005

Students can test their investment prowess against that of their professors in the Disnat DeGroote School of Business Stock Market Challenge. DeGroote staff and alumni are also participating. They will be challenged by special guest investor, Felix the Cat, who will make random stock investments.

The challenge begins May 31 and runs until September 30. Although participants compete with imaginary cash, monthly prizes are awarded for the largest portfolio value and cash gain. A final $500 prize is awarded to the participant making the most over the four months.

The prizes for this challenge are sponsored by Disnat, a division of Desjardins Securities, the brokerage arm of the Desjardins Group. Desjardins investment professionals are offering a complementary investment seminar to all stock market challenge participants in the fall 2005. The simulation was developed by George Papayiannis who is a McMaster software engineering and management student.

The simulation game will be hosted by Stock Boulevard and will be open to all DeGroote students, professors, staff and alumni (DeGroote family members are also welcome also). The simulation will last for four months.

Desjardins investment professionals will be offering a complementary investment seminar to all participants of the stock market challenge in the fall 2005.

To join the stock market simulation visit http://ddsc.stockboulevard.com to register.

————————————————————————————————

Various countries that accessed WebBasedCron.com in August

Wednesday, September 21st, 2005

For a site that produces a small monetary return – it’s a great feeling to see all the different countries that come and use the service. The list bellow is the different countries that have come to WebBasedCron.com in August

1 US Commercial
2 Network (cron related)
3 United Kingdom
4 Unresolved/Unknown (most likely cron related)
5 Sweden
6 Australia
7 Denmark
8 Russian Federation
9 Canada
10 United States
11 Netherlands
12 Poland
13 France
14 Italy
15 US Educational
16 Belgium
17 Switzerland
18 Singapore
19 Old style Arpanet (arpa)
20 Saudi Arabia
21 Finland
22 Malaysia
23 Czech Republic
24 Romania
25 Germany
26 South Africa
27 Japan
28 Argentina
29 New Zealand (Aotearoa)
30 Austria

McMaster Greek Society

Thursday, September 8th, 2005

This year I’m the President of the Greek Society at McMaster University. Using some basic marketing, we’re looking to re-define our image and attract as many new members as possible. Our name is in the process of changing from McMaster Hellenic Society to ΣΕΦ. Translated ΣΕΦ means “Group of Greek Friends”. Many more cool things are on the way..

I just finished getting up the new website. We’re going for a “blogging” style, so that members can come and see all the news at any point in time. The new site really fits our new image, take a look.

The Phenomenon of the Cron

Saturday, August 27th, 2005

As WebBasedCron has become bigger, and more people have been setting cron jobs, I’ve begun to notice a strange pattern. If someone wanted to set a cron job to run, say every hour, they would generally choose the start of the hour as the trigger time. This is ok in theory, but when everyone does this, it causes a huge load on the server at specific points in time.

I monitored the number of cron jobs run every minute, for a one week period. The plot below shows the number of cron jobs run every minute of a given hour. This is averaged over a one week period, but it shows what I’m talking about. You can see that on the hour, half hour, and every 15 minutes, people generally set their cron jobs to run.

The thing is that if a cron job was to run ever 15 minutes of each hour, it could be set to run on the 3rd minute, the 18th minutes, 33rd minute, etc. rather than 0, 15, 30, etc.

So next time anyone wants to set a cron job, and you want it to run once and hour, just choose a random minute, rather than 0.

WebBasedCron plot

My First Patent

Friday, August 26th, 2005

This summer I’m working for IBM in the Toronto Software Lab. I was fortunate enough to work with a couple really amazing teams, and put in a position to solve a few serious problems.

I wrote an algorithm for the WebFacing team which solves a series of problems they were having. You can learn more about WebFacing at this IBM internal site. Through the recommendation of my manager, I created a patent application and submitted the algorithm for patent approval.

I’m pretty excited about having a patent under my name, hopefully it gets approved soon.