Like many programmers right now I am interested in machine learning etc...And, as usual, I like to kind of re-invent the wheel when I'm building things for myself. It's more fun for and I get to learn a bit more about what's actually happening.
So anyhoo, I am a bit hazy on the details but, I think that generally a classifier is something that is set in stone that the program learns from.
A set of samples that are pre-defined by the person based on what the program should learn.
I wanted to build something where the program had no classifiers at all. It would build them over time based on similarities in the data. Obviously it's necessary to have a set of rules to find the data. These would be akin to the nodes however, I am not using nodes to find my data as such.
My initial experiments have been successful. I can find letters and images and the results flatten out nicely over random noise.
This was the easy part...The hard part, I think, will be the classification. I am still not entirely sure how I will go about doing it. But, I have a few ideas.
Basic Image Search and Match Process
Currently the tool makes three passes to try to determine information about the image.
1. Is a basic average. It breaks the image down into sections and then makes a comparison to previous images.
2. Looks for right angles.
3. Look for slopes.
I am planning to add one more filter for curves as well. This will help find letters like O or numbers like 8 that might match B.
Programming Languages
Currently this is built on PHP and Javascript.
Obviously the PHP is a bit slow and will be into the future moving forward so I plan to convert to Python soon.
I will likely keep the Javascript to display the results.
Database Structure
I am currently using SQLLite3. The abstraction for PHP appears to be very slow on my home server. I haven't had this issue in the past using Python. Yet another reason to switch.
CREATE TABLE class (id integer primary key autoincrement, color varchar(7),active boolean);
CREATE TABLE data (iid int,cid int,pixel boolean,x int,y int,certainty tinyint);
CREATE TABLE icinfo (iid int, cid int,x int,y int);
Classification
As stated in the opening to the article I plan to allow the program to dynamically create classifications by matching similar data.
( That's the plan )
I am thinking that I will use the K-nearest algorithm. The X or Y coord will likely be based on the weight and I was thinking of using the CID as the Y coordinate.
( I am still VERY unsure about this logic ).
The weight is built from multiple steps that basically can be broken into two main steps:
- Average pixels per section
- Average angles per section
I can store these two values as the X, Y for the K-nearest.
When the classifier runs this is based on the test image in relation to all the images.
Therefore, it can change the values and possibly even the classification of an image.
Example:
Here I show three existing classifiers. The large black dot represents an as of yet un-classified image. The cumulative distances to the images determines it's classification. Let's say in this case that the position is closest to blue. So it is classified as blue. Now let's say that in the future more green images are added but they are increasingly similar to the originally unclassified image. If enough are close to it, then it can the be reclassified as GREEN.