For the last year or so I've been keeping an eye out for interesting articles on the Semantic Web, NLP (Natural Language Processing) and other goodies. Most of the work around the Semantic Web seems to be defining ways to annotate your content to contain semantic information. Other work centers around using NLP to extract semantic information from content that isn't marked up.
Microformats (www.microformats.org) falls into the first camp. Instead of being accademic, however, they offer very practical ways to add semantic information to your content (i.e., this represents a person, event, etc ...). As simple as the formats are, nobody is going to add this to content without the help of tools .
My questions are how will new Microformats (or whatever ends up being used by our tools) be derived? Will there be a voting procedure? Some sort of standards group? I can image lots and lots of different formats. And how will those be integrated into new tools?
One final thought is: will advances in NLP make semantic markup obsolete?
Saturday, August 20, 2005
Monday, March 28, 2005
Mining CVS Data
I've been reading a book on Data Mining and was wondering how I might actually apply what I've read. Fixing some bugs today got me thinking about mining CVS data. If your cvs comments can be linked to defects (i.e. the comment contains a defect number) you might be able to make some useful observations.
Code Complexity
If a "chunk" of code has lots of defects assigned to it, that might indicate a high level of complexity. A "chunk" could be lots of things: a function, a class, a module/package, anything. My intuition is that the more bugs, the more difficult the code is to understand, maintain and test.
Probabalistic Reasoning
Given a "commit" into cvs, what is the probability it will contain a bug? What factors play into the decision?
Code Complexity
If a "chunk" of code has lots of defects assigned to it, that might indicate a high level of complexity. A "chunk" could be lots of things: a function, a class, a module/package, anything. My intuition is that the more bugs, the more difficult the code is to understand, maintain and test.
Probabalistic Reasoning
Given a "commit" into cvs, what is the probability it will contain a bug? What factors play into the decision?
- Number of lines changed.
- Number of previous bugs in this chunk?
- Did the current author make the previous, change, or is she working on this code for the first time?
- Number of dependancies introduces (i.e. a new import statment in a java class).
Thursday, March 03, 2005
javolution.org
I recently ran across a nice project that has some support for memory pools in java: javaolution.org. Since it works with the current JVM you need to do a bit of work. Instances of a pool context are created and bound to the current thread. Then, any objects created using a special factory are actually allocated in the pool. When the thread leaves the pool context the objects are recycled.
Subscribe to:
Posts (Atom)