Monday, March 28, 2005

Mining CVS Data

I've been reading a book on Data Mining and was wondering how I might actually apply what I've read. Fixing some bugs today got me thinking about mining CVS data. If your cvs comments can be linked to defects (i.e. the comment contains a defect number) you might be able to make some useful observations.

Code Complexity
If a "chunk" of code has lots of defects assigned to it, that might indicate a high level of complexity. A "chunk" could be lots of things: a function, a class, a module/package, anything. My intuition is that the more bugs, the more difficult the code is to understand, maintain and test.

Probabalistic Reasoning
Given a "commit" into cvs, what is the probability it will contain a bug? What factors play into the decision?
  • Number of lines changed.
  • Number of previous bugs in this chunk?
  • Did the current author make the previous, change, or is she working on this code for the first time?
  • Number of dependancies introduces (i.e. a new import statment in a java class).

Thursday, March 03, 2005

I recently ran across a nice project that has some support for memory pools in java: Since it works with the current JVM you need to do a bit of work. Instances of a pool context are created and bound to the current thread. Then, any objects created using a special factory are actually allocated in the pool. When the thread leaves the pool context the objects are recycled.