stty -sane: Mining CVS Data

I've been reading a book on Data Mining and was wondering how I might actually apply what I've read. Fixing some bugs today got me thinking about mining CVS data. If your cvs comments can be linked to defects (i.e. the comment contains a defect number) you might be able to make some useful observations.

Code Complexity
If a "chunk" of code has lots of defects assigned to it, that might indicate a high level of complexity. A "chunk" could be lots of things: a function, a class, a module/package, anything. My intuition is that the more bugs, the more difficult the code is to understand, maintain and test.

Probabalistic Reasoning
Given a "commit" into cvs, what is the probability it will contain a bug? What factors play into the decision?

Number of lines changed.
Number of previous bugs in this chunk?
Did the current author make the previous, change, or is she working on this code for the first time?
Number of dependancies introduces (i.e. a new import statment in a java class).

stty -sane

Monday, March 28, 2005

Mining CVS Data

1 comment:

Blog Archive

About Me