Wednesday, October 18, 2006

Filtering Blog posts

I use bloglines to read weblogs. I recently added a site that posts different deals on the web every day because I was going to buy a new monitor. Anyway, I still like to see what is going on, but there are a lot of deals I'm not interested in. I'd really like to be able to filter the results.

Bloglines doesn't have any filtering capability, but I could create a proxy server to do the filtering for me. So for example: Instead of giving bloglins http://feed.com/atom.xml I would give it http://myproxy.com/blogfilter/feed.com/atom.xml. My 'blogfilter' application would make the request to feed.com, get the data, filter stuff out and return the results.

How should I filter? The simplest would be by keywords. Remove entries with these keywords in the title or body. You could have per-blog filters, too. A next step might be to use Bayesian filtering. But then you'd need an easy way to "train" the system. The proxy could add a small snippet of javascript at the end of each post. The javascript would be loaded from my site and would create a little panel with options to somehow rate the blog. It would then use this data to train the bayesian filter.

Update:
Somes services that do this already:
Articles on filtering feeds: