Filtering Feeds
As mashups and feed-centric apps abound, the need to filter feeds is becoming more prevalent. The idea behind a feed filter is that you may only want to see items from an RSS-based publication on a certain topic, keyword, or tag. This is an essential aspect of the “newsmastering” domain.
ReFilter Reconsidered
ReFilter is an online feed filter service that lets you filter an RSS feed based on key words. It is a pretty handy offering, but in my recent coverage of Kitchen Sink I noted that the use of ReFilter as “man-in-the-middle” might be troublesome under load. It just doesn’t lend a sense of permanency.
So, I spoke with Kitchen Sink developer Marjolein Hoekstra today and she said the developer of ReFilter has asked her not to use it, because of the potentially high volume that could go through his servers. He does provide his source code if other people want to host it, however.
MySyndicaat URL-based Filters
Kitchen Sink is already using MySyndicaat to combine all of the marketing feeds. I learned from Marjolein that MySyndicaat allows for filtering right on the feed URL, by keyword and date ranges, in fact. I knew you could do this sort of thing when setting up a feedbot, but I didn’t know you could do “post filters” on the feedbot feed itself. Accordingly, Marjolein has dropped the use of ReFilter and is now using the URL feed filtering directly from MySyndicaat.
Here’s how it works:
Once you have a feedbot set up in MySyndicaat, the filtering syntax on any given feedbot RSS is simple. For example, this feed URL is for Top US news:
http://mysyndicaat.com/myfeed/feed/MySyndicaat_Top%20News%20-%20US
Add “?query=Bush”, and you now can get only US news RSS items that mention the keyword “Bush“. Add “&daterange=2007-02-04%20to%202007-02-05″ to only see items about Bush on February 4th and 5th.
Cache in Hand
MySyndicaat uses a 15 minute cache to aid in performance on feed retrieval, but I wasn’t sure whether adding query parameters would bypass their cache. Checking in with Giovanni Guardalben at KipCast, he assured me that query results are also cached. Accordingly, whoever does the query first will get a slower result than any repeated requests.
I’ve already taken advantage of this on my JetStream project to provide tag-specific feeds… it works perfectly. (Example) This is really useful functionality, and I fully expect to see it used in a lot more places in the future.

