better full-text RSS

Remember when I asked for help testing a script I'd written that turns partial-text RSS feeds into full-text ones? And it sort of sucked, but not entirely?

Well, I thought about it some more and came up with a way to significantly improve the algorithm. So I took another pass at it (improving the caching mechanism while I was at it) and I'm pretty happy with the results. But I'd like to test it some more before moving it to its final home. If you've got a partial-text feed that's been bugging you, please give it a try. And if it fails to work on any particular feeds, please let me know about them in comments.


UPDATE: Looks like the dates on the feeds produced by this tool are screwed up. I'll fix that soon.

Comments

Yglesias's feed (http://www.matthewyglesias.com/feed) gives this error, though it does work if I put the feedburner url in directly.

XML Parsing Error: xml declaration not at start of external entity Location: http://www.metamonkey.net/fulltextrss2/?url=http%3A%2F%2Fwww.matthewyglesias.com%2Ffeed Line Number 1, Column 571:
 

give it another shot, Matt. I was screwing around with the code just a second ago, and you may have hit the script in between saves.

 

I just tried this with Ken Silverstein's Washington Babylon blog on Harper's, and it seems to collect everything that is put up on the front page of Harper's, as well as returning "(full text retrieval failed)" for the two most recent posts (as well as some others).

This is a great idea - thanks for putting it up.

 

Hmm. Yeah, I can see that it behaves weirdly with Harper's. Unfortunately there may not be much to be done about that. The script relies on the target site being constructed with certain HTML tags, and on the HTML being (relatively) correct. If those conditions are met, it should only fail when posts consist entirely of non-text HTML (youtube clips & that sort of thing).

 

If it's any consolation, works perfectly on Dan Drezner's blog, the other one I was looking for full feeds for.

 

It works about half the time for the SciAm feed, but a lot of them look like this in my Newsfire:

The sudden appearance of a large self-copying molecule such as RNA was exceedingly improbable. Energy-driven networks of small molecules afford better odds as the initiators of life. (full text retrieval failed)
 

Post a comment