Using Git and git-annex as a podcast client

Published:

Last Modified:

A year or so ago I started using Joey Hess’s Git-Annex to manage backing up and synchronizing files across several computers. It’s worked quite well for me. It has an “Assistant” feature which enables you to create something akin to a Dropbox replacement. There are some special features which allow backing up files to Rsync.net, Amazon S3, and Box.net among others. Files uploaded to cloud services are transparently encrypted. It’s pretty neat stuff.

One feature that I recently learned about git-annex is that it can be used as a podcatcher. This is pretty neat. You can download a podcast once and synchronize it to your various devices. I may listen to a podcast on:

I’d like to be able to synchronize the podcasts on the various devices. Most of this comes from the documentation on the git-annex website.

To start, I created a new repository on my home server (a Cubox i4Pro running fairly stock build of Debian. I’ll put together a post on this pretty sweet device later on.

mkdir Podcasts; cd Podcasts
git init
git annex init

I then cloned the repository on my laptop.

git clone ssh://host/directory/of/repository
git annex init "Laptop"
git annex sync

The appropriate git-annnex command for importing a feed is git annex importfeed URL. By default, this creates a subdirectory and filename according to the following: '${feedtitle}/${itemtitle}${extension}'. I find that the default location isn’t very useful since many podcasts don’t include the date in the item title. If they do, the date isn’t usually in a place for convenient filesystem sorting.date.

I ended up using the following command to download the feeds:

git annex importfeed --template='${feedtitle}/${itempubdate}-${itemtitle}${extension}' URL`

There’s a neat program often installed on Linux systems called xargs. This program takes the lines in a file and passes them to another program. I created a file called podcasts.txt and added it to the git repository. podcasts.txt contains a list of podcast RSS feeds separated by a newline.

I then added a file to the repo called getpodcasts.sh which includes the following:

#!/bin/sh
xargs git annex importfeed --template='${feedtitle}/${itempubdate}-${itemtitle}${extension}' < podcasts

I added this script to my crontab. Everyday, my server will fetch the RSS feed and download the latest episodes. My other devices will download the files from my home server – saving the bandwidth of the podcast producers.

Initial Import

For the initial import, I don’t necessarily want to have git-annex download the entire history of each podcast. Some of the RSS feeds go all the way back to the dawn of podcasting. The --fast flag for git-annex will import the feeds in their entirety to the git-annex repo, but won’t download any of the associated files.

xargs git annex importfeed --fast  --template='${feedtitle}/${itempubdate}-${itemtitle}${extension}' < podcasts

Future Stuff

I would really like to be able to synchronize the podcast state across the various devices. Near as I can tell, there aren’t any Git-annex aware podcast clients. One potential way to do it would be to use another neat feature of Git-Annex which allows you to store metadata about each file in the repository. Joey Hess put together a nice demo for the feature. He uses it to present data in different views.

I could imagine making a post-import hook to add an “unread” boolean flag to the metadata when the podcast is downloaded. Once the file has been listened to, the flag should be changed to “false” and the file removed from the players. Retaining a copy on a full backup disk somewhere makes sense.

The trickier part is going to be saving the last listened position. Maybe it would be good to update a “last time” field every 1-5mins in order to save an approximate last listened to location. I don’t really know how expensive the operation that writes the metadata out is.