Preventing rsync from doubling–or even tripling–your S3 fees.

Using rsync to upload files to Amazon S3 over s3fs?  You might be paying double–or even triple–the S3 fees.

I was observing the file upload progress on the transcoder server this morning, curious how it was moving along, and I noticed something: the currently uploading file had an odd name.

My file, CAT5TV-265-Writing-Without-Distractions-With-Free-Software-HD.m4v was being uploaded as .CAT5TV-265-Writing-Without-Distractions-With-Free-Software-HD.m4v.f100q3.

I use rsync to upload the files to the S3 folder over S3FS on Debian, because it offers good bandwidth control.  I can restrict how much of our upstream bandwidth is dedicated to the upload and prevent it from slowing down our other services.

Noticing the filename this morning, and understanding the way rsync works, I know the random filename gets renamed the instant the upload is complete.

In a normal disk-to-disk operation, or when rsync’ing over something such as SSH, that’s fine, because a mv this that doesn’t use any resources, and certainly doesn’t cost anything: it’s a simple rename operation. So why did my antennae go up this morning? Because I also know how S3FS works.

A rename operation over S3FS means the file is first downloaded to a file in /tmp, renamed, and then re-uploaded.  So what rsync is effectively doing is:

  1. Uploading the file to S3 with a random filename, with bandwidth restrictions.
  2. Downloading the file to /tmp with no bandwidth restrictions.
  3. Renaming the /tmp file.
  4. Re-uploading the file to S3 with no bandwidth restrictions.
  5. Deleting the temp files.

Fortunately, this is 2013 and not 2002.  The developers of rsync realized at some point that direct uploading may be desired in some cases.  I don’t think they had S3FS in mind, but it certainly fits the bill.

The option is –inplace.

Here is what the manpage says about —inplace:

This option changes how rsync transfers a file when its data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is complete, rsync instead writes the update data directly  to  the destination file.

It’s that simple!  Adding –inplace to your rsync command will cut your Amazon S3 transfer fees by as much as 2/3 for future rsync transactions!

I’m glad I caught this before the transcoders transferred all 314 episodes of Category5 Technology TV to S3.  I just saved us a boatload of cash.

Happy coding!

– Robbie

The new transcoder is proving itself.

Well, we’ve been on the new transcoders for one week now, and I’m excited to see the impact.

Last night was the first night where I was able to initiate an automated transcode of an episode shortly after we signed off the air.

There are still some things I need to work out.  For example, I could not initiate the conversion until I had imported the photos, because the transcoder uses the episode’s image for the ID3 cover art on MP3 transcodes.  So after I finished choosing and uploading the images for last night’s show, I fired the transcoder.

Here’s the log output:

Episode 314 Begin:  Tue Sep 24 20:49:01 EDT 2013
  Create Thumbnail Files Begin:  Tue Sep 24 20:49:01 EDT 2013
  Create LD File Begin:  Tue Sep 24 20:49:01 EDT 2013
  Create HD File Begin:  Tue Sep 24 20:49:01 EDT 2013
  Create WEBM File Begin:  Tue Sep 24 20:49:01 EDT 2013
  Create MP3 File Begin:  Tue Sep 24 20:49:01 EDT 2013
  Create SD File Begin:  Tue Sep 24 20:49:01 EDT 2013
  Create Thumbnail Files Complete:  Tue Sep 24 20:49:57 EDT 2013 (0d 0h 0m 56s)
  Create MP3 File Complete:  Tue Sep 24 20:56:55 EDT 2013 (0d 0h 7m 54s)
  Create LD File Complete:  Tue Sep 24 21:37:23 EDT 2013 (0d 0h 48m 22s)
  Create WEBM File Complete:  Tue Sep 24 22:39:23 EDT 2013 (0d 1h 50m 22s)
  Create SD File Complete:  Tue Sep 24 22:40:07 EDT 2013 (0d 1h 51m 6s)
  Create HD File Complete:  Tue Sep 24 23:20:05 EDT 2013 (0d 2h 31m 4s)
  Move Master File Begin:  Tue Sep 24 23:20:05 EDT 2013
  Create Master File Complete and Finish Job:  Tue Sep 24 23:20:05 EDT 2013 (0d 2h 31m 4s)

It was less than 8 minutes after I initiated the transcoder that the MP3 RSS feeds received the new episode.  Just a little more than 48 minutes after initiating the transcoder, the Low Definition (LD) file completed.  The show went up on the web site almost immediately after that (the files first get sync’d to our CDN and then added to the database, automatically).

All files (MP3, LD, SD, HD and WEBM) were complete in just 2 hours 31 minutes 4 seconds, including all distribution, even cross-uploading to Blip.TV (also automated now).

From 17 hours to only 2.5 hours.  This thing is incredible.

And that means, on average, we’ll be able to transcode nearly 10 episodes per day — almost double the turnaround of our first week.  That means the job which was estimated to take 72 days on our main server alone has been cut to only a day or two longer than one month.  In just one month from now, all back episodes – six years worth of Category5 TV – will be transcoded.

Why am I so excited about this “transcoder” thing?

There’s something I’ve been really excited about the past little while, and some may not understand why.

It’s the new Category5 Transcoders.

Transcoding is the direct analog-to-analog or digital-to-digital conversion of one encoding to another, such as for movie data files or audio files. This is usually done in cases where a target device (or workflow) does not support the format or has limited storage capacity that mandates a reduced file size,[1] or to convert incompatible or obsolete data to a better-supported or modern format. [Wikipedia]

Here is what I wanted to achieve in building a custom transcoding platform for Category5:

  1. Become HTML5 video compliant.
  2. Provide screaming fast file delivery via RSS or direct download.
  3. Provide instant video loading in browser embeds, with instant playback when seeking to specific points in the timeline.
  4. Provide Flash fallback for users with terrible, terrible systems.
  5. Ensure our show is accessible across all devices, all platforms, and in all nations.
  6. Make back-episodes available, even ones which are no longer available through any other means.
  7. Reduce the file size of each version of each episode in order to keep costs down for us as well as improve performance for our viewers.
  8. Ensure our video may be distributed by web site embeds, popup windows, RSS feed aggregators, iTunes, Miro Internet TV, Roku, and more.
  9. Ensure our videos are compatible with current monetization platforms such as Google AdSense for Video.

In the past, we’ve been limited to third-party services from Blip and YouTube.  Both of these services are huge parts of what we do, but relying on them exclusively has had some issues:

  1. Both Blip and YouTube services are blocked in Mainland China, meaning our viewers there have trouble tuning in.
  2. Both services, in their default state, require manually labour in order to place episodes online in a clean way (eg., including appropriate title, description and playlist integration).
  3. Blip does not monetize well.
  4. YouTube monetizes well on their site, but they restrict advertising on embeds (so if people watch the show through our site rather than directly on YouTube, we don’t get paid).

The process of transcoding the files and making them available to our viewers has been a onerous task since the get-go.  We grew so quickly during Season 1 that we didn’t really have the infrastructure to provide the massive amount of video that was to go out each month.  We had one month in 2012 for example, where we served nearly 125 Terribytes of video.

It takes me many hours each week just to make the files available to our viewers, and the new transcoder has been developed to cut that task down to only a few minutes, while simultaneously pumping out the video much, much faster.

I’ll try to explain how this happens in a mockup:

Old Vs. New Transcoding ProcessThe new transcoder not only does things faster: it does things simultaneously.

While transcoding the files for the RSS feeds, it has already placed a web-embedded copy of the show on our web site, in as little as 45 minutes.  Not only that, but once it’s all said and done, the transcoder server then automatically uploads the file to Blip.

The new transcoder consists of two servers at two different locations sharing the task itself, and then the files are distributed through two of our CDNs (one which is powered by Amazon, the other is our own affordable solution based on the old “alt” feed model).

We have been working with the team at Flowplayer, who are soon to introduce a public transcoding and hosting / distribution service for content providers.  With this new relationship, we will be able to serve up ads in a friendly way to help offset distribution costs.  This also means we now have our own embed player, no longer relying on YouTube or Blip’s embedded player.

This means, viewers in Mainland China can now watch Category5 directly through our main web site.  No more workarounds!

As long as we can offset the added expense of self-hosting video, this could lead to some great things.  I’ll be keeping an eye on it over the next while, and encourage you to submit your feedback.  I love the idea of Category5 finally being accessible to everyone, everywhere, and very quickly following each show.  I also love that my Tuesday nights will no longer be so arduously long.

Transcoders are a very difficult thing to explain, and the way we’re doing it is hard to explain, but to me, it’s exciting.  Just know that it means “everything is better than ever”, with fast video load time through our site, RSS feeds that are more than 10x faster than before, global access (even in Mainland China), and room to grow.

I’m currently running the system through countless tests, but the transcoders are live.  It will be working its way (automatically) through back-episodes, so you’ll start to see the YouTube player disappearing from the site, replaced with our own player.  Eventually, all 312+ episodes will be available.

Thanks for growing with us!

– Robbie