Tuning CBC.ca with Cache Optimization

My colleague Blake Crosby has just posted a presentation about the performance and scalability tuning we do for CBC.ca.

We’d originally prepared this for the 2010 Akamai Global Customer Conference, but it wasn’t accepted there. We are now making the information publicly available in the hopes that it will help other high-volume news sites optimize their content delivery.

Getting IPv6 working for World IPv6 Day

World IPv6 Day is tomorrow, June 8 — which also happens to be my birthday. I took it as a personal challenge to see if I could get IPv6 working at home, and to report back on how difficult it was. The answer: Extraordinarily difficult and beyond the reach of the average consumer.

Neglecting the fact that no major ISP (aside from Comcast, perhaps) provides native IPv6 service to the home, one is forced to use a tunnel broker, like SiXXS or Hurricane Electric. These organizations will let you establish an IPv6 tunnel between two IPv4 endpoints (i.e. their server and your router), and they will also assign and route you an IPv6 /64 subnet. The rest — getting all the moving parts up and running — is up to you. Continue reading

A content producer’s take on Usage-Based Billing

The Canadian Radio and Telecommunications Commission recently issued a decision on usage-based billing and I’d like to comment from the perspective of a large-scale, Internet video supplier. (Insert the usual disclaimer about my opinions not representing my employer’s.)

As many readers know, I work for the Canadian Broadcasting Corporation in digital media operations. It’s extremely important for our customers — the Canadian taxpayer — to have cheap, unmetered bandwidth, so that they can watch our programming online. “Online” means not only the content that users can stream directly from our website using the CBC player, but also the content we send to our channel partners: iTunes, NetFlix, YouTube, and so on.

The adoption of usage-based billing across the board will drastically affect our ability to reach consumers over the Internet. It doesn’t take very long to go through 25 gigabytes of streaming data in a month. For Bell and other incumbent carriers to characterize anyone who uses over 25 GB/mo as a “bandwidth hog” grossly misstates the available capacity on the Internet today. Otherwise, why should it be possible for a commercial entity to purchase unlimited bandwidth ADSL service from Bell using essentially the same technology as home DSL, but without being metered?

I could continue, but let me instead quote some folks who have said it better than I could: Netflix. Here’s an interesting excerpt from Netflix’s investor relations website; specifically, their Q4 Letter To Shareholders (PDF). Obviously, I’m not speaking for CBC when I say this, but I think the comments here fairly represent the challenges that we, as an “Internet video supplier”, face under a usage-based billing regime.

Delivering Internet video in scale creates costs for both Netflix and for ISPs. We think the cost sharing between Internet video suppliers and ISPs should be that we have to haul the bits to the various regional front-doors that the ISPs operate, and that they then carry the bits the last mile to the consumer who has requested them, with each side paying its own costs. This open, regional, no-charges, interchange model is something for which we are advocating. Today, some ISPs charge us, or our CDN partners, to let in the bits their customers have requested from us, and we think this is inappropriate. As long as we pay for getting the bits to the regional interchanges of the ISP’s choosing, we don’t think they should be able to use their exclusive control of their residential customers to force us to pay them to let in the data their customers’ desire. Their customers already pay them to deliver the bits on their network, and requiring us to pay even though we deliver the bits to their network is an inappropriate reflection of their last mile exclusive control of their residential customers. Conversely, this open, regional, no-charges model should disallow content providers like Netflix and ESPN3 from shutting off certain ISPs unless those ISPs pay the content provider. Hopefully, we can get broad voluntary agreement on this open, regional, no-charges, interchange model. Some ISPs already operate by this open, regional, no-charges, interchange model, but without any commitment to maintain it going forward.

and

An independent negative issue for Netflix and other Internet video providers would be a move by wired ISPs to shift consumers to pay-per-gigabyte models instead of the current unlimited-up-to-a-large-cap approach. We hope this doesn’t happen, and will do what we can to promote the unlimited-up-to-a-large-cap model. Wired ISPs have large fixed costs of building and maintaining their last mile network of residential cable and fiber. The ISPs’ costs, however, to deliver a marginal gigabyte, which is about an hour of viewing, from one of our regional interchange points over their last mile wired network to the consumer is less than a penny, and falling, so there is no reason that pay-per-gigabyte is economically necessary. Moreover, at $1 per gigabyte over wired networks, it would be grossly overpriced.

I’ll close by giving you a sense of how outrageous a $1/GB charge is.

CBC pays pennies per gigabyte to our CDN to deliver content to the ISP’s front door. Some portion of that is the CDN’s profit, and yet they are still able to meet the marginal cost obligations of expanding their network. In fact, by using a CDN, we are paying a premium to the actual cost of the delivery of the bits, for the benefit of leveraging the CDN’s robust infrastructure, ability to scale, and many points of presence.

From an network engineering perspective, there really is no difference between a CDN and an ISP; in fact, the CDN transfers far more data per year across a far more complex worldwide data network. If our CDN can do it for such a low cost, why can’t Bell? I can only arrive at the same conclusion as Netflix: that Bell and other incumbent “last mile” providers are using their monopolistic ownership of those connections to justify outlandish charges to the customer.

What Twitter Could Learn from the Telephone System

I used to read a magazine called 2600, which was billed as a “Hacker’s Quarterly”. The title refers to the audio frequency, in hertz, used as a control tone in early analog telephone systems. Enterprising hackers discovered that a free promotional whistle in boxes of Cap’n Crunch cereal could be used to generate this tone. A whole class of phone hacking — or “phreaking” — was born. (Trivia: Apple co-founders Steve Jobs and Steve Wozniak were phreakers; Woz’s "blue box" or DTMF frequency generator, is preserved at the Computer History Museum in Mountain View, CA.)

This is an example of a system using “in-band signalling”. Both the control and data signals (audio) on early phone systems were transmitted on the same channel, thereby making the system open to compromise. Today’s modern phone networks have a completely isolated signalling system known as SS7.

Twitter is also a system with in-band signalling. I’ve always been bothered by the fact that Twitter commands — DM, FOLLOW, LEAVE, etc. — are transmitted by the user as part of the data signal (your tweet). This leads to all kinds of mistakes. For example, users have publicly tweeted when they think they have DM‘d, because they forgot to prefix their DM with “DM “. Other users accidentally expose their intent to FOLLOW or LEAVE users, due to misspelling commands (e.g. “FOLOW”).

The “@” reply prefix is also problematic. Tweets beginning with “@[username]” are only seen by the receiver’s followers, and not the sender’s. If the sender wants a wider distribution, hacks like “[email protected][username]” are used.

In-band signalling on Twitter clearly originated from the fact that it was intended to be used via SMS. Traditional mobile devices have no way to send signalling data out-of-band. What you see in 140 characters is what you get. As Twitter migrates to the desktop (or at least to rich mobile devices like the iPhone), we begin to see Twitter addressing this long-standing flaw. For example, the retweet identifier (RT) is no longer considered as part of the actual tweet, as long as one is using the Twitter API. Other metadata like geolocation and user agent are already transmitted as signalling data through the API.

Eventually, I believe that even the remaining in-band commands will transition out of the data stream. It’s only a matter of time before a celebrity’s mistweet makes the news and forces Twitter to clearly separate control from data. (Could you imagine something like President Obama accidentally tweeting “DM tonyhayward You are a first-class douchebag”?) Fortunately, they already have an API on which to build the control system. Shall we call it Twittering System Seven?

Amazon S3 backups: a proof-of-concept

Recently, I decided to experiment with Amazon Web Services‘ Simple Storage Service (S3) for online backups. This was predicated by my DLT7000 tape drive dying; when I discovered the repair cost is nearly $400, I decided to do a cost-benefit analysis using the S3 platform as a proof-of-concept before sending it off to the shop. Today’s post will review the results of that analysis. Continue reading

all I want for Christmas are some custom Apache modules

Operating an Apache httpd-based origin in conjunction with a CDN presents some interesting challenges and opportunities. For example, one can actually eliminate a lot of sophisticated cache control directives by trusting that the CDN will Do The Right Thing ™ when communicating with client browsers. Furthermore, implementation of a few judicious Apache modules and mod_expires directives can go a long way towards reducing origin bandwidth and load on the webservers.

However, dynamically-generated web pages (including those generated via SSI) can result in unnecessary cache evictions due to the inability to determine last modification time. In this article I’ll explore exactly why SSIs are so irritating from a CDN-interaction perspective and why all I want for Christmas is a CDN-aware mod_include and/or mod_expires, as per the title of this post. Continue reading

performance improvements of changing Apache MPM from prefork to worker

We at CBC.ca have made major improvements in our web platform over the last two years. When I first returned to CBC in September 2006, we were still running Apache 1.3.29 on SuSE Linux Enterprise Server. Since then, we’ve upgraded first to Apache 2.0.59 (still on SuSE) and, with the migration to Red Hat Enterprise Linux in July of this year, to Apache 2.2.8. (You can see the evolution of our web platform over at Netcraft.)

Two days after the Canadian Federal Election, we implemented the next major upgrade of that platform and that was to convert from the prefork MPM to the worker MPM. Since we monitor the performance of all our Apache servers using Cacti, I can share some detailed information about the performance improvement that has resulted from this change. Continue reading

Varnish HTTP accelerator nears 2.0 release

I’ve long been an advocate of origin HTTP caching and acceleration for large websites, something I alluded to in the post Performance Tuning and Optimization of High-Traffic Websites, which I wrote almost eleven months ago. In the early, heady days of the World Wide Web, many vendors like CacheFlow (later BlueCoat) and Nortel made HTTP caching appliances, but there are almost no such vendors in the marketplace now. I still believe there is a sound technical reason for an origin website architecture with HTTP accelerators deployed in front of it, and I’m happy to see that one recent entrant into this space, the Varnish HTTP Accelerator, is nearing a stable 2.0 release. In this post, I’ll elaborate on why I think HTTP caching solutions went the way of the dodo, why I think they should come back, and use the feature set and stated goals of the Varnish project as evidence. Continue reading

resolving the conflict between new media broadcasters and corporate IT

I returned late last week from attending Akamai Technologies’ first Global Customer Conference in Boston. Intended to bring together Akamai’s major customers, in order to share knowledge and information about current and future Akamai products, I think I derived more insight out of my conversations with other media & entertainment customers than out of the program material. I’ll explain why. Continue reading

Quicktime caching of Windows Media payloads

A while ago I wrote a post about how the Windows Media video experience is sub-optimal on non-Windows computers — in particular, Macintoshes — and why I think this will trigger a run towards Flash on-demand and eventually Flash live. Here’s a concrete example: over the last six months to a year, and perhaps longer, we’ve been dealing with a steady stream of user complaints that the nightly newscast of CBC’s The National (insert promotional tagline about “Canada’s most trusted news source, hosted by newly-announced Order of Canada member Peter Mansbridge”) is frequently “out of date”. While I haven’t totally nailed down why this might be the case, I do note that most complainants seem to be using Quicktime to play back the stream, with Flip4Mac (ugh) as the translation layer. I believe that with so many moving parts, something is inevitably going to go wrong. Continue reading