ooma Outages - Corrective action plan
Posted: Wed Apr 15, 2009 5:25 pm
There have been a number of discussions internally about the two outages this week. Needless to say, we are very disappointed in our handling of the situation. Although there are some external factors we don't control, there are many that we do, and we clearly have room to improve. We've listened to all the feedback that the community has been kind enough to share and I want to take a moment to go over some of the steps that we are taking to improve communication about major outages and how we plan to mitigate/avoid them in the future.
Communication
Communication
- We have closed a loophole that prevented the product marketing team from being immediately notified about major outages during off-hours. This prevented us from responding in the forums in a timely manner this morning.
- We are adding a mechanism for us to display outage notifications on our homepage and Lounge login page. The notification will be displayed at the top of the home page and link to our status in the forums. This should be live tomorrow.
- We have created an official ooma_status twitter feed. You can now find us at http://twitter.com/ooma_status/. This gives you RSS and SMS notification options as well.
- We will send outage notification e-mails to customers who are subscribed to our marketing communication list. If you are not subscribed, you can do so here: http://www.ooma.com/contact/. Be sure to click the "I currently own an ooma system" checkbox. This is the only opt-in list we can use right now where we can blast tens of thousands of e-mails without getting blacklisted. In the future, we will consider allowing separate opt-in's for outage vs marketing.
- We are adding a second upstream Internet provider. We have hardware on order, hope to have it live next week.
- We are doubling the capacity of our provisioning systems so we can accelerate the rate of recovery. That should be live by the end of the week.
- We have found two issues that prevented the ooma Hub from automatically recovering during the last two outages. The fix is being tested as we speak and will remove the need for a manual power cycle.
- We are adjusting our back-off and retry mechanisms to prevent the Hubs from placing undue load on our system during outage recovery. That should be complete by the end of the week.
- Longer term, we are planning to add a second data center in the Midwest/East Coast.