Page 1 of 24

ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 5:25 pm
by Dennis P
There have been a number of discussions internally about the two outages this week. Needless to say, we are very disappointed in our handling of the situation. Although there are some external factors we don't control, there are many that we do, and we clearly have room to improve. We've listened to all the feedback that the community has been kind enough to share and I want to take a moment to go over some of the steps that we are taking to improve communication about major outages and how we plan to mitigate/avoid them in the future.

Communication
  • We have closed a loophole that prevented the product marketing team from being immediately notified about major outages during off-hours. This prevented us from responding in the forums in a timely manner this morning.
  • We are adding a mechanism for us to display outage notifications on our homepage and Lounge login page. The notification will be displayed at the top of the home page and link to our status in the forums. This should be live tomorrow.
  • We have created an official ooma_status twitter feed. You can now find us at http://twitter.com/ooma_status/. This gives you RSS and SMS notification options as well.
  • We will send outage notification e-mails to customers who are subscribed to our marketing communication list. If you are not subscribed, you can do so here: http://www.ooma.com/contact/. Be sure to click the "I currently own an ooma system" checkbox. This is the only opt-in list we can use right now where we can blast tens of thousands of e-mails without getting blacklisted. In the future, we will consider allowing separate opt-in's for outage vs marketing.
Mitigation/Prevention
  • We are adding a second upstream Internet provider. We have hardware on order, hope to have it live next week.
  • We are doubling the capacity of our provisioning systems so we can accelerate the rate of recovery. That should be live by the end of the week.
  • We have found two issues that prevented the ooma Hub from automatically recovering during the last two outages. The fix is being tested as we speak and will remove the need for a manual power cycle.
  • We are adjusting our back-off and retry mechanisms to prevent the Hubs from placing undue load on our system during outage recovery. That should be complete by the end of the week.
  • Longer term, we are planning to add a second data center in the Midwest/East Coast.
No doubt there will always be room for improvement, and we will continue to look to the community for feedback.

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 5:36 pm
by niknak
Thank you for your response, it sounds like OOMA is taking very positive steps to provide uninterrupted service

PS: If you decide to constuct an East Coast Datacenter in the metro NY/NJ area, I'd apply for a job!

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 5:44 pm
by trim81
Dennis:

I suggest Ooma make the Twitter open-commentary (public views), so we as a whole, can update as necessary (aka, as soon as outage strikes)

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 5:59 pm
by number9
This sounds like an excellent plan Dennis!! Very good news!!! Thanks for the update.

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 6:05 pm
by dlong
Awesome. As they say, turn your scars into stars. The current outages were blessings in disguise and real-world tests in preparation for a smoother future. Glad to see this company learning from this and now getting all its ducks in a row, so-to-speakish!

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 6:11 pm
by tjnamtiw
Outstanding, professional approach to the problems. In my 50 years in business, I've rarely seen such a positive response.

Congrats from an ooma 'believer'

Tom

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 6:31 pm
by tommies
This is good new. Thanks Dennis.

Regarding this morning outage, I notice a difference from the one on Mon.

On 4/13. my hub status said ooma core: connecting, please wait . . .
this morning, my hub status said ooma core: registering, please wait ...

It seems to me that the servers were overloaded with too many hubs try to establish the tunnels at once.

I don't know what the engineers would do, but I have an idea regarding this morning situation. It could be divide into say 10 groups by the last digit of our phone numbers, and then give each group an exclusive time window to try. Something like the way the Fed sent out stimulus check last year.

It's just my thought, and I'm glad if it helps.

Thank you again.

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 7:10 pm
by parmenides
Nice response. Thank you.

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 7:27 pm
by dank
All this actually forced me to register on the forums after months of lurking just to say...thank you.

The cable company never listened. That's why I'm here.

Re: ooma Outages - Corrective action plan

Posted: Wed Apr 15, 2009 8:16 pm
by daet
I'm pleased that Ooma has taken such a proactive stand on the outage problem. I really like the approaches you have outlined, and thank you for being upfront about the issues involved.

DG