A small agency provides online presence and social media management services to high-end customers: hedge funds, law firms, oil tycoons. There's many players in the field, from low-cost freelancers to established brand names; the agency is able to survive in this crowded market because they offer top-notch security and 5-star customer service.
Most of the computing infrastructure is hosted locally, in the basement of the agency's headquarters. It's an aging data center, with a capricious diesel generator, multiple unreliable high-speed internet connections, and overloaded server racks. The infrastructure is a major liability, with repeated outages causing the loss of customer accounts.
A thorough analysis indicates that to be truly resilient, the data center would require significant upgrades; a secondary site for disaster recovery would have to be built, and systems administrators would have to be hired. None of those are part of the agency's core competencies, so a decision is made to migrate the entire infrastructure to the cloud
The agency's business doesn't run on a commercial software platform, because none of the products that have been explored proved flexible enough to meet the needs of the demanding customers. The agency has instead opted for a highly customized approach, building new capabilities as needed. The result is a complex ecosystem with a constellation of bespoke systems partially integrated with shared components.
As the migration to cloud computing begins, the IT staff wants to rearchitect the entire infrastructure and make it cloud-friendly. They want to use microservices, an API gateway, event queues, containers. They want to go serverless and use native services.
A year goes by. Then two. The new cloud architecture looks good on paper but the agency cannot do a stop-the-world migration, so the IT staff spends a lot of time in firefigthing mode: repairing the old infrastructure, adding features to old systems, patching old databases. There's no end in sight and not much time is spent on the cloud project.
Things take a turn for the worse on a Tuesday. As they perform repairs on broken pipes, a city crew accidentally damages power lines down the street. The agency's building goes dark, including the data center in the basement where the diesel generator refuses to start. Employees scramble to install new generators and UPS units, but for hours the agency is dead in the water. As a result, two of the biggest accounts are lost.
Management can no longer afford delays in the cloud project. They know they need external help to move things along, so they reach out to us.
As we get introduced to the IT staff, we realize that while the agency's ecosystem is quite complex, they have a solid handle on things. The cloud architecture they designed is excellent, and every single member of the staff embodies the agency's core values of security and customer service. The target is therefore not the issue; they just need a better roadmap.
The least elegant way to leverage cloud computing is called a lift and shift: simply clone the local infrastructure and host it in the cloud. It's crude and typically not cost-effective. In their quest for a better infrastructure, the IT staff initially turned away from a lift and shift, and opted instead for something much, much worse: the "might as well" approach.
This type of approach works well when there's unlimited time and budgets, not when the local infrastructure is falling apart and eating up 40% of the IT staff time.
We decide to implement a strategy loosely based on the four steps of the VisibleOps approach.
The agency's infrastructure is now rock-solid, and the IT staff can take things to the next level without our assistance. We helped the client reach this happy state in a much shorter time than they initially expected and helped boost morale in the organization as a whole in the process. The agency can now focus on delivering value to their customers.
As for the aging data center: with the exception of the diesel generator, everything has been left in place. It was initially considered an emergency exit plan in case cloud computing proved unreliable, but has since been "upgraded" to a gaming infrastructure for the employees to enjoy during lunch time.