Dec 2 ’13
The Cloud Is Ready for Your Data Warehouse, Are You?
The business world continues to evaluate and implement the cloud for some of its IT requirements. The concept of the cloud as a viable IT storage solution, as well as a way to cut costs, is gaining momentum. But it might prompt the question, is the cloud the right place for a data warehouse?
This is an interesting question for many, and a problematic question for some. For most large IT organizations, the most often-cited answer is, “We’re concerned about security, especially customer-sensitive or business-critical information.” Here we will address that issue from a somewhat historical viewpoint, but also talk about the trends and upsides we believe will shape this question in the coming years.
It’s quite clear that many of the issues and concerns about the cloud are abating. A Gartner research report that presented its IT predictions for 2012 and beyond stated, “At year-end 2016, more than 50 percent of Global 1000 companies will have stored customer-sensitive data in the public cloud.”
Like many things, sometimes what we think is new (the cloud) isn’t really so new after all. In the ’70s, many companies couldn’t afford their own computers (mainframes). So they connected, often through dial-up, to these large systems in other data centers. It wasn’t called “the cloud”; it was called “time-sharing.” Important and critical information for companies was “out there” on someone else’s computers. Since then, security of shared and managed resources has always been a priority for the provider of those resources.
Data storage in the cloud probably isn’t perfect for every application. There can be compelling reasons, sometimes contractual ones, where on-premise storage is an absolute requirement. There can be instances where the required latency or speed of certain transactions require a non-cloud solution. However, there are few data warehouse situations where blinding submillisecond speeds are required. When it comes to the business of data warehousing and analytics, there are many environments where the cloud could be used as a very cost-effective solution.
Security is an interesting discussion. While organizations may feel more secure having those hard disks and servers down the hall, the argument could also be made that putting that data in the hands of a company whose very existence depends on its ability to provide stable and secure environments is about as risk-free as you can get. Large cloud providers, with their powerful security teams, are probably more skilled at security and compliance than the handful of security professionals at your company.
Food for thought, how many of your company’s offices are set up like Ft. Knox with absolute physical security? Within those semi-secured buildings, how many Ethernet jacks are hidden under a desk, behind a door ... being used by anyone with physical access ... employee, contractor and data thief alike? Does network security know about every renegade router, every personal Wi-Fi access point? Can they physically stop the wireless signals from extending out into the parking lot or across the street, providing convenient access to anyone with the right skills? That’s highly unlikely to impossible. Now, how many people have access to your cloud provider’s facilities? These are secured similar to military bases. And while you need to be extremely prudent in any scenario, cloud or on-premise of network hacking intrusions, the fact is that 99 percent of on-premise systems today are far more exposed and vulnerable than the top cloud provider solutions will ever be.
Clouds on the Horizon
It’s unlikely that organizations with huge, enterprise data warehouse investments will want to switch over their entire data warehouse and analytics platform to a cloud environment. We aren’t hearing or seeing a lot of buzz in that regard. However, for even large organizations, if there’s data that’s siloed or has variable demands, the cloud might be a relocation possibility in order to reduce costs.
In addition, as more and more medium-sized organizations hear and read about the benefits of Business Intelligence (BI), analytics systems and data discovery that large companies are using, there’s an interest bubble that’s floating downstream. Some of these companies don’t have a legacy data warehouse or the on-premise infrastructure to build and manage one. These organizations are perfect candidates for a cloud-based Data Warehouse-as-a-Service (DWaaS) approach. We’re seeing more and more interest in that market.
One primary issue that has held things back in the past is pricing. With a cloud-based platform that is scalable and elastic, with lowered tool prices and improvements around BI and Extract, Transfer, Load (ETL), we may be approaching a point where large-scale data warehousing benefits can now reach down into smaller companies.
Amazon Web Services Redshift
One of the latest cloud offerings that was purpose-built for data warehousing is Amazon Web Services (AWS). Within the last year, AWS announced and released Amazon Redshift, a likely game-changing platform to the world of data warehousing. Amazon Redshift is a petabyte-scale data warehouse service.
From a pricing perspective in its simplest form, Amazon Redshift has no upfront cost. Beginning users can provision a new instance with a few clicks, and in a matter of minutes be up and running with 2 TB of space for 85 cents per hour. Later, as storage requirements and utilization grow, users can also take advantage of reserved instance pricing with a lower, longer-term price model that’s less than $1,000 per TB per year.
While the low cost and overall performance characteristics are compelling, Amazon Redshift’s key differentiator lies more in its capabilities for both scalability and elasticity. With traditional data warehouse appliances, you can only take advantage of the computing capacity they own. In comparison, Amazon Redshift changes the paradigm to simply take advantage of as much computing power as needed when and only when it’s needed. The complete ramifications of that approach to pricing and elasticity are just now being evaluated and understood by customers. It will be interesting to watch how this disrupts traditional models of both pricing and deployments.
Some of the most commonly touted benefits of looking into a cloud-based infrastructure, in particular, Amazon Redshift, include:
• Cost savings. The potential to save substantial costs over the years is a powerful argument—and the consideration of no upfront costs can sometimes be too good to ignore. Couple that with the flexibility of on-demand and metered pricing, and savings can quickly balloon.
• Capital expense considerations. Companies have the ability to use large databases without going through a time-consuming procurement process to obtain the hardware and software—a capital expense. Some organizations are becoming proponents of the pay as you go Infrastructure-as-a-Service (IaaS) concept.
• Rapid deployment. Reduced pricing allows you to focus on developing, testing and delivering even small or “proof of concept” BI and analytics projects that show strong ROI that in turn fuel new, focused projects.
• Enterprise-ready. Amazon Redshift appears to be capable of the speed and performance enterprise users need. Peak workload management is also a consideration to economically deal with unplanned and unforeseen demands. Amazon Redshift can scale to enormous capacity, with capabilities into the petabyte range.
There are other considerations that aren’t discussed as often. One interesting benefit is something called data monetization. Data monetization simply means taking the efforts associated with maintaining your business data and turning that data into a profit center.
Cloud-based data, when properly structured and protected, is accessible via the internet. Not earthshaking by itself, but that simple concept offers tremendous opportunities for collaboration and monetization. How valuable would it be if partners to your business have access to some of your information? How valuable would it be if you had access to data from your partners or suppliers? What if you had easy and inexpensive access to additional third-party data for demographics, climate information and so on? When companies partner to share information, new patterns of revenue, sales campaign effectiveness, market reach and so on, can be identified—and all partners benefit.
The concept of cloud-based data warehousing solutions is intriguing. It comes with some concerns, but every organization is different in its approach to common cloud-based issues and the addition of new projects. The concept is also garnering interest for smaller organizations where cost is a bigger concern and the infrastructure won’t allow them to do what they want and need to do in BI and analytics. To take advantage of a cloud platform for data warehousing, however, some organizations need to let go of pre-existing biases. People have built their careers around creating an in-house infrastructure, and their concerns and pushback, however unwarranted, will be part of the organizational culture.
Just remember what Gartner’s report cited earlier stated: “At year-end 2016, more than 50 percent of Global 1000 companies will have stored customer-sensitive data in the public cloud.” Maybe it’s time to jump in the pool, or at least dip your toe in.