Leading analysts and organisations have begun recognising data warehouse automation as being key to running a truly data-driven business.
AI News caught up with Rob Mellor, GM & VP, EMEA at WhereScape, to discuss this industry shift.
AI News: Only earlier this year did Gartner really begin recognising data warehouse automation after publishing a paper on the subject. Is this indicative of a shift in how companies view automation?
Rob Mellor: At WhereScape, we feel the increased recent activity from Gartner around data warehouse automation is reflective of an industry shift. Organisations are beginning to realise that automation is really necessary if companies are to be truly data-driven.
By using a tool to automate repetitive and mundane tasks such as hand-coding, developers can be more productive and focus on adding features specific to their unique business requirements. This means their business can react faster to BI trends.
This is obvious to companies who have been data-driven for some time and are enjoying the results. However, we are now seeing Data Automation tools crossing the chasm into the mainstream, and so Gartner has moved to inform those who are considering tools like WhereScape for the first time and help those familiar with automation tools to choose the best one for their needs.
AN: What is the difference between modern data warehouse automation and ETL (extract, transform, load) tools?
RM: ETL tools are typically server-based, data integration solutions for moving and manipulating data from its sources to a target data warehouse. When ETL tools first emerged four decades ago, the servers that databases ran on did not have the computing power of today, so ETL solutions were developed to alleviate the data processing workload. They typically provided additional database and application connectivity and data manipulation functions that were previously limited in database engines.
Instead of using the older ETL method, today some vendors take an ELT approach. With ELT, data transformation happens in the target data warehouse rather than requiring a middle-tier ETL server. This approach takes advantage of today’s database engines that support massively parallel processing (MPP) as well as its availability within cloud-based data platforms such as Snowflake, Amazon Redshift and Microsoft Azure SQL Data Warehouse.
While ELT certainly represented a step forward in thinking compared to ETL, both types of data movement solutions still only cover a small portion of the data warehousing lifecycle. This means that organizations must rely on many disparate tools to support everything else involved in designing, developing, deploying, documenting and operating their data warehouses and other data infrastructure.
In comparison to the limited scope of ETL and ELT tools, data infrastructure automation encompasses the entire data warehousing lifecycle. From planning, data discovery and design through development, deployment, operations, change management — and even documentation — automation unifies it all.
AN: What are the main factors driving the adoption of data warehouse automation?
RM: Given the broad reach of Data Automation tools across the data warehousing lifecycle, we hear an array of reasons from companies looking to adopt them. Here are some of the most common reasons.
The small to medium size businesses we speak to typically look for automation tools to allow them to standardise their current data warehouse and scale the business effectively. They might typically start with custom data warehouse solutions, the knowledge of which is limited to one individual and so makes it hard to democratise the use of data to colleagues, especially non-technical staff.
WhereScape offers a templated, best practice approach for the design and implementation of effective data warehouse solutions, enabling more robust architectures to be built faster. All actions taken are fully documented with full data lineage, which saves many hours of repetitive work. Automation then handles the day-to-day and change management, so it does not take up a large portion of developers’ time.
Larger companies want all of the above, but they often look to WhereScape when embarking on a data warehouse modernisation project involving a switch in architecture or database. They want an automation tool to handle the complexity and ensure the new architecture works the first time.
Two big examples we have seen recently are a switch to Data Vault modelling, or a cloud migration project. These complex, large-scale projects can be prone to human error. WhereScape has specific tools and enablement packs for these projects, so while it may be the first time the company has implemented a project like this, the automation tool is fine-tuned in accordance with many previous similar projects. The benefit of this experience ensures the implementation works as it should the first time and so can save many months of work.
An overarching reason to adopt Data Automation tools is a desire to increase developer productivity, handing accurate business insight to those that need it, faster. This increases trust in IT and means the business can be more ambitious in its data-driven projects.
Automation tools also enable agile principles by increasing communication between IT and the business. For example, using a drag and drop GUI to design data infrastructure means that visual prototypes can be produced in minutes, ensuring all requirements have been understood before the build takes place.
Typically, we find data teams look for an automation tool to solve a specific problem, then expand its usage to other areas once they see a leap in productivity and understand what this can mean for the future of their organisation.
(Photo by Tim Mossholder on Unsplash)
WhereScape sponsored this year’s AI & Big Data Expo and shared their invaluable insights during the event. The next events in the series will be held in Santa Clara on 11-12 May 2022, Amsterdam on 20-21 September 2022, and London on 1-2 December 2022.