When building a modern infrastructure to harness the power of data, organizations must consider a number of factors.
Data digestion and storage. Factors related to data digestion and storage include:
Data integration. Organizations have many choices for integrating/replicating data across databases and applications — for example, extract, load, transform (ELT)/extract, transform, load (ETL); change data capture (CDC); data virtualization; integration platform as a service (iPaaS); and application programming interface (API). Organizations will typically utilize multiple technologies in parallel depending on the data integration need. For example, CDC may be used to create real-time, harmonized repositories for high-volume, structured data across cloud/hybrid environments (e.g., to support real-time reporting), whereas APIs are more typically used for low-volume, high-frequency application-to-application linkages (e.g., to support daily operations).
Data storage. Two main examples of cloud data storage are data lakes and data warehouses. Data lakes store all data, regardless of format. Data warehouses are more organized and have all data in a common format, making it easier to analyze. Companies such as Snowflake specialize in making data lakes accessible to analysts by automating the process of cleaning the data lake and moving it to data warehouses.
Data exploration. Once the data is in a usable format, analysis can be performed. There are many open-source libraries for analyzing data (e.g., Pandas) and building ML models (e.g., TensorFlow) that have robust functionality and large communities supporting them, although they do require coding abilities. An increasing number of paid, enterprise, and low- and no-code solutions make the data more accessible to business analysts. This data democratization empowers citizen data scientists, defined as people who “create or generate models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.”
Data visualization and output. Once the data is formatted and analyzed, software services/business intelligence solutions can provide tools and dashboards to help communicate data findings to the broader company and make the insights actionable. Some examples are Looker, Tableau and Power BI. Additionally, the data can then be used in internal apps or through third-party software (such as customer personalization through a digital experience platform) to provide value to customers.