top of page

5_R

  • Oct 14, 2020
  • 3 min read

Updated: Oct 19, 2020


The most common configuration of data repositories


Operational or Transactional systems (OLTP) is a class of information systems typically facilitate and manage transaction-oriented applications [3]. Here, operational database management systems are used to update data in real-time [13]. These types of databases allow to do more than just reading archived data. Operational databases allow modification of that data (add, change or delete data), by doing it in real-time. OLTP is typically contrasted to OLAP (online analytical processing), which is characterized by more complex queries, in a smaller volume, for the purpose of reporting rather than to process transactions. Whereas OLTP systems process all kinds of queries (read, insert, update and delete), OLAP is optimized for read only and might not even support other kinds of queries.


Streaming Data is data that is continuously generated by different sources. It usually processed incrementally using Stream Processing [10] algorithms without having access to all of the data [1]. In other words, it can also be explained as a technology that delivers the content/data to devices over the internet, which allows users to access the content/data immediately, instead of waiting for it to be downloaded.


Data Stream is a sequence of data packets used to transmit or receive information that is in the process of being transmitted [15]. It is a set of extracted information from a data provider. For example, it contains raw data that was collected out of users' browser behavior from websites.


Data Lake/Data Swamp is a system or repository of data which is stored in its natural/raw format usually as object blobs or files [2]. A data lake is often a single store of data that contains raw copies of data of different sources and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. In other words, it refers to the storage of a large amount of unstructured and semi data, which is useful due to the increase of big data as it can be stored in such a way that analysts can dive into the data lake and pull out the data they need. It can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video). While a data swamp is a deteriorated and unmanaged data lake that cannot be accessed and has a almost no value.


Data Warehouse (DW) is a system used for reporting and data analysis, and is considered a core component of business intelligence [4]. They are central repositories of integrated data collected from one or more different sources. They store current and historical data in single place. The data in DW is used for conducting analysis and analytical reports. The data usually pass through data cleansing for additional operations to ensure data quality before it is used in the DW for reporting. Extract, transform, load (ETL) [12] and extract, load, transform (ELT) [11] are the two main approaches used to build a data warehouse system .


Data Marts is a structure / access pattern specific to DW, that is used to retrieve specifically oriented data [5]. It is a subset of the DW and is dedicated to a specific business function or region, or team (accounting, marketing, sales, etc.). Whereas DW have an enterprise-wide depth information, the information in data marts pertains to a single department. DWs and data marts are built because the data in the database are not organized in readily accessible way. The queries might be too complicated, difficult to access or resource intensive. DWs and data marts are read only.


Analytical and statistical systems/Online Analytical Processing (OLAP) is a class of software technology and systems that allows users to perform sophisticated data analysis on typically large amounts of data to gain insight on the information it contains [6, 14]. For example, analysis may include financial modeling, budget forecasting, production planning etc. OLAP is typically contrasted to OLTP, that characterized by much less complex queries, in a larger volume, to process transactions. Whereas OLAP systems are mostly optimized for read, OLTP processes all kinds of queries (read, insert, update and delete).

Typical applications of OLAP might include:

  • Data Analysis. A process of inspecting, cleansing, transforming and modeling data with the purpose of discovering useful information, drawing conclusions and supporting decision-making [7].

  • Data Mining. A process of discovering patterns in large data sets using methods from the machine learning, statistics, and database systems [8].

  • Data Reporting. A process of gathering, submitting and organizing data into informational summaries to provide support for analyses and assess the current information [9].



 


References


Recent Posts

See All

Commenti


bottom of page