The data pipeline consists of a data producer, data processor, and data consumer, the data producer will produce the data and the data consumer will consume the data and the transfer of data between them will be carried out via the data pipeline. Data Pipeline and Data Management is very vital for data organization.
The data pipeline process includes:
DATA is cleaned | Data Filtering | Data Normalization | Data governance policy implementation | Data is enriched | Data is processed via BATCH and REAL-TIME formats.
Data Producer: Data Production happens here.
Data Consumption: Data Consumption happens here.
DATA PRODUCER —> DATA PROCESSING via DATA PIPELINE –> DATA CONSUMER.
Use cases : ML and AI | R & D | Scalable Distributed system | SWE | IT
Data is processed in DATA Pipeline, data processing is done in ETL pipelines, Data batch processing, and Data real-time processing is carried out via DATA PIPELINES.
Data pipelines are of 3 types
- BATCH DATA PIPELINE
- REAL-TIME DATA PIPELINE
- LAMBDA
The data from the data source will be sorted and sequenced in the Batch layer, and then the data from the BATCH LAYER will be processed via sequenced data streams. Data Processing speed and latency need to be well optimized to render a strong DATA PIPELINE INFRA.
DATA SOURCE -> BATCH PROCESSING
Data from the DATA SOURCE processed in batches is referred to as BATCH PROCESSING. Based on BATCH INGESTION.
DATA SOURCE -> REAL-TIME PROCESSING
Data from the DATA SOURCE processed in REAL-TIME is referred to as REAL- TIME PROCESSING. Based on STREAM INGESTION. Data sources can be a mix of both batch processing and real-time data processing.
For Batch data leveraged (BATCH INGESTION)
For REAL TIME DATA STREAM leveraged (STREAM INGESTION)
Both the BATCH DATA STREAM and REAL TIME DATA STREAM will be integrated via CENTRALIZED DATA PIPELINE. From the centralized pipeline, the data is sent to DBMS, NOSQL, MONGO DB, BLOG STORAGE, and other Data Stores.
NOTE: Abiding by MDM protocols, and industry-accepted standards the data quality can be enhanced for accurate data processing. (MDM –MASTER DATA MANAGEMENT) needs to be integrated with the data infra.
Data consumption can be done at a multi-access level both at public and private levels.
Few Data processing and Data pipeline use case :
- BUSINESS INTELLIGENCE REPORT
- NASA Cosmic competition
- R AND D REPORTS AND SOLUTION
- Gaming technology
- Financial technology
- SDLC
- Internet
- Public cloud implementation
- PRIVATE Cloud implementation
- Scalable distributed session
- Streaming services
- Stock Markets
- Real Tine data stream
- ML MODELS
- AI
Diagrams
The article above is rendered by integrating outputs of 1 HUMAN AGENT & 3 AI AGENTS, an amalgamation of HGI and AI to serve technology education globally.