Glossary
A
ACIDAcronym for Atomicity, Consistency, Isolation and Durability. The fou…Read more Airflow alternativesMost popular alternatives to Apache AirflowRead more Analog Signal ProcessingAnalog Signal Processing is a subfield of signal processing that deal…Read more Analytics engineAnalytics Engine is a combined Apache Spark and Apache Hadoop service…Read more Analytics engineer vs Data engineerData engineers build and keep up-to-date data integration and pipelin…Read more Analytics transformThe real-time equivalent of a data warehouse. These are systems that …Read more Asynchronous data streamsStream of data with measurement incoming at different moments in time…Read more
B
Backfill dataProcess of inserting historical data into a system or data stream to …Read more BASEBASE is a data consistency model, and stands for Basic Availability S…Read more Batch ProcessingBatch processing arranges incoming data in groups: the data is stored…Read more Bellman Ford AlgorithmIt computes the shortest paths from a single source vertex to all the…Read more BIAcronym for Business Intelligence. BI-solutions enable companies to d…Read more
C
CAP theoremAlso known as Brewer's theorem. The theorem states that any distribut…Read more Change Data Capture (CDC)Set of software design patterns that allows determining and tracking …Read more Cloud Native ArchitectureData Architecture built in the cloud to take advantage of its computi…Read more Complex Event ProcessingTechniques allowing the treatment of information related to events e.…Read more Computation graphIt models the different transformation steps performed on the data. A…Read more ConcatenationIn databases, concatenation is the process of combining several entit…Read more ConnectorElement that allows to extract data from a location to load it to ano…Read more ConsistencyConsistency is a fundamental property that ensures the correctness an…Read more Contextual Anomaly DetectionContextual anomaly detection is the analysis of data (usually in time…Read more Continual learningAlso known as incremental learning, continual learning is a Machine L…Read more Cost-EffectiveThe cost-effectiveness of a solution is usually assessed by comparing…Read more
D
Data analytics engineer salaryAccording to Glassdoor, the American salary aggregator and benchmarki…Read more Data ApplicationA data application (also referred as data app) is a software program …Read more Data ArchitectureDescribes how the data is collected, stored, arranged and integrated.Data meshTechnique to build decentralized data architecture by leveraging spec…Read more Data ProcessingAction of manipulating the data in order to get actionable data. It i…Read more Data processing frameworkReady-made solution to run operations on data (sorting, joining, runn…Read more Data ProductTool that processes and consolidates raw data coming from different …Read more Data StreamingProcess of continuous collection and transmission of data at a high s…Read more Data tablesAny display of information in tabular form. A classical relational ta…Read more Data transformationAims to transform raw data into structured data to perform analytics …Read more Dbt alternativesRead more Developer ToolProduct designed for developers to optimize their workflow. It allows…Read more Digital Signal ProcessingDigital Signal Processing is a subfield of signal processing that dea…Read more Distributed ComputingSystem allowing various computers to work together on a network. This…Read more Distributed Streaming SystemSystem that can process multiple data streams simultaneously. This al…Read more DownsamplingDownsampling is a common technique in the field of signal processing.…Read more
E
ELT / ETLELT & ETL are data integration methods moving data from source to dat…Read more Error HandlingCapacity to spot an error and understand where it comes from.Event dataEvent data refers to data that describes an action or occurrence that…Read more Event-driven architectureEvent-driven architecture is a software architecture and model for ap…Read more Event triggerAn event trigger is a condition or action that initiates or activates…Read more Eventual consistencyTheoretical guarantee that any update made on a distributed database,…Read more
F
Fault toleranceFault tolerance is the ability of a system or application to continue…Read more Fuzzy Join / Fuzzy MatchingUsed to perform a join on datasets when the keys do not match exactly…Read more
G
Geospatial dataInformation describing objects, events or other features with an impl…Read more GraphFinite set of vertices also called nodes or points meant to implement…Read more
I
IdempotencyProperty implying that no matter how many times one executes a progra…Read more Incremental computingIncremental computing is a computational approach in which only the n…Read more IndexAn data index is a reorganization of the data to enable a very effici…Read more Input and Output StreamsIn streaming mode, input connectors wait for incoming updates. Whenev…Read more IoT data analytics toolsRead more IterationProcess of repeating a certain number of steps continuously until one…Read more
K
Kafka Real Time AnalyticsDistributed streaming system, which can perform real-time event strea…Read more Kappa architectureKappa architecture is a data architecture pattern that leverages a st…Read more
L
Lambda architectureLambda architecture is a data architecture pattern that combines batc…Read more Linear RegressionStatistical method used for predictive analysis.
A linear regression …Read more Low LatencyRefers to delay between cause and effects. In data processing, low-la…Read more Locality-sensitive hashing Python - LSH PythonLSH speeds up kNN computation by clustering data into buckets, comput…Read more
M
Machine Learning KafkaKafka is an Open-source system developed by Apache Software. Kafka is…Read more Machine Learning PipelineA data pipeline is a series of interconnected data processing and ana…Read more Message BrokerA message broker helps exchange messages across systems and applicati…Read more Model retrainingRefers to the process of training again from scratch an existing mach…Read more Modern data stackSuite of tools used for data integration, allowing businesses to anal…Read more Multi topic event streamProcessing of a stream of events on different topics. Use cases might…Read more
O
Online machine learningOnline machine learning is a type of machine learning that continuous…Read more OntologyDescription of a data-structure, properties and relationships between…Read more
P
PointerVariable that stores an address in the computer memory.Programming frameworkReady-made template customized to speed up development of programming…Read more Python Packages for data engineeringRead more
R
Reactive data processingAt Pathway, all machine learning outcomes are updated as the models a…Read more Reactive ProgrammingProgramming paradigm that handles realtime updates by propagating the…Read more Real-time analyticsReal-time analytics enable users to interpret data as it arrives. It …Read more Real-time analytics databaseOrganized collection of data stored, and constantly updated with data…Read more Real Time Anomaly detectionTraining models with high volume of data allows the identification of…Read more Real-time consistencyReal-time consistency refers to the ability of a system to provide im…Read more Real-time data analytics toolsRead more Real-time data processingData streaming processes data as it's generated, enabling real-time i…Read more Real time feature storeFeature Stores ingest raw incoming data, apply user-defined transform…Read more Real Time fraud detectionCommon use-case for many sectors like banking, or insurance for which…Read more Real Time Graph DataA graph is a structure consisting of a finite set of vertices (also c…Read more Real-time intelligenceCapability to access business intelligence based on live dataReal time IoT dataData gathered through devices that log in measurements live.Real time Machine LearningProcess of running machine learning programs on live data. It allows …Read more Real Time recommender systemCustomized information filtering system which provides live suggestio…Read more Real time / Streaming unsupervised learningMethod that allows clustering data from a dataset live, as new data i…Read more RealTimeIn data treatment, realtime means processing the info at the very mom…Read more RecursionProcess of defining a problem in terms of itself. It has the benefit …Read more ReducerFunction used to combine different items together.
For example, we ca…Read more ResamplingResampling is the process of changing the sampling rate or the number…Read more RustProgramming language for data engineering, allowing faster results wi…Read more
S
Signal ProcessingSignal processing is the practice of analyzing, manipulating, and tra…Read more Sliding WindowA strategy for processing (stream) data by specific limited frames, u…Read more SnapshotA snapshot is a point-in-time view of the state of the system, includ…Read more Spark alternativesRead more Spatiotemporal dataTwo-dimensional type of data, covering location information combined …Read more Status dataData that provides information about a process at a moment in time.Stream consistencyProperty ensuring that data is processed in the order it was received…Read more Stream processingConsiders each piece of data independently: the data is processed whe…Read more
T
ThroughputAmount of data a system can process in a given amount of time.Throughput vs. LatencyLatency refers to the time a program takes in providing results. A th…Read more Time series anomaly detectionOne-dimensional anomaly detection that relies on data incoming in tim…Read more Time series dataSeries of data points recorded over consistent intervals of time. S…Read more Transactions (eg Transactional Change)Sequence of instructions to satisfy a query
A transaction is a sequen…Read more Tumbling windowA strategy for processing (stream) data by specific limited frames, u…Read more
U
UniverseThe universe of a table is the collection of the ids of this table. I…Read more UpsamplingUpsampling is a common technique in the field of signal processing. I…Read more
V