Preloader

PLATFORMS AND PRODUCTS

TECHNOLOGY

Technology

In the emerging industry of Big Data/AI/Machine Learning ecosystems we see ourselves as a leading player with our engineers contributing to the development of new trends and technologies.


Apache Spark

Apache Spark

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. In Enigma we use it as a base for many of our systems.

Hadoop

Hadoop

Software library and a framework that allows distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Apache Pig

Apache Pig

Platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. In Enigma we often use it for parallel ETL (Extract, Transform, Load), especially when dealing in textual contexts.


hive

Apache Hive

APACHE HIVE is a data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Since some of our clients still use Hadoop and keep their data on it, Hive gives as s nice, almost SQL way of dealing with the distributed information.

NiFi

NiFi

Supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. We commonly use it as a “truck's system” for distributed files, making sure all the machines and system receives the right data on time.

Docker

Docker

An open-source project that automates the deployment of applications inside software containers. Docker provides an additional layer of abstraction and automation of operating-system-level virtualization on Windows and Linux. In Enigma when possible, we try to divide each of our systems into small blocks - docker containers. We can then deploy them when needed, optionally multiplying these containers that should do some part of the data manipulation quicker, allowing for parallelism.


kafka

Kafka

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Scala

Scala

Scala is a general-purpose programming language providing support for functional programming and a strong static type system. Scala is particularly useful in Big Data projects when it comes to scalable server software that makes use of concurrent and synchronous processing, parallel utilization of multiple cores, and distributed processing in the cloud.

Flume

Flume

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application. We use it often when optimizing applications or websites - using Flume means we are practically not limited to what we want to log even on huge systesms.


Mesos

Mesos

Distributed systems kernel. Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elasticsearch) with API’s for resource management and scheduling across entire datacenter and cloud environments. Enigma’s team often use it as a controller of our Docker containers, which allow us to use our machines more efficiently (it launches / removes different parts of the system when necessary, allowing for role-changning of the same machines).

SciKit

SciKit

Free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Enigma’s team often prototypes models using SciKit, and when situation allows (e.g. all data fit onto one machine), we use it in the final systems as well.

Mlib

MLlib

Apache Spark's scalable machine learning library. When the data doesn’t fit onto just one machine, and we need to train a model on the whole data, or we need to train the model in a very short time on many parallel machines, MLlib is the number one solution. It lacks some of the SciKit algorithms, but gives the ability to divide the data into smaller packets and run algorithms on them simultaneously.


Cassandra

Cassandra

Free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. We use it to keep huge files of information spreaded onto many machines, still having SQL-like way of dealing with the data.

XGBoost

XGBoost

Open-source software library which provides the gradient boosting framework for C++, Java, Python,R, and Julia. It works on Linux, Windows, and macOS. Other than running on a single machine, it also supports the distributed processing frameworks Apache Hadoop, Apache Spark, and Apache Flink. It is probably the most commonly algorithm we use - it usually, especially when dealing with business data, gives very accurate results still maintaining very quick response time, and reasonable training time.

NumPy

NumPy

Fundamental package for scientific computing with Python. It provides a powerful N-dimensional array object, sophisticated (broadcasting) functions, tools for integrating C/C++ and Fortran code and introduces useful linear algebra, Fourier transform, and random number capabilities. This is Enigma’s number one library for matrix operations.


R

R

Open source programming language and software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. The R language is used in Enigma for developing statistical software and data analysis. We often use it when a more “scientific” approach calls when solving some aspects of our models.

How can we help you?

To find out more about Enigma Pattern, or to discuss how we may be of service to you,
please get in touch.

GET IN TOUCH