logo
Sign upLog in

Data Exchange (ex. Crossover)

#knowledgebase#trood#data
AuthorYury Nosov
30 October 2019 / 00:00

The Trood platform exchanges data using the Crossover module which stores events in real-time incoming from a queue server RabbitMQ storing them in a Druid database. This module can be configured to store events in Druid database in a report-friendly view.

Events are stored in a repository based on HDFS format with archiving processed according to Iz4 algorithm. The adding procedure is being performed regularly. Such a necessity is due to when storing events in real-time in Druid some events can be skipped.

General information on Crossover operation is in configuration files. It is possible to save several configuration versions. To do that, the versioning section can be structured as an array with each array object has a version parameter (configuration).

Retrieving data from queue server

Retrieving data from the queue server is set up in a rabbit section of the configuration file. Remember that RabbitMQ contains a queue - the buffer which stores events. The queue can be assigned to message consumers processing event messages. In this case, the Crossover service is a consumer of messages with a task to receive event information and transform it into the view applicable for Druid saving. Correspondingly, you need to edit the rabbit section of the configuration file at Crossover specify the IP address of a server, connection login/password, connection timeout, requirement for SSL connection. Additionally, the RabbitMQ queue needs to be defined from where the event information is retrieved.

Interrelation with Druid database

Druid database interrelation is performed based on a module in a Tranquility library. This module fulfills the streaming redirection of events into the Druid indexing service.

Interrelation parameters with Druid database are described in the cubes section. Each object in this section describes parameters of OLAP cubes stored in Druid: Crossover's task is that the RabbitMQ data are transformed and transmitted to Druid in such a way that Druid has relevant information for report building.

Data construction in Druid

To specify data storage in Druid, the configuration file needs to have the following information:

  1. Specify the data source. The data sources can be events from RabbitMQ. Each event is described by a set of parameters (dataset in Druid terminology). That is, you need to specify which events should be transmitted into Druid and under which names.
  2. Set up OLAP cube construction rules. After you describe the data source, you need to set the rules for processing and data transmitted into Druid.
  3. Specify additional parameters of data transmission into Druid, such as:
    1. segment granularity. Druid stores its index in so-called segments;
    2. event timeout. If this interval is equal to ten minutes, by default, then any events longer than 10 minutes in the past or more than 10 minutes in future are not to be transmitted into Druid;
    3. "warming" period. Since the creation of a new segment takes time then the continuous data transmission requires that the new segment is created preliminary;
    4. number of partitions. When storing data in Druid, they are subdivided into partitions. Increasing the number of partitions allows for increasing the data storing rate.
    5. number of replicas. It is possible to store one event in several partitions simultaneously.  Increasing the number of replicas allows for increasing the performance of search requests to the database.

Saved data integrity

There can be errors when storing data in Druid in real-time. The module sends the data on such events repetitively, however, the entire mechanism does not guarantee that all the data are stored. E.g., if the event time does not fit into limits defined in the windowPeriod parameter it is discarded.

To reduce the risks of data loss, all events are also stored in an archive based on HDFS, a distributed file system created in a Hadoop project. When storing the data, they are archived based on lz4 algorithm. Then, if necessary, you can set up a scheduled procedure of repeated download of data (re-ingest in Druid terminology) allowing to add information on absent events into the database.

You should take into consideration that the reports on data stored in real-time can have discrepancies. After repeated download procedure such discrepancies are eliminated.

Setting up parameters of OLAP cubes via REST services

See the list of well-defined OLAP cubes; add new OLAP cubes, perform their changes - this all is possible not only by editing the configuration file but by calling special REST services. Supported operations: viewing, adding, and deleting the configuration of OLAP cubes.

Dear friend! Since you are here and still reading, please know that we perceive each TCP visitor (let alone member) very personally. We don’t abuse you with popups encouraging to sign up, but if you leave your email here:

you will cause our eternal gratitude and tears of happiness. You will see how responsibly we approach our mailing policy, and we promise you won’t get any odd word from us! (unless something goes wrong with our AI called Boris) All our emails are gluten- and dairy-free!

Do you like it? Share with your colleagues!

Your personal feed

Hide

Welcome to TCP! We are delighted to see you here. This small step makes you closer to the wellspring of knowledge, wisdom, insights, and key opinion leaders. To check diverse, newly born topics that come along with contemporary mankind and technology evolution, we compile and deliver you the top-notch news and wisdom. We encourage you to join our Trood Community Portal (TCP) and stay informed and connected with like-minded people. This personal feed is available for registered users only and delivers interesting and compelling documents, links, news, blogs, and insights. PleaseSign upand stay at the same wave with us!

Do you love cookies? We absolutely do. Please allow us to use website cookies, afterreading the website privacy policy