Skip to main content
4 answers
3
Asked 724 views

I need a complete guidance on restructuring a data, using a streaming lining data technique .

#BigData #datascinece #machinelearing #spark #hadoop #HDFS #tech

+25 Karma if successful
From: You
To: Friend
Subject: Career question for you

3

4 answers


0
Updated
Share a link to this answer
Share a link to this answer

Vijayakumar’s Answer

Kafka and Spark would be better solutions for this requirement,
0
0
Updated
Share a link to this answer
Share a link to this answer

Udaya’s Answer

Restructuring data options depend on complexity of restructuring (example: Convert speech to text) , data set size, latency needs. The options are Hadoop, Spark, Kafka and if one wants to do without much setup, one can use cloud tools such as Dataproc or Snowflakes and associated AI function APIs. If you provide problem statement clearly then end to end high level steps & tools can be given.
0
0
Updated
Share a link to this answer
Share a link to this answer

Jane’s Answer

It depends on the scale of the data and your SLA requirements. Let's say you want to reconstruct a small dataset in memory. If you are a Java or Scala developer, You can leverage Java 8 Lamda or Scala to manipulate data using functions like map, flatmap, reduce, and etc.

But if your dataset will be in TB and low latency data processing is required, then you need to consider to build a large scale and distributed streaming pipelines. The following tech stack may help you to initiate your evaluation:

Kafka (for pub/sub)

Kinesis (for pub/sub)

Storm

Spark Streaming (for compute)


Hope it helps!


Thank you comment icon Thank you so much, Jane! We so appreciate you joining us remotely and sharing your technical expertise. PS- Happy International Women's Day! yoonji KIM, Admin
0
0
Updated
Share a link to this answer
Share a link to this answer

Anuj’s Answer

There's a lot of factors to consider, but you should look into Kafka and Spark which are both open source.
0