top of page

File Watcher with AWS & Sparkflows

Updated: Sep 1, 2019

Overview


There are many use cases where we have to process the incoming files on S3. This document describes one way to achieve it with SQS, Lambda and using the REST API of Fire Insights.


Design

The below diagram captures the high level design:



HIGH LEVEL DESIGN

Below is the flow of execution:


  • New files arrives on S3 in the directory location /sparklows-file-watcher/raw-data/iot/2019-08-2201.

  1. In the above design, all the raw data comes into the directory /sparklows-file-watcher/raw-data

  2. There are various types of raw data which can come.

  3. iot is one type of raw data coming in. Each day we receive a number of iot files in the folder /sparklows-file-watcher/raw-data/iot/yyyy-MM-dd.

  4. Once all the files for that date have been written to the appropriate folder, a _SUCCESS files is written into it.

  • It triggers an event which is sent to a configured SQS queue.

  • Once the event reaches SQS, it triggers an AWS Lambda.

  • The AWS Lambda uses the Fire Insights REST API to execute a workflow to process the new incoming files in the AWS S3 bucket.

  • If AWS Lambda fails, it sends the event to DLQ (Dead Letter Queue). It can be further handled from there based on the requirements.


Create an SQS Queue


Create an SQS Queue for receiving the events from S3 and triggering the AWS Lambda function.


Below we see the SQS queue : sf-workflow-file-watcher-ql-dev.


It has the below permissions to receive the messages from S3 bucket and invoke the AWS Lambda function.



Permissions to receive the messages from S3 bucket and invoke the AWS Lambda function

Configure AWS S3 bucket to generate events

Configure the AWS S3 bucket to send events for the new files coming in to AWS SQS queue.


Below, it looks for the new files with prefix of events and suffix of _SUCCESS. It sends these events to sf-workflow-file-watcher-ql-dev SQS Queue.



Create the AWS Lambda function


Create the AWS Lambda function to take the SQL Event and kick off the workflow in Fire Insights. This workflow would process the new files which came in.


First create an IAM role. An example is shown below.


We add 3 Environment variables as shown below. These get used by the Lambda functions in this example.


  • SPARKFLOWS_TOKEN or KMS_ARN

  • SPARKFLOWS_URL

  • WORKFLOW_ID

Instead of the Sparkflows token, users can encrypt the token using KMS and use the kms arn as the Environment variable and decrypt the token using kms inside the Lamdba.




Upload the jar file for the RequestHandler. It can also be placed into S3 location and the Lambda configured for it.

1,179 views0 comments
bottom of page