Rill Meets StreamSets

Building data pipelines with StreamSets Data Collector

In the previous post we have explored Twitch API using Elixir programming language. We have done our exploration in order to plan how to build a process that acquires data from Twitch API. Data acquisition problem is a common problem in Data analysis and Business intelligence. In data warehousing there is a process called ETL (Extract, Transform, Load), which represents how data flows from source systems to destinations. One way to acquire data is to write custom code for each source (bringing challenges of maintenance, flexibility, reliability). [Read More]

Rill Stage 1

Exploring Twitch API

The goal of the project Rill is to collect data about online streams from Twitch (and, possibly, other streaming platforms) for further analysis. 1) Set up Twitch client ID according to: http://blog.danielberkompas.com/elixir/2015/03/21/manage-env-vars-in-elixir.html The process to obtain data about streams for a particular user looks like this: 1) Find user’s username (e.g., from a Twitch URL) 2) Make a request to Twitch API to convert username to stream id. 3) Make a request to Twitch API to obtain data about user’s stream (is there a live steam, is there a recording being played) [Read More]

Luhn algorithm in Elixir: implementation, refactoring, and benchmarking

Some time ago, I have encountered a programming exercise in which the goal was to implement Luhn algorithm for credit card number validation. At first I implemented the algorithm in Ruby and then decided to implement it in Elixir. Eventually, I like how Elixir version looks like. In this article Elixir 1.2.2 is being used. Initial version It is assumed that credit card numbers are being read from a file and that check digit is included in a credit card number. [Read More]

A challenge: remove tags from a string.

Recently I have implemented a small piece of functionality, which is suitable to be a small challenge. Description Given a string with opening and closing tags (e.g., <mark></mark>), return a string without tags and indices of opening and closing tags. For example, given a string: "we eat <mark>healthy</mark> and <mark>tasty</mark> food." We expect to receive a string: "we eat healthy and tasty food.", and an array with pairs of indices: [[7, 14], [19, 24]]. [Read More]

How to set Cache-Control and Expires headers for Paperclip uploads.

In Ruby on Rails applications, gem Paperclip is often used with gem Fog, which has a feature of setting up Cache-Control and Expires headers for file uploads. Assuming, there is the following configuration in config/application.rb file: config.paperclip_defaults = { storage: :fog, fog_credentials: { provider: "Local", local_root: "#{Rails.root}/public" }, fog_directory: "", fog_host: "localhost" } Cache-Control and Expires headers can be set up by adding related attributes to the paperclip configuration: [Read More]

How to fix problems when migrating a Rails app from MySQL to PostgreSQL

Recently, at eet.nu we have been working on migrating a Ruby on Rails application from MySQL to PostgreSQL. Depending on the complexity of an app, there might be many caveats during migration. Here are some notes on issues that we solved during the migration. This article assumes that we work with: Ruby on Rails 4; MySQL 5.6; PostgreSQL 9.4. Quoting and Boolean fields MySQL uses backticks (`) for quoting, while in PostgreSQL double quotes (“) are being used. [Read More]