So far our Rill journey comprised API exploration and building ingestion pipelines for Twitch and Giantbomb APIs. Next thing to do with data is to analyze it. In this part we will answer some questions about the downloaded data with help of Linux command-line tools: zcat, zgrep, sort, uniq, tr, cut, jq, awk, GNU Parallel.
Posts with the tag rill:
For the last two years I have been fetching data from Twitch API using StreamSets Data Collector and over the course of these two years Twitch API pipelines were scheduled in various ways: JDBC Query Consumer, Cron script, Orchestration pipelines.
In the previous post we have explored Twitch API using Elixir programming language. We have done our exploration in order to plan how to build a process that acquires data from Twitch API. Data acquisition problem is a common problem in Data analysis and Business intelligence. In data warehousing there is a process called ETL (Extract, Transform, Load), which represents how data flows from source systems to destinations. One way to acquire data is to write custom code for each source (bringing challenges of maintenance, flexibility, reliability). The other way is to use one of systems that were built to solve the data acquisition problem.