<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Technical blog of Bony Simon]]></title><description><![CDATA[Technical blog by Bony Simon with content from the domains of Data Engineering , Business Intelligence , Analytics.]]></description><link>https://gatsby.ghost.org/</link><image><url>https://gatsby.ghost.org/favicon.png</url><title>Technical blog of Bony Simon</title><link>https://gatsby.ghost.org/</link></image><generator>Ghost 2.9</generator><lastBuildDate>Sun, 11 Jul 2021 17:17:20 GMT</lastBuildDate><atom:link href="https://gatsby.ghost.org/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Loading Data from AWS S3 to AWS Redshift]]></title><description><![CDATA[AWS S3 is vital component of the AWS [https://aws.amazon.com] (Amazon Web Services) ecosystem. Any types of files can be stored in AWS S3 [https://aws.amazon.com/s3/] and can be retrieved or loaded to most of the products in AWS ecosystem seamlessly. The optimal way to load data into the AWS Redshift [https://aws.amazon.com/redshift/] is via AWS S3. Very large amount of data can be be loaded into AWS Redshift within few minutes using if the data is stored in S3. In post will show how to load th]]></description><link>https://ghost-blog-bony.herokuapp.com/loading-data-from-aws-s3-to-aws-redshift/</link><guid isPermaLink="false">Ghost__Post__5fa9b2f7d6f3d5001ebdc568</guid><category><![CDATA[Business Intelligence]]></category><category><![CDATA[Data Warehousing]]></category><category><![CDATA[AWS Redshift]]></category><category><![CDATA[AWS S3]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Big Data]]></category><category><![CDATA[ETL]]></category><dc:creator><![CDATA[Bony Simon]]></dc:creator><pubDate>Mon, 09 Nov 2020 22:18:03 GMT</pubDate><media:content url="https://res-5.cloudinary.com/hgu2xgrdq/image/upload/q_auto/v1/ghost-blog-images/aws_s3_to_redshift.png" medium="image"/><content:encoded><![CDATA[<img src="https://res-5.cloudinary.com/hgu2xgrdq/image/upload/q_auto/v1/ghost-blog-images/aws_s3_to_redshift.png" alt="Loading Data from AWS S3 to AWS Redshift"/><p>AWS S3 is vital component of the <a href="https://aws.amazon.com">AWS</a> (Amazon Web Services) ecosystem. Any types of files can be stored in <a href="https://aws.amazon.com/s3/">AWS S3</a> and can be retrieved or loaded to most of the products in AWS ecosystem seamlessly. The optimal way to load data into the AWS <a href="https://aws.amazon.com/redshift/">Redshift</a> is via AWS S3. Very large amount of data can be be loaded into AWS Redshift within few minutes using if the data is stored in S3. In post will show how to load the data into AWS Redshift from AWS S3.</p><h3 id="loading-data-to-redshift-using-copy-command">Loading data to Redshift using COPY command</h3><p>The below command will copy the specified csv file from the S3 . The delimited specified is <em><strong>tab </strong></em>('\t') and you have to specify the delimiter based on the file. You can also specify the quote character used in the file. If these additional parameters are not passed, then the default values will be used. </p><!--kg-card-begin: markdown--><pre><code>copy schema.dest_table_name from 's3://bucket_name/filename.csv' access_key_id 'your_access_key_id' secret_access_key 'your_secret_access_key' delimiter '\t' CSV quote '^' </code></pre> <!--kg-card-end: markdown--><p/><h3 id="loading-compressed-file-into-redshift-from-aws-s3">Loading compressed file into Redshift from AWS S3</h3><p>Storing compressed files in S3 can reduce the monthly cost and data transfer. GZIP compressed csv files can be loaded into Redshift with much ease.</p><!--kg-card-begin: markdown--><pre><code>copy schema.dest_table_name from 's3://bucket_name/filename.csv.gz' access_key_id 'your_access_key_id' secret_access_key 'your_secret_access_key' delimiter '\t' CSV quote '^' gzip </code></pre> <!--kg-card-end: markdown--><h3 id="loading-multiple-files-into-redshift-from-aws-s3-prefix-folder">Loading multiple files into Redshift from AWS S3 prefix/folder</h3><p>Multiple files can be loaded to redshift in parallel by specifying the S3 folder/prefix path. </p><!--kg-card-begin: markdown--><pre><code>copy schema.dest_table_name from 's3://bucket_name/folder_name/' access_key_id 'your_access_key_id' secret_access_key 'your_secret_access_key' delimiter '\t' CSV quote '^' </code></pre> <!--kg-card-end: markdown--><h3 id="loading-csv-to-redshift-with-additional-options">Loading CSV to Redshift with additional options</h3><p>Various options can be used in COPY command as per our requirement.</p><!--kg-card-begin: markdown--><pre><code>copy schema.dest_table_name from 's3://bucket_name/folder_name/' access_key_id 'your_access_key_id' secret_access_key 'your_secret_access_key' delimiter '\t' CSV quote '^' IGNOREHEADER 1 maxerror as 10 timeformat 'YYYY-MM-DD HH24:MI:SS' </code></pre> <!--kg-card-end: markdown--><p>IGNOREHEADER - skip the header row.</p><p>MAXERROR - maximum number of errors acceptable.</p><p>TIMEFORMAT - Used to specify the time format of the csv file.</p><h3 id="loading-parquet-file-to-redshift-from-aws-s3">Loading PARQUET file to Redshift from AWS S3</h3><p>Parquet is an open-source columnar data format mostly used in the big data ecosystem.</p><!--kg-card-begin: markdown--><pre><code>copy schema.dest_table_name from 's3://bucket_name/folder_name/' FORMAT AS PARQUET </code></pre> <!--kg-card-end: markdown--><h3 id="using-iam-roles-for-authenticating-aws-s3-to-redshift-data-load">Using IAM roles for authenticating AWS S3 to Redshift data load</h3><!--kg-card-begin: markdown--><pre><code>copy schema.dest_table_name from 's3://bucket_name/folder_name/' iam_role 'arn:aws:iam::86365107279:role/ReadS3Redshift' FORMAT AS PARQUET </code></pre> <!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Apache Superset: Open-source Alternative for Enterprise Data Visualisation]]></title><description><![CDATA[What is Apache Superset ? Apache Superset is an open-source data analytics and data visualisation platform with self-service business intelligence (BI) capabilities. Apache Superset supports to different types of databases / data sources. It is also battle tested to scale over thousands of dashboards being used by thousands of users. History of Apache Superset Superset was an internal project of Airbnb developed by Max Beauchemin. Initially the project was called "Panoramix" , later it was ren]]></description><link>https://ghost-blog-bony.herokuapp.com/apache-superset-intro/</link><guid isPermaLink="false">Ghost__Post__5f9d838727cc5b001eb2a72c</guid><category><![CDATA[Apache Superset]]></category><category><![CDATA[Visualisation]]></category><category><![CDATA[Business Intelligence]]></category><category><![CDATA[Open-Source]]></category><category><![CDATA[Data Infrastructure]]></category><dc:creator><![CDATA[Bony Simon]]></dc:creator><pubDate>Sat, 31 Oct 2020 16:29:13 GMT</pubDate><media:content url="https://res-1.cloudinary.com/hgu2xgrdq/image/upload/q_auto/v1/ghost-blog-images/supersetlogo.png" medium="image"/><content:encoded><![CDATA[<figure class="kg-card kg-image-card"><img src="https://res-1.cloudinary.com/hgu2xgrdq/image/upload/q_auto/v1/ghost-blog-images/supersetlogo.png" class="kg-image" alt="Apache Superset: Open-source Alternative for Enterprise Data Visualisation"/></figure><h3 id="what-is-apache-superset">What is Apache Superset ?</h3><img src="https://res-1.cloudinary.com/hgu2xgrdq/image/upload/q_auto/v1/ghost-blog-images/supersetlogo.png" alt="Apache Superset: Open-source Alternative for Enterprise Data Visualisation"/><p>Apache Superset is an open-source data analytics and data visualisation platform with self-service business intelligence (BI) capabilities. Apache Superset supports to different types of databases / data sources. It is also battle tested to scale over thousands of dashboards being used by thousands of users. </p><h3 id="history-of-apache-superset">History of Apache Superset</h3><p>Superset was an internal project of Airbnb developed by Max Beauchemin. Initially the project was called <em>"Panoramix" </em>, later it was renamed to <em>"Caravel"</em> and finally changed to <em>Superset . </em>During the development stage, Superset was open sourced and was incubated by the Apache software foundation. Currently, Superset is being used in 100's of organisations.</p><p><em>to be continued ............</em></p>]]></content:encoded></item></channel></rss>