Blogs

Migrating Data from Data Warehouse to Redshift using AWS Schema Conversion Tool (SCT) and Data Extraction Agent

In this blog, I will explain the components and configurations that are required to migrate data from a Data Warehouse to Amazon AWS Redshift. The source I have selected is Oracle DW but the same steps are applicable to all Data Warehouses. Please refer to this AWS Documentation link for other Data Warehouse products that are supported by Schema Conversion Tool (SCT).

Setting up the components

Source — Create an Oracle RDS instance which will be considered as the Source database. We will register this instance as an Oracle DW instance in SCT. Create a schema, tables and load some data into the tables. I have used the scripts downloaded from here to populate the source database.

Target — Create a Redshift Cluster. I have created a cluster with 1 Node of size t2.small.

Schema Conversion Tool — I have installed this on my local computer. This can be installed on a separate EC2 instance, if needed. Once the SCT Project is configured for migration, Data Extraction Agent handles the actual migration.

JDBC Drivers — Download the JDBC drivers for Oracle and Redshift. These need to be copied to both the instances where SCT and Data Extraction Agent are installed.

Trust and key stores — Generate Java trust and key stores with passwords to be configured in both SCT and Data Extraction Agent. I have used the Settings→Global Settings→Security tab in SCT to generate trust and key store. I copied the same files onto the EC2 instance where Data Extraction Agent is installed. This will allow SSL connection from SCT to the Data Extraction Agent to be successful.

Data Extraction Agent — Create an EC2 instance and install Data Extraction Agent. Installation files for each supported operating system will be downloaded as a part of AWS SCT download. I have used a Windows EC2 instance.

- Copy the setup file “aws-schema-conversion-tool-extractor-{version}.msi” to the server and install the Agent.

- Copy Oracle and Redshift JDBC drivers to the server.

- Copy trust and key store files generated in the SCT tool to the server.

- Select/Enter path to the drivers, trust and key stores when prompted.

- Start the Extraction Agent by navigating to the installation directory in command prompt and running the batch file “StartAgent.bat”

Security Groups and Network ACL — Make sure security groups and Network ACLs in the VPC allow communication between the SCT, Data Extraction Agent, Oracle RDS and Redshift instances over the ports used.

Configure SCT Project for Migration

1. Update JDBC Driver path under Settings →Global Settings →Drivers for Oracle and Redshift drivers.

2. Generate trust and key stores under Settings →Global Settings →Security. Copy same trust and key stores onto the EC2 instance where Data Extraction Agent is installed.

3. Update your AWS credentials and S3 Bucket Folder under Settings→Global Settings→AWS Profile. This is needed for the agent to upload extracted data to S3 bucket.

4. Create a new project in SCT using File→New Project

- Select Data warehouse (OLAP)

- Oracle DW for Source

- Leave target as Amazon Redshift (This is the only option available)

5. Click on Connect to Oracle DW and provide connection details to the Oracle RDS instance and click on Test Connection to make sure connection is successful. I used the “Service Name” for Type and provided the RDS endpoint details for the connection.

6. Click on Connect to Redshift and provide connection details. Click on Test Connection to make sure connection is successful. After configuring both connections, Main View showing source and target server objects is displayed as highlighted in the picture below.

7. Migrate the schema from Oracle DW to Redshift.

· Right click on the schema to be migrated and select “convert schema”.

· Select the schema to be converted in the target and select “Apply to database”.

· This will create the schema with all the objects in the Redshift database without any data. We will register the agent and migrate the data in the steps below.

· Click on View→ Assessment Report View to identify any objects that need to be manually converted to Redshift.

8. Register Agent

· In the AWS SCT tool, click on View→Data Migration View.

· Click on Register Button to register the SCT Extractor Agent.

· Provide the description, Host name and port number (default 8192). Hostname can be Data Extraction Agent instance’s public IP/hostname, private IP/hostname where you have VPN connection to your VPC. Select the checkbox “Use SSL” and select the Trust and Key Store in the “SSL Tab”.

· Click on Test Connection and then Register.

9. Create Local Migration Task

· Right click on Tables in Source Schema →Create Local Task.

· Select the “Migration Mode” based on the requirement. I selected Extract, Upload & Copy because it will extract the data from the source and upload it to S3 bucket and then copy the files to the Redshift i.e. the target DW.

· Click on Advanced tab and do the following(if required):

· If you want the migrated files to be in the local machine the select the checkbox.

· In any of the columns if NULL values are present then select checkbox NULL Value as a string then give a space.

· Uncheck two options below NULL value as a string.

· In S3 Settings tab specify the S3 bucket name and test the task and click on Create.

10. Start Migration

· Click on start, the migration task progress can be viewed as shown below.

To verify the data in the target, connect to Redshift using SQL Client.

I will publish more blogs on AWS DMS related services. Thanks for reading my blog. Do share feedback for further improvements.