top of page

Introduction to SSIS (Technical Note)

  • Writer: DeJia Wang
    DeJia Wang
  • Jan 7, 2022
  • 3 min read

Updated: Jan 27, 2022

Prior to taking up the role of ETL Developer at KPMG Singapore, I had no experience in dealing with data at all, so in this article, I will present some basic learning points for the creation of ETL package using SSIS.


As an ETL developer leveraged on the Microsoft stacks, we not only need to understand the key points of technology implementation of C#, MSSQL and SSIS, but also understand the business logic of the legacy and new application. Furthermore, it is also often necessary for us to know how to analyze basic data.



(Figure 1: Main Components: C#, SQL , SSIS)

What is SSIS?

SSIS (SQL Server Integration Service) is an data migration solution from Microsoft. It is a Visual Studio Extension and it function similar to the ASP.NET program. It allow us to designs a lot of workflow similar to controls, but they are called tasks. The main function of these tasks is to modify and transfer data from source database to target database (i.e. query, aggregation, etc.)


(Figure 2: SSIS)


1. SSIS is not only used to simply transfer and transform data, but can also be used to integrate, transfer, extend, filter and correct the data built into the Business Intelligence platform.


2.SSIS not only has many built-in data cleaning functions, but also can expand functions through third-party controls, allowing users to develop custom controls through Visual Studio.NET.


3. SSIS can directly import data into the Cube of SQL Server Analysis Services.


4. SSIS can be seamlessly integrated with SQL Server Reporting Services (Reporting Services), that is, SSIS packages can be used as data sources for reporting services.



(Figure 3: Data Transformation)


How to use SSIS:

So SSIS is a tool to perform data migration, but how to use it? The following is a detailed description based on an example:

(Figure 4: Two table structure as example)


Let there be two tables:


(1) employee_internet (network employee table):

employee_internet:id (primary key + auto-increment)

name (Employee name)

birthday (Date of birth)

idCard (ID number)

idType (ID type like Work Pass, S Pass etc.)

empNo (Employee number)


(2) employee (employee table):

employee_internet:id (primary key + auto-increment)

name (Employee name)

birthday (Date of birth)

idCard (ID number)

idType (ID type like Work Pass, S Pass etc.)

empNo (Employee number)


ETL Objective: Merging the employee_internet table information into the employee table according to the idCard (ID number).


1. Create an SSIS project file

  1. In the Start menu, find the SQL Server BusinessIntelligence Development Studio program under the SQL database file, and click Run the program.

  2. On the opened page, click "File → New → Project" in the upper left corner, select the "IntegrationService" project in the "sample\source\repo" project directory, fill in the name and location, and click OK.

(Figure 5: Integration Service Project)


2. Open an existing SSIS project file

Find the folder where you need to open the SSIS project and open the file. Find the file with the file extension .sln and run the file


3. Component introduction


(Figure 6: SSIS Component Introduction)


SSIS Toolbox: Like the name implies, this is your toolbox of available operations in SQL Server Integration Services (SSIS). This list is also context-aware and shows the tools for either the control flow or data flow based on where you’re currently working.


Control Flow: It is the SQL Server workflow engine that contains control flow elements. An SSIS package consists of at least one control flow task, and optionally one or more data flows to support repeating control flow tasks.


Data Flow: It is the most frequently used task in SSIS, which contains the data transformation logic (ETL processes). It moves the data from source to destination and adds transforms in them to merge, update or split data.


4. Creating SSIS package

All SSIS development is done in packages. Steps to create a package: In the solution explorer on the right, right-click the SSIS package folder and select New SSIS Package, so that you can start ETL

(Figure 7: Creating SSIS Package)


5. Next, build control flow and data flow according to the above topic

(Figure 7: Control Flow)


6. The OLE DB source is the data of the source data table in the title is employee_internet


(Figure 8: OLE DB)


7. Ensure that the search is to find the table employee that needs to be connected according to the source data, and connect according to the idCard, and the derived column is based on finding no matching output to derive a different column from the ID type of idType


8. Double-click the 'Execute Package' to run the package.


(Figure 9: Execute Package)


Through the above operations, this small example is completed, It's very simple!


Conclusion

This article is a brief introduction to SSIS, and it is explained according to a small example. If you want to better realize the process of ETL, you must have a good grasp of SQL statements so as to construct the target data source more conveniently and quickly. Even though it provides a way to write scripts to achieve the required functions, it is difficult to maintain in the later stage, so use SQL scripts with caution.



Tags: #SQL

Comments


©2018 by DeJia

bottom of page