Sunday, 9 October 2016

SSIS – One Variable To Rule Them All

Variable Expressions

Back in the DTS days, in order to change the value of a variable we need to write VBScripts.  These scripts were at best irritating to write and were wrought with opportunities for misspellings and unrunnable code.
Enter stage right with SSIS and its variables, which can be evaluated as expressions.  This was a vast improvement over DTS and all of the VBScript that we had in the past was chucked.
But how exactly do we evaluate variables as expressions.  First, what does it mean to do this?  When a variable is evaluated as an expression, the value for the variable is calculated at run-time based on SSIS package properties, system properties, and other variables to determine a value for the variable.
From here though, let’s see how this works with our own variable.  To start create a package and open up the Variables window.  In this window create a new variable named SimpleVariable.

Once the variable is created bring up the Properties window for the variable.  There will be a number of things to choose from in the properties.  The item we are concerned with is EvaluateAsExpression.  Follow the red arrow below if cannot find the property.  Change this value to True.

When you change the EvaluateAsExpression property to True, an ellipse will be enabled (at the red arrow below) that will allow you to open the expression builder for the variable.

Click on the ellipse and the Expression Builder window will appear.  This is pictured below.  There are a couple areas to pay attention to when building expressions.  First, the area in green below contains all of the system and package variables that are available.  If you are unsure on the name of a variable just browse through this list to retrieve one.  Next, in the yellow area, has a list of all of the operations that are possible in expressions.  These can be a bit tricky to get used to at first and I’ll dedicate a later post to some of their nuances.

For our expression at the moment, type in “1 + 2”.  This is written in the Expression box above.  Selecting the Evaluate Expression button will return the result of 3.

This was a simple explanation of how to build an expression for a variable.  Now onto something much more useful.

The One Variable

Yesterday’s post focused on the configuration of the SSIS environment form a folder perspective.  The reason for that post was because I wanted to talk today about how I will often use one variable in a package as a fulcrum for setting many other variables and configuring other portions of the SSIS package.
In the Folder Structure post, I talked about the WorkFiles folder.  Often I include a variable that has this value which I use to set the values for files and folders that the package must interact with.  The rest of this post will demonstrate this practice.
In an SSIS package, create the following variables; WorkFolder, ArchiveFolder, FileName, LogFolder, and ResponseFile.  Set the value for the WorkFolder variable to “c:\temp”.  At this point your Variables window should look like this:

Now change the EvaluateAsExpression property for ArchiveFolder, FileName, LogFolder, and ResponseFile like we did in the previous example.  What we want to do now is create expressions for these variables based on the value in WorkFolder.  Type in the following expressions for each of the variables:
  • ArchiveFolder: @[User::WorkFolder] + “\\Archive\\”
  • FileName: @[User::WorkFolder] + “\\ImportFile.txt”
  • LogFolder: @[User::WorkFolder] + “\\Log\\”
  • ResponseFile: @[User::WorkFolder] + “\\Response\\”
Once these are all set, the values for each of the variables should change to the following

As you can see, the WorkFolder variable now controls the values for each of the other variables.  By just changing the value of WorkFolder all of the locations for all of the files can be changed.

Variable Wrap-Up

Using a variable or two to control the configuration of your SSIS packages can is relatively simple to do.  In the demo above, the one variable reduced the amount of configuration points needed for the package.  Instead of setting all of the values individually, they are set by the one variable.
This can help reduce configuration changes when you are moving your SSIS packages from environment to environment.  It can also reduce the amount of work required when troubleshooting a package since only a few variable will vary between environments.
Now that I’ve laid out some ground work, be prepared for a few packages in the next few days that I’ve used to solve business problems.


SSIS Package

Thank You.

Tuesday, 31 May 2016

SSIS – Folder Structures

The Folder Structure

The SSIS folder structure is usually contained under a single root directory.  For simplicity, I often name this folder SSISYeah, I’m creative.  Off of the root folder, I’ll add folders for Packages and WorkFiles.
The Packages folder often contains many folders underneath it.  These folders are for all of the SSIS projects that have been deployed.  For every project there will be a Project folder.  The Project folder may have other folder beneath them, such as a Configurations folder.
The WorkFiles folder will be similar to the Packages folder in the sense that there will be Project folders below it.  TheProject folders will have other folders such as ArchiveLogsResponse, or Temporary.
If you are good at visualizing, the folder structure will look like this diagram:

The Folder Purpose

Now that this has been described, less put some definition and understanding behind these folders.
  • Packages: Everything under the Packages folder is considered “code”.  Because of this status these folders are under change management control.  People who are not the gatekeepers should only be able to browse these folders, if that.  These folders should be backed up on a regular basis.
  • Project (under Packages): These folders contain all of the SSIS package files (dtsx) and configuration files (dtsconfig) for each deployed project.  A project could be a single SSIS project or a collections of projects based on a client’s business.  Typically, I store SSIS packages outside of the MSDB database in the file system.  This has worked best for change management and with security in the environments.
  • WorkFiles: Everything under the WorkFiles folder is considered temporary.  These folders will store files to be imported, exported, or log files.  Business users are often granted access to these folders depending on their projects and whether they need to pick-up, drop-off, or review files.
  • Project (under WorkFiles): These folders contain project, same project definition from above, related files.  They can be logs, data received for import, or data exported.
  • Archive (Workfiles –> Project): After a file has been imported by an SSIS package, the imported file is dated and placed in this folder.
  • Logs (Workfiles –> Project): Log files generated from the SSIS package and the SQL Agent job steps are placed here.  These need to be in a place where more than those running the jobs can find them.
  • Response (Workfiles –> Project): When the SSIS package has data that cannot be imported and is rejected for some reason – that data will go in this folder.
  • Temporary (Workfiles –> Project): This folder stores the raw files that are generated during an SSIS package execution.  This keeps them out of the way and is an easy place to just delete everything in the unlikely scenario that there are space constraints.

Folder Structure Wrap-Up

There are countless other folders that you could add, but this should cover a lot of the basics.  These have been most of the basics that I try to use.
This has been about how I do things.  The tips and tricks I’ve learned to keep things going and to reduce headaches.  These aren’t necessarily the right answers, but they are my answers.

SSIS - Data Flow Breakpoints

 Data Viewers Definition
Data Viewers are containers that can contain a single SSIS buffer’s work.  As the data passes through the Data Path you are able to view what is in each buffer.  As each buffer passes into the Data Viewer, the processing of the SSIS package on that Data Path is paused.  In essence, you have a breakpoint.  But not really.
There are four types of data viewers.  These are:
  • Grid: Displays selected columns display in a tabular format.
  • Histogram: Displays distribution of numeric values and works only with numeric data.
  • Scatter plot: Displays scatter plot graph works only with numeric data.  You are able to set the x-axis and y-axis.
  • Chart format: Displays a column chart of the count of discrete values in the selected column.
To be perfectly honest, I’ve only ever found a use for the Grid Data Viewer.  There may be situations that I’m unaware of that the others are of great benefit.  Unfortunately, I’ve never come across those.

Data Viewer Setup

Now setting up a Data Viewer on a Data Path is fairly simple.  First, go into the Data Flow and select the Data Path that you want to pause the data processing.  You can right-click to open the properties or double click on the Data Path.  Just like make sure your cursor is on the red “X” below.
Once you have that, the Data Flow Path Editor will open.  From this screen you can tab into the Data Viewers tab to get the options for Data Viewers.
The types of Data Viewers will appear on the left-hand side of the screen.  It’s shown below at the green arrow.  All four of the types from above will be listed.  As I mentioned, the only one I use is the Grid.  The other three have never been of use to me.  I’ve looked at them a few times and just thought they were pretty.
Before moving on, it’s advisable if you are putting these into your SSIS package to name them with something meaningful.  The blue arrow and bubble above indicate the place where the name can be typed
Now this next step is optional but I do recommend that you consider this.  When you create a Data Viewer Grid you have the option to remove some columns from the grid.  This can be extremely handy when the Data Path has more than a handful of columns.  Since you are likely adding the Data Viewer to look for specific values, you may as well only return the values that are of use.
When you are finished, select OK and you’ll return to the Data Flow.  From here you will notice that a little icon has been added to the Data Flow Path to indicate that there is one or more Data Viewers on the path.
Yes, that’s correct.  You can have more than one Data Viewer on the same Data Flow Path.  Depending on what you are troubleshooting it may be worthwhile have two or three Data Viewers that return different columns.

Data Viewer In Use

Once you have a Data Viewer in place on a package, go ahead an fire it up.  The execution of the package will change when there are Data Viewers.
When the package begins, a contained will open for each Data Viewer.  By default, these will float over the window.  If you choose, you can dock these, that is up to you.
On the container there are a few things to take notice of.  First, in the image below the yellow area is all of the data currently contained in the Data Viewer.  Within it you can highlight one row or multiple rows.
If you want to copy out the data, you need to use the Copy Data button, indicated at the red arrow.  The use of Crtl-C to copy doesn’t do a dang thing.  And that sucks.
If the data you are looking for isn’t in the current buffer, you can move the package forward to the next buffer.  You do this with the Play button located at the orange arrow.  Selecting this will clear the Data Viewer and it will populate with the next buffer of data.
Lastly, if you are done with the Data Viewer and want to stop using it you can detach it from the Data Flow Path.  This is done at the green arrow.  By detaching, the Data Viewer retains it’s current data while the package continues with processing the rest of the Data Flow.
As I mentioned, Data Viewers are a lot like Breakpoints.  Due to this, you can use the Breakpoint window to disable, enable, and delete data viewers in your project.  This can be extremely handy when you have a half dozen or more data viewers you just want to get rid of.

Data Viewers Wrap-Up

Now that we’ve gone through this, I hope you can see the power and benefit of using Data Viewers.  When you are troubleshooting an SSIS package trying to figure out where some of the data is becoming out of wack.
Above, I mentioned a couple bits where I compared Breakpoints to Data Viewers.  In a lot of regards, they function similar to one another.  There is one way, though, in which they are very different.  When a breakpoint is encountered, the entire package execution is paused until the break is cleared.  When a Data Viewer is encountered only the data in that Data Flow Path is paused.  While these can be in effect the same thing, in more complicated packages this might cause unexpected results.
Finally, you may have trouble using Data Viewers when you are processing large volumes of data.  If the row that is causing the issue is a few thousand or million rows into the data set you will likely want to do a little bit more than just add a Data Viewer.  You may need filter the data some to make the Data Viewer useful for troubleshooting.


Thank You

Monday, 25 April 2016

SSIS – Using Breakpoints

Using Breakpoints

Much like running projects in other languages through Visual Studio, you can also stop SSIS packages mid-execution when specific events occur in the Control Flow.  This ability allows you to stop the execution of a package to investigate and troubleshoot the state of the SSIS package.
There are ten events in the Control Flow (definitions from Books Online):
  • OnPreExecute: Called when a task is about to execute. This event is raised by a task or a container immediately before it runs.
  • OnPostExecute: Called immediately after the execution logic of the task finishes. This event is raised by a task or container immediately after it runs.
  • OnError: Called by a task or container when an error occurs.
  • OnWarning: Called when the task is in a state that does not justify an error, but does warrant a warning.
  • OnInformation: Called when the task is required to provide information.
  • OnTaskFailed: Called by the task host when it fails.
  • OnProgress: Called to update progress about task execution.
  • OnQueryCancel: Called at any time in task processing when you can cancel execution.
  • OnVariableValueChanged: Called by the Integration Services runtime when the value of a variable changes. The RaiseChangeEvent of the variable must be set to true to raise this event.
  • OnCustomEvent: Called by tasks to raise custom task-defined events.
I’ve mentioned this twice now, and thought it should be called out yet again.  Breakpoints function only at the Control Flow.  If you need to break the action in a Data Flow, this isn’t the solution for that.  This is called foreshadowing.
Breakpoint Demonstration
To implement a breakpoint in the Control Flow, start by right-clicking on a task in the Control Flow.  In the menu, the option to Edit Breakpoints will be about half way down this list.

Select this menu item and you can choose between any of the breakpoints described above.  For this example, we’ll select the OnPostExecute breakpoint.  As mentioned above, this event will fire after the task’s execution has completed.

Let’s run the package now to see what happens.  First you’ll notice that the the task has a red rot added to it, this indicates that there is a breakpoint on the task.  Next when the breakpoint is reached, a little yellow arrow is added to the breakpoint icon.

Once a breakpoint has been used there are a number of things that can be used to get some information on the breakpoint.  To start with there is the Call Stack window.  This window provides information on where the breakpoint is located.

Next there is the Breakpoints window.  Guess what it has?  Yes, all of the breakpoints in your project will be listed here.  There are couple points of interest and points of usefulness with this window.
First, the area in blue is all of the breakpoints in your package.  From this point you can enable and disable them.  A very useful feature in case you have many breakpoints in your package.
Second, the area in red is a control panel for your breakpoints.  There are options to delete single breakpoints, delete all breakpoints in a project, disable all breakpoints in the project, and change the columns returned.  There is additional functionality that can be used as well in advanced scenarios.

There is also an Output window that displays messaged on the progress of the package and breakpoint.  This information will help determine what has been run in the package when the breaks occur.

Finally, the most important piece that breakpoints bring to you is the Locals window.  This window will contain information on the current status of the package execution and the amount of time in which the package has been executing.

Most importantly, it contains all of the package variables.  Often when I need to troubleshoot some odd package behavior it is related to variables not being populated as I thought they should.  This is the golden ticket to finding out what the package is doing during execution.

Breakpoint Wrap-Up
Breakpoints are an essential tool in troubleshooting SSIS packages.  Take time to understand how they work and practice using them a few times so that when the time comes you are ready to pull them out.  They’ve gotten me through a few package executions and will probably save the day for you some time in the future.

This post scratches the surface on using breakpoints.  There is a lot more that can be covered and other scenarios for their use.  But those will have to wait for another day.  Maybe one of the other 29 more posts in this series.


Saturday, 23 April 2016

SSIS - Raw File Format

Raw File Format
Starting with the definition, the raw file format is the data format that is native to SSIS.  When data is being consumed by the data flows of an SSIS package it has a format.  This format is the same that appears in raw files.
Since these are the same file format, SSIS doesn’t have to parse or translate the data between the data streams or the raw file format.  This results in the raw file format being faster than most other data source and destination formats
The raw file format can be incredibly useful.  It can be used for either a data source or a data destination.  Once you’ve gotten your hands wet with them, you’ll find a great many uses.  Some of which may solve issues that you’ve run into with previous development projects.
For instance if you want to build an SSIS package that processes data in multiple steps, temporary versions of the data can be stored in raw files and passed between data flows.  This will allow packages to be built such that they can be restarted at different points in the data processing.
Similar to the previous example, raw files can be used to exchange data between SSIS packages.  You can develop one package to convert a clients file to a standardized structure in the raw file format.  And then use a couple other packages to read that format in the next packages that process the data in a standard format.  All of this can be done without hitting the database.
Another thing you can do is export data from a database to a raw file to use a reference file for multiple SSIS packages.  If you need to cycle through a group of IDs or states multiple time, store the data in a raw file and use it as a data cache.  Sure you can use a Lookup Cache instead – but that doesn’t fit all scenarios and not everyone is using SQL Server 2008.
Let’s move onto a couple demonstrations

Raw File Format Destination

First off, let’s build an SSIS package that uses a Raw File as destination.  We’ll fast forward through adding the Data Flow and the OLE DB Source.  The query for the source doesn’t matter for this demonstration so we’ll skip over that too.
Let’s begin.  Start by dragging a Raw File Destination onto the Data Flow and connecting the OLE DB Source to it.  The package should look like this:

From here, open the Raw File Destination so that we can configure it.  The first thing you will need to configure is the location for the raw file.  This can either be a static value or tied to a variable.  In most cases I use variables to improve configurability between environments

Next you’ll configure how you want the Raw File Destination to behave when it writes.  The write option selection is important since it will have significant impact on how the destination is populated

There are a few options for this, they are (definitions from Books Online):
  • Append: Appends data to an existing file. The metadata of the appended data must match the file format.
  • Create always: Always creates a new file.
  • Create once: Creates a new file. If the file exists, the component fails.
  • Truncate and append: Truncates an existing file and then writes the data to the file. The metadata of the appended data must match the file format.
Missing from this list is a Create First Time or Create Metadata option.  As you begin to use the file format more, you’ll realize that this is an important option to have.  Please, help out by voting for a metadata option for raw files.
At this point, you can move to the next screen and select the columns you need.  There is a select all checkbox for SQL Server 2008 and forward, which is quite handy.  For SQL Server 2005, you need to select all of the columns individually.

One extremely nice thing about the Raw File Destination is that the order of the columns in it don’t matter.  Whether you add a column now or in six months it’s position in the raw data file doesn’t matter.  Or if you forget to add a column that data source expects and it is setup for the source as the first column.
Execute the package and you will have a Raw File created.

Raw File Format Source

Now that we have a file to work with, let’s create a quick package that can import some data from a raw file.  As before, we’ll skip going through adding the Data Flow and then Raw File Source and Row Sampling task.  The Row Sampling task is just there to provide a place to add a data viewer if you desire to see the data that is returned.

Configuration for the the Raw File Source is similar to the Raw File Destination.  Open up the data source and first you determine the location for the raw file.  Same options as above with either be a static value or tied to a variable.

Next you’ll select the columns that you want returned.  They can be selected through the check boxes or by typing their names in the grid below.  The column names can also be changed at this point in the Output Column column.

At this point the Raw File Data Source is configured.  It knows the underlying data so there isn’t any need to call up and discuss this with the person that created the raw file, if it isn’t you.  The steps are simple and not likely to get you bogged down in details – since there really aren’t any.

Raw File Wrap-Up

Now that we’ve got through this far, I hope that you see some of the value is using the raw files.  I’ve got a couple upcoming posts in this series that will go deeper into their actual use, but for now this should get you started.
One bit of advice that I’ll leave you with.  After the data types for you data source or data destination are set it is wise to change the ValidateExternalMetadata property to False.  This prevents some pesky data type and file availability issues that can come up during validation and helps prevent accidentally changing data types.

References Download Package

Thank You

Friday, 15 April 2016

Tips To Get A Salary Increment

Asking for a pay hike can be a tricky affair. Even after a lot of discussions with your boss, you might end up having nothing more than mere assurances.

Your appraisal depends on a lot of factors, some of which you have no control over. A recession hit market, company's financial loss, etc are examples of some such reasons.

However, there still a whole lot of things that are indeed in your control and you can try to maneuver these factors to get the kind of hike that you are looking at. Here's what you can do:

List down your responsibilities:

Make a bulleted list of all the responsibilities handled by you in the last one year. Do not miss out on any task. Then try explaining the problems faced by you while performing them and the way adopted to overcome the difficulties.

Self initiated efforts are always acknowledged with respect. Therefore make sure you dedicate enough time for the list.

Make a note of the extra work:

It is important to have undertaken additional responsibilities in your work-life. This not only shows your dedication towards work but also the capability to accommodate more work within the allotted time. Your willingness and confidence will help your boss to build trust in you.
Speak about the added skills: 
Sometimes you need to learn certain skills do be able to perform your current job better. Point out any skill-building activities you've done or professional training courses you've taken. Any knowledge transfer or training to other team members should also be spoken about during the performance review meetings.

Prepare yourself:

Collect your thoughts and prepare yourself for the meeting. Get your points ready and keep all inferences to work at the back of your mind. Remember to use positive terms and sound confident in the meeting. Present your case clearly and logically.

Talk to the decision-maker, not HR, if you can: 

Often people throw themselves into negotiations with people who don't have any power over pay. It's a waste of time. Many times the company appoints a person from the HR team to have pay discussions, who might then pass pay rise requests up the line, but often key messages are lost or distorted. So try and find out who are the key decision makers and also look at ways that you can speak with them.

Get the data:

It is important to know what others in similar positions are being paid by competitors and the employer in question via salary surveys from recruiters or word of mouth.
This will give you an idea how less or more you are getting in terms of market competition.

Pick The Right Time:

People pick the right time and a quiet, private place for having a salary discussion and book a meeting rather than casually stopping by to ensure they get the decision-maker's attention.
You don't want to be going in when the person you are dealing with has had bad news or there's a crisis in the organization. The right time will also depend on the work cycle in the organization.

Don't just accept the first offer:

Whether you should take a first offer from an employer depends on what was offered and your circumstances.

Read the person's mine you are dealing with for salary negotiations and be open to accepting a good salary offer, particularly in the current economic climate.

But in senior executive positions, sales roles, and other jobs where people are expected to be good at bargaining, taking a first offer can be a big mistake, as employers tend to anticipate bargaining in such roles and build this into their first offer.

Be ready. Don't wait for someone to recognize your achievements. Step forward and request for your promotion.

Thank You