Note: This post assumes you’re working with SQL Server 2008. This can likely be adapted to SQL Server 2005, but the code is written in C#, which 2005 does not support.
It seems that SSIS can’t handle embedded text qualifiers when importing from a flat file. What is an embedded text qualifier? Let’s say you have a CSV file with a few fields. To thwart problems with commas inside your text fields causing confusion with your column delimiters, you implement a text qualifiers - typically double quotes. It makes a row look something like this:
"12036","Company Name, Inc.","555-555-1234","3.14159"
That’s fine, but what happens when the field also contains quotes, such as this:
"52665","Best "Kept" Secret Storage Facility","555-555-9876","2.71828"
Now, depending on how the parser interprets this line, it could see the double-quote before “Kept” and expect a field terminator (a comma in our case). Well, it so happens, that for whatever reason, Integration Services exhibits this type of behavior and will fail on this type of line. It’s a well-documented bug and oft-requested fix, but even as of SQL Server 2008, the issue is still present (Note: This apparently should be a supported configuration according to RFC 4180). Ideally, you can simply import a different format - using a different text qualifier is probably the easiest change (ever tried the thorn?). If this is not feasible, a variety of other solutions have been suggested, the most flexible being, writing your own Script Transformation to custom-parse the rows.
The setup is simple: configure the Flat File Source to import the whole record as a single field. Pipe that output to your script component, and then connect that to the destination database or rest of your process. Here’s what my Data Flow Task looks like:

Data Flow Task Overview
And here are the settings I used for my Flat File Connection:
- General
- Text qualifier: <none>
- Header row delimiter: {CR}{LF}
- Columns
- Row delimiter: {CR}{LF}
- Column delimiter: [blank]
- Advanced
- Name your column. My example user the unimaginative name, “line”.
- DataType: string [DT_STR]
- OutputColumnWidth: 4000
And the steps to get started:
- Go ahead and add a Flat File Source to your Data Flow Task and configure it to use this Flat File Connection
- Now, place a Script Component on your Data Flow and select “Transformation” when prompted
- Drag the green output arrow from Flat File Source to Script Component
- Edit the Script Component and switch over to the “Inputs and Outputs” page
- Rename the output to “Match Rows”, or anything else of your choosing.
- Set ExclusionGroup = 1
- Add the correct output columns, with necessary names and types. This is very important, yet annoyingly mundane. Add an extra column for the line number with type of “four-byte signed integer [DT_I4]“. I called mine MatchLineNum.
- Add another output and call it “Error Rows” or anything else of your choosing.
- Set ExclusionGroup = 1
- Add only two columns, ErrorLine (string [DT_STR] 8000) and ErrorLineNum (four-byte signed integer [DT_I4]).
Now, a quick explanation on what all this was for: Because we are only importing the whole row, we want to configure the Script Component to output the individual columns. We did this part when we completed steps 5-7. Next, we configured an additional output for rows that we are not able to successfully parse. This is why we defined the Error Rows output in steps 8-10. Be aware, that if you’re supremely confident that you can parse every row, you could optionally skip the creation of the Error Rows output, but I would advise against it.
Now, we will create the actual script to parse the rows manually.
- Switch back to the Script page and click the “Edit Script…” button. This will open up a script editor that looks like a stripped-down Visual Studio.
- Take note of the code below, and I will explain it below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
| /* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
//Declare LineNum to keep track of the number of rows
int LineNum;
public override void PreExecute()
{
base.PreExecute();
//Initalize LineNum
LineNum = 0;
}
public override void PostExecute()
{
base.PostExecute();
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
//Increment LineNum to keep track of the number of rows
LineNum += 1;
//split the input column. This will be highly dependent on the format and will probably need adjustment
string[] columns = System.Text.RegularExpressions.Regex.Split(Row.line, "(?<=\"),(?=\")");
//If the number of elements is not expected, we assume there was a problem
if (columns.Length != 4)
{
Row.ErrorLine = Row.line;
Row.ErrorLineNum = LineNum;
Row.DirectRowToErrorRows();
}
//Everything looks good, let's move forward
else
{
Row.MatchLineNum = LineNum;
Row.ID = StripQualifier(columns[0], "\"");
Row.Company = StripQualifier(columns[1], "\"");
Row.PhoneNumber = StripQualifier(columns[2], "\"");
Row.FavoriteIrrationalNumber = StripQualifier(columns[3], "\"");
Row.DirectRowToMatchRows();
}
}
public static string StripQualifier(string InputString, string Qualifier)
{
//This is a helper function only to remove surrounding text qualifiers
string OutputString;
if(
InputString.Substring(0, Qualifier.Length) == Qualifier
&& InputString.Substring(InputString.Length - Qualifier.Length, Qualifier.Length) == Qualifier
)
OutputString = InputString.Substring(Qualifier.Length, InputString.Length - (2 * Qualifier.Length));
else
OutputString = InputString;
return OutputString;
}
} |
- Line 14: Declare LineNum for counting our position in the file.
- Line 21: Initialize LineNum to 0. In the above script, we increment before parsing, so the first record will have a LineNum of 1. If you happen to have a header row and would prefer it to be 2 to coincide with the actual lines in your flat file, feel free to change the initalization to -1.
- Line 31: Increment LineNum for keeping track of our position in the file
- Line 35: This is the real nugget to the whole post. It utilizes regular expressions to correctly parse out the row, even with an embedded text qualifier. This is a fairly simplistic implementation. If you need to tweak the parsing routine to your situation do it here!
- Line 37: In my example I am expecting 4 fields, so I check that my string array contains four entries. If not, I assume it’s just a bad row and will inspect later.
- Line 39-40: Set the appropriate output columns.
- Line 41: Force the rows to the “Error Rows” output. Remember that extra output we created? It’s specifically for this situation, so we can redirect the “bad” rows to a different location.
- Line 44: If we have 4 elements in our array, assume the best! You probably want to perform additional sanity checks here, including wrapping the whole thing in a try-catch block, but for our example, I’m keeping it simple.
- Line 47-50: Set the appropriate output columns
- Line 51: Redirect output to “Match Rows”
- Line 56-71: This is simply a helper method to strip the text qualifiers off the text. Discussing it is outside the scope of this post, but I’ve tested it and it works fine.
Be sure and build the script (Ctrl+Shift+B or use the Build menu) before closing out. If not, you will receive a validation error. After a successful build, close and click “OK” on the Script Transformation Editor dialog. You can now connect the outputs from your Script Component to any destination, be it an OLE DB Destination or another flat file. The important thing to note is that you connect the Error Rows output to analyze any potential problems. You have access to both the record number and the original line read in. This should provide you with sufficient information to start analyzing the problem, and hopefully fix the parsing routine in the Script Component to deal with it.
Note, that performing this extra Script Component severely affected the processing rate versus SSIS natively supporting the import. Again, the best solution would be to correct the source, if possible.