<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"	> <channel><title>Comments on: Handling Embedded Text Qualifiers</title> <atom:link href="http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/feed/" rel="self" type="application/rss+xml" /><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/</link> <description>Technology Musings</description> <lastBuildDate>Sat, 04 Feb 2012 00:04:25 +0000</lastBuildDate> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>By: Iain Elder</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-578</link> <dc:creator>Iain Elder</dc:creator> <pubDate>Thu, 25 Aug 2011 13:07:43 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-578</guid> <description>Thank you for documenting this with reference to RFC 4180.I ran into this bug and had trouble explaining it to my colleagues. Part of the trouble comes from having no agreement on what a CSV file actually looks like.</description> <content:encoded><![CDATA[<p>Thank you for documenting this with reference to RFC 4180.</p><p>I ran into this bug and had trouble explaining it to my colleagues. Part of the trouble comes from having no agreement on what a CSV file actually looks like.</p> ]]></content:encoded> </item> <item><title>By: Forrest Towne</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-524</link> <dc:creator>Forrest Towne</dc:creator> <pubDate>Wed, 09 Mar 2011 17:13:54 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-524</guid> <description>Taylor, I figured out the column conversion.  I needed to not use the n prefix in defining the table fields.Now the code runs, however all the rows in up going to the error table.I am assuming that the regex parsing is not correct for my data.My data is comma delimenated fields.Not all field values are double quotes around the data.  Which I believe you handle.The fields that are delimenated with double quotes may have embedded commas or double quotes.  It appears the embedded quotes are escaped with another double quote.I assume that I can check for the embedded extra quote after removing the delimenating double quotes.All that being said it appears that the regex expression is where I need help.Forrest</description> <content:encoded><![CDATA[<p>Taylor,<br /> I figured out the column conversion.  I needed to not use the n prefix in defining the table fields.</p><p>Now the code runs, however all the rows in up going to the error table.</p><p>I am assuming that the regex parsing is not correct for my data.</p><p>My data is comma delimenated fields.</p><p>Not all field values are double quotes around the data.  Which I believe you handle.</p><p>The fields that are delimenated with double quotes may have embedded commas or double quotes.  It appears the embedded quotes are escaped with another double quote.</p><p>I assume that I can check for the embedded extra quote after removing the delimenating double quotes.</p><p>All that being said it appears that the regex expression is where I need help.</p><p>Forrest</p> ]]></content:encoded> </item> <item><title>By: Taylor Gerring</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-523</link> <dc:creator>Taylor Gerring</dc:creator> <pubDate>Wed, 09 Mar 2011 16:57:00 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-523</guid> <description>@ForrestThe issue is with how picky SSIS is with data types. There&#039;s a discrepancy between what is read from the file (ASCII string) and where it should be stored in the database (Unicode string). If these assumptions are true, you can simply add a Data Conversion component in the stream, converting the offending column from ASCII to Unicode with an appropriate code page (probably 1252). Just make sure to modify the database inset component to use this new, Unicode column.</description> <content:encoded><![CDATA[<p>@Forrest</p><p>The issue is with how picky SSIS is with data types. There&#8217;s a discrepancy between what is read from the file (ASCII string) and where it should be stored in the database (Unicode string). If these assumptions are true, you can simply add a Data Conversion component in the stream, converting the offending column from ASCII to Unicode with an appropriate code page (probably 1252). Just make sure to modify the database inset component to use this new, Unicode column.</p> ]]></content:encoded> </item> <item><title>By: Forrest Towne</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-522</link> <dc:creator>Forrest Towne</dc:creator> <pubDate>Wed, 09 Mar 2011 16:09:34 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-522</guid> <description>Taylor, Excellent article, thank you.I am getting a conversion error when I write it to the database table.  The data coming out appears to be DT_STR and the database table wants DT_WSTR.  How should by database columns be defined.  They are current nchar(##), shold they be nvarchar(##).Forrest</description> <content:encoded><![CDATA[<p>Taylor,<br /> Excellent article, thank you.</p><p> I am getting a conversion error when I write it to the database table.  The data coming out appears to be DT_STR and the database table wants DT_WSTR.  How should by database columns be defined.  They are current nchar(##), shold they be nvarchar(##).</p><p>Forrest</p> ]]></content:encoded> </item> <item><title>By: SQL Lion</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-399</link> <dc:creator>SQL Lion</dc:creator> <pubDate>Sun, 04 Apr 2010 19:13:59 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-399</guid> <description>To get the workaround and Step by Step description for developing SSIS package in order to overcome the issue with SSIS while importing text files with Flat File Connection Manager and  Flat File Source where the &quot;Row Delimiter&quot; property does not work properly for rows having NULL or empty values, follow the below link: &lt;a href=&quot;//www.sqllion.com/2010/04/ssis-vs-text-file-importing-1/”&quot; rel=&quot;nofollow&quot;&gt; http://www.sqllion.com/2010/04/ssis-vs-text-file-importing-1/ &lt;/a&gt; Thanks, SQL Lion</description> <content:encoded><![CDATA[<p>To get the workaround and Step by Step description for developing SSIS package in order to overcome the issue with SSIS while importing text files with Flat File Connection Manager and  Flat File Source where the &#8220;Row Delimiter&#8221; property does not work properly for rows having NULL or empty values, follow the below link:<br /> <a href="//www.sqllion.com/2010/04/ssis-vs-text-file-importing-1/”" rel="nofollow"><br /> </a><a href="http://www.sqllion.com/2010/04/ssis-vs-text-file-importing-1/" rel="nofollow">http://www.sqllion.com/2010/04/ssis-vs-text-file-importing-1/</a><br /> Thanks,<br /> SQL Lion</p> ]]></content:encoded> </item> <item><title>By: Marco</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-199</link> <dc:creator>Marco</dc:creator> <pubDate>Tue, 02 Jun 2009 17:08:34 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-199</guid> <description>Andy, I found a way to solve your problem:&lt;pre lang=&quot;csharp&quot;&gt;public override void Input0_ProcessInputRow(Input0Buffer Row) { Regex rCSV = new Regex(&quot;,(?=(?:[^\&quot;]*\&quot;[^\&quot;]*\&quot;)*(?![^\&quot;]*\&quot;))&quot;); Regex rQout = new Regex(&quot;\&quot;\&quot;&quot;);string[] fields = rCSV.Split(Row.line);if (fields.Length == EXPECTED_FIELDS) { // Case 1: Quoted field with &quot;&quot; embedded text qualifiers Row.FIELD1 = Format(field[0], rQout); // Case 2: Non-quoted field Row.FIELD2 = Convert.ToDESIRED_TYPE(field[1]); } }public static string Format(string input, Regex rQout) { return (!String.IsNullOrEmpty(input)) ? rQout.Replace(input.Substring(1, input.Length - 2), &quot;\&quot;&quot;) : &quot;&quot;; }&lt;/pre&gt;-------------------------------------------------------This has also the case where there are emtpy fields:For example&quot;Hi&quot;,123,,&quot;my name is Marco the &quot;&quot;programmer&quot;&quot;&quot;,&quot;other text&quot;Will produce the output:Hi 123my name is Marco the &quot;programmer&quot; other text</description> <content:encoded><![CDATA[<p>Andy, I found a way to solve your problem:</p><div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #0600FF; font-weight: bold;">override</span> <span style="color: #6666cc; font-weight: bold;">void</span> Input0_ProcessInputRow<span style="color: #008000;">&#40;</span>Input0Buffer Row<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
        Regex rCSV <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> Regex<span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;,(?=(?:[^<span style="color: #008080; font-weight: bold;">\&quot;</span>]*<span style="color: #008080; font-weight: bold;">\&quot;</span>[^<span style="color: #008080; font-weight: bold;">\&quot;</span>]*<span style="color: #008080; font-weight: bold;">\&quot;</span>)*(?![^<span style="color: #008080; font-weight: bold;">\&quot;</span>]*<span style="color: #008080; font-weight: bold;">\&quot;</span>))&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        Regex rQout <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> Regex<span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\&quot;</span><span style="color: #008080; font-weight: bold;">\&quot;</span>&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> fields <span style="color: #008000;">=</span> rCSV<span style="color: #008000;">.</span><span style="color: #0000FF;">Split</span><span style="color: #008000;">&#40;</span>Row<span style="color: #008000;">.</span><span style="color: #0000FF;">line</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">if</span> <span style="color: #008000;">&#40;</span>fields<span style="color: #008000;">.</span><span style="color: #0000FF;">Length</span> <span style="color: #008000;">==</span> EXPECTED_FIELDS<span style="color: #008000;">&#41;</span>
        <span style="color: #008000;">&#123;</span>
                <span style="color: #008080; font-style: italic;">// Case 1: Quoted field with &quot;&quot; embedded text qualifiers</span>
                Row<span style="color: #008000;">.</span><span style="color: #0000FF;">FIELD1</span> <span style="color: #008000;">=</span> Format<span style="color: #008000;">&#40;</span>field<span style="color: #008000;">&#91;</span><span style="color: #FF0000;">0</span><span style="color: #008000;">&#93;</span>, rQout<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
                <span style="color: #008080; font-style: italic;">// Case 2: Non-quoted field</span>
                Row<span style="color: #008000;">.</span><span style="color: #0000FF;">FIELD2</span> <span style="color: #008000;">=</span> Convert<span style="color: #008000;">.</span><span style="color: #0000FF;">ToDESIRED_TYPE</span><span style="color: #008000;">&#40;</span>field<span style="color: #008000;">&#91;</span><span style="color: #FF0000;">1</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #0600FF; font-weight: bold;">static</span> <span style="color: #6666cc; font-weight: bold;">string</span> Format<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span> input, Regex rQout<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">return</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">!</span><span style="color: #6666cc; font-weight: bold;">String</span><span style="color: #008000;">.</span><span style="color: #0000FF;">IsNullOrEmpty</span><span style="color: #008000;">&#40;</span>input<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">?</span> rQout<span style="color: #008000;">.</span><span style="color: #0000FF;">Replace</span><span style="color: #008000;">&#40;</span>input<span style="color: #008000;">.</span><span style="color: #0000FF;">Substring</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">1</span>, input<span style="color: #008000;">.</span><span style="color: #0000FF;">Length</span> <span style="color: #008000;">-</span> <span style="color: #FF0000;">2</span><span style="color: #008000;">&#41;</span>, <span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\&quot;</span>&quot;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">:</span> <span style="color: #666666;">&quot;&quot;</span><span style="color: #008000;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div><p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</p><p>This has also the case where there are emtpy fields:</p><p>For example</p><p>&#8220;Hi&#8221;,123,,&#8221;my name is Marco the &#8220;&#8221;programmer&#8221;"&#8221;,&#8221;other text&#8221;</p><p>Will produce the output:</p><p>Hi<br /> 123</p><p>my name is Marco the &#8220;programmer&#8221;<br /> other text</p> ]]></content:encoded> </item> <item><title>By: Andy Galbraith</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-180</link> <dc:creator>Andy Galbraith</dc:creator> <pubDate>Mon, 04 May 2009 19:57:05 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-180</guid> <description>The biggest problem in the file I am fighting is that everything does not have a qualifier around it:&quot;Clark, Bob&quot;,&quot;123 Main St&quot;,1234.00,45,&quot;Something Else&quot;,.......so I am having trouble writing the appropriate parsing function because I cannot parse on just comma or on quote-comma-quote!Thanks for listening to my frustration...{-:</description> <content:encoded><![CDATA[<p>The biggest problem in the file I am fighting is that everything does not have a qualifier around it:</p><p>&#8220;Clark, Bob&#8221;,&#8221;123 Main St&#8221;,1234.00,45,&#8221;Something Else&#8221;,&#8230;.</p><p>&#8230;so I am having trouble writing the appropriate parsing function because I cannot parse on just comma or on quote-comma-quote!</p><p>Thanks for listening to my frustration&#8230;{-:</p> ]]></content:encoded> </item> <item><title>By: Taylor Gerring</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-175</link> <dc:creator>Taylor Gerring</dc:creator> <pubDate>Tue, 28 Apr 2009 17:21:31 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-175</guid> <description>@OrlandoYes, thanks for pointing out that a line break as data would not work for this solution due to parsing each line as a single record.And you&#039;re right insofar as using DTS. This is what infuriates many people, is that this very old tool more correctly parses CSV than SSIS, despite the longstanding issue. At this point, it&#039;s clear this is not a priority for Microsoft, so unless someone develops a custom Data Flow Source, all we can do is wait and hope for a patch or fix in the next version of SQL Server.</description> <content:encoded><![CDATA[<p>@Orlando</p><p>Yes, thanks for pointing out that a line break as data would not work for this solution due to parsing each line as a single record.</p><p>And you&#8217;re right insofar as using DTS. This is what infuriates many people, is that this very old tool more correctly parses CSV than SSIS, despite the longstanding issue. At this point, it&#8217;s clear this is not a priority for Microsoft, so unless someone develops a custom Data Flow Source, all we can do is wait and hope for a patch or fix in the next version of SQL Server.</p> ]]></content:encoded> </item> <item><title>By: Orlando</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-174</link> <dc:creator>Orlando</dc:creator> <pubDate>Tue, 28 Apr 2009 02:42:42 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-174</guid> <description>As described in RFC 4180 setion 2 item 7 (http://tools.ietf.org/html/rfc4180#section-2) any characters may appear between text-qualifiers...including line breaks. Your solution will suffice for rows that exist on one line, however csv files can have lines such as this:1,&quot;Hello, this field is a &quot;&quot;real&quot;&quot; pain!&quot;,&quot;4/27/2009&quot;Yes, that&#039;s one row where:Field 1 = 1 Field 2 (represented on one line with line break escaped for readability)  = Hello, this field \r\nis a &quot;real&quot; pain! Field 3 = 4/27/2009The destination table is:create table dbo.LogInfo ( RecordID int, LogInfo varchar(500), LogDateTime datetime )I have looked into reading the file where each line equates to a single column and parsing from there as you suggested however the embedded line break prevents me from using that method.Any further pointers on how to import csv files using SSIS would be much appreciated. Against some long-standing personal bias I am actually considering recommending using DTS, a 10+ year old technology, to solve the issue since is does a capable job of parsing and importing csv files and SSIS does not provide an easy path to process what many would consider a most common file format.Thanks for reading.</description> <content:encoded><![CDATA[<p>As described in RFC 4180 setion 2 item 7 (<a href="http://tools.ietf.org/html/rfc4180#section-2" rel="nofollow">http://tools.ietf.org/html/rfc4180#section-2</a>) any characters may appear between text-qualifiers&#8230;including line breaks. Your solution will suffice for rows that exist on one line, however csv files can have lines such as this:</p><p>1,&#8221;Hello, this field<br /> is a &#8220;&#8221;real&#8221;" pain!&#8221;,&#8221;4/27/2009&#8243;</p><p>Yes, that&#8217;s one row where:</p><p>Field 1 = 1<br /> Field 2 (represented on one line with line break escaped for readability)  = Hello, this field \r\nis a &#8220;real&#8221; pain!<br /> Field 3 = 4/27/2009</p><p>The destination table is:</p><p>create table dbo.LogInfo<br /> (<br /> RecordID int,<br /> LogInfo varchar(500),<br /> LogDateTime datetime<br /> )</p><p>I have looked into reading the file where each line equates to a single column and parsing from there as you suggested however the embedded line break prevents me from using that method.</p><p>Any further pointers on how to import csv files using SSIS would be much appreciated. Against some long-standing personal bias I am actually considering recommending using DTS, a 10+ year old technology, to solve the issue since is does a capable job of parsing and importing csv files and SSIS does not provide an easy path to process what many would consider a most common file format.</p><p>Thanks for reading.</p> ]]></content:encoded> </item> <item><title>By: Dennis</title><link>http://www.ideaexcursion.com/2008/11/12/handling-embedded-text-qualifiers/comment-page-1/#comment-88</link> <dc:creator>Dennis</dc:creator> <pubDate>Tue, 03 Feb 2009 22:24:49 +0000</pubDate> <guid isPermaLink="false">http://www.ideaexcursion.com/?p=88#comment-88</guid> <description>Oh, yes. It works now. Thanks.</description> <content:encoded><![CDATA[<p>Oh, yes. It works now. Thanks.</p> ]]></content:encoded> </item> </channel> </rss>
<!-- Served from: www.ideaexcursion.com @ 2012-02-07 08:34:33 by W3 Total Cache -->
