CSV file with embedded commas

 5 Replies
 0 Subscribed to this topic
 52 Subscribed to this forum
Sort:
Author
Messages
Terry P
Veteran Member
Posts: 234
Veteran Member
    I have a CSV file being delivered by an outside agency. It is a CSV file in that commas separate the fields, but they are also putting the ALPHA fields wrapped in double quotes. A typical record might look like this:

    12345,"Terry","Pickering","this, that and others"

    I'm using standard data iterator activity to parse out each field. I also can remove the "quote" characters. The problem is when I encounter a record where they have embedded the comma in a field surrounded by quotes. The parser picks up the comma in the middle of the field.

    Any suggestions on how to get around this?
    Bob Canham
    Veteran Member
    Posts: 217
    Veteran Member
      I haven't found a solution to that (and would look forward to hearing from anyone who has). What we were able to do was get the outside agency to send it as pipe | delimited file rather than comma delimited. That took care of the problem for us.
      John Henley
      Posts: 3353
        The best solution I've used is regex (regular expression) in JavaScript. It's very powerful for parsing any type of string data. Not easiest thing to learn/grasp but it works well for what you're trying to do (trust me, I've been down this road
        Thanks for using the LawsonGuru.com forums!
        John
        jamesraceson
        Veteran Member
        Posts: 52
        Veteran Member
          Terry,

          If you are familar with regular expressions and Javascript, you could write one to look for the pattern [space comma] [comma space] and [space comma space]. You could put the code inside of an assign node after each pass for the record. Because most CSV files do not have a leading or trailing space for each field, if you found a match to your expression this would most likely be a comma inserted in somewhere in the data (in between the quotes). You would then replace the matches with just one space. I would run a check to make sure before committing any changes, but that should help get rid of the unwanted commas.
          Terry P
          Veteran Member
          Posts: 234
          Veteran Member
            Good suggestions. I like the last one. I could do a replace on the entire record and do a replace ", " with " ". This is then before I parse the individual fields. I'll post the solution if it works.
            Terry P
            Veteran Member
            Posts: 234
            Veteran Member
              That worked!

              I added an Assign between the two data iterator activities to a new variable with this code:

              CSVFile_outputData.replace(/, /g," ")

              Then I replaced the input for the second one to use this new variable.