Parsing CSV with double quotes

 9 Replies
 3 Subscribed to this topic
 52 Subscribed to this forum
Sort:
Author
Messages
AC_Slater
Basic Member
Posts: 6
Basic Member

    Hi All,

    I'm hoping someone else might have had to tackle a similar issue, and seeing as though my regex skills are pretty rusty, I thought I'd reach out.

    I am currently using the Data Iterator to go through a CSV file to break up the file by line.  Normally, I would use a second data iterator to go through and break up the fields by their delimiter, which is a comma, but unfortunately some of the fields have valid commas within their data:

    For instance:

    1234,Cake,,"Smith,John",,"Status: Acknowledge,Accept",,Red,,

     

    In the example above, "Smith,John" and "Status: Acknowledge, Accept" should both not be split by the comma.  My hope is that someone might have some javascript experience that would assist me in getting the fields broken up correctly.  Please note that just because one line might have a relevant comma, that same field on the next line may not contain such a comma.

    For instance:

    5678,Car,,"Ross,Bob",,Status: Reject,,Blue,,

    I currently have the outputData of the line data iterator being assigned to a string using in Assign node.  My initial thought is finding some way to replace all the commas with pipes, so it would look something like this:

    1234|Cake||"Smith,John"||"Status, Acknowledge,Accept"||Red||

    5678|Car||"Ross,Bob"||Status: Reject||Blue||

    But I'm not sure the best way to approach it.  Any recommendations?  Thank you all for taking the time to read.

     

     

    Attachments
    pgallucci12
    Basic Member
    Posts: 5
    Basic Member
      Not sure if this is an option for you but I always ask for PIPE | field delimeters (instead of commas). That takes care of the embedded comma issue.
      David Williams
      Veteran Member
      Posts: 1127
      Veteran Member
        I thought th Data Iterator had an option for quoted text?
        David Williams
        AC_Slater
        Basic Member
        Posts: 6
        Basic Member

          The CSV is actually created via the PO122, so unless we make a custom program and alter the 4GL, I think we are locked into comma-delimited.

           

          AC_Slater
          Basic Member
          Posts: 6
          Basic Member

            I believe you can parse by a string, but since the values aren't static, I'm not sure how that would work.  And if I were to parse by just ", then I would run into issues where the dataOutput would look something like this:

             

            outputData1=1234,Cake,,
            outputData2=Smith,John
            outputData3=,,
            outputData4=Status: Acknowledge,Accept
            outputData5,,Red,,

             

            Then trying to accurately parse through the data would become increasingly difficult, as I wouldn't have logic that specified which fields had quotations.

            steve finger
            Veteran Member
            Posts: 47
            Veteran Member

              It's been a little while since i've invoked "workdef", but if i'm not mistaken  you are allowed to select your delimiter character when defining the workfile.  Again....unless i'm mistaken, you don't even have to recompile the program.... but i'm not betting any money on that one.  

              AC_Slater
              Basic Member
              Posts: 6
              Basic Member

                Interesting thought.  Do you mean jobdef though?  I updated the jobdef and changed the delimiter to a pipe.  Would fields with commas in them remain unchanged?

                 

                I attached the jobdef config.

                Attachments
                steve finger
                Veteran Member
                Posts: 47
                Veteran Member
                  Like i said earlier....it's been a while since i've played in that sandbox....you need to empirically demonstrate that what i THINK happens is really the truth.

                  If you change the delimiter using jobdef, it should hold for that JOB - until the delimiter is changed again for that JOB via jobdef. If you change the delimiter via workdef, that delimiter then becomes the default for that PROGRAM.

                  Once again.....better test that. If as i mentioned earlier that you don't need to recompile for the delimiter to be changed, then that would support the notion that workdef would change the overall default delimiter and jobdef would change the delimiter for that job only.
                  Karen Sheridan
                  Veteran Member
                  Posts: 142
                  Veteran Member
                    Are you Unix or Windows? There is a Unix command that can be used to change the commas to pipes.  Google SED
                    steve finger
                    Veteran Member
                    Posts: 47
                    Veteran Member
                      it doesn't matter whether they are windows or unix. they are working with IPA which has javascript. There are various ways to accomplish the substitution using javascript