Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
631 views
in Technique[技术] by (71.8m points)

javascript - What is the best way to convert from CSV to JSON when commas and quotations may be in the fields?

I want to be able to convert a CSV to JSON. The csv comes in as free text like this (with the newlines):

name,age,booktitle
John,2,Hello World
Mary,3,""Alas, What Can I do?""
Joseph,5,"Waiting, waiting, waiting"

My problem as you can tell is the file...

  • Has got some interior commas in some fields, though they are wrapped in at least one double quote.
  • There could be double quotes within the file.

I would like the output to not have any leading and trailing quotes for each field... how can I correctly create a JSON object parsed out from the csv string that represents this CSV accurately? (without the leading and trailing quotes).

I usually use:

var mycsvstring;
var finalconvertedjson = {};
var headerfields = // get headers here
var lines = mycsvstring.split('
');


for(var i = 0; i < lines.length; i++) {
// loop through each line and set a key for each header field that corresponds to the appropriate lines[i]    
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

My first guess is to use a regular expression. You can try this one I've just whipped up (regex101 link):

/s*(")?(.*?)1s*(?:,|$)/gm

This can be used to extract fields, so headers can be grabbed with it as well. The first capture group is used as an optional quote-grabber with a backreference (1), so the actual data is in the second capture group.

Here's an example of it in use. I used a filter to cut off the last match in all cases, since allowing for blank fields with the * wildcard (things like f1,,f3) put a zero-width match at the end. This was easier to get rid of with JavaScript rather than with some regex trickery. Finally, I've got 'extra_i' as a default/placeholder value if there are some extra columns not accounted for by the headers. You should probably swap that part out to fit your own needs.

/**
 * Takes a raw CSV string and converts it to a JavaScript object.
 * @param {string} string The raw CSV string.
 * @param {string[]} headers An optional array of headers to use. If none are
 * given, they are pulled from the file.
 * @param {string} quoteChar A character to use as the encapsulating character.
 * @param {string} delimiter A character to use between columns.
 * @returns {object[]} An array of JavaScript objects containing headers as keys
 * and row entries as values.
 */
const csvToJson = (string, headers, quoteChar = '"', delimiter = ',') => {
  const regex = new RegExp(`\s*(${quoteChar})?(.*?)\1\s*(?:${delimiter}|$)`, 'gs');
  const match = string => [...string.matchAll(regex)].map(match => match[2])
    .filter((_, i, a) => i < a.length - 1); // cut off blank match at end

  const lines = string.split('
');
  const heads = headers || match(lines.splice(0, 1)[0]);

  return lines.map(line => match(line).reduce((acc, cur, i) => ({
    ...acc,
    [heads[i] || `extra_${i}`]: (cur.length > 0) ? (Number(cur) || cur) : null
  }), {}));
}

const testString = `name,age,quote
John,,Hello World
Mary,23,""Alas, What Can I do?""
Joseph,45,"Waiting, waiting, waiting"
"Donaldson Jones"   , sixteen,    ""Hello, "my" friend!""`;

console.log(csvToJson(testString));
console.log(csvToJson(testString, ['foo', 'bar', 'baz']));
console.log(csvToJson(testString, ['col_0']));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...