hadoop - AWS Glue issue with double quote and commas

Question

Welcome To Ask or Share your Answers For Others

hadoop - AWS Glue issue with double quote and commas

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

hadoop - AWS Glue issue with double quote and commas

I have this CSV file:

reference,address
V7T452F4H9,"12410 W 62TH ST, AA D"

The following options are being used in the table definition

ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'quoteChar'='"', 
  'separatorChar'=',')

but it still won't recognize the double quotes in the data, and that comma in the double quote fiel is messing up the data. When I run the Athena query, the result looks like this

reference     address
V7T452F4H9    "12410 W 62TH ST

How do I fix this issue?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:39:36+0000

Look like you also need to add escapeChar. AWS Athena docs shows this example:

CREATE EXTERNAL TABLE myopencsvtable (
   col1 string,
   col2 string,
   col3 string,
   col4 string
)
ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   'separatorChar' = ',',
   'quoteChar' = '"',
   'escapeChar' = '\'
   )
STORED AS TEXTFILE
LOCATION 's3://location/of/csv/';

Categories

hadoop - AWS Glue issue with double quote and commas

hadoop - AWS Glue issue with double quote and commas

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags