Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
599 views
in Technique[技术] by (71.8m points)

join two json files based on common key with jq utility or alternative way from command line

I have 2 json files with common key and I need to JOIN them with jq utility or alternative way from command line As follows:

(for example: jq -join -key "id" jsonFile1 jsonFile2)

jsonFile1:

{"id":"10","data":"abc"}
{"id":"20","data":"xyz"}
{"id":"30","data":"qwe"}
{"id":"40","data":"wsx"}
{"id":"50","data":"zxc"}

jsonFile2:

{"id":"60","content":"ert"}
{"id":"40","content":"tgb"}
{"id":"10","content":"yui"}
{"id":"30","content":"ujm"}
{"id":"70","content":"rfv"}

output:

{"id":"10","data":"abc","content":"yui"}
{"id":"30","data":"qwe","content":"ujm"}
{"id":"40","data":"wsx","content":"tgb"}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This response illustrates an efficient approach using jq.

In the example, the value of .id in each object is a string and therefore in the first part of this response, it is assumed that the key is always string-valued (in the P.S., this assumption is relaxed).

It is also assumed that the "rows" can be combined without regard to conflicting values. (We use jq's + to combine objects.)

# hashJoin(a1; a2; field) expects a1 and a2 to be arrays of JSON objects
# and that for each of the objects, the field value is a string.
# A relational join is performed on "field".

def hashJoin(a1; a2; field):
  # hash phase:
  (reduce a1[] as $o ({};  . + { ($o | field): $o } )) as $h1
  | (reduce a2[] as $o ({};  . + { ($o | field): $o } )) as $h2
  # join phase:
  | reduce ($h1|keys[]) as $key
      ([]; if $h2|has($key) then . + [ $h1[$key] + $h2[$key] ] else . end) ;

hashJoin( $file1; $file2; .id)[]

Invocation:

$ jq -nc --slurpfile file1 file1.json --slurpfile file2 file2.json -f join.jq

Output:

{"id":"10","data":"abc","content":"yui"}  
{"id":"30","data":"qwe","content":"ujm"}
{"id":"40","data":"wsx","content":"tgb"}

P.S. Here is a still more efficient implementation of hashJoin/3, which relaxes all assumptions about the specified "key" except that it specify a valid key. Composite keys can be specified as arrays.

def hashJoin(a1; a2; key):
  def akey: key | if type == "string" then . else tojson end;
  def wrap: { (akey) : . } ;
  # hash phase:
  (reduce a1[] as $o ({};  . + ($o | wrap ))) as $h1
  | (reduce a2[] as $o 
      ( {};
        ($o|akey) as $v
        | if $h1[$v] then . + { ($v): $o } else . end )) as $h2
  # join phase:
  | reduce ($h2|keys[]) as $key
      ([];  . + [ $h1[$key] + $h2[$key] ] ) ;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...