Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

python - How to flatten a nested json array?

I need to flatten a JSON with different levels of nested JSON arrays in Python

Part of my JSON looks like:

{
  "data": {
    "workbooks": [
      {
        "projectName": "TestProject",
        "name": "wkb1",
        "site": {
          "name": "site1"
        },
        "description": "",
        "createdAt": "2020-12-13T15:38:58Z",
        "updatedAt": "2020-12-13T15:38:59Z",
        "owner": {
          "name": "user1",
          "username": "John"
        },
        "embeddedDatasources": [
          {
            "name": "DS1",
            "hasExtracts": false,
            "upstreamDatasources": [
              {
                "projectName": "Data Sources",
                "name": "DS1",
                "hasExtracts": false,
                "owner": {
                  "username": "user2"
                }
              }
            ],
            "upstreamTables": [
              {
                "name": "table_1",
                "schema": "schema_1",
                "database": {
                  "name": "testdb",
                  "connectionType": "redshift"
                }
              },
              {
                "name": "table_2",
                "schema": "schema_2",
                "database": {
                  "name": "testdb",
                  "connectionType": "redshift"
                }
              },
              {
                "name": "table_3",
                "schema": "schema_3",
                "database": {
                  "name": "testdb",
                  "connectionType": "redshift"
                }
              }
            ]
          },
          {
            "name": "DS2",
            "hasExtracts": false,
            "upstreamDatasources": [
              {
                "projectName": "Data Sources",
                "name": "DS2",
                "hasExtracts": false,
                "owner": {
                  "username": "user3"
                }
              }
            ],
            "upstreamTables": [
              {
                "name": "table_4",
                "schema": "schema_1",
                "database": {
                  "name": "testdb",
                  "connectionType": "redshift"
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

The output should like this

sample output

Tried using json_normalize but couldn't make it work. Currently parsing it by reading the nested arrays using loops and reading values using keys. Looking for a better way of normalizing the JSON

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a partial solution:

First save your data in the same directory as the script as a JSON file called data.json.

import json
import pandas as pd
from pandas.io.json import json_normalize

with open('data.json') as json_file:
    json_data = json.load(json_file)

new_data = json_data['data']['workbooks']

result = json_normalize(new_data, ['embeddedDatasources', 'upstreamTables'], ['projectName', 'name', 'createdAt', 'updatedAt', 'owner', 'site'], record_prefix='_')

result 

Output:

_name _schema _database.name _database.connectionType projectName name createdAt updatedAt owner site
0 table_1 schema_1 testdb redshift TestProject wkb1 2020-12-13T15:38:58Z 2020-12-13T15:38:59Z {'name': 'user1', 'username': 'John'} {'name': 'site1'}
1 table_2 schema_2 testdb redshift TestProject wkb1 2020-12-13T15:38:58Z 2020-12-13T15:38:59Z {'name': 'user1', 'username': 'John'} {'name': 'site1'}
2 table_3 schema_3 testdb redshift TestProject wkb1 2020-12-13T15:38:58Z 2020-12-13T15:38:59Z {'name': 'user1', 'username': 'John'} {'name': 'site1'}
3 table_4 schema_1 testdb redshift TestProject wkb1 2020-12-13T15:38:58Z 2020-12-13T15:38:59Z {'name': 'user1', 'username': 'John'} {'name': 'site1'}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...