Pandas读取json文件

初学小白Lu

5385人浏览 · 2022-08-22 22:42:20

初学小白Lu · 2022-08-22 22:42:20 发布

文章目录

pandas.read_json
pandas.json_normalize
DataFrame.to_json
pandas.io.json.build_table_schema

Pandas对于Json文件操作的方法：

将 JSON 字符串转换为 pandas 对象。

read_json([path_or_buf, orient, typ, dtype, ...])

Normalize semi-structured JSON data into a flat table.

json_normalize(data[, record_path, meta, ...])

将对象转换为 JSON 字符串。

DataFrame.to_json([path_or_buf, orient, ...])

Create a Table schema from data.

build_table_schema(data[, index, ...])

pandas.read_json

pandas.read_json(path_or_buf=None, 
				orient=None, 
				typ='frame', 
				dtype=None, 
				convert_axes=None, 
				convert_dates=True, 
				keep_default_dates=True, 
				numpy=False, 
				precise_float=False, 
				date_unit=None, 
				encoding=None, 
				encoding_errors='strict', 
				lines=False, 
				chunksize=None, 
				compression='infer', 
				nrows=None, 
				storage_options=None)

参数：

path_or_buf：a valid JSON str, path object or file-like object
orient：str
typ：{‘frame’, ‘series’}, default ‘frame’
dtype：bool or dict, default None
convert_axes：bool, default None
convert_dates：bool or list of str, default True
keep_default_dates：bool, default True
numpy：bool, default False
precise_float：bool, default False
date_unit：str, default None
encoding：str, default is ‘utf-8’
encoding_errors：str, optional, default “strict”
lines：bool, default False。按行读取
chunksize：int, optional
compression：str or dict, default ‘infer’
nrows：int, optional
storage_options：dict, optional

返回值： Series or DataFrame
示例：
json文件内容：

[{"ttery":"[123]","issue":"20130801-3391"},{"ttery":"[123]","issue":"20130801-3390"},{"ttery":"[123]","issue":"20130801-3389"}]

# -*- coding: utf-8 -*-

import pandas as pd

file = open('ceshi.json', 'r', encoding='utf-8')

df = pd.read_json(file, orient='records')
df.to_excel('pandas处理ceshi-json.xlsx', index=False, columns=["ttery", "issue"])

pandas.json_normalize

pandas.json_normalize(data, 
				record_path=None, 
				meta=None, 
				meta_prefix=None, 
				record_prefix=None, 
				errors='raise', 
				sep='.', 
				max_level=None)

参数：

data：dict or list of dicts
record_path：str or list of str, default None
meta：list of paths (str or list of str), default None
meta_prefix：str, default None
record_prefix：str, default None
errors：{‘raise’, ‘ignore’}, default ‘raise’
sep：str, default ‘.’
max_level：int, default None

返回值： frame:DataFrame
示例：

data = [
    {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
    {"name": {"given": "Mark", "family": "Regner"}},
    {"id": 2, "name": "Faye Raker"},
]
pd.json_normalize(data)

id name.first name.last name.given name.family        name
0  1.0     Coleen      Volk        NaN         NaN         NaN
1  NaN        NaN       NaN       Mark      Regner         NaN
2  2.0        NaN       NaN        NaN         NaN  Faye Raker

data = [
    {
        "id": 1,
        "name": "Cole Volk",
        "fitness": {"height": 130, "weight": 60},
    },
    {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
    {
        "id": 2,
        "name": "Faye Raker",
        "fitness": {"height": 130, "weight": 60},
    },
]
pd.json_normalize(data, max_level=0)

id        name                        fitness
0  1.0   Cole Volk  {'height': 130, 'weight': 60}
1  NaN    Mark Reg  {'height': 130, 'weight': 60}
2  2.0  Faye Raker  {'height': 130, 'weight': 60}

data = [
    {
        "id": 1,
        "name": "Cole Volk",
        "fitness": {"height": 130, "weight": 60},
    },
    {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
    {
        "id": 2,
        "name": "Faye Raker",
        "fitness": {"height": 130, "weight": 60},
    },
]
pd.json_normalize(data, max_level=1)

id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60

data = [
    {
        "state": "Florida",
        "shortname": "FL",
        "info": {"governor": "Rick Scott"},
        "counties": [
            {"name": "Dade", "population": 12345},
            {"name": "Broward", "population": 40000},
            {"name": "Palm Beach", "population": 60000},
        ],
    },
    {
        "state": "Ohio",
        "shortname": "OH",
        "info": {"governor": "John Kasich"},
        "counties": [
            {"name": "Summit", "population": 1234},
            {"name": "Cuyahoga", "population": 1337},
        ],
    },
]
result = pd.json_normalize(
    data, "counties", ["state", "shortname", ["info", "governor"]]
)

name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich

DataFrame.to_json

DataFrame.to_json(path_or_buf=None, 
				orient=None, 
				date_format=None, 
				double_precision=10, 
				force_ascii=True, 
				date_unit='ms', 
				default_handler=None, 
				lines=False, 
				compression='infer', 
				index=True, 
				indent=None, 
				storage_options=None)

参数：

path_or_buf：str, path object, file-like object, or None, default None
orient：str
date_format：{None, ‘epoch’, ‘iso’}
double_precision：int, default 10
force_ascii：bool, default True
date_unit：str, default ‘ms’ (milliseconds)
default_handler：callable, default None
lines：bool, default False
compression：str or dict, default ‘infer’
index：bool, default True
indent：int, optional
storage_options：dict, optional

返回值： None or str
示例：

import json
df = pd.DataFrame(
    [["a", "b"], ["c", "d"]],
    index=["row 1", "row 2"],
    columns=["col 1", "col 2"],
)
result = df.to_json(orient="split")
parsed = json.loads(result)
json.dumps(parsed, indent=4)  

{
    "columns": [
        "col 1",
        "col 2"
    ],
    "index": [
        "row 1",
        "row 2"
    ],
    "data": [
        [
            "a",
            "b"
        ],
        [
            "c",
            "d"
        ]
    ]
}

pandas.io.json.build_table_schema

pandas.io.json.build_table_schema(data, index=True, primary_key=None, version=True)

参数：

data：Series,DataFrame
index：bool, default True
primary_key：bool or None, default True
version：bool, default True

返回值： schema：dict
示例：

df = pd.DataFrame(
    {'A': [1, 2, 3],
     'B': ['a', 'b', 'c'],
     'C': pd.date_range('2016-01-01', freq='d', periods=3),
    }, index=pd.Index(range(3), name='idx'))
build_table_schema(df)

{'fields': [{'name': 'idx', 'type': 'integer'}, {'name': 'A', 'type': 'integer'}, {'name': 'B', 'type': 'string'}, {'name': 'C', 'type': 'datetime'}], 'primaryKey': ['idx'], 'pandas_version': '1.4.0'}

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

2025科研新风向！Top5文献阅读管理工具全测评

GitCode 开源社区

文献管理神器大盘点：5款主流软件助你科研提效！

文献管理是科研的基础环节，选择一款合适的科研提效软件能让你的研究事半功倍。无论是EndNote的专业格式化、Zotero的免费灵活、Mendeley的云端协作、的中文支持，还是沁言学术的全流程AI辅助，每款工具都有独特优势，满足不同需求。尤其是沁言学术，凭借强大的文献管理功能、智能分类和写作支持，成为近年来新兴的科研效率利器，值得一试。科研之路漫漫，效率先行！建议每位大学生、研究生和科研人员根据自