json文件两种读取方式
这是一段两个人的对话标注抄本

[
    {
        "start_time": {
            "original": "0:00:00.611000"
        },
        "end_time": {
            "original": "0:00:05.760000"
        },
        "words": "+",
        "speaker": "",
        "location": "",
        "session_id": "MDT_F2F_223"
    },
    {
        "start_time": {
            "original": "0:00:05.760000"
        },
        "end_time": {
            "original": "0:00:09.755000"
        },
        "words": "不想上了,你知道吗?我都烦死了",
        "speaker": "SPK506",
        "location": "",
        "session_id": "MDT_F2F_223"
    },
    ...
    ]

json_file ='/data/magic_data/transcription/train/MDT_F2F_223.json'
cnt = 0
with open(json_file, 'r') as f:
    lines = f.readlines()
    print(lines)
    for line in lines:
        print(line)
        if cnt == 3:
            break
        else:
            cnt += 1
['[\n', '    {\n', '        "start_time": {\n', '            "original": "0:00:00.611000"\n', '        },\n', '        "end_time": {\n', '            "original": "0:00:05.760000"\n', '        },\n', '        "words": "+",\n', '        "speaker": "",\n', '        "location": "",\n', '        "session_id": "MDT_F2F_223"\n', '    },\n', '    {\n', '        "start_time": {\n', '            "original": "0:00:05.760000"\n', '        },\n', '        "end_time": {\n', '            "original": "0:00:09.755000"\n', '        },\n', '        "words": "不想上了,你知道吗?我都烦死了",\n', '        "speaker": "SPK506",\n', '        "location": "",\n', '        "session_id": "MDT_F2F_223"\n', '    },\n', '    {\n', '        "start_time": {\n', '            "original": "0:00:09.755000"\n', '        },\n', '        "end_time": {\n', '            "original": "0:00:10.376000"\n', '        },\n', '        "words": "[*]",\n', '        "speaker": "",\n', '        "location": "",\n', '        "session_id": "MDT_F2F_223"\n', '    },\n', '    {\n', '        "start_time": {\n', '            "original": "0:00:10.376000"\n', '        },\n', '        "end_time": {\n', '            "original": "0:00:11.995000"\n', '        },\n', '        "words": "诶,那你考研想考哪儿",\n', '   ']']
[

    {

        "start_time": {

            "original": "0:00:00.611000"

这是打印出来效果
原始文件中一行就是list一个item

如何读取json文件

方式1(常用)
import json
json_file ='/data/magic_data/transcription/train/MDT_F2F_223.json'
with open(json_file, 'r') as f:
    data1 = json.load(f)

用json模块load方法,load的参数是json文件对象

方式2
json_file ='/data/magic_data/transcription/train/MDT_F2F_223.json'
with open(json_file, 'r') as f:
    lines = f.readlines()
    json_str = ''.join(lines).replace('\n', '').replace(' ', '').replace(',}', '}')
    data2 = json.loads(json_str)

lines得到一个字符串列表,列表每一项的值就是json文件中一行,调用json的loads方法,传入的参数是字符串, json_str = ‘’.join(lines)可以简写成这样,不用替换换行和空格字符也行

以上两种读取方式都行,一般习惯用第一种,第二种也比较浪费内存。

返回数据

上面两种方式读取的json文件,返回一个字典列表,其实和打印原json文件长得差不多,打印data2或者data1显示如下

[{'start_time': {'original': '0:00:00.611000'},
  'end_time': {'original': '0:00:05.760000'},
  'words': '+',
  'speaker': '',
  'location': '',
  'session_id': 'MDT_F2F_223'},
 {'start_time': {'original': '0:00:05.760000'},
  'end_time': {'original': '0:00:09.755000'},
  'words': '不想上了,你知道吗?我都烦死了',
  'speaker': 'SPK506',
  'location': '',
  'session_id': 'MDT_F2F_223'},
 {'start_time': {'original': '0:00:09.755000'},
  'end_time': {'original': '0:00:10.376000'},
  'words': '[*]',
  'speaker': '',
  'location': '',
  'session_id': 'MDT_F2F_223'}]
Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐