Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

python - TypeError: Object of type 'bytes' is not JSON serializable

I just started programming Python. I want to use scrapy to create a bot,and it showed TypeError: Object of type 'bytes' is not JSON serializable when I run the project.

import json
import codecs

class W3SchoolPipeline(object):

  def __init__(self):
      self.file = codecs.open('w3school_data_utf8.json', 'wb', encoding='utf-8')

  def process_item(self, item, spider):
      line = json.dumps(dict(item)) + '
'
      # print line

      self.file.write(line.decode("unicode_escape"))
      return item

from scrapy.spiders import Spider
from scrapy.selector import Selector
from w3school.items import W3schoolItem

class W3schoolSpider(Spider):

    name = "w3school"
    allowed_domains = ["w3school.com.cn"]

    start_urls = [
        "http://www.w3school.com.cn/xml/xml_syntax.asp"
    ]

    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//div[@id="navsecond"]/div[@id="course"]/ul[1]/li')

    items = []
    for site in sites:
        item = W3schoolItem()
        title = site.xpath('a/text()').extract()
        link = site.xpath('a/@href').extract()
        desc = site.xpath('a/@title').extract()

        item['title'] = [t.encode('utf-8') for t in title]
        item['link'] = [l.encode('utf-8') for l in link]
        item['desc'] = [d.encode('utf-8') for d in desc]
        items.append(item)
        return items

Traceback:

TypeError: Object of type 'bytes' is not JSON serializable
2017-06-23 01:41:15 [scrapy.core.scraper] ERROR: Error processing       {'desc': [b'x
e4xbdxbfxe7x94xa8 XSLT xe6x98xbexe7xa4xba XML'],
 'link': [b'/xml/xml_xsl.asp'],
 'title': [b'XML XSLT']}

Traceback (most recent call last):
File  
"c:usersadministratorappdatalocalprogramspythonpython36libsite-p
ackageswistedinternetdefer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
File "D:LZZZZBw3schoolw3schoolpipelines.py", line 19, in process_item
    line = json.dumps(dict(item)) + '
'
File 
"c:usersadministratorappdatalocalprogramspythonpython36libjson\_
_init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
File 
"c:usersadministratorappdatalocalprogramspythonpython36libjsone
ncoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
File  
"c:usersadministratorappdatalocalprogramspythonpython36libjsone
ncoder.py", line 257, in iterencode
    return _iterencode(o, 0)
File      
"c:usersadministratorappdatalocalprogramspythonpython36lib
jsonencoder.py", line 180, in default
    o.__class__.__name__)
  TypeError: Object of type 'bytes' is not JSON serializable
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are creating those bytes objects yourself:

item['title'] = [t.encode('utf-8') for t in title]
item['link'] = [l.encode('utf-8') for l in link]
item['desc'] = [d.encode('utf-8') for d in desc]
items.append(item)

Each of those t.encode(), l.encode() and d.encode() calls creates a bytes string. Do not do this, leave it to the JSON format to serialise these.

Next, you are making several other errors; you are encoding too much where there is no need to. Leave it to the json module and the standard file object returned by the open() call to handle encoding.

You also don't need to convert your items list to a dictionary; it'll already be an object that can be JSON encoded directly:

class W3SchoolPipeline(object):    
    def __init__(self):
        self.file = open('w3school_data_utf8.json', 'w', encoding='utf-8')

    def process_item(self, item, spider):
        line = json.dumps(item) + '
'
        self.file.write(line)
        return item

I'm guessing you followed a tutorial that assumed Python 2, you are using Python 3 instead. I strongly suggest you find a different tutorial; not only is it written for an outdated version of Python, if it is advocating line.decode('unicode_escape') it is teaching some extremely bad habits that'll lead to hard-to-track bugs. I can recommend you look at Think Python, 2nd edition for a good, free, book on learning Python 3.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...