2007年01月23日 星期二 23:36
if __name__ == "__main__":
print "¿ªÊ¼......"
conn = httplib.HTTPConnection('www.baidu.com')
conn.request("GET","/index.html")
response = conn.getresponse()
html=response.read()
conn.close()
print html
´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓÃ
html=response.read().encode('gbk')
½á¹û£¬ÔËÐÐʱ´íÎó
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 124:
ordinal not in range(128)
ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070123/55e69caf/attachment.htm
2007年01月23日 星期二 23:48
html=response.read().decode('gbk')
»òÕß
html=response.read().decode('utf-8')
ÄãÊÔÒ»ÊÔ°É
On 1/23/07, ¿¡½Ü²Ì <yzcaijunjie在gmail.com> wrote:
>
> if __name__ == "__main__":
> print "¿ªÊ¼......"
> conn = httplib.HTTPConnection('www.baidu.com')
> conn.request("GET","/index.html")
> response = conn.getresponse()
> html=response.read()
> conn.close()
> print html
>
> ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓÃ
> html=response.read().encode('gbk')
> ½á¹û£¬ÔËÐÐʱ´íÎó
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 124:
> ordinal not in range(128)
>
> ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070123/9a9fe461/attachment.html
2007年01月24日 星期三 14:28
html=response.read().decode('gbk')ÌáʾÈçÏ´íÎó£º
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-13:
ordinal not in range(128)
html=response.read().decode('utf-8')ÌáʾÈçÏ´íÎó£º
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0:
unexpected code byte
On 1/23/07, junyi sun <ccnusjy在gmail.com> wrote:
>
> html=response.read().decode('gbk')
> »òÕß
> html=response.read().decode('utf-8')
> ÄãÊÔÒ»ÊÔ°É
>
>
> On 1/23/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote:
>
> > if __name__ == "__main__":
> > print "¿ªÊ¼......"
> > conn = httplib.HTTPConnection('www.baidu.com')
> > conn.request("GET","/index.html")
> > response = conn.getresponse()
> > html=response.read()
> > conn.close()
> > print html
> >
> > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓÃ
> > html=response.read().encode('gbk')
> > ½á¹û£¬ÔËÐÐʱ´íÎó
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position
> > 124: ordinal not in range(128)
> >
> > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request在lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070124/0a16dbb7/attachment.htm
2007年01月24日 星期三 15:46
试试看这个:
import httplib
if __name__ == "__main__":
print "开始......"
conn = httplib.HTTPConnection('www.baidu.com')
conn.request("GET","/index.html")
response = conn.getresponse()
html=response.read()
html = unicode(html, 'gb2312')
conn.close()
print html
ps:
我用原始程序,没有遇到楼主的乱码问题。检查了html的内容,发现百度是传递gb2312编码给我的。可能楼主需要先判断一下传过来的是什么编码,然后再用对应的codec。
On 1/24/07, 俊杰蔡 <yzcaijunjie在gmail.com> wrote:
>
> html=response.read().decode('gbk')提示如下错误:
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 0-13: ordinal not in range(128)
>
> html=response.read().decode('utf-8')提示如下错误:
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0:
> unexpected code byte
>
>
>
>
> On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote:
> >
> > html=response.read().decode('gbk')
> > 或者
> > html= response.read().decode('utf-8')
> > 你试一试吧
> >
> >
> > On 1/23/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote:
> >
> > > if __name__ == "__main__":
> > > print "开始......"
> > > conn = httplib.HTTPConnection('www.baidu.com')
> > > conn.request("GET","/index.html")
> > > response = conn.getresponse()
> > > html=response.read()
> > > conn.close()
> > > print html
> > >
> > > 打印html,里面中文显示为乱码。我也尝试过使用
> > > html=response.read().encode('gbk')
> > > 结果,运行时错误
> > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position
> > > 124: ordinal not in range(128)
> > >
> > > 请问这是什么原因呢?
> > >
> > > _______________________________________________
> > > python-chinese
> > > Post: send python-chinese在lists.python.cn
> > > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > > Unsubscribe: send unsubscribe to
> > > python-chinese-request在lists.python.cn
> > > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> > >
> >
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request在lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
>
>
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
--
Best Regards,
Archer
Ming Zhe Huang
-------------- 下一部分 --------------
一个HTML附件被移除...
URL: http://python.cn/pipermail/python-chinese/attachments/20070124/85fd65c4/attachment.html
2007年01月24日 星期三 16:26
Ææ¹ÖÁË£¬ÎÒ»¹ÊÇûÓÐÄÜͨ¹ý£¬ÎÒʹÓõÄÊÇeclipse+Pydev£¬µÃµ½µÄ´íÎóÐÅÏ¢ÊÇ£º
¿ªÊ¼......
Traceback (most recent call last):
File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, in ?
print html
UnicodeEncodeError: 'ascii' codec can't encode characters in position 19-32:
ordinal not in range(128)
»¹ÊDZàÂëÎÊÌ⣬ÄѵÀºÍÎÒϵͳÓйأ¿ ÎÒʹÓõÄÊÇUbuntu
On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote:
>
> ÊÔÊÔ¿´Õâ¸ö£º
> import httplib
>
> if __name__ == "__main__":
> print "¿ªÊ¼......"
> conn = httplib.HTTPConnection('www.baidu.com')
> conn.request("GET","/index.html")
> response = conn.getresponse()
> html=response.read()
> html = unicode(html, 'gb2312')
> conn.close()
> print html
>
> ps:
> ÎÒÓÃÔʼ³ÌÐò£¬Ã»ÓÐÓöµ½Â¥Ö÷µÄÂÒÂëÎÊÌâ¡£¼ì²éÁËhtmlµÄÄÚÈÝ£¬·¢ÏÖ°Ù¶ÈÊÇ´«µÝgb2312±àÂë¸øÎҵġ£¿ÉÄÜÂ¥Ö÷ÐèÒªÏÈÅжÏһϴ«¹ýÀ´µÄÊÇʲô±àÂ룬ȻºóÔÙÓöÔÓ¦µÄcodec¡£
>
> On 1/24/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote:
> >
> > html=response.read().decode('gbk')ÌáʾÈçÏ´íÎó£º
> > UnicodeEncodeError: 'ascii' codec can't encode characters in position
> > 0-13: ordinal not in range(128)
> >
> > html=response.read().decode('utf-8')ÌáʾÈçÏ´íÎó£º
> > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0:
> > unexpected code byte
> >
> >
> >
> >
> > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote:
> > >
> > > html=response.read().decode('gbk')
> > > »òÕß
> > > html= response.read().decode('utf-8')
> > > ÄãÊÔÒ»ÊÔ°É
> > >
> > >
> > > On 1/23/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote:
> > >
> > > > if __name__ == "__main__":
> > > > print "¿ªÊ¼......"
> > > > conn = httplib.HTTPConnection('www.baidu.com')
> > > > conn.request("GET","/index.html")
> > > > response = conn.getresponse()
> > > > html=response.read()
> > > > conn.close()
> > > > print html
> > > >
> > > > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓÃ
> > > > html=response.read().encode('gbk')
> > > > ½á¹û£¬ÔËÐÐʱ´íÎó
> > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position
> > > > 124: ordinal not in range(128)
> > > >
> > > > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿
> > > >
> > > > _______________________________________________
> > > > python-chinese
> > > > Post: send python-chinese在lists.python.cn
> > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > > > Unsubscribe: send unsubscribe to
> > > > python-chinese-request在lists.python.cn
> > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> > > >
> > >
> > >
> > > _______________________________________________
> > > python-chinese
> > > Post: send python-chinese在lists.python.cn
> > > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > > Unsubscribe: send unsubscribe to
> > > python-chinese-request在lists.python.cn
> > > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> > >
> >
> >
> > _______________________________________________
> > python-chinese
> > Post: send python-chinese在lists.python.cn
> > Subscribe: send subscribe to python-chinese-request在lists.python.cn
> > Unsubscribe: send unsubscribe to
> > python-chinese-request在lists.python.cn
> > Detail Info: http://python.cn/mailman/listinfo/python-chinese
> >
>
>
>
> --
> Best Regards,
>
> Archer
>
> Ming Zhe Huang
> _______________________________________________
> python-chinese
> Post: send python-chinese在lists.python.cn
> Subscribe: send subscribe to python-chinese-request在lists.python.cn
> Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
> Detail Info: http://python.cn/mailman/listinfo/python-chinese
>
-------------- 下一部分 --------------
Ò»¸öHTML¸½¼þ±»ÒƳý...
URL: http://python.cn/pipermail/python-chinese/attachments/20070124/49c50183/attachment.htm
2007年01月24日 星期三 17:08
那你可以看看乱码的html页面里面的head上的encoding是什么吧?可能ubuntu上不是gb2312,gbk On 1/24/07, 俊杰蔡 <yzcaijunjie在gmail.com> wrote: > > 奇怪了,我还是没有能通过,我使用的是eclipse+Pydev,得到的错误信息是: > 开始...... > Traceback (most recent call last): > File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, in ? > print html > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 19-32: ordinal not in range(128) > > 还是编码问题,难道和我系统有关? 我使用的是Ubuntu > > > On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > > > 试试看这个: > > import httplib > > > > if __name__ == "__main__": > > print "开始......" > > conn = httplib.HTTPConnection(' www.baidu.com') > > conn.request("GET","/index.html") > > response = conn.getresponse() > > html=response.read() > > html = unicode(html, 'gb2312') > > conn.close() > > print html > > > > ps: > > 我用原始程序,没有遇到楼主的乱码问题。检查了html的内容,发现百度是传递gb2312编码给我的。可能楼主需要先判断一下传过来的是什么编码,然后再用对应的codec。 > > > > On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > html=response.read().decode('gbk')提示如下错误: > > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > > 0-13: ordinal not in range(128) > > > > > > html=response.read().decode('utf-8')提示如下错误: > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: > > > unexpected code byte > > > > > > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > > > html=response.read().decode('gbk') > > > > 或者 > > > > html= response.read().decode('utf-8') > > > > 你试一试吧 > > > > > > > > > > > > On 1/23/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > > > > if __name__ == "__main__": > > > > > print "开始......" > > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > > conn.request("GET","/index.html") > > > > > response = conn.getresponse() > > > > > html=response.read() > > > > > conn.close() > > > > > print html > > > > > > > > > > 打印html,里面中文显示为乱码。我也尝试过使用 > > > > > html=response.read().encode('gbk') > > > > > 结果,运行时错误 > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > > > > > position 124: ordinal not in range(128) > > > > > > > > > > 请问这是什么原因呢? > > > > > > > > > > _______________________________________________ > > > > > python-chinese > > > > > Post: send python-chinese在lists.python.cn > > > > > Subscribe: send subscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Unsubscribe: send unsubscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > -- > > Best Regards, > > > > Archer > > > > Ming Zhe Huang > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -- Best Regards, Archer Ming Zhe Huang -------------- 下一部分 -------------- 一个HTML附件被移除... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/343bf3fe/attachment.html
2007年01月24日 星期三 22:13
encodingÊÇgb2312¡£¿ÉÊÇΪɶ²»ÄÜÕý³£ÏÔÊ¾ÄØ£¿ On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > ÄÇÄã¿ÉÒÔ¿´¿´ÂÒÂëµÄhtmlÒ³ÃæÀïÃæµÄheadÉϵÄencodingÊÇʲô°É£¿¿ÉÄÜubuntuÉϲ»ÊÇgb2312,gbk > > On 1/24/07, ¿¡½Ü²Ì <yzcaijunjie在gmail.com> wrote: > > > > Ææ¹ÖÁË£¬ÎÒ»¹ÊÇûÓÐÄÜͨ¹ý£¬ÎÒʹÓõÄÊÇeclipse+Pydev£¬µÃµ½µÄ´íÎóÐÅÏ¢ÊÇ£º > > ¿ªÊ¼...... > > Traceback (most recent call last): > > File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, in > > ? > > print html > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > 19-32: ordinal not in range(128) > > > > »¹ÊDZàÂëÎÊÌ⣬ÄѵÀºÍÎÒϵͳÓйأ¿ ÎÒʹÓõÄÊÇUbuntu > > > > > > On 1/24/07, Mingzhe Huang < archerzz在gmail.com> wrote: > > > > > > ÊÔÊÔ¿´Õâ¸ö£º > > > import httplib > > > > > > if __name__ == "__main__": > > > print "¿ªÊ¼......" > > > conn = httplib.HTTPConnection(' www.baidu.com') > > > conn.request("GET","/index.html") > > > response = conn.getresponse() > > > html=response.read() > > > html = unicode(html, 'gb2312') > > > conn.close() > > > print html > > > > > > ps: > > > ÎÒÓÃÔʼ³ÌÐò£¬Ã»ÓÐÓöµ½Â¥Ö÷µÄÂÒÂëÎÊÌâ¡£¼ì²éÁËhtmlµÄÄÚÈÝ£¬·¢ÏÖ°Ù¶ÈÊÇ´«µÝgb2312±àÂë¸øÎҵġ£¿ÉÄÜÂ¥Ö÷ÐèÒªÏÈÅжÏһϴ«¹ýÀ´µÄÊÇʲô±àÂ룬ȻºóÔÙÓöÔÓ¦µÄcodec¡£ > > > > > > On 1/24/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > html=response.read().decode('gbk')ÌáʾÈçÏ´íÎó£º > > > > UnicodeEncodeError: 'ascii' codec can't encode characters in > > > > position 0-13: ordinal not in range(128) > > > > > > > > html=response.read().decode('utf-8')ÌáʾÈçÏ´íÎó£º > > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position > > > > 0: unexpected code byte > > > > > > > > > > > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > > > > > html=response.read().decode('gbk') > > > > > »òÕß > > > > > html= response.read().decode('utf-8') > > > > > ÄãÊÔÒ»ÊÔ°É > > > > > > > > > > > > > > > On 1/23/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > > if __name__ == "__main__": > > > > > > print "¿ªÊ¼......" > > > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > > > conn.request("GET","/index.html") > > > > > > response = conn.getresponse() > > > > > > html=response.read() > > > > > > conn.close() > > > > > > print html > > > > > > > > > > > > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓà > > > > > > html=response.read().encode('gbk') > > > > > > ½á¹û£¬ÔËÐÐʱ´íÎó > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > > > > > > position 124: ordinal not in range(128) > > > > > > > > > > > > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿ > > > > > > > > > > > > _______________________________________________ > > > > > > python-chinese > > > > > > Post: send python-chinese在lists.python.cn > > > > > > Subscribe: send subscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Unsubscribe: send unsubscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > python-chinese > > > > > Post: send python-chinese在lists.python.cn > > > > > Subscribe: send subscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Unsubscribe: send unsubscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > -- > > > Best Regards, > > > > > > Archer > > > > > > Ming Zhe Huang > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > -- > Best Regards, > > Archer > > Ming Zhe Huang > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -------------- 下一部分 -------------- Ò»¸öHTML¸½¼þ±»ÒƳý... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/21fdeedd/attachment-0001.htm
2007年01月24日 星期三 22:40
使用html = unicode(html, 'gb2312')也不行? 那可能是ubuntu的环境没设置好吧,特别是在console下。 On 1/24/07, 俊杰蔡 <yzcaijunjie在gmail.com> wrote: > > encoding是gb2312。可是为啥不能正常显示呢? > > On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > > > 那你可以看看乱码的html页面里面的head上的encoding是什么吧?可能ubuntu上不是gb2312,gbk > > > > On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > 奇怪了,我还是没有能通过,我使用的是eclipse+Pydev,得到的错误信息是: > > > 开始...... > > > Traceback (most recent call last): > > > File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, > > > in ? > > > print html > > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > > 19-32: ordinal not in range(128) > > > > > > 还是编码问题,难道和我系统有关? 我使用的是Ubuntu > > > > > > > > > On 1/24/07, Mingzhe Huang < archerzz在gmail.com> wrote: > > > > > > > > 试试看这个: > > > > import httplib > > > > > > > > if __name__ == "__main__": > > > > print "开始......" > > > > conn = httplib.HTTPConnection(' www.baidu.com') > > > > conn.request("GET","/index.html") > > > > response = conn.getresponse() > > > > html=response.read() > > > > html = unicode(html, 'gb2312') > > > > conn.close() > > > > print html > > > > > > > > ps: > > > > 我用原始程序,没有遇到楼主的乱码问题。检查了html的内容,发现百度是传递gb2312编码给我的。可能楼主需要先判断一下传过来的是什么编码,然后再用对应的codec。 > > > > > > > > On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > html=response.read().decode('gbk')提示如下错误: > > > > > UnicodeEncodeError: 'ascii' codec can't encode characters in > > > > > position 0-13: ordinal not in range(128) > > > > > > > > > > html=response.read().decode('utf-8')提示如下错误: > > > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in > > > > > position 0: unexpected code byte > > > > > > > > > > > > > > > > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > > > > > > > html=response.read().decode('gbk') > > > > > > 或者 > > > > > > html= response.read().decode('utf-8') > > > > > > 你试一试吧 > > > > > > > > > > > > > > > > > > On 1/23/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > > > > if __name__ == "__main__": > > > > > > > print "开始......" > > > > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > > > > conn.request("GET","/index.html") > > > > > > > response = conn.getresponse() > > > > > > > html=response.read() > > > > > > > conn.close() > > > > > > > print html > > > > > > > > > > > > > > 打印html,里面中文显示为乱码。我也尝试过使用 > > > > > > > html=response.read().encode('gbk') > > > > > > > 结果,运行时错误 > > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > > > > > > > position 124: ordinal not in range(128) > > > > > > > > > > > > > > 请问这是什么原因呢? > > > > > > > > > > > > > > _______________________________________________ > > > > > > > python-chinese > > > > > > > Post: send python-chinese在lists.python.cn > > > > > > > Subscribe: send subscribe to > > > > > > > python-chinese-request在lists.python.cn > > > > > > > Unsubscribe: send unsubscribe to > > > > > > > python-chinese-request在lists.python.cn > > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > python-chinese > > > > > > Post: send python-chinese在lists.python.cn > > > > > > Subscribe: send subscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Unsubscribe: send unsubscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > python-chinese > > > > > Post: send python-chinese在lists.python.cn > > > > > Subscribe: send subscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Unsubscribe: send unsubscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > -- > > > > Best Regards, > > > > > > > > Archer > > > > > > > > Ming Zhe Huang > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > -- > > Best Regards, > > > > Archer > > > > Ming Zhe Huang > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -- Best Regards, Archer Ming Zhe Huang -------------- 下一部分 -------------- 一个HTML附件被移除... URL: http://python.cn/pipermail/python-chinese/attachments/20070124/093c047c/attachment.html
2007年01月25日 星期四 08:16
Mingzhe Huang,您好!
html = unicode(html, 'gb2312').encode('utf8')
======== 2007-01-24 22:41:27 您在来信中写道: ========
使用html = unicode(html, 'gb2312')也不行?
那可能是ubuntu的环境没设置好吧,特别是在console下。
On 1/24/07, 俊杰蔡 <yzcaijunjie在gmail.com > wrote:
encoding是gb2312。可是为啥不能正常显示呢?
On 1/24/07, Mingzhe Huang <archerzz在gmail.com > wrote:
那你可以看看乱码的html页面里面的head上的encoding是什么吧?可能ubuntu上不是gb2312,gbk
On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote:
奇怪了,我还是没有能通过,我使用的是eclipse+Pydev,得到的错误信息是:
开始......
Traceback (most recent call last):
File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, in ?
print html
UnicodeEncodeError: 'ascii' codec can't encode characters in position 19-32: ordinal not in range(128)
还是编码问题,难道和我系统有关? 我使用的是Ubuntu
On 1/24/07, Mingzhe Huang < archerzz在gmail.com> wrote:
试试看这个:
import httplib
if __name__ == "__main__":
print "开始......"
conn = httplib.HTTPConnection(' www.baidu.com')
conn.request("GET","/index.html")
response = conn.getresponse()
html=response.read()
html = unicode(html, 'gb2312')
conn.close()
print html
ps: 我用原始程序,没有遇到楼主的乱码问题。检查了html的内容,发现百度是传递gb2312编码给我的。可能楼主需要先判断一下传过来的是什么编码,然后再用对应的codec。
On 1/24/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote:
html=response.read().decode('gbk')提示如下错误:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-13: ordinal not in range(128)
html=response.read().decode('utf-8')提示如下错误:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: unexpected code byte
On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote:
html=response.read().decode('gbk')
或者
html= response.read().decode('utf-8')
你试一试吧
On 1/23/07, 俊杰蔡 < yzcaijunjie在gmail.com> wrote:
if __name__ == "__main__":
print "开始......"
conn = httplib.HTTPConnection('www.baidu.com')
conn.request("GET","/index.html")
response = conn.getresponse()
html=response.read()
conn.close()
print html
打印html,里面中文显示为乱码。我也尝试过使用
html=response.read().encode('gbk')
结果,运行时错误
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 124: ordinal not in range(128)
请问这是什么原因呢?
_______________________________________________
python-chinese
Post: send python-chinese在lists.python.cn
Subscribe: send subscribe to python-chinese-request在lists.python.cn
Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese
_______________________________________________
python-chinese
Post: send python-chinese在lists.python.cn
Subscribe: send subscribe to python-chinese-request在lists.python.cn
Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese
_______________________________________________
python-chinese
Post: send python-chinese在lists.python.cn
Subscribe: send subscribe to python-chinese-request在lists.python.cn
Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese
--
Best Regards,
Archer
Ming Zhe Huang
_______________________________________________
python-chinese
Post: send python-chinese在lists.python.cn
Subscribe: send subscribe to python-chinese-request在lists.python.cn
Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese
_______________________________________________
python-chinese
Post: send python-chinese在lists.python.cn
Subscribe: send subscribe to python-chinese-request在lists.python.cn
Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese
--
Best Regards,
Archer
Ming Zhe Huang
_______________________________________________
python-chinese
Post: send python-chinese在lists.python.cn
Subscribe: send subscribe to python-chinese-request在lists.python.cn
Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese
_______________________________________________
python-chinese
Post: send python-chinese在lists.python.cn
Subscribe: send subscribe to python-chinese-request在lists.python.cn
Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn
Detail Info: http://python.cn/mailman/listinfo/python-chinese
--
Best Regards,
Archer
Ming Zhe Huang
= = = = = = = = = = = = = = = = = = = = = =
致
礼!
charles huang
hyy在fjii.com
2007-01-25
2007年01月25日 星期四 11:36
html=response.read().decode('utf-8').encode('gbk')
-------------- ä¸ä¸é¨å --------------
??HTML?????...
URL: http://python.cn/pipermail/python-chinese/attachments/20070125/695c0f2a/attachment-0001.html
2007年01月25日 星期四 11:37
ÎÒ³¢ÊÔÁËÏ£¬ÔÚconsoleÖÐÔËÐеϰ£¬Ò»ÇÐÕý³£ÁË£¨²ÉÓÃgb2312£©£¬ÖÐÎÄÏÔʾûÓÐÎÊÌ⣬µ«ÔÚEclipseÖÐʼÖÕ»á³öÀ´ÄǸöÎÊÌâ¡£²»µÃÆä½â¡£ ÁíÍâÎÒ»¹ÏëÎÊÏ£¬ÎÒÓÐÒ»¸ö³¬Á´½ÓµØÖ· http://localhost/mybook">Êé¼® ÎÒ¸ÃÈçºÎͬʱȡ³öÕâ¸öÁ´½ÓµØÖ·ºÍÊé¼®´æ·ÅÔÚÒ»¸ö×Öµä½á¹¹ÖÐÄØ£¿ Ö±½ÓÓÃSGMLParseÀàµÄstart_a()ÄÜʵÏÖô£¿ ÎÒ¿ÉÒԵõ½µØÖ·£¬µ«"Êé¼®"ÔõôµÃµ½ÄØ£¿ def start_a(self,attr): url = [value for (key,value) in attrs] del url[len(url)-1] if name: self.urls.append(url) ÔÚÕâ¸ö·½·¨Öд¦Àíô£¿ »¹ÊÇÐèÒªÔÚ handle_data()Öд¦ÀíÄØ£¿ On 1/24/07, Mingzhe Huang <archerzz在gmail.com> wrote: > > ʹÓÃhtml = unicode(html, 'gb2312')Ò²²»ÐУ¿ > ÄÇ¿ÉÄÜÊÇubuntuµÄ»·¾³Ã»ÉèÖúðɣ¬ÌرðÊÇÔÚconsoleÏ¡£ > > On 1/24/07, ¿¡½Ü²Ì <yzcaijunjie在gmail.com > wrote: > > > > encodingÊÇgb2312¡£¿ÉÊÇΪɶ²»ÄÜÕý³£ÏÔÊ¾ÄØ£¿ > > > > On 1/24/07, Mingzhe Huang <archerzz在gmail.com > wrote: > > > > > > ÄÇÄã¿ÉÒÔ¿´¿´ÂÒÂëµÄhtmlÒ³ÃæÀïÃæµÄheadÉϵÄencodingÊÇʲô°É£¿¿ÉÄÜubuntuÉϲ»ÊÇgb2312,gbk > > > > > > On 1/24/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > Ææ¹ÖÁË£¬ÎÒ»¹ÊÇûÓÐÄÜͨ¹ý£¬ÎÒʹÓõÄÊÇeclipse+Pydev£¬µÃµ½µÄ´íÎóÐÅÏ¢ÊÇ£º > > > > ¿ªÊ¼...... > > > > Traceback (most recent call last): > > > > File "/home/cjj/workspace/MyPy/src/program/myprogram.py", line 15, > > > > in ? > > > > print html > > > > UnicodeEncodeError: 'ascii' codec can't encode characters in > > > > position 19-32: ordinal not in range(128) > > > > > > > > »¹ÊDZàÂëÎÊÌ⣬ÄѵÀºÍÎÒϵͳÓйأ¿ ÎÒʹÓõÄÊÇUbuntu > > > > > > > > > > > > On 1/24/07, Mingzhe Huang < archerzz在gmail.com> wrote: > > > > > > > > > > ÊÔÊÔ¿´Õâ¸ö£º > > > > > import httplib > > > > > > > > > > if __name__ == "__main__": > > > > > print "¿ªÊ¼......" > > > > > conn = httplib.HTTPConnection(' www.baidu.com') > > > > > conn.request("GET","/index.html") > > > > > response = conn.getresponse() > > > > > html=response.read() > > > > > html = unicode(html, 'gb2312') > > > > > conn.close() > > > > > print html > > > > > > > > > > ps: > > > > > ÎÒÓÃÔʼ³ÌÐò£¬Ã»ÓÐÓöµ½Â¥Ö÷µÄÂÒÂëÎÊÌâ¡£¼ì²éÁËhtmlµÄÄÚÈÝ£¬·¢ÏÖ°Ù¶ÈÊÇ´«µÝgb2312±àÂë¸øÎҵġ£¿ÉÄÜÂ¥Ö÷ÐèÒªÏÈÅжÏһϴ«¹ýÀ´µÄÊÇʲô±àÂ룬ȻºóÔÙÓöÔÓ¦µÄcodec¡£ > > > > > > > > > > On 1/24/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > > > html=response.read().decode('gbk')ÌáʾÈçÏ´íÎó£º > > > > > > UnicodeEncodeError: 'ascii' codec can't encode characters in > > > > > > position 0-13: ordinal not in range(128) > > > > > > > > > > > > html=response.read().decode('utf-8')ÌáʾÈçÏ´íÎó£º > > > > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in > > > > > > position 0: unexpected code byte > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 1/23/07, junyi sun < ccnusjy在gmail.com> wrote: > > > > > > > > > > > > > > html=response.read().decode('gbk') > > > > > > > »òÕß > > > > > > > html= response.read().decode('utf-8') > > > > > > > ÄãÊÔÒ»ÊÔ°É > > > > > > > > > > > > > > > > > > > > > On 1/23/07, ¿¡½Ü²Ì < yzcaijunjie在gmail.com> wrote: > > > > > > > > > > > > > > > if __name__ == "__main__": > > > > > > > > print "¿ªÊ¼......" > > > > > > > > conn = httplib.HTTPConnection('www.baidu.com') > > > > > > > > conn.request("GET","/index.html") > > > > > > > > response = conn.getresponse() > > > > > > > > html=response.read() > > > > > > > > conn.close() > > > > > > > > print html > > > > > > > > > > > > > > > > ´òÓ¡html£¬ÀïÃæÖÐÎÄÏÔʾΪÂÒÂë¡£ÎÒÒ²³¢ÊÔ¹ýʹÓà > > > > > > > > html=response.read().encode('gbk') > > > > > > > > ½á¹û£¬ÔËÐÐʱ´íÎó > > > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > > > > > > > > position 124: ordinal not in range(128) > > > > > > > > > > > > > > > > ÇëÎÊÕâÊÇʲôÔÒòÄØ£¿ > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > python-chinese > > > > > > > > Post: send python-chinese在lists.python.cn > > > > > > > > Subscribe: send subscribe to > > > > > > > > python-chinese-request在lists.python.cn > > > > > > > > Unsubscribe: send unsubscribe to > > > > > > > > python-chinese-request在lists.python.cn > > > > > > > > Detail Info: > > > > > > > > http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > python-chinese > > > > > > > Post: send python-chinese在lists.python.cn > > > > > > > Subscribe: send subscribe to > > > > > > > python-chinese-request在lists.python.cn > > > > > > > Unsubscribe: send unsubscribe to > > > > > > > python-chinese-request在lists.python.cn > > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > python-chinese > > > > > > Post: send python-chinese在lists.python.cn > > > > > > Subscribe: send subscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Unsubscribe: send unsubscribe to > > > > > > python-chinese-request在lists.python.cn > > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best Regards, > > > > > > > > > > Archer > > > > > > > > > > Ming Zhe Huang > > > > > _______________________________________________ > > > > > python-chinese > > > > > Post: send python-chinese在lists.python.cn > > > > > Subscribe: send subscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Unsubscribe: send unsubscribe to > > > > > python-chinese-request在lists.python.cn > > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > > _______________________________________________ > > > > python-chinese > > > > Post: send python-chinese在lists.python.cn > > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > > Unsubscribe: send unsubscribe to > > > > python-chinese-request在lists.python.cn > > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > > > > > > > > -- > > > Best Regards, > > > > > > Archer > > > > > > Ming Zhe Huang > > > > > > _______________________________________________ > > > python-chinese > > > Post: send python-chinese在lists.python.cn > > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > > Unsubscribe: send unsubscribe to > > > python-chinese-request在lists.python.cn > > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > > > > _______________________________________________ > > python-chinese > > Post: send python-chinese在lists.python.cn > > Subscribe: send subscribe to python-chinese-request在lists.python.cn > > Unsubscribe: send unsubscribe to > > python-chinese-request在lists.python.cn > > Detail Info: http://python.cn/mailman/listinfo/python-chinese > > > > > > -- > Best Regards, > > Archer > > Ming Zhe Huang > > _______________________________________________ > python-chinese > Post: send python-chinese在lists.python.cn > Subscribe: send subscribe to python-chinese-request在lists.python.cn > Unsubscribe: send unsubscribe to python-chinese-request在lists.python.cn > Detail Info: http://python.cn/mailman/listinfo/python-chinese > -------------- 下一部分 -------------- Ò»¸öHTML¸½¼þ±»ÒƳý... URL: http://python.cn/pipermail/python-chinese/attachments/20070125/139f55b0/attachment-0001.html
Zeuux © 2025
京ICP备05028076号