Python论坛  - 讨论区

标题:[python-chinese] strage minidom or xml

2004年03月10日 星期三 15:43

Anthony Liu antonyliu2002 at yahoo.com
Wed Mar 10 15:43:03 HKT 2004

I copied this sample xml file from the web:




 Sample Document 
  Brandon 
 Voss 
 The XML Pages  
 This is element text and an entity
follows:&Description;



And then when I tried to parse it using the following
python code:

from xml.dom import minidom
xmldoc = minidom.parse('samplexml.xml')
print xmldoc.toxml()

Python still says that the xml is not well-formed. 
See below:

Traceback (most recent call last):
  File "C:\Python23\codes\xmltest.py", line 4, in
-toplevel-
    xmldoc = minidom.parse('samplexml.xml')
  File "C:\Python23\lib\xml\dom\minidom.py", line
1919, in parse
    return expatbuilder.parse(file)
  File "C:\Python23\lib\xml\dom\expatbuilder.py", line
924, in parse
    result = builder.parseFile(fp)
  File "C:\Python23\lib\xml\dom\expatbuilder.py", line
207, in parseFile
    parser.Parse(buffer, 0)
ExpatError: not well-formed (invalid token): line 1,
column 5

How come?

What can I do about this?


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 15:55

Zoom.Quiet zoomq at infopro.cn
Wed Mar 10 15:55:17 HKT 2004

Hello Anthony,

""
is unnecessary !!!

=== [ 15:43 ; 04-03-10 ] you wrote:

AL> I copied this sample xml file from the web:

AL> 
AL> 
AL> 
AL>  Sample Document 
AL>   Brandon 
AL>  Voss 
AL>  The XML Pages  
AL>  This is element text and an entity
AL> follows:&Description;
AL> 
AL> 

AL> And then when I tried to parse it using the following
AL> python code:

AL> from xml.dom import minidom
AL> xmldoc = minidom.parse('samplexml.xml')
AL> print xmldoc.toxml()

AL> Python still says that the xml is not well-formed. 
AL> See below:

AL> Traceback (most recent call last):
AL>   File "C:\Python23\codes\xmltest.py", line 4, in
AL> -toplevel-
AL>     xmldoc = minidom.parse('samplexml.xml')
AL>   File "C:\Python23\lib\xml\dom\minidom.py", line
AL> 1919, in parse
AL>     return expatbuilder.parse(file)
AL>   File "C:\Python23\lib\xml\dom\expatbuilder.py", line
AL> 924, in parse
AL>     result = builder.parseFile(fp)
AL>   File "C:\Python23\lib\xml\dom\expatbuilder.py", line
AL> 207, in parseFile
AL>     parser.Parse(buffer, 0)
AL> ExpatError: not well-formed (invalid token): line 1,
AL> column 5

AL> How come?

AL> What can I do about this?


AL> __________________________________
AL> Do you Yahoo!?
AL> Yahoo! Search - Find what you’re looking for faster
AL> http://search.yahoo.com

=== === === === === === === === === === 

-- 
Best regards,
 Zoom.Quiet                            

 /=======================================\
]Time is unimportant, only life important![
 \=======================================/



[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 15:57

Anthony Liu antonyliu2002 at yahoo.com
Wed Mar 10 15:57:50 HKT 2004

ok, then let me give it another try.

--- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> Hello Anthony,
> 
> ""
> is unnecessary !!!
> 
> === [ 15:43 ; 04-03-10 ] you wrote:
> 
> AL> I copied this sample xml file from the web:
> 
> AL> 
> AL> 
> AL> 
> AL>  Sample Document 
> AL>   Brandon 
> AL>  Voss 
> AL>  The XML Pages  
> AL>  This is element text and an entity
> AL> follows:&Description;
> AL> 
> AL> 
> 
> AL> And then when I tried to parse it using the
> following
> AL> python code:
> 
> AL> from xml.dom import minidom
> AL> xmldoc = minidom.parse('samplexml.xml')
> AL> print xmldoc.toxml()
> 
> AL> Python still says that the xml is not
> well-formed. 
> AL> See below:
> 
> AL> Traceback (most recent call last):
> AL>   File "C:\Python23\codes\xmltest.py", line 4,
> in
> AL> -toplevel-
> AL>     xmldoc = minidom.parse('samplexml.xml')
> AL>   File "C:\Python23\lib\xml\dom\minidom.py",
> line
> AL> 1919, in parse
> AL>     return expatbuilder.parse(file)
> AL>   File
> "C:\Python23\lib\xml\dom\expatbuilder.py", line
> AL> 924, in parse
> AL>     result = builder.parseFile(fp)
> AL>   File
> "C:\Python23\lib\xml\dom\expatbuilder.py", line
> AL> 207, in parseFile
> AL>     parser.Parse(buffer, 0)
> AL> ExpatError: not well-formed (invalid token):
> line 1,
> AL> column 5
> 
> AL> How come?
> 
> AL> What can I do about this?
> 
> 
> AL> __________________________________
> AL> Do you Yahoo!?
> AL> Yahoo! Search - Find what you抮e looking
for
> faster
> AL> http://search.yahoo.com
> 
> === === === === === === === === === === 
> 
> -- 
> Best regards,
>  Zoom.Quiet                            
> 
>  /=======================================\
> ]Time is unimportant, only life important![
>  \=======================================/
> 


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 15:59

Anthony Liu antonyliu2002 at yahoo.com
Wed Mar 10 15:59:47 HKT 2004

Man, I removed that line, but the problem remains. 
Watch this:

ExpatError: not well-formed (invalid token): line 1,
column 5


--- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> Hello Anthony,
> 
> ""
> is unnecessary !!!
> 
> === [ 15:43 ; 04-03-10 ] you wrote:
> 
> AL> I copied this sample xml file from the web:
> 
> AL> 
> AL> 
> AL> 
> AL>  Sample Document 
> AL>   Brandon 
> AL>  Voss 
> AL>  The XML Pages  
> AL>  This is element text and an entity
> AL> follows:&Description;
> AL> 
> AL> 
> 
> AL> And then when I tried to parse it using the
> following
> AL> python code:
> 
> AL> from xml.dom import minidom
> AL> xmldoc = minidom.parse('samplexml.xml')
> AL> print xmldoc.toxml()
> 
> AL> Python still says that the xml is not
> well-formed. 
> AL> See below:
> 
> AL> Traceback (most recent call last):
> AL>   File "C:\Python23\codes\xmltest.py", line 4,
> in
> AL> -toplevel-
> AL>     xmldoc = minidom.parse('samplexml.xml')
> AL>   File "C:\Python23\lib\xml\dom\minidom.py",
> line
> AL> 1919, in parse
> AL>     return expatbuilder.parse(file)
> AL>   File
> "C:\Python23\lib\xml\dom\expatbuilder.py", line
> AL> 924, in parse
> AL>     result = builder.parseFile(fp)
> AL>   File
> "C:\Python23\lib\xml\dom\expatbuilder.py", line
> AL> 207, in parseFile
> AL>     parser.Parse(buffer, 0)
> AL> ExpatError: not well-formed (invalid token):
> line 1,
> AL> column 5
> 
> AL> How come?
> 
> AL> What can I do about this?
> 
> 
> AL> __________________________________
> AL> Do you Yahoo!?
> AL> Yahoo! Search - Find what you抮e looking
for
> faster
> AL> http://search.yahoo.com
> 
> === === === === === === === === === === 
> 
> -- 
> Best regards,
>  Zoom.Quiet                            
> 
>  /=======================================\
> ]Time is unimportant, only life important![
>  \=======================================/
> 


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 16:02

Jacob Fan jacob at exoweb.net
Wed Mar 10 16:02:04 HKT 2004

我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。

-------
Explicit is better than implicit ... 

-----Original Message-----
From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] 
Sent: 2004年3月10日 16:00
To: pycn
Subject: Re: [python-chinese] strage minidom or xml


Man, I removed that line, but the problem remains. 
Watch this:

ExpatError: not well-formed (invalid token): line 1,
column 5


--- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> Hello Anthony,
> 
> ""
> is unnecessary !!!
> 
> === [ 15:43 ; 04-03-10 ] you wrote:
> 
> AL> I copied this sample xml file from the web:
> 
> AL> 
> AL> 
> AL> 
> AL>  Sample Document 
> AL>   Brandon 
> AL>  Voss 
> AL>  The XML Pages  
> AL>  This is element text and an entity follows:&Description;
> AL> 
> AL> 
> 
> AL> And then when I tried to parse it using the
> following
> AL> python code:
> 
> AL> from xml.dom import minidom
> AL> xmldoc = minidom.parse('samplexml.xml')
> AL> print xmldoc.toxml()
> 
> AL> Python still says that the xml is not
> well-formed.
> AL> See below:
> 
> AL> Traceback (most recent call last):
> AL>   File "C:\Python23\codes\xmltest.py", line 4,
> in
> AL> -toplevel-
> AL>     xmldoc = minidom.parse('samplexml.xml')
> AL>   File "C:\Python23\lib\xml\dom\minidom.py",
> line
> AL> 1919, in parse
> AL>     return expatbuilder.parse(file)
> AL>   File
> "C:\Python23\lib\xml\dom\expatbuilder.py", line
> AL> 924, in parse
> AL>     result = builder.parseFile(fp)
> AL>   File
> "C:\Python23\lib\xml\dom\expatbuilder.py", line
> AL> 207, in parseFile
> AL>     parser.Parse(buffer, 0)
> AL> ExpatError: not well-formed (invalid token):
> line 1,
> AL> column 5
> 
> AL> How come?
> 
> AL> What can I do about this?



[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 16:08

Zoom.Quiet zoomq at infopro.cn
Wed Mar 10 16:08:25 HKT 2004

Hello Anthony,

¿ÉÄÜÊǹí×Ö·ûÁË£¡

ÖØÐÂʹÓÃxmlSpy Ö®ÀàµÄXML±à¼­Æ÷Éú³ÉÒ»¸öXMLÎĵµ£¬
ÓÉÆäÏÈÈ·ÈÏÁ¼¹¹·ñ°É£¡



=== [ 15:59 ; 04-03-10 ] you wrote:

AL> Man, I removed that line, but the problem remains. 
AL> Watch this:

AL> ExpatError: not well-formed (invalid token): line 1,
AL> column 5


AL> --- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
>> Hello Anthony,
>> 
>> ""
>> is unnecessary !!!
>> 
>> === [ 15:43 ; 04-03-10 ] you wrote:
>> 
>> AL> I copied this sample xml file from the web:
>> 
>> AL> 
>> AL> 
>> AL> 
>> AL>  Sample Document 
>> AL>   Brandon 
>> AL>  Voss 
>> AL>  The XML Pages  
>> AL>  This is element text and an entity
>> AL> follows:&Description;
>> AL> 
>> AL> 
>> 
>> AL> And then when I tried to parse it using the
>> following
>> AL> python code:
>> 
>> AL> from xml.dom import minidom
>> AL> xmldoc = minidom.parse('samplexml.xml')
>> AL> print xmldoc.toxml()
>> 
>> AL> Python still says that the xml is not
>> well-formed. 
>> AL> See below:
>> 
>> AL> Traceback (most recent call last):
>> AL>   File "C:\Python23\codes\xmltest.py", line 4,
>> in
>> AL> -toplevel-
>> AL>     xmldoc = minidom.parse('samplexml.xml')
>> AL>   File "C:\Python23\lib\xml\dom\minidom.py",
>> line
>> AL> 1919, in parse
>> AL>     return expatbuilder.parse(file)
>> AL>   File
>> "C:\Python23\lib\xml\dom\expatbuilder.py", line
>> AL> 924, in parse
>> AL>     result = builder.parseFile(fp)
>> AL>   File
>> "C:\Python23\lib\xml\dom\expatbuilder.py", line
>> AL> 207, in parseFile
>> AL>     parser.Parse(buffer, 0)
>> AL> ExpatError: not well-formed (invalid token):
>> line 1,
>> AL> column 5
>> 
>> AL> How come?
>> 
>> AL> What can I do about this?
>> 
>> 
>> AL> __________________________________
>> AL> Do you Yahoo!?
>> AL> Yahoo! Search - Find what you抮e looking
AL> for
>> faster
>> AL> http://search.yahoo.com
>> 
>> === === === === === === === === === === 
>> 
>> -- 
>> Best regards,
>>  Zoom.Quiet                            
>> 
>>  /=======================================\
>> ]Time is unimportant, only life important![
>>  \=======================================/
>> 


AL> __________________________________
AL> Do you Yahoo!?
AL> Yahoo! Search - Find what you’re looking for faster
AL> http://search.yahoo.com

=== === === === === === === === === === 

-- 
Best regards,
 Zoom.Quiet                            

 /=======================================\
]Time is unimportant, only life important![
 \=======================================/



[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 16:11

Anthony Liu antonyliu2002 at yahoo.com
Wed Mar 10 16:11:25 HKT 2004

You are suggesting me to take a look at
expatbuilder.py?

--- Jacob Fan <jacob at exoweb.net> wrote:
>
我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。
> 
> -------
> Explicit is better than implicit ... 
> 
> -----Original Message-----
> From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] 
> Sent: 2004年3月10日 16:00
> To: pycn
> Subject: Re: [python-chinese] strage minidom or xml
> 
> 
> Man, I removed that line, but the problem remains. 
> Watch this:
> 
> ExpatError: not well-formed (invalid token): line 1,
> column 5
> 
> 
> --- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> > Hello Anthony,
> > 
> > ""
> > is unnecessary !!!
> > 
> > === [ 15:43 ; 04-03-10 ] you wrote:
> > 
> > AL> I copied this sample xml file from the web:
> > 
> > AL> 
> > AL> 
> > AL> 
> > AL>  Sample Document 
> > AL>   Brandon 
> > AL>  Voss 
> > AL>  The XML Pages  
> > AL>  This is element text and an entity
> follows:&Description;
> > AL> 
> > AL> 
> > 
> > AL> And then when I tried to parse it using the
> > following
> > AL> python code:
> > 
> > AL> from xml.dom import minidom
> > AL> xmldoc = minidom.parse('samplexml.xml')
> > AL> print xmldoc.toxml()
> > 
> > AL> Python still says that the xml is not
> > well-formed.
> > AL> See below:
> > 
> > AL> Traceback (most recent call last):
> > AL>   File "C:\Python23\codes\xmltest.py", line 4,
> > in
> > AL> -toplevel-
> > AL>     xmldoc = minidom.parse('samplexml.xml')
> > AL>   File "C:\Python23\lib\xml\dom\minidom.py",
> > line
> > AL> 1919, in parse
> > AL>     return expatbuilder.parse(file)
> > AL>   File
> > "C:\Python23\lib\xml\dom\expatbuilder.py", line
> > AL> 924, in parse
> > AL>     result = builder.parseFile(fp)
> > AL>   File
> > "C:\Python23\lib\xml\dom\expatbuilder.py", line
> > AL> 207, in parseFile
> > AL>     parser.Parse(buffer, 0)
> > AL> ExpatError: not well-formed (invalid token):
> > line 1,
> > AL> column 5
> > 
> > AL> How come?
> > 
> > AL> What can I do about this?
> 
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 16:18

Jacob Fan jacob at exoweb.net
Wed Mar 10 16:18:03 HKT 2004

Please look at the traceback? If this is your script, how do you debug =
it? ;)
First look at here expatbuilder.py line 207:
> > "C:\Python23\lib\xml\dom\expatbuilder.py", line
> > AL> 207, in parseFile
> > AL>     parser.Parse(buffer, 0)
> > AL> ExpatError: not well-formed (invalid token):
> > line 1,
> > AL> column 5
The ExpatError is thrown by parser.Parse
We could add a print statement above parser.Parse(buffer,0) to see which =
parser does it actually use. Then look into that parser to see where it =
throws a ExpatError with the message "not well-formed(invalid token)". =
But before that, maybe we can just, as Zoom.Quiet said, check if there =
are a ghost character. If you have something such as UltraEdit, you may =
use it to see if there are strange characters in the file. Or just use a =
known good file to check.

-------
Explicit is better than implicit ...=20

-----Original Message-----
From: Anthony Liu [mailto:antonyliu2002 at yahoo.com]=20
Sent: 2004=C4=EA3=D4=C210=C8=D5 16:11
To: python-chinese at lists.python.cn
Subject: RE: [python-chinese] strage minidom or xml


You are suggesting me to take a look at
expatbuilder.py?

--- Jacob Fan <jacob at exoweb.net> wrote:
>
我建议你到源码里面&=
#30475;看。我每次遇到这&#=
31181;问题就先去看代码=
5292;看看某个结果是怎=
040;出来的。
>=20
> -------
> Explicit is better than implicit ...
>=20
> -----Original Message-----
> From: Anthony Liu [mailto:antonyliu2002 at yahoo.com]
> Sent: 2004年3月10日 16:00
> To: pycn
> Subject: Re: [python-chinese] strage minidom or xml
>=20
>=20
> Man, I removed that line, but the problem remains.
> Watch this:
>=20
> ExpatError: not well-formed (invalid token): line 1,
> column 5
>=20
>=20
> --- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> > Hello Anthony,
> >=20
> > ""
> > is unnecessary !!!
> >=20
> > =3D=3D=3D [ 15:43 ; 04-03-10 ] you wrote:
> >=20
> > AL> I copied this sample xml file from the web:
> >=20
> > AL> 
> > AL> 
> > AL> 
> > AL>  Sample Document 
> > AL>   Brandon 
> > AL>  Voss 
> > AL>  The XML Pages  
> > AL>  This is element text and an entity
> follows:&Description;
> > AL> 
> > AL> 
> >=20
> > AL> And then when I tried to parse it using the
> > following
> > AL> python code:
> >=20
> > AL> from xml.dom import minidom
> > AL> xmldoc =3D minidom.parse('samplexml.xml')
> > AL> print xmldoc.toxml()
> >=20
> > AL> Python still says that the xml is not
> > well-formed.
> > AL> See below:
> >=20
> > AL> Traceback (most recent call last):
> > AL>   File "C:\Python23\codes\xmltest.py", line 4,
> > in
> > AL> -toplevel-
> > AL>     xmldoc =3D minidom.parse('samplexml.xml')
> > AL>   File "C:\Python23\lib\xml\dom\minidom.py",
> > line
> > AL> 1919, in parse
> > AL>     return expatbuilder.parse(file)
> > AL>   File
> > "C:\Python23\lib\xml\dom\expatbuilder.py", line
> > AL> 924, in parse
> > AL>     result =3D builder.parseFile(fp)
> > AL>   File
> > "C:\Python23\lib\xml\dom\expatbuilder.py", line
> > AL> 207, in parseFile
> > AL>     parser.Parse(buffer, 0)
> > AL> ExpatError: not well-formed (invalid token):
> > line 1,
> > AL> column 5
> >=20
> > AL> How come?
> >=20
> > AL> What can I do about this?
>=20
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn=20
> http://python.cn/mailman/listinfo/python-chinese


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you=92re looking for faster =
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 16:27

Anthony Liu antonyliu2002 at yahoo.com
Wed Mar 10 16:27:18 HKT 2004

I really don't know what happened to the code. I
tested that code and the sample xml file on the
Mandrake system, and I still get the same error
message: not well-formed.

O, my gosh, I am really fed up with it.


--- Jacob Fan <jacob at exoweb.net> wrote:
> Please look at the traceback? If this is your
> script, how do you debug it? ;)
> First look at here expatbuilder.py line 207:
> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
> > > AL> 207, in parseFile
> > > AL>     parser.Parse(buffer, 0)
> > > AL> ExpatError: not well-formed (invalid token):
> > > line 1,
> > > AL> column 5
> The ExpatError is thrown by parser.Parse
> We could add a print statement above
> parser.Parse(buffer,0) to see which parser does it
> actually use. Then look into that parser to see
> where it throws a ExpatError with the message "not
> well-formed(invalid token)". But before that, maybe
> we can just, as Zoom.Quiet said, check if there are
> a ghost character. If you have something such as
> UltraEdit, you may use it to see if there are
> strange characters in the file. Or just use a known
> good file to check.
> 
> -------
> Explicit is better than implicit ... 
> 
> -----Original Message-----
> From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] 
> Sent: 2004年3月10日 16:11
> To: python-chinese at lists.python.cn
> Subject: RE: [python-chinese] strage minidom or xml
> 
> 
> You are suggesting me to take a look at
> expatbuilder.py?
> 
> --- Jacob Fan <jacob at exoweb.net> wrote:
> >
>
我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。
> > 
> > -------
> > Explicit is better than implicit ...
> > 
> > -----Original Message-----
> > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com]
> > Sent: 2004年3月10日 16:00
> > To: pycn
> > Subject: Re: [python-chinese] strage minidom or
> xml
> > 
> > 
> > Man, I removed that line, but the problem remains.
> > Watch this:
> > 
> > ExpatError: not well-formed (invalid token): line
> 1,
> > column 5
> > 
> > 
> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> > > Hello Anthony,
> > > 
> > > ""
> > > is unnecessary !!!
> > > 
> > > === [ 15:43 ; 04-03-10 ] you wrote:
> > > 
> > > AL> I copied this sample xml file from the web:
> > > 
> > > AL> 
> > > AL> 
> > > AL> 
> > > AL>  Sample Document 
> > > AL>   Brandon 
> > > AL>  Voss 
> > > AL>  The XML Pages  
> > > AL>  This is element text and an entity
> > follows:&Description;
> > > AL> 
> > > AL> 
> > > 
> > > AL> And then when I tried to parse it using the
> > > following
> > > AL> python code:
> > > 
> > > AL> from xml.dom import minidom
> > > AL> xmldoc = minidom.parse('samplexml.xml')
> > > AL> print xmldoc.toxml()
> > > 
> > > AL> Python still says that the xml is not
> > > well-formed.
> > > AL> See below:
> > > 
> > > AL> Traceback (most recent call last):
> > > AL>   File "C:\Python23\codes\xmltest.py", line
> 4,
> > > in
> > > AL> -toplevel-
> > > AL>     xmldoc = minidom.parse('samplexml.xml')
> > > AL>   File "C:\Python23\lib\xml\dom\minidom.py",
> > > line
> > > AL> 1919, in parse
> > > AL>     return expatbuilder.parse(file)
> > > AL>   File
> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
> > > AL> 924, in parse
> > > AL>     result = builder.parseFile(fp)
> > > AL>   File
> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
> > > AL> 207, in parseFile
> > > AL>     parser.Parse(buffer, 0)
> > > AL> ExpatError: not well-formed (invalid token):
> > > line 1,
> > > AL> column 5
> > > 
> > > AL> How come?
> > > 
> > > AL> What can I do about this?
> > 
> > _______________________________________________
> > python-chinese list
> > python-chinese at lists.python.cn 
> > http://python.cn/mailman/listinfo/python-chinese
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! Search - Find what you抮e looking for
faster
> http://search.yahoo.com
> _______________________________________________
> python-chinese list
> python-chinese at lists.python.cn
> http://python.cn/mailman/listinfo/python-chinese


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 16:43

Zoom.Quiet zoomq at infopro.cn
Wed Mar 10 16:43:40 HKT 2004

Hello Anthony,

"Mandrake"??
"C:\Python23\"??

WHAT SYSTEM  U RUNNING PYTHON??

so so at frist use Py test weel-format self!
"""
from xml.sax.handler import ContentHandler
from xml.sax import make_parser
from glob import glob
import sys

def parsefile(file):
    parser = make_parser(  )
    parser.setContentHandler(ContentHandler(  ))
    parser.parse(file)

for arg in sys.argv[1:]:
    for filename in glob(arg):
        try:
            parsefile(filename)
            print "%s is well-formed" % filename
        except Exception, e:
            print "%s is NOT well-formed! %s" % (filename, e)
"""

and try expat to parsers ??
minidom is poor and slow...
"""
import xml.parsers.expat

# 3 handler functions
def start_element(name, attrs):
    print 'Start element:', name, attrs
def end_element(name):
    print 'End element:', name
def char_data(data):
    print 'Character data:', repr(data)

p = xml.parsers.expat.ParserCreate()

p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data

p.Parse("""
Text goes here
More text
""")

"""

=== [ 16:27 ; 04-03-10 ] you wrote:

AL> I really don't know what happened to the code. I
AL> tested that code and the sample xml file on the
AL> Mandrake system, and I still get the same error
AL> message: not well-formed.

AL> O, my gosh, I am really fed up with it.


AL> --- Jacob Fan <jacob at exoweb.net> wrote:
>> Please look at the traceback? If this is your
>> script, how do you debug it? ;)
>> First look at here expatbuilder.py line 207:
>> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
>> > > AL> 207, in parseFile
>> > > AL>     parser.Parse(buffer, 0)
>> > > AL> ExpatError: not well-formed (invalid token):
>> > > line 1,
>> > > AL> column 5
>> The ExpatError is thrown by parser.Parse
>> We could add a print statement above
>> parser.Parse(buffer,0) to see which parser does it
>> actually use. Then look into that parser to see
>> where it throws a ExpatError with the message "not
>> well-formed(invalid token)". But before that, maybe
>> we can just, as Zoom.Quiet said, check if there are
>> a ghost character. If you have something such as
>> UltraEdit, you may use it to see if there are
>> strange characters in the file. Or just use a known
>> good file to check.
>> 
>> -------
>> Explicit is better than implicit ... 
>> 
>> -----Original Message-----
>> From: Anthony Liu [mailto:antonyliu2002 at yahoo.com] 
>> Sent: 2004年3月10日 16:11
>> To: python-chinese at lists.python.cn
>> Subject: RE: [python-chinese] strage minidom or xml
>> 
>> 
>> You are suggesting me to take a look at
>> expatbuilder.py?
>> 
>> --- Jacob Fan <jacob at exoweb.net> wrote:
>> >
>>
AL> 我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。
>> > 
>> > -------
>> > Explicit is better than implicit ...
>> > 
>> > -----Original Message-----
>> > From: Anthony Liu [mailto:antonyliu2002 at yahoo.com]
>> > Sent: 2004年3月10日 16:00
>> > To: pycn
>> > Subject: Re: [python-chinese] strage minidom or
>> xml
>> > 
>> > 
>> > Man, I removed that line, but the problem remains.
>> > Watch this:
>> > 
>> > ExpatError: not well-formed (invalid token): line
>> 1,
>> > column 5
>> > 
>> > 
>> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
>> > > Hello Anthony,
>> > > 
>> > > ""
>> > > is unnecessary !!!
>> > > 
>> > > === [ 15:43 ; 04-03-10 ] you wrote:
>> > > 
>> > > AL> I copied this sample xml file from the web:
>> > > 
>> > > AL> 
>> > > AL> 
>> > > AL> 
>> > > AL>  Sample Document 
>> > > AL>   Brandon 
>> > > AL>  Voss 
>> > > AL>  The XML Pages  
>> > > AL>  This is element text and an entity
>> > follows:&Description;
>> > > AL> 
>> > > AL> 
>> > > 
>> > > AL> And then when I tried to parse it using the
>> > > following
>> > > AL> python code:
>> > > 
>> > > AL> from xml.dom import minidom
>> > > AL> xmldoc = minidom.parse('samplexml.xml')
>> > > AL> print xmldoc.toxml()
>> > > 
>> > > AL> Python still says that the xml is not
>> > > well-formed.
>> > > AL> See below:
>> > > 
>> > > AL> Traceback (most recent call last):
>> > > AL>   File "C:\Python23\codes\xmltest.py", line
>> 4,
>> > > in
>> > > AL> -toplevel-
>> > > AL>     xmldoc = minidom.parse('samplexml.xml')
>> > > AL>   File "C:\Python23\lib\xml\dom\minidom.py",
>> > > line
>> > > AL> 1919, in parse
>> > > AL>     return expatbuilder.parse(file)
>> > > AL>   File
>> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
>> > > AL> 924, in parse
>> > > AL>     result = builder.parseFile(fp)
>> > > AL>   File
>> > > "C:\Python23\lib\xml\dom\expatbuilder.py", line
>> > > AL> 207, in parseFile
>> > > AL>     parser.Parse(buffer, 0)
>> > > AL> ExpatError: not well-formed (invalid token):
>> > > line 1,
>> > > AL> column 5
>> > > 
>> > > AL> How come?
>> > > 
>> > > AL> What can I do about this?
>> > 
>> > _______________________________________________
>> > python-chinese list
>> > python-chinese at lists.python.cn 
>> > http://python.cn/mailman/listinfo/python-chinese
>> 
>> 
>> __________________________________
>> Do you Yahoo!?
>> Yahoo! Search - Find what you抮e looking for
AL> faster
>> http://search.yahoo.com
>> _______________________________________________
>> python-chinese list
>> python-chinese at lists.python.cn
>> http://python.cn/mailman/listinfo/python-chinese


AL> __________________________________
AL> Do you Yahoo!?
AL> Yahoo! Search - Find what you’re looking for faster
AL> http://search.yahoo.com

=== === === === === === === === === === 

-- 
Best regards,
 Zoom.Quiet                            

 /=======================================\
]Time is unimportant, only life important![
 \=======================================/



[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月10日 星期三 17:05

Anthony Liu antonyliu2002 at yahoo.com
Wed Mar 10 17:05:45 HKT 2004

I tested it on both Mandrake and Win2K, it worked on
neither of them.

--- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> Hello Anthony,
> 
> "Mandrake"??
> "C:\Python23\"??
> 
> WHAT SYSTEM  U RUNNING PYTHON??
> 
> so so at frist use Py test weel-format self!
> """
> from xml.sax.handler import ContentHandler
> from xml.sax import make_parser
> from glob import glob
> import sys
> 
> def parsefile(file):
>     parser = make_parser(  )
>     parser.setContentHandler(ContentHandler(  ))
>     parser.parse(file)
> 
> for arg in sys.argv[1:]:
>     for filename in glob(arg):
>         try:
>             parsefile(filename)
>             print "%s is well-formed" % filename
>         except Exception, e:
>             print "%s is NOT well-formed! %s" %
> (filename, e)
> """
> 
> and try expat to parsers ??
> minidom is poor and slow...
> """
> import xml.parsers.expat
> 
> # 3 handler functions
> def start_element(name, attrs):
>     print 'Start element:', name, attrs
> def end_element(name):
>     print 'End element:', name
> def char_data(data):
>     print 'Character data:', repr(data)
> 
> p = xml.parsers.expat.ParserCreate()
> 
> p.StartElementHandler = start_element
> p.EndElementHandler = end_element
> p.CharacterDataHandler = char_data
> 
> p.Parse("""
> Text goes
> here
> More text
> """)
> 
> """
> 
> === [ 16:27 ; 04-03-10 ] you wrote:
> 
> AL> I really don't know what happened to the code. I
> AL> tested that code and the sample xml file on the
> AL> Mandrake system, and I still get the same error
> AL> message: not well-formed.
> 
> AL> O, my gosh, I am really fed up with it.
> 
> 
> AL> --- Jacob Fan <jacob at exoweb.net> wrote:
> >> Please look at the traceback? If this is your
> >> script, how do you debug it? ;)
> >> First look at here expatbuilder.py line 207:
> >> > > "C:\Python23\lib\xml\dom\expatbuilder.py",
> line
> >> > > AL> 207, in parseFile
> >> > > AL>     parser.Parse(buffer, 0)
> >> > > AL> ExpatError: not well-formed (invalid
> token):
> >> > > line 1,
> >> > > AL> column 5
> >> The ExpatError is thrown by parser.Parse
> >> We could add a print statement above
> >> parser.Parse(buffer,0) to see which parser does
> it
> >> actually use. Then look into that parser to see
> >> where it throws a ExpatError with the message
> "not
> >> well-formed(invalid token)". But before that,
> maybe
> >> we can just, as Zoom.Quiet said, check if there
> are
> >> a ghost character. If you have something such as
> >> UltraEdit, you may use it to see if there are
> >> strange characters in the file. Or just use a
> known
> >> good file to check.
> >> 
> >> -------
> >> Explicit is better than implicit ... 
> >> 
> >> -----Original Message-----
> >> From: Anthony Liu
> [mailto:antonyliu2002 at yahoo.com] 
> >> Sent: 2004年3月10日 16:11
> >> To: python-chinese at lists.python.cn
> >> Subject: RE: [python-chinese] strage minidom or
> xml
> >> 
> >> 
> >> You are suggesting me to take a look at
> >> expatbuilder.py?
> >> 
> >> --- Jacob Fan <jacob at exoweb.net> wrote:
> >> >
> >>
> AL>
>
我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。
> >> > 
> >> > -------
> >> > Explicit is better than implicit ...
> >> > 
> >> > -----Original Message-----
> >> > From: Anthony Liu
> [mailto:antonyliu2002 at yahoo.com]
> >> > Sent: 2004年3月10日 16:00
> >> > To: pycn
> >> > Subject: Re: [python-chinese] strage minidom or
> >> xml
> >> > 
> >> > 
> >> > Man, I removed that line, but the problem
> remains.
> >> > Watch this:
> >> > 
> >> > ExpatError: not well-formed (invalid token):
> line
> >> 1,
> >> > column 5
> >> > 
> >> > 
> >> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> >> > > Hello Anthony,
> >> > > 
> >> > > ""
> >> > > is unnecessary !!!
> >> > > 
> >> > > === [ 15:43 ; 04-03-10 ] you wrote:
> >> > > 
> >> > > AL> I copied this sample xml file from the
> web:
> >> > > 
> >> > > AL> 
> >> > > AL> 
> >> > > AL> 
> >> > > AL>  Sample Document 
> >> > > AL>   Brandon 
> >> > > AL>  Voss 
> >> > > AL>  The XML Pages 
> 
> >> > > AL>  This is element text and an
> entity
> >> > follows:&Description;
> >> > > AL> 
> >> > > AL> 
> >> > > 
> >> > > AL> And then when I tried to parse it using
> the
> >> > > following
> >> > > AL> python code:
> >> > > 
> >> > > AL> from xml.dom import minidom
> >> > > AL> xmldoc = minidom.parse('samplexml.xml')
> >> > > AL> print xmldoc.toxml()
> >> > > 
> >> > > AL> Python still says that the xml is not
> >> > > well-formed.
> >> > > AL> See below:
> >> > > 
> >> > > AL> Traceback (most recent call last):
> >> > > AL>   File "C:\Python23\codes\xmltest.py",
> line
> >> 4,
> >> > > in
> >> > > AL> -toplevel-
> >> > > AL>     xmldoc =
> minidom.parse('samplexml.xml')
> >> > > AL>   File
> "C:\Python23\lib\xml\dom\minidom.py",
> >> > > line
> >> > > AL> 1919, in parse
> >> > > AL>     return expatbuilder.parse(file)
> >> > > AL>   File
> >> > > "C:\Python23\lib\xml\dom\expatbuilder.py",
> line
> >> > > AL> 924, in parse
> >> > > AL>     result = builder.parseFile(fp)
> >> > > AL>   File
> >> > > "C:\Python23\lib\xml\dom\expatbuilder.py",
> line
> >> > > AL> 207, in parseFile
> >> > > AL>     parser.Parse(buffer, 0)
> >> > > AL> ExpatError: not well-formed (invalid
> token):
> 
=== message truncated ===


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

2004年03月11日 星期四 00:47

Anthony Liu antonyliu2002 at yahoo.com
Thu Mar 11 00:47:27 HKT 2004

The parse is successful if I lower-case the "xml" in
the declaration of the xml document, and meanwhile
remove the ampersand (&) before "Description".

But if I insert some Chinese characters into the xml
document, the same sample python code cannot parse it.
 The code got stuck whenever it hits the 1st Chinese
character.

Python complains:

ExpatError: not well-formed (invalid token): line 3,
column 7

where lin3 and column 7 pinpoints the 1st byte of the
1st Chinese character in the xml document.

How can I correctly parse an xml document containing
Chinese using python? 

Give a hint, please.

--- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> Hello Anthony,
> 
> "Mandrake"??
> "C:\Python23\"??
> 
> WHAT SYSTEM  U RUNNING PYTHON??
> 
> so so at frist use Py test weel-format self!
> """
> from xml.sax.handler import ContentHandler
> from xml.sax import make_parser
> from glob import glob
> import sys
> 
> def parsefile(file):
>     parser = make_parser(  )
>     parser.setContentHandler(ContentHandler(  ))
>     parser.parse(file)
> 
> for arg in sys.argv[1:]:
>     for filename in glob(arg):
>         try:
>             parsefile(filename)
>             print "%s is well-formed" % filename
>         except Exception, e:
>             print "%s is NOT well-formed! %s" %
> (filename, e)
> """
> 
> and try expat to parsers ??
> minidom is poor and slow...
> """
> import xml.parsers.expat
> 
> # 3 handler functions
> def start_element(name, attrs):
>     print 'Start element:', name, attrs
> def end_element(name):
>     print 'End element:', name
> def char_data(data):
>     print 'Character data:', repr(data)
> 
> p = xml.parsers.expat.ParserCreate()
> 
> p.StartElementHandler = start_element
> p.EndElementHandler = end_element
> p.CharacterDataHandler = char_data
> 
> p.Parse("""
> Text goes
> here
> More text
> """)
> 
> """
> 
> === [ 16:27 ; 04-03-10 ] you wrote:
> 
> AL> I really don't know what happened to the code. I
> AL> tested that code and the sample xml file on the
> AL> Mandrake system, and I still get the same error
> AL> message: not well-formed.
> 
> AL> O, my gosh, I am really fed up with it.
> 
> 
> AL> --- Jacob Fan <jacob at exoweb.net> wrote:
> >> Please look at the traceback? If this is your
> >> script, how do you debug it? ;)
> >> First look at here expatbuilder.py line 207:
> >> > > "C:\Python23\lib\xml\dom\expatbuilder.py",
> line
> >> > > AL> 207, in parseFile
> >> > > AL>     parser.Parse(buffer, 0)
> >> > > AL> ExpatError: not well-formed (invalid
> token):
> >> > > line 1,
> >> > > AL> column 5
> >> The ExpatError is thrown by parser.Parse
> >> We could add a print statement above
> >> parser.Parse(buffer,0) to see which parser does
> it
> >> actually use. Then look into that parser to see
> >> where it throws a ExpatError with the message
> "not
> >> well-formed(invalid token)". But before that,
> maybe
> >> we can just, as Zoom.Quiet said, check if there
> are
> >> a ghost character. If you have something such as
> >> UltraEdit, you may use it to see if there are
> >> strange characters in the file. Or just use a
> known
> >> good file to check.
> >> 
> >> -------
> >> Explicit is better than implicit ... 
> >> 
> >> -----Original Message-----
> >> From: Anthony Liu
> [mailto:antonyliu2002 at yahoo.com] 
> >> Sent: 2004年3月10日 16:11
> >> To: python-chinese at lists.python.cn
> >> Subject: RE: [python-chinese] strage minidom or
> xml
> >> 
> >> 
> >> You are suggesting me to take a look at
> >> expatbuilder.py?
> >> 
> >> --- Jacob Fan <jacob at exoweb.net> wrote:
> >> >
> >>
> AL>
>
我建议你到源码里面看看。我每次遇到这种问题就先去看代码,看看某个结果是怎么出来的。
> >> > 
> >> > -------
> >> > Explicit is better than implicit ...
> >> > 
> >> > -----Original Message-----
> >> > From: Anthony Liu
> [mailto:antonyliu2002 at yahoo.com]
> >> > Sent: 2004年3月10日 16:00
> >> > To: pycn
> >> > Subject: Re: [python-chinese] strage minidom or
> >> xml
> >> > 
> >> > 
> >> > Man, I removed that line, but the problem
> remains.
> >> > Watch this:
> >> > 
> >> > ExpatError: not well-formed (invalid token):
> line
> >> 1,
> >> > column 5
> >> > 
> >> > 
> >> > --- "Zoom.Quiet" <zoomq at infopro.cn> wrote:
> >> > > Hello Anthony,
> >> > > 
> >> > > ""
> >> > > is unnecessary !!!
> >> > > 
> >> > > === [ 15:43 ; 04-03-10 ] you wrote:
> >> > > 
> >> > > AL> I copied this sample xml file from the
> web:
> >> > > 
> >> > > AL> 
> >> > > AL> 
> >> > > AL> 
> >> > > AL>  Sample Document 
> >> > > AL>   Brandon 
> >> > > AL>  Voss 
> >> > > AL>  The XML Pages 
> 
> >> > > AL>  This is element text and an
> entity
> >> > follows:&Description;
> >> > > AL> 
> >> > > AL> 
> >> > > 
> >> > > AL> And then when I tried to parse it using
> the
> >> > > following
> >> > > AL> python code:
> >> > > 
> >> > > AL> from xml.dom import minidom
> >> > > AL> xmldoc = minidom.parse('samplexml.xml')
> >> > > AL> print xmldoc.toxml()
> >> > > 
> >> > > AL> Python still says that the xml is not
> >> > > well-formed.
> >> > > AL> See below:
> >> > > 
> >> > > AL> Traceback (most recent call last):
> >> > > AL>   File "C:\Python23\codes\xmltest.py",
> line
> >> 4,
> >> > > in
> >> > > AL> -toplevel-
> >> > > AL>     xmldoc =
> minidom.parse('samplexml.xml')
> >> > > AL>   File
> "C:\Python23\lib\xml\dom\minidom.py",
> >> > > line
> >> > > AL> 1919, in parse
> >> > > AL>     return expatbuilder.parse(file)
> >> > > AL>   File
> >> > > "C:\Python23\lib\xml\dom\expatbuilder.py",
> line
> >> > > AL> 924, in parse
> >> > > AL>     result = builder.parseFile(fp)
> >> > > AL>   File
> >> > > "C:\Python23\lib\xml\dom\expatbuilder.py",
> line
> >> > > AL> 207, in parseFile
> >> > > AL>     parser.Parse(buffer, 0)
> >> > > AL> ExpatError: not well-formed (invalid
> token):
> 
=== message truncated ===


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com


[导入自Mailman归档:http://www.zeuux.org/pipermail/zeuux-python]

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2024

    京ICP备05028076号