[auscope-geosciml] Improper encoding in UML notes fileds impacts Fullmoon-generated HTML documentation

Létourneau, François Francois.Letourneau at RNCan-NRCan.gc.ca
Thu Sep 16 10:41:39 EDT 2010

I am posting this on both list, as it can benefit both communities (GeoSciML modellers and Fullmoon users).
This week, I worked on the generation of the documentation for GeoSciML 2.1.1 using Fullmoon. I used the UML model, which was already corrected. No errors were detected from the Fullmoon tests. I was then able to generate in a breeze the schemas, export them and generate the documentation. 
While doing quality control over the generated HTML documentation files, I noticed that while all classes for the model were generated correctly, some of them were missing the XSD snippets, placed under the "GML-conformant XML Implementation Details" in the HTML documentation. All the classes from three distinct UML leaves under GeoSciML were missing this information. I noted however that the schemas pertaining to these leaves were created while doing the execute-enc command. The whole schemas were validated using XMLSpy (with some character encoding issues) . After investigation, I found that for the creation of the documentation, the schemas are copied in a distinct structure un the eXist database, under /db/auto-doc/xmi11ea/schemas. I opened the logs from Fullmoon and Exist and the only problem that was identified from the log is the following (from xmldb.log) : 
2010-09-15 14:15:39,084 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document borehole.xsd 
2010-09-15 14:15:39,100 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document collection.xsd 
2010-09-15 14:15:39,115 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document earthMaterial.xsd 
2010-09-15 14:15:39,131 [http-8080-5] ERROR (LocalCollection.java [storeXMLResource]:757) - org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. 
2010-09-15 14:15:39,131 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document fossil.xsd 
2010-09-15 14:15:39,162 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document geologicAge.xsd 
2010-09-15 14:15:39,178 [http-8080-5] ERROR (LocalCollection.java [storeXMLResource]:757) - org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. 
2010-09-15 14:15:39,178 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document geologicFeature.xsd 
2010-09-15 14:15:39,209 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document geologicRelation.xsd 
2010-09-15 14:15:39,240 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document geologicStructure.xsd 
2010-09-15 14:15:39,272 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document geologicUnit.xsd 
2010-09-15 14:15:39,303 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document geosciml.xsd 
2010-09-15 14:15:39,334 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document value.xsd 
2010-09-15 14:15:39,350 [http-8080-5] ERROR (LocalCollection.java [storeXMLResource]:757) - org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. 
2010-09-15 14:15:39,350 [http-8080-5] DEBUG (LocalCollection.java [storeResource]:647) - storing document vocabulary.xsd 
I reopened the three xsd files in XMLSpy and it displayed errors regarding the encoding of the file. Some characters were not encoded in UTF-8. I found where the wrong characters were in the xsd files. They were all in the notes elements of either classes or attributes. Probably a text that was copied from a word document or a website with an unsupported character. I applied the correction to the UML model and reload the XMI into Fullmoon, but while the xsd files were produced now without any error in their character encoding, the three xsd files were still missing in the eXist database and the same errors were displayed in the log (Invalid byte 1 of 1-byte UTF-8 sequence). I had to comment an instruction in one of the xquery files which Fullmoon uses to be able to place manually the corrected xsd files into the exist database and then run the execute-doc command. The final output was correct, all the classes had their xsd snippet.
**Conclusion : be aware that notes you are including in the UML model could have an impact when generating the model documentation. A good practice would be to properly encode it in UTF-8 before putting it into the notes.
**Question: would it be possible to add a converter into Fullmoon to make sure wrong character encoding is not an issue when generating the documentation?
François Létourneau
Professionnel de recherche - géomatique / TI
Institut national de la recherche scientifique
Centre Eau Terre Environnement - Centre Géoscientifique de Québec
490, de la Couronne, bureau 3344
Québec (Québec) G1K 9A9 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opengeospatial.org/pipermail/geosciml/attachments/20100916/c62a26e7/attachment.htm>

More information about the GeoSciML mailing list