How to extract text from SVG file with XSLT

Discuss SVG code, accessible via the XML Editor.
Pander
Posts: 19
Joined: Tue Sep 08, 2009 7:46 pm

How to extract text from SVG file with XSLT

Postby Pander » Sun Sep 30, 2012 1:45 am

I want to extract all text from an SVG file with XSLT to do proofreading and spelling and grammar checking. Here http://www.w3.org/2002/05/svg2stuff.html are two examples of which only the second is not working but the third is working. I have optimsed it to:

Code: Select all

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" indent="no" encoding="utf-8"/>
</xsl:stylesheet>


which can be run with:

Code: Select all

sudo apt-get install libsaxon-java
java -jar /usr/share/java/saxon.jar -o text.txt drawing.svg svg2txt.xsl


However, I have the following questions:

How can I get new lines in my output after each text element in the same node? Now they are all concatenated together. E.g. now <text>Hi</text><text>there.<text> results in Hithere.

How can I get no more than two consecutive new lines in the output? I.e. not more than one blank line.

What are the benefits of the second transformation from http://www.w3.org/2002/05/svg2stuff.html

Pander
Posts: 19
Joined: Tue Sep 08, 2009 7:46 pm

Re: How to extract text from SVG file with XSLT

Postby Pander » Thu Nov 01, 2012 10:08 pm

Anyone?

chriswww
Posts: 383
Joined: Fri Nov 19, 2010 3:04 pm

Re: How to extract text from SVG file with XSLT

Postby chriswww » Fri Nov 02, 2012 1:09 pm

Regards the first part of your question, it's basically all up to you. If you have a suitable xslt selector to loop through all <text> direct children nodes, then just append a carriage return after each one. There's no inherent newline etc output handling in xslt which is an added feature. Hope that makes sense.

I'm not understanding the second part of your question. I will just say that an empty stylesheet has some useful BUT limited built-in behaviour. When you need more than that, you need to code it. Those are very old posts by the way, so you need to check with the inkscape svg file itself and the specs of your xslt processor as to what happens. Usual debugging procedure applies.

Note that you can also have text in inkscape in a <tspan> tag set.


Return to “SVG / XML Code”