2

What's a simple example of removing a node from an XML file using Saxon 12.9 and an XSLT 3.0 stylesheet?

I've got an XML export from Blogger, and I'm puzzled on how to remove just the COMMENT entries but retain the POST entries.

Below is the input.xml file, which contains COMMENT entries I want to remove, but also POST entries that I want to retain in output.xml:

input.xml:

<?xml version='1.0' encoding='utf-8'?> <feed xmlns='http://www.w3.org/2005/Atom' xmlns:blogger='http://schemas.google.com/blogger/2018'> <id>tag:blogger.com,1999:blog-17477</id> <title>Test Blog</title> <entry> <id>tag:blogger.com,1999:blog-17477.post-3947073770</id> <blogger:parent>tag:blogger.com,1999:blog-17477.post-23573</blogger:parent> <blogger:inReplyTo/> <blogger:type>COMMENT</blogger:type> <blogger:status>LIVE</blogger:status> <author> <name>Name Name</name> <blogger:type>ANONYMOUS</blogger:type> </author> <content type='html'>A comment.....</content> <blogger:created>2024-06-10T10:32:13.389Z</blogger:created> <published>2024-06-10T10:32:13.389Z</published> <updated>2024-06-10T10:32:13.389Z</updated> <blogger:trashed/> </entry> <entry> <id>tag:blogger.com,1999:blog-17477.post-670855911</id> <blogger:type>POST</blogger:type> <blogger:status>LIVE</blogger:status> <author> <name>Author</name> <uri></uri> <blogger:type>BLOGGER</blogger:type> </author> <title>Title Title</title> <content type='html'>Content Content Content Content Content</content> <blogger:metaDescription/> <blogger:created>2011-01-05T16:33:59.731Z</blogger:created> <published>2011-01-06T12:32:00.001Z</published> <updated>2011-01-06T12:32:00.138Z</updated> <blogger:location/> <category scheme='tag:blogger.com,1999:blog-17477683' term='News'/> <blogger:filename>/2011/01/post.html</blogger:filename> <link/> <enclosure/> <blogger:trashed/> </entry> <entry> <id>tag:blogger.com,1999:blog-17477.post-4539665487</id> <blogger:parent>tag:blogger.com,1999:blog-17477.post-8659501057</blogger:parent> <blogger:inReplyTo/> <blogger:type>COMMENT</blogger:type> <blogger:status>LIVE</blogger:status> <author> <name>Author 2</name> <blogger:type>BLOGGER</blogger:type> </author> <content type='html'>My comment</content> <blogger:created>2009-11-30T20:09:49.055Z</blogger:created> <published>2009-11-30T20:09:49.055Z</published> <updated>2009-11-30T20:09:49.055Z</updated> <blogger:trashed/> </entry> </feed> 

Desired output.xml:

<?xml version='1.0' encoding='utf-8'?> <feed xmlns='http://www.w3.org/2005/Atom' xmlns:blogger='http://schemas.google.com/blogger/2018'> <id>tag:blogger.com,1999:blog-17477</id> <title>Test Blog</title> <entry> <id>tag:blogger.com,1999:blog-17477.post-670855911</id> <blogger:type>POST</blogger:type> <blogger:status>LIVE</blogger:status> <author> <name>Author</name> <uri></uri> <blogger:type>BLOGGER</blogger:type> </author> <title>Title Title</title> <content type='html'>Content Content Content Content Content</content> <blogger:metaDescription/> <blogger:created>2011-01-05T16:33:59.731Z</blogger:created> <published>2011-01-06T12:32:00.001Z</published> <updated>2011-01-06T12:32:00.138Z</updated> <blogger:location/> <category scheme='tag:blogger.com,1999:blog-17477683' term='News'/> <blogger:filename>/2011/01/post.html</blogger:filename> <link/> <enclosure/> <blogger:trashed/> </entry> </feed> 

Below is the skeleton of a stylesheet.xsl, borrowed from Martin Honnen's answer to an earlier question of mine Use xmlstarlet and XPath to find/replace HTML entities in an XML node

sample stylesheet.xsl:

<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xpath-default-namespace="http://www.w3.org/2005/Atom" exclude-result-prefixes="#all" expand-text="yes"> // how to remove only COMMENT nodes and leave POST nodes? </xsl:stylesheet> 

How do I designate only the COMMENT nodes to be removed in stylesheet.xsl?

1
  • 2
    There are two ways you can do this: (1) identity transform + an empty template matching the nodes you want to remove; (2) identity transform + a template matching feed that applies templates to (or copies) only child nodes you want to keep. Commented Oct 16 at 19:27

1 Answer 1

2

You can use

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xpath-default-namespace="http://www.w3.org/2005/Atom" xmlns:blogger='http://schemas.google.com/blogger/2018' exclude-result-prefixes="#all"> <xsl:mode on-no-match="shallow-copy"/> <xsl:template match="entry[blogger:type = 'COMMENT']"/> </xsl:stylesheet> 

might want to add <xsl:output indent="yes"/><xsl:strip-space elements="*"/> as children of xsl:stylesheet, to avoid the identity shallow-copy leaving you with empty lines between elements you probably don't want.

++&input=++tag:blogger.com,1999:blog-17477++Test+Blog++++++tag:blogger.com,1999:blog-17477.post-3947073770++++tag:blogger.com,1999:blog-17477.post-23573++++++++COMMENT++++LIVE++++++++++Name+Name++++++ANONYMOUS++++++++A+comment.....++++2024-06-10T10:32:13_389Z++++2024-06-10T10:32:13_389Z++++2024-06-10T10:32:13_389Z++++++++++++tag:blogger.com,1999:blog-17477.post-670855911++++POST++++LIVE++++++++++Author++++++++++++BLOGGER++++++++Title+Title++++Content+Content+Content+Content+Content++++++++2011-01-05T16:33:59_731Z++++2011-01-06T12:32:00_001Z++++2011-01-06T12:32:00_138Z++++++++++++/2011/01/post_html++++++++++++++++++++tag:blogger.com,1999:blog-17477.post-4539665487++tag:blogger.com,1999:blog-17477.post-8659501057++++++++COMMENT++++LIVE++++++++++Author+2++++++BLOGGER++++++++My+comment++++2009-11-30T20:09:49_055Z++++2009-11-30T20:09:49_055Z++++2009-11-30T20:09:49_055Z++++++&code-type=XSLT&input-type=XML&auto-evaluate=true" rel="nofollow noreferrer">Example online fiddle.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks much! That works great :) I'm slowly learning.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.