On another forum a user asked how to "Split XML and output to different files" and did not get an answer, just several confusing responses about versions of XSLT. This is how to do it in the free firstobject XML editor:
split_XML() { CMarkup input; input.Load( "input.xml" ); while ( input.FindElem("//npc") ) WriteTextFile( "npc"+input.GetAttrib("id")+".xml", input.GetSubDoc() ); }
This script will read input.xml and create an npc[ID].xml file for each npc subdocument. This was the question:
Starting with | ... | Desired result |
---|---|---|
"input.xml" <root>
<npc id="1">
<p pid="1"/>
<p pid="2"/>
<p pid="3"/>
</npc>
<npc id="2">
<p pid="3"/>
<p pid="4"/>
<p pid="5"/>
</npc>
<npc id="3">
<p pid="4"/>
<p pid="5"/>
<p pid="6"/>
</npc>
</root>
|
"npc1.xml" <npc id="1">
<p pid="1"/>
<p pid="2"/>
<p pid="3"/>
</npc>
"npc2.xml" <npc id="2">
<p pid="3"/>
<p pid="4"/>
<p pid="5"/>
</npc>
"npc3.xml" <npc id="3">
<p pid="4"/>
<p pid="5"/>
<p pid="6"/>
</npc>
|
Avoid XSLT because it makes this much more tricky.
For an 862MB file you should use the Open method instead of Load to read the input file. Then you will use very little memory. Note that unless you have an exceptional amount of free memory, you cannot view files that large in the firstobject XML editor.
See examples for huge files in Split XML file into smaller pieces and How to generate file names with XML splitter script.
Open not getting anything
Angela Baines 15-Jan-2010
With the editor and foal script using Open
it doesn't seem to get anything when stepping through in debug it shows xmloutput (cmarkup) (0)
and xmlinput (cmarkup) (0)
.
It is not finding the input file. In the path strings use double backslashes in quotes:
xmlInput.Open("C:\\XML files\\testing.xml");
You can also check the result of Open
and get an explanation with GetResult or GetError:
t() { CMarkup m; if ( ! m.Open("c:\\does_not_exist.txt",MDF_READFILE) ) return m.GetError(); return m.GetDoc(); }
The output is:
The system cannot find the file specified.
add static data to each XML piece
Jeff Taylor 10-Mar-2011
Is there any way I can add some static data (like a date or something) to each XML piece?
Say you have a big file with a list of companies and you want to split it into files called Company1.xml, Company2.xml etc. Before writing out each company file you want to add the date to it. First, assign the company subdocument to its own CMarkup object called company
and set the attribute in its top element.
CMarkup company = input.GetSubDoc(); company.FindElem(); // Company company.SetAttrib( "timestamp", sDate );
Here is the script written in such a way that you can run it from the DOS command line specifying the date you want to put into the individual output files.
split_and_set_date(str sDate) { int nCompanyCount = 0; CMarkup input; if ( ! input.Load("C:\\Companies.xml") ) return input.GetError(); while ( input.FindElem("//Company") ) { CMarkup company = input.GetSubDoc(); company.FindElem(); // Company company.SetAttrib( "timestamp", sDate ); ++nCompanyCount; if ( ! company.Save("C:\\Company"+nCompanyCount+".xml") ) return company.GetError(); } return nCompanyCount; }
If this script was in C:\split.foal you could run it from the DOS command line as follows (see Using the firstobject XML editor from the command line and make sure you have foxe release 2.4.2).
"C:\Program Files\firstobject\foxe.exe" -run C:\split.foal 20110310T121500
<Companies>
<Company id="56A" zone="A">
...
</Company>
<Company id="62B" zone="B">
...
</Company>
</Companies>
You could do all sorts of things with the company subdocument before writing it to file, even extract a value used to name the output file. Also, see the top of the script where sDate
is passed into the function; you could pass in the input path and output base to which N.xml will be appended, and even some criteria (like a zone) to control which companies get output. The following does all this, and uses a company ID attribute to name the output file.
split_and_set_date(str sDate, str sZone, str sInPath, str sOutPath) { int nCompanyCount = 0, nSelectionCount = 0; CMarkup input; if ( ! input.Load(sInPath) ) return input.GetError(); while ( input.FindElem("//Company") ) { ++nCompanyCount; CMarkup company = input.GetSubDoc(); company.FindElem(); if ( company.GetAttrib("zone") == sZone ) // e.g. zone "A" { ++nSelectionCount; company.SetAttrib( "timestamp", sDate ); str sID = company.GetAttrib( "id" ); // e.g. "56A" if ( ! company.Save(sOutPath+sID+".xml") ) // e.g. C:\Company56A.xml return company.GetError(); } } return "selected " + nSelectionCount + "/" + nCompanyCount; }
This could be called like this:
foxe.exe -run C:\split.foal 20110310T121500 A C:\Companies.xml C:\Company
See also:
Split and Merge Translation XML
Using the firstobject XML editor from the command line
Split XML file into smaller pieces
Video of XML splitter script for splitting XML files
Angela Baines 15-Jan-2010
Hi I'm trying to use the xml splitter script but I'm getting an out of memory exception when I try to [load] the xml file and the script just exits on the [next] line within the free editor. The file is 862 mb but it does state that the same method can be used for gigabite size files.