I have a folder full of files which are named with four digits and a file extension e.g. 0312.file and an XML-file describing the contents of these files. I am trying to do a shell scipt that will create folders and rename these files according to the xml, but I don’t know how?
This is a template of what the xml looks like:
<section label=”AA. category”>
<children label=”topic” source=”AABB.file” />
</ section>
The script should represent the categories as folders containing the related topics, which should be represented as filenames.
The complexity of this shell script is directly related to how well formed the XML data file you’re working with is. That is, if it’s laid out so that each filename map appears on its own line, we can do that, and do that pretty directly. If we need to actually parse the XML data file, extract value pairs and then parse that, it’s going to be more difficult. In fact, at that point I would suggest you move to Perl or anther language that has better XML support rather than brute-force it with a Linux shell script.
Still, you sent along some sample data and source files and I have a pretty good sense of what you’re trying to accomplish, so let’s give it a shot!
The basic idea of what we’re going to do is to step through the XML file looking for lines that contain a filename mapping, extract the source and target values, then apply them with a rename operation.
Here’s the data file we’ll be working with:
<children label=”Welcome” source=”0101.mp4″ />
<children label=”How To do Stuff” source=”0102.mp4″ />
<children label=”About some other things” source=”0103.mp4″ />
</section>
<section label=”02. Additional information”>
<children label=”Important information” source=”0201.mp4″ />
<children label=”More important information” source=”0202.mp4″ />
<children label=”additional important information” source=”0203.mp4″ />
</section>
Looks complicated, but let’s tackle it by ignoring the section labels and instead just look for “children” lines. Each of those has source and label, the latter being the target filename.
For example, the first line has label=”Welcome” and source=”0101.mp4″. Using this as a base, we’d want to rename 0101.mp4 to Welcome.mp4. At least, that’s what this script will do!
Step one is to extract just the children lines in the XML file, something I’ll do with a slightly unusual shell script structure:
do
…
done
It’s little known that any shell block structure is a redirectable container, so I just slip it right into the pipeline and the input to the “while” loop are only the matching lines from the XML data file.
Since it’s well formed data, we can just use “cut” and mark the double-quote as the delimiter:
source=”$(echo $inputline | cut -d\” -f4)”
That’s all the hard work. Now let’s just test to ensure that the source file exists before we issue the actual file renaming instruction. I’ll put it all together here:
do
label=”$(echo $inputline | cut -d\” -f2)”
source=”$(echo $inputline | cut -d\” -f4)”
if [ -f $source ] ; then
echo “Renaming $source to $label.mp4”
mv “$source” “$label.mp4”
else
echo “Error: Matched $source but can’t find it. Skipped”
fi
done
Now you added an additional complication here because you didn’t just want to look at the “children” lines, but the section labels too, using the latter as subfolder names. That’s more tricky, but still solvable: change the “grep” to include both types of lines in the data file, then have a conditional statement testing to see which you have. Use “cut” to extract category name, then when you’re doing the “mv”, simply tweak it to something like:
echo “Creating directory $category”
mkdir “$category”
fi
mv “$source” $category/$label.mp4″
See what I’m doing? Not too much more complicated after all!