Answer» Beginner here. Trying to get certain lines from an xml file to print with awk and/or sed and I need help.
I have an xml file like this:
Code: [Select] <item id="26141427"> <properties> <name>233D_camB_take02.mov</name> <path>/Dailies Released/VT096_DAY41_2011_10_27</path> <description>HI CU AARON PREPPING CAMERA</description> <status></status> <approved /> <created_by id="20184437"> <name>Movie</name> </created_by> <created_timestamp>2011-10-28T21:04:51Z</created_timestamp> <modified_by id="17929743"> <name>Some dude</name> </modified_by> <modified_timestamp>2011-10-31T14:59:54Z</modified_timestamp> <width>1280</width> <height>720</height> <timebase>24</timebase> <mime_type>video/quicktime</mime_type> </properties> <attributes> <attribute key="Camera">B</attribute> <attribute key="Description">HI CU AARON PREPPING CAMERA</attribute> <attribute key="End">16:40:32:00</attribute> <attribute key="Name">233D-2B</attribute> <attribute key="Notes"></attribute> <attribute key="Scene">233D</attribute> <attribute key="Shoot_Date">10/27/2011</attribute> <attribute key="Shoot_Day">41</attribute> <attribute key="Start">16:37:52:00</attribute> <attribute key="Take">2</attribute> <attribute key="Tape">VT096</attribute> </attributes> <tags /> <notes /> </item> What I need is to print the lines:
Code: [Select] <item id="26141427"> <name>233D_camB_take02.mov</name> <attribute key="Name">233D-2B</attribute> In the end I need this in a document:
Code: [Select]item id="26141427" 233D_camB_take02.mov 233D-2BFollowed by a blank line and then the next item. There are multiple items in the document.
Some things to NOTE, there MAY be multiple tags but I only need the ones with the string ".mov" present. That string will always be present in every item but will only be present once in every item.
However, as can be seen in the example above, there may or may not be other lines like movie and Some dude. These need to be ignored. So while the other entries I'm looking for can be found by searching for their tags, it is probably better to find that entry by looking for the ".mov" string.
Also, there may or may not be a some value entry. If it is there, I need it. All other tags need to be ignored.
Lastly, because each item may or may not have certain entries, this cannot be done by a line number algorithm but needs to be done by search for patterns.
So, in summary:
- Will always be present and will only be present once per item. I need the output: item id=123456 something.mov - Will always be present but only once with the ".mov" string. May or may not be present with other strings. I need the output: something.mov. Other instances should be ignored. something May or may not be present. If it is there, I need the output: something
What I have so far is this:
Code: [Select]sed -n '/<item id="/,/>/p' marcherdailiescopy.xml | awk '{sub("<properties>",""); print}' | awk '{sub("<",""); print}' | awk '{sub(">",""); print}' My first problem is that the sed command returns the item id but also returns the tags ans the next tag followed by the next item like this:
Code: [Select] <item id="27385774"> <properties>
So I'm using awk to strip out the EXTRA strings and characters there, but I know there is a more efficient way to do this. I also don't know how to get awk or sed to grab the strings I need in ORDER so it places them together. I can get:
item id item id item id ... ...
value.mov value.mov value.mov ... ... ...
But I need:
item id value.mov name (if it is there)
item id value.mov name (if it is there)
... ...
I also don't know whether it would be more efficient to delete everything other than what I need or grab only what I need. Any help would be Kool and the Gang!
Thanks,
DanI should also add that each item will follow the format:
Code: [Select]<item id="123456"> a bunch of tags and values </item>Try something like this
Code: [Select]grep '.mov|item_id|attribute key="Name"' xmlfile You will get the required lines.
Then use sed to extract what ever the data you need.That worked! Thanks so much.
|