
A conundrum for a sed wizard - 00_NOP
http://cartesianproduct.wordpress.com/2013/12/09/a-conundrum-for-a-sed-wizard/
======
finnh
rather than try to convince sed to match your corrupt line via regexp, just
use awk to split the file into 2 pieces - the lines before the corrupt line
(into file A) and the lines after the corrupt line (into file B).

Then cat the 2 files together, with your fixed line in the middle:

    
    
        cat A > fixed.xml
        echo "my fix" >> fixed.xml
        cat B >> fixed.xml

~~~
alexkus
That was my original suggestion (awaiting moderation):-

    
    
        head -35185221 infile.xml > tmp.xml
        echo "<load address=’11c1385b’ size=’08’ />" >> tmp.xml
        tail -n +35185223 infile.xml >> tmp.xml

------
alexkus
Works fine for me:-

    
    
        $ cat infile.xml
        1
        <load address=’11c1�����ze=’08′ />
        3
        $ cat seddy.sed
        2s@^.*$@<load address=’11c1385b’ size=’08’ />@
        p
        $ sed -n -f seddy.sed infile.xml
        1
        <load address=’11c1385b’ size=’08’ />
        3
        $
    

So I'm guessing it's locale related (LC_CTYPE is en_GB.UTF-8 for me), or the
string that forms his corrupt line is different from what was pasted. The
output of:-

    
    
        sed -n '35185222p' infile.xml | hexdump -b
    

would help in that regard.

------
VLM
This just looks painful. Not the problem, the 35 million line XML document.
Which probably will not validate after eating that line.

None the less. I'm sure the constants are a bit off here.

head -35185222 infile.xml > OhGodThisIsHorrific.xml

tail -n+35185224 infile.xml >> OhGodThisIsHorrific.xml

I'm guessing a 35 million line file, is like, what, enterprise Java version of
hello world? (just kidding, sorta)

(Edited to fix the line spacing. Also the double greater than is not a typo)

------
pit
This seems like a great question for StackOverflow.

------
tlow
Are you using unicode handling?

