HTML не является обычным языком, поэтому сначала имейте в виду, что попытка разобрать его с помощью регулярных выражений – это первоклассный билет в безумие. Тем не менее, похоже, что HTML, который вы пытаетесь пережевывать, должен поддаваться довольно простому извлечению. Данные, которые вы хотите извлечь, кажутся следующими:
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_0" ID="Address_0" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_0" ID="Count_0" TYPE="text" VALUE="240155653"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_0" ID="Device_Exception_0" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_0" ID="Device_Exception_Code_0" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_1" ID="Address_1" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_1" ID="Count_1" TYPE="text" VALUE="48719610"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_1" ID="Device_Exception_1" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_1" ID="Device_Exception_Code_1" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_2" ID="Address_2" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_2" ID="Count_2" TYPE="text" VALUE="0"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_2" ID="Device_Exception_2" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_2" ID="Device_Exception_Code_2" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_3" ID="Address_3" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_3" ID="Count_3" TYPE="text" VALUE="1"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_3" ID="Device_Exception_3" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_3" ID="Device_Exception_Code_3" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_4" ID="Address_4" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_4" ID="Count_4" TYPE="text" VALUE="1973018"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_4" ID="Device_Exception_4" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_4" ID="Device_Exception_Code_4" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_5" ID="Address_5" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_5" ID="Count_5" TYPE="text" VALUE="17260345"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_5" ID="Device_Exception_5" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_5" ID="Device_Exception_Code_5" TYPE="hidden" VALUE="0"></tr>
Итак, мы, вероятно, можем sed
это:
$ sed -n '/HA7Value.*Address_/{ s/VALUE="/%%%/;s/^.*%%%//; s/".*//; p; }' input.html
D90000000C8A9A1D
D90000000C8A9A1D
C00000000C8C9D1D
C00000000C8C9D1D
2D0000000EE97D1D
2D0000000EE97D1D
Объяснить это:
/HA7Value.*Address_/ # Only run on lines that match this expression
{ # Begin code block
s/VALUE="/%%%/ # Replace (only) the first 'VALUE="' with a special marker
s/^.*%%%// # Delete everything up to that marker
s/".*// # Delete from the first '"' to the end of the line
p # Print what's left
} # End code block