Я хотел бы сохранить все уникальные имена пользователей, присутствующие в файле extended.log
, в новом файле с помощью команд awk
, grep
] и / или sed
.
Ниже приведены имена полей в моем файле, разделенные табуляцией. и мне просто нужны значения поля «имя пользователя»
(12-е поле).
"record_id" "client_id" "request_id" "date_time" "elapsed_time" "status" "size" "upload" "download" "bypassed" "client_ip" "username" "method" "url" "http_referer" "useragent" "mime" "filter_name" "filtering_reason" "interface" "cachecode" "peercode" "peer" "request_host" "request_tld" "referer_host" "referer_tld" "range" "time_profiles" "user_groups" "request_profiles" "application_signatures" "categories" "response_profiles" "upload_content_types" "download_content_types" "profiles"
Ниже приводится образец содержимого файла:
"SVZerDLJhIj6G3PA.6575.1466420105.346.1837.1" "1837" "1" "20/Jun/2016:16:25:05" "4" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420107.357.1838.1" "1838" "1" "20/Jun/2016:16:25:07" "4" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420109.367.1840.1" "1840" "1" "20/Jun/2016:16:25:09" "4" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420111.377.1841.1" "1841" "1" "20/Jun/2016:16:25:11" "4" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420113.387.1842.1" "1842" "1" "20/Jun/2016:16:25:13" "5" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420115.399.1843.1" "1843" "1" "20/Jun/2016:16:25:15" "5" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420117.410.1844.1" "1844" "1" "20/Jun/2016:16:25:17" "4" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420119.421.1845.1" "1845" "1" "20/Jun/2016:16:25:19" "4" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420121.431.1846.1" "1846" "1" "20/Jun/2016:16:25:21" "4" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420123.445.1847.1" "1847" "1" "20/Jun/2016:16:25:23" "4" "200" "0" "-" "0" "-" "192.168.12.13" "anonymous@192.168.12.13""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420108.240.1839.1" "1839" "1" "20/Jun/2016:16:25:23" "15623" "200" "2826" "0" "2826" "-" "192.168.0.14" "anonymous@192.168.0.14""CONNECT" "connect://livehelp.safesquid.com:443/" "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "livehelp.safesquid.com" "livehelp.safesquid.com" "safesquid.com" "-" "-" "1K-10K" "" "NO_AUTHENTICATION" "uncachable request,BUSINESS SITES REQ" "" "computersandsoftware" "" "" "" "uncachable"
С помощью awk
на вкладке -файл с разделителями:
awk -F '\t' '{ print $12 }' file
Это позволит извлечь 12-е поле. Перенаправьте вывод в новый файл, если хотите.
Чтобы удалить из данных двойные кавычки, вы можете использовать
awk -F '\t' '{ sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file
Перед печатью выполняется две замены для удаления первого и последнего символов 12-го поля (, если они представляют собой двойные кавычки ).
Чтобы пропустить первую строку, если это строка заголовка:
awk -F '\t' 'FNR > 1 { sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file
Чтобы получить только уникальные имена пользователей, используя толькоawk
:
awk -F '\t' 'FNR > 1 && !( $12 in seen ) { seen[$12]++; sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file
При этом используется массив с ключом в 12-м поле, чтобы отслеживать, какие имена пользователей уже были просмотрены. Если данные в 12-м поле не являются ключом в массиве, значит, его не видели.
Альтернативный способ — просто протестировать !seen[$12]
вместо !( $12 in seen )
.
Использование sort
для получения уникальных (и отсортированных )имен пользователей:
awk -F '\t' 'FNR > 1 { sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file | sort -u