Imprimir palavra contendo string e primeira palavra

10

Quero encontrar uma string em uma linha de texto e imprimir a string (entre espaços) e a primeira palavra da frase.

Por exemplo:

"Esta é uma única linha de texto"
"Outra coisa"
"É melhor você tentar de novo"
"Melhor"

A lista de strings é:

texto
coisa
experimentar
Melhor

O que estou tentando é obter uma tabela como esta:

Este texto [guia]
Outra coisa [tab]
Ele tenta
Melhor

Eu tentei com grep, mas nada ocorreu. Alguma sugestão?

command-line text-processing regex

— Felipe Lira
fonte

Então, basicamente "Se a linha tiver uma string, imprima a primeira palavra + string". Direita ?

— Sergiy Kolodyazhnyy

12

Versão Bash / grep:

#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.

text_file="$1"
shift

for string; do
    # Find string in file. Process output one line at a time.
    grep "$string" "$text_file" | 
        while read -r line
    do
        # Get the first word of the line.
        first_word="${line%% *}"
        # Remove special characters from the first word.
        first_word="${first_word//[^[:alnum:]]/}"

        # If the first word is the same as the string, don't print it twice.
        if [[ "$string" != "$first_word" ]]; then
            echo -ne "$first_word\t"
        fi

        echo "$string"
    done
done

Chame assim:

./string-and-first-word.sh /path/to/file text thing try Better

Resultado:

This    text
Another thing
It  try
Better

— wjandrea
fonte

9

Perl para o resgate!

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;

open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
    if (my ($match) = /$regex/) {
        print my ($first) = /^\S+/g;
        if ($match ne $first) {
            print "\t$match";
        }
        print "\n";
    }
}

Salvar como first-plus-word, execute como

perl first-plus-word file.txt text thing try Better

Ele cria uma regex a partir das palavras de entrada. Cada linha é comparada com a regex e, se houver uma correspondência, a primeira palavra será impressa e, se for diferente da palavra, a palavra também será impressa.

— choroba
fonte

9

Aqui está uma versão awk:

awk '
  NR==FNR {a[$0]++; next;} 
  {
    gsub(/"/,"",$0);
    for (i=1; i<=NF; i++)
      if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
  }
  ' file2 file1

Onde file2está a lista de palavras e file1contém as frases.

— chave de aço
fonte

2

Um bom! Eu colocá-lo em um arquivo de script, paste.ubuntu.com/23063130 , apenas por conveniência

— Sergiy Kolodyazhnyy

8

Aqui está a versão python:

#!/usr/bin/env python
from __future__ import print_function 
import sys

# List of strings that you want
# to search in the file. Change it
# as you fit necessary. Remember commas
strings = [
          'text', 'thing',
          'try', 'Better'
          ]


with open(sys.argv[1]) as input_file:
    for line in input_file:
        for string in strings:
            if string in line:
               words = line.strip().split()
               print(words[0],end="")
               if len(words) > 1:
                   print("\t",string)
               else:
                   print("")

Demo:

$> cat input_file.txt                                                          
This is a single text line
Another thing
It is better you try again
Better
$> python ./initial_word.py input_file.txt                                      
This    text
Another     thing
It  try
Better

Nota lateral : o script é python3compatível, portanto, você pode executá-lo com python2ou python3.

— Sergiy Kolodyazhnyy
fonte

7

Tente o seguinte:

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/p' File
This    text
Another thing
It      try
        Better

Se a guia anterior a Betterfor um problema, tente o seguinte:

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/; ta; b; :a; s/^\t//; p' File
This    text
Another thing
It      try
Better

O acima foi testado no GNU sed (chamado gsedno OSX). Para o BSD sed, algumas pequenas alterações podem ser necessárias.

Como funciona

s/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/

Isso procura por uma palavra, [[:alnum:]]+seguida por um espaço, [[:space:]]seguida por qualquer coisa .*, seguida por uma de suas palavras text|thing|try|Better, seguida por qualquer coisa. Se isso for encontrado, ele será substituído pela primeira palavra na linha (se houver), uma guia e a palavra correspondente.
ta; b; :a; s/^\t//; p

Se o comando de substituição resultou em uma substituição, significando que uma das suas palavras foi encontrada na linha, o tacomando diz ao sed para pular para o rótulo a. Caso contrário, ramificamos ( b) para a próxima linha. :adefine o rótulo a. Portanto, se uma das suas palavras foi encontrada, (a) fazemos a substituição s/^\t//que remove uma guia à esquerda, se houver uma, e (b) imprimimos ( p) a linha.

— John1024
fonte

7

Uma abordagem simples bash / sed:

$ while read w; do sed -nE "s/\"(\S*).*$w.*/\1\t$w/p" file; done < words 
This    text
Another thing
It  try
    Better

O while read w; do ...; done < wordsiterará sobre cada linha do arquivo wordse o salvará como $w. O -nfaz sednão imprime nada por padrão. O sedcomando, então, substituirá aspas duplas seguidas por espaços em branco ( \"(\S*), os parênteses servem para "capturar" o que corresponde \S*à primeira palavra, e depois podemos nos referir a ela como \1), 0 ou mais caracteres ( .*) e, em seguida, o palavra que estamos procurando ( $w) e 0 ou mais caracteres novamente ( .*). Se este partidas, nós substitui-lo com apenas o 1º palavra, um guia e $w( \1\t$w), e imprimir a linha (que é o que o pno s///pfaz).

— Terdon
fonte

5

Esta é a versão Ruby

str_list = ['text', 'thing', 'try', 'Better']

File.open(ARGV[0]) do |f|
  lines = f.readlines
  lines.each_with_index do |l, idx|
    if l.match(str_list[idx])
      l = l.split(' ')
      if l.length == 1
        puts l[0]
      else
        puts l[0] + "\t" + str_list[idx]
      end
    end
  end
end

O arquivo de texto de amostra hello.txtcontém

This is a single text line
Another thing
It is better you try again
Better

Executando com ruby source.rb hello.txtresultados em

This    text
Another thing
It      try
Better

— Anwar
fonte