Qual é a maneira mais precisa de determinar a cor de um objeto?

Eu escrevi um programa de computador que pode detectar moedas em uma imagem estática (.jpeg, .png, etc.) usando algumas técnicas padrão de visão computacional (Gaussian Blur, limiar, Hough-Transform etc.). Usando as proporções das moedas retiradas de uma determinada imagem, posso estabelecer com certeza quais são as moedas. No entanto, desejo aumentar meus níveis de confiança e também determinar se uma moeda que deduzi ser do tipo A (a partir da razão do raio) também é da cor correta. O problema é que, para as moedas britânicas et al. (cobre, prata, ouro), as cores respectivas (especialmente de cobre para ouro) são muito semelhantes.

Eu tenho uma rotina que extrai a cor média de uma determinada moeda em termos do 'espaço de cores' RedGreenBlue (RGB) e rotinas para converter essa cor no 'espaço de cores' HueSaturationBrightness (HSB ou HSV).

Não é muito agradável trabalhar com RGB na tentativa de diferenciar as três cores das moedas (veja a imagem [básica] em anexo). Eu tenho os seguintes intervalos e valores típicos para as cores dos diferentes tipos de moedas:

Nota: o valor típico aqui é aquele selecionado usando a média "pixel" de uma imagem real.

**Copper RGB/HSB:** typicalRGB = (153, 117, 89)/(26, 0.42, 0.60).

**Silver RGB/HSB:** typicalRGB = (174, 176, 180)/(220, 0.03, 0.71).

**Gold RGB/HSB:** typicalRGB = (220, 205, 160)/(45, 0.27, 0.86)

Primeiro tentei usar a 'distância euclidiana' entre uma determinada cor média da moeda (usando RGB) e os valores típicos para cada tipo de moeda acima, tratando os valores RGB como um vetor; para o cobre teríamos:

D_{c o p p e r} = \sqrt{(} (R_{t y p e} - R_{c o p p e r})^{2} + (G_{t y p e} - G_{c o p p e r})^{2} + (B_{t y p e} - B_{c o p p e r})^{2})

$D_{copper} = \sqrt((R_{type} - R_{copper})^{2} + (G_{type} - G_{copper})^{2} + (B_{type} - B_{copper})^{2})$

where the smallest value of the difference ( $D$ ) would tell us which type the given coin is most likely to be. This method has shown itself to be very inaccurate.

I have also tried just comparing the hue of the coins with the typical values of the types provided above. Although theoretically this provides a much better 'color-space' to deal with varying brightness and saturation levels of the images, it too was not accurate enough.

Question: What is the best method to determine a coins type based on color (from a static image)?

Thanks very much for your time.

Typical Coin Colors

Edit 1

Note: I have tried all of the ideas discussed below and have achieved next to nothing. Variance in lighting conditions (even within the same image) make this problem very tough and should be taken into consideration.

Edit 2 (Summery of Outcome)

Thank you for your answers. Further research of my own (including your answers and comments) has highlighted just how tough this problem is to deal with in the generic case of arbitrary lighting, arbitrary camera (mobile device), fluctuation in coin colour (even for same species/type) etc. I first looked at skin colour recognition (a very active field of research) as a starting point and there are still numerous problems even with the recognition of skin colour for Caucasians alone (see this paper for a review of the current techniques), and the fact that this problem contains three distinct colour objects all of which can have continuous and varying chromacities make this topic of computer vision a very hard one to classify and deal with accordingly (in fact you could do a good Ph.D. on it!).

I looked into the Gamut Constraint Method from the very helpful post by D.W. below. This was at first sight very promising as a pre-processing step to transform the image and the separate coin objects to colours that are independent of lighting conditions. However, even this technique does not work perfectly (and involves a library of images/histograms for mappings – which I don’t want to get into) and neither does the much more complex Neural Network Architecture methodologies. In fact this paper states in the abstract that:

"current machine colour constancy algorithms are not good enough for colour-based 
 object recognition.".

That is not to say that there aren’t much more up-to-date papers on this subject out there, but I can't find them and it does not seem to be a very active research area at this time.

The answer by AVB was also helpful and I have looked into LAB* briefly.

"The nonlinear relations for L*, a*, and b* are intended to mimic the nonlinear
response of the eye. Furthermore, uniform changes of components in the L*a*b* colour
space aim to correspond to uniform changes in perceived colour, so the relative 
perceptual differences between any two colours in L*a*b* can be approximated by 
treating each colour as a point in a three dimensional space."

From what I have read, the transformation to this colour space for my device dependent images will be tricky - but I will look into this in detail (with a view to some sort of implementation) when I have a bit more time.

I am not holding my breath for a concrete solution to this problem and after the attempt with LAB* I shall be neglecting coin colour and looking to sure-up my current geometric detection algorithms (accurate Elliptic Hough Transform etc.).

Thanks you all. And as a end note to this question, here is the same image with a new geometric detection algorithm, which has no colour recognition:

enter image description here

image-processing

— MoonKnight
fonte

Will the images always have the same colour background? Or can you introduce some other 'colour standard' object into the image? If so, you have a way of adjusting for varying lighting. If not, could be hard..

— onestop

It's not just obviously coloured light. I am pretty sure that sunlight, fluorescent light, and incandescent light have colours that are different enough to mess up HSB colour-matching, even though our eyes somehow adjust so that we don't perceive things changing colour.

— Peter Shor

(+1) The question is interesting and challenging. I feel that it needs some refinement in order to have a good chance at a good answer. As currently stated it borders on being ill-posed. For a practical solution, you would need to provide some more details on the range of environments in which you wish to be able to perform this classification. What color backgrounds are possible? Always the same number of coins? Will you always have a color image? Good ambient lighting? Knowing these sorts of characteristics can provide guidance toward a solution.

— cardinal

I think the problem you are facing is that of "color constancy", Other search terms would be "discounting the illuminant" or "discounting the background". It is an unsolved problem in vision science.

— caracal

Wish I could +1 again for the nice follow-up! Very interesting stuff.

— Matt Parker

Respostas:

Duas coisas, para iniciantes.

Um, definitivamente não funciona em RGB. Seu padrão deve ser o espaço de cores Lab (também conhecido como CIE L * a * b *). Descartar L. Desde a sua imagem parece que a acoordenar lhe dá o máximo de informações, mas você provavelmente deve fazer uma análise de componentes principais em ae be trabalho ao longo do primeiro (o mais importante) do componente, apenas para manter as coisas simples. Se isso não funcionar, você pode tentar mudar para um modelo 2D.

Só para ter uma idéia, aas três moedas amareladas têm DSTs abaixo de 6 e as médias de 137 ("ouro"), 154 e 162 - devem ser distinguíveis.

Segundo, a questão da iluminação. Aqui você terá que definir cuidadosamente seu problema. Se você deseja distinguir cores próximas sob qualquer iluminação e em qualquer contexto - você não pode, não é assim, de qualquer maneira. Se você estiver preocupado apenas com as variações locais de brilho, o Lab cuidará disso principalmente. Se você deseja trabalhar sob luz do dia e luz incandescente, pode garantir um fundo branco uniforme, como na imagem de exemplo? Geralmente, quais são suas condições de iluminação?

Além disso, sua imagem foi tirada com uma câmera bastante barata, pelo que parece. Provavelmente, possui algum tipo de recurso de balanço de branco automático, que atrapalha bastante as cores - desligue-o se puder. Também parece que a imagem foi codificada em YCbCr em algum momento (acontece muito se for uma câmera de vídeo) ou em uma variante semelhante do JPG; as informações de cores são severamente subamostradas. No seu caso, pode ser realmente bom - isso significa que a câmera fez alguns ajustes para você nos canais coloridos. Por outro lado, provavelmente significa que em algum momento as informações de cores também foram quantizadas mais fortes que o brilho - isso não é tão bom. A principal coisa aqui é - a câmera é importante, e o que você faz deve depender da câmera que você vai usar.

Se alguma coisa aqui não fizer sentido - deixe um comentário.

— AVB
fonte

Thanks for your answer. I cannot ensure any of the above. This is for a mobile application that counts coins (an arbitrary amount of coins) at a click of a button (and is very fast!). So, lighting can vary wildly and there is no consistent background either. I believe that classifying the coin types via colour in this manner (as you point out) is not possible. However, I like you answer of using LAB* and believe it to be the best answer offered. In light of this you have the answer and the bounty. Thanks again.

— MoonKnight

No espírito do brainstorming, vou compartilhar algumas idéias que você pode tentar:

Experimente o Hue mais? Parece que o Hue deu a você um bom discriminador entre prata e cobre / ouro, embora não entre cobre e ouro, pelo menos no exemplo único que você mostrou aqui. Você examinou o uso do Hue em mais detalhes para ver se seria um recurso viável distinguir prata de cobre / ouro?

Posso começar reunindo um monte de imagens de exemplo, que você rotulou manualmente, e calculando o tom de cada moeda em cada imagem. Então você pode tentar histografá-las, para ver se o Hue parece uma maneira plausível de discriminar. Eu também poderia tentar olhar para o tom médio de cada moeda, para alguns exemplos como o que você apresentou aqui. Você também pode experimentar a saturação, pois parece que também pode ser útil.

Se isso falhar, edite sua pergunta para mostrar o que você tentou e dê alguns exemplos para ilustrar concisamente por que isso é difícil ou onde falha.
Outros espaços de cores? Da mesma forma, você pode tentar transformar a cromatividade rg e experimentar para ver se o resultado é útil para distinguir prata de cobre / ouro. É possível que isso ajude a ajustar a variação da iluminação, portanto vale a pena tentar.
Verifique as diferenças relativas entre moedas, em vez de olhar para cada moeda isoladamente? Entendo que, a partir das proporções dos tamanhos das moedas (raios), você tem uma hipótese inicial para o tipo de cada moeda. Se você tem $n$ moedas, este é um $n$ -vetor. Sugiro que você teste toda essa hipótese composta de uma só vez, em vez de $n$ vezes testando sua hipótese para cada moeda por si só.

Por que isso pode ajudar? Bem, isso pode permitir que você aproveite os matizes relativos das moedas, o que deve ser mais próximo da invariante em relação à iluminação (assumindo uma iluminação relativamente uniforme) do que o matiz individual de cada moeda. Por exemplo, para cada par de moedas, você pode calcular a diferença de matiz e verificar se isso corresponde ao que você esperaria dar sua hipótese sobre as duas identidades. Ou você pode gerar um $n$ -vetor $p$ com os tons previstos para o $n$ moedas; calcular um $n$ -vetor $o$ com os tons observados para o $n$ moedas; agrupar cada um; e verifique se há uma correspondência individual entre os matizes. Ou, dados os vetores $p,o$ , você pode testar se existe uma transformação simples $T$ de tal modo que $o \approx T(p)$ , i.e., $o_i \approx T(p_i)$ holds for each i. You may have to experiment with different possibilities for the class of $T$ 's that you allow. One example class is the set of functions $T(x)=x+c \pmod{360}$ , where the constant $c$ ranges over all possibilities.
Compare to reference images? Rather than using the color of the coin, you might consider trying to match what is printed on the coin. For instance, let's say that you have detected a coin $C$ in the image, and you hypothesize it is a one pound coin. You could take a reference image $R$ of a one pound coin and test whether $R$ seems to match $C$ .

You will need to account for differences in pose. Let me start by assuming that you have a head-on image of the coin, as in your example picture. Then the main thing you need to account for is rotation: you don't know a priori how much $C$ is rotated. A simple approach might be to sweep over a range of possible rotation angles $\theta$ , rotate $R$ by $\theta$ , and check whether $R_\theta$ seems to match $C$ . To test for a match, you could use a simple pixel-based diff metric: i.e., for each coordinate $(x,y)$ , compute $D(x,y) = R_\theta(x,y) - C(x,y)$ (the difference between the pixel value in $R_\theta$ and the pixel value in $C$ ); then use a $L_2$ norm (sum of squares) or somesuch to combine all of the difference values into a single metric of how close a match you have (i.e., $\sum_{(x,y)} D(x,y)^2$ ). You will need to use a small enough step increment that the pixel diff is likely to work. For instance, in your example image, the one-pound coin has a radius of about 127 pixels; if you sweep over values of $\theta$ , increasing by $0.25$ degrees at each step, then you will only need to try about 1460 different rotation values, and the error at the circumference of the coin at the closest approximation to the true $\theta$ should be at most about one-quarter of a pixel, which is small enough that the pixel diff might work out OK.

You may want to experiment with multiple variations on this idea. For instance, you could work with a grayscale version of the image; the full RGB, and use a $L_2$ norm over all three R,G,B differences; the full HSB, and use a $L_2$ norm over all three H,S,B differences; or work with just the Hue, Saturation, or Brightness plane. Also, another possibility would be to first run an edge detector on both $R$ and $C$ , then match up the resulting image of edges.

For robustness, you might have multiple different reference images for each coin (in fact, each side of each coin), and try all of the reference images to find the best match.

If images of the coins aren't taken from directly head-on, then as a first step you may want to compute the ellipse that represents the perimeter of the coin $C$ in the image and infer the angle at which the coin is being viewed. This will let you compute what $R$ would look like at that angle, before performing the matching.
Check how color varies as a function of distance from the center? Here is a possible intermediate step in between "the coin's mean color" (a single number, i.e., 0-dimensional) and "the entire image of the coin" (a 2-dimensional image). For each coin, you could compute a 1-dimensional vector or function $f$ , where $f(r)$ represents the mean color of the pixels at distance approximately $r$ from the center of the coin. You could then try to match the vector $f_C$ for a coin $C$ in your image against the vector $f_R$ for a reference image $R$ of that coin.

This might let you correct for illumination differences. For instance, you might be able to work in grayscale, or in just a single bitplane (e.g., Hue, or Saturation, or Brightness). Or, you might be able to first normalize the function $f$ by subtracting the mean: $g(r) = f(r)-\mu$ , where $\mu$ is the mean color of the coin -- then try to match $g_C$ to $g_R$ .

The nice thing about this approach is that you don't need to infer how much the coin was rotated: the function $f$ is rotation-invariant.

If you want to experiment with this idea, I would compute the function $f_C$ for a variety of different example images and graph them. Then you should be able to visually inspect them to see if the function seems to have a relatively consistent shape, regardless of illumination. You might need to try this for multiple different possibilities (grayscale, each of the HSB bitplanes, etc.).

If the coin $C$ might not have been photographed from directly head-on, but possibly from an angle, you'll first need to trace the ellipse of $C$ 's perimeter to deduce the angle from which it was photographed and then correct for that in the calculation of $f$ .
Look at vision algorithms for color constancy. The computer vision community has studied color constancy, the problem of correcting for an unknown illumination source; see, e.g., this overview. You might explore some of the algorithms derived for this problem; they attempt to infer the illumination source and then correct for it, to derive the image you would have obtained had the picture been taken with the reference illumination source.
Look into Color Constant Color Indexing. The basic idea of CCCI, as I understand it, is to first cancel out the unknown illumination source by replacing each pixel's R value with the ratio between its R-value and one of its neighbor's R-values; and similarly for the G and B planes. The idea is that (hopefully) these ratios should now be mostly independent of the illumination source. Then, once you have these ratios, you compute a histogram of the ratios present in the image, and use this as a signature of the image. Now, if you want to compare the image of the coin $C$ to a reference image $R$ , you can compare their signatures to see if they seem to match. In your case, you may also need to adjust for angle if the picture of the coin $C$ was not taken head-on -- but this seems like it might help reduce the dependence upon illumination source.

I don't know if any of these has a chance of working, but they are some ideas you could try.

— D.W.
fonte

Problema interessante e bom trabalho.

Tente usar os valores medianos das cores em vez da média. Isso será mais robusto em relação aos valores extremos devido ao brilho e à saturação. Tente usar apenas um dos componentes RGB em vez dos três. Escolha o componente que melhor distingue as cores. Você pode tentar plotar histogramas dos valores de pixel (por exemplo, um dos componentes RGB) para ter uma idéia das propriedades da distribuição de pixels. Isso pode sugerir uma solução que não é imediatamente óbvia. Tente plotar os componentes RGB no espaço 3D para ver se eles seguem algum padrão, por exemplo, eles podem estar próximos de uma linha indicando que uma combinação linear dos componentes RGB pode ser um classificador melhor do que um individual.

— martino
fonte

Good shout with the median, in fact I have also coded this and this also poor in terms of establishing the correct colour. With the histogram approach, I am conscious of computational expense; as soon as I start looping through pixels in 2D I will incur charges! Never-the-less, it might be worth me putting something like this in (as you point out) to establish any correlations. I produced all sorts of plots for the RGB components and due to the varying lighting conditions (a consequence of taking pictures in differing locations) the RGB values can overlap heavily for all three coin types.

— MoonKnight

Também procurei ajustar um modelo para estimar uma probabilidade posterior de um ponto de espaço de cor pertencer a um determinado tipo de moeda. Também observei a modelagem de misturas gaussianas, mas ainda não cheguei muito longe com isso. Também fui informado sobre outra abordagem (um tanto arbitrária, mas mais simples), que é usar algo como a interpolação do vizinho mais próximo. Obrigado pelo seu tempo.

— MoonKnight

Em uma faixa completamente diferente, outra diferença entre as moedas é o design na frente / trás (embora algumas possam ter o mesmo design em um lado). Que tal correlacionar o conjunto de padrões de design com os pixels da moeda (ou usar informações mútuas) para ajudar a determinar para qual moeda você está olhando. Com uma combinação de proporções, cor de pixel e esse design compatível, você provavelmente poderá reduzir a taxa de descoberta falsa.

— martino

Eu pensei nisso - mas isso está pedindo muito do software de reconhecimento atual e seria um trabalho enorme para escrever do zero (OCR ??). Também existe uma grande variação nos gráficos dessas moedas, que tornam essa implementação um pesadelo. Vou fazer uma peça mais tarde - relatarei o que encontrar. Obrigado novamente.

— MoonKnight 22/02/12

Why the downvote? If there's an issue with the answer it'd be helpful to point it out - I can't see one

— martino