How can AI ID a cat?

(quantamagazine.org)

170 points | by sonabinu 6 days ago ago

72 comments

bdcravens 3 days ago
I have six animals, and Apple Photos does a great job of recognizing them by name after I labeled them the first time (the office dog as well). Two of them however are gray tabbies (brothers) and it can't distinguish them, so I had to name them with an ampersand ("Harley & Ralph Lauren")
Impressed that it can do as well as it does, I just find that amusing.
[-]
- javchz 3 days ago
  The same with Google photos, it groups similar cats as just one. Fun fact does the same for human twins
  [-]
  - telesilla 2 days ago
    Google photos is the same with our family cat, she has her own auto-updated category when we add photos including her. I love this feature.
- mshockwave 3 days ago
  Came to say Apple also did a great job on tagging my bois who are both grey-ish cats, even in pictures they faced backward, no idea how they did that
  [-]
  - dhosek 3 days ago
    What I found impressive was that Apple Photos, given pictures of my cousins when they were 50 or more years old, was able to identify pictures of them as kids. On the other hand, it could never consistently distinguish between my two older brothers (although to be fair, they were identical twins). It also insists that a beagle I once owned was a cat. I mean, sure, he sometimes slept on his back with his paws in the air like a cat, but he was all dog.
    [-]
    - dmd 2 days ago
      On the other hand, it has no understanding of time. I have thousands of photos of me from the 1970s up through today, and Apple Photos is remarkably good at identifying me in all of them. And yet when my daughter was born it started identifying her, as a baby, as me. You'd think you could build a model to grasp the idea that a photo of a baby taken in 2015 is probably not of me.
      [-]
      - jpc0 2 days ago
        I raise you the photo of a photo edge case.
        Metadata is 2015, photo is 1960
        [-]
        dmd 2 days ago
        But you shouldn't optimize for the edge case! (All of my 20k+ photos, dating back to 1905, have correct metadata + GPS).
        [-]
        jpc0 a day ago
        I would argue that AI is exactly how you should handlr these edge cases, but likely a fine tuned model.
        There would be hints that a photo is from the 60s vs 15s, a human would be able to tell in many cases even without other context.
        That is exactly the use case AI is meant to excel at, something that is arguably hard to do algorithmically but is possible for an ML model
        dhosek 2 days ago
        I hate to say it, but you are the edge case. Most users are not fixing dates of photos (especially pre-digital scans) or adding GPS data to photos which didn’t originally have it.
        [-]
        rkomorn a day ago
        I'd suspect there are more people removing metadata from photos that have it than are adding metadata to photos that don't have it.
megaloblasto 2 days ago
This is a nice article but it fails to mention something important. Beyond the computer magic that makes neural networks so powerful, there is a massive human effort, often from people in Sub-Saharan Africa, that spend all day labeling images, text, audio, etc for the major AI companies [1]. These workers are often exploited and treated as expendable.
It's not all just math. Real people are what make this work.
[1] https://www.theverge.com/features/23764584/ai-artificial-int...
[-]
- runjake 2 days ago
  > These workers are often exploited and treated as expendable.
  So, common ground with a lot of Hacker News audience?
  Don't take me too seriously here, and not to excuse anything but what would these people be doing if they weren't data labeling? How would they be treated differently?
  Presumably, they'd be working for some other multinational, because overall their quality of living is better than working at whatever other local industry exists?
  The data labeling job itself strikes me as something dystopian. As if we're the work mules for our AI overlords.
  [-]
  - megaloblasto 2 days ago
    It definitely sounds like you're trying to excuse the labor exploitation of multinational corporations. I was just pointing out that the network doesn't figure out how to classify things without a massive human undertaking. Are you suggesting that if something is common across the globe then we shouldn't complain about it?
    [-]
    - runjake 2 days ago
      > It definitely sounds like you're trying to excuse the labor exploitation of multinational corporations.
      That was not my intent. I probably worded my thoughts poorly. Indeed, though I am far more advantaged than those data laborers I’m feeling a bit exploited myself lately.
      > Are you suggesting that if something is common across the globe then we shouldn't complain about it?
      No. I guess the crux of my comment is what would they be doing for income otherwise? And which would they choose, given the fact they’re being exploited.
    - jpc0 2 days ago
      Sometimes the choice is not between good and evil, but between evil and less evil. We should be happy less evil is an option at times while hoping for good.
      [-]
      - runjake 2 days ago
        Yes. This is where I was going. Thank you.
- Noumenon72 2 days ago
  If someone has a simple task to do and has scoured the entire globe to find people who can do this task without being pulled away from more important work, they should be praised. Paying Americans prevailing wage for this would be simpler but it would hurt both America and Sub-Saharan Africa.
  [-]
  - megaloblasto 2 days ago
    The implicit idea in your comment that there is no important work in low GDP countries reminds me of this quote:
    "I am, somehow, less interested in the weight and convolutions of Einstein's brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops"
    I think you are giving too much credit to billion dollar companies that really just want to milk as much labor from poor countries as they can.
    [-]
    - Noumenon72 2 hours ago
      I feel that "billion dollar companies wanting to milk as much labor from poor countries as they can" is one of the only forces in the world that will actually take the time to go out and look for African Einsteins instead of just giving the jobs to their friends and neighbors.
      It doesn't follow that because I said you can find people without important work in low-GDP countries, that there is no important work there. If you've ever been in one though, there are always people whose job seems to be "watching a single goat" and would be much better off training AIs to identify cats.
    - jpc0 2 days ago
      You are also assuming that those people would otherwise be doing something that would justify that.
      Just like a college student working at McDs to get by, the same could apply here. Cost of living is not equal.
      I'm not siding with either of you to be clear, just a different perspective. I feel both points are valid and without more information both are also irrefutable
astrobe_ 3 days ago
> These days, computers can easily recognize photos of cats, but that’s not because a clever programmer discovered a way to isolate the essence of “catness.”
It could have been. It did happen in some cases as computer vision didn't wait for neural networks (e.g. OCR). But to hijack a famous quote, "Neural networks are like violence - if it doesn't solve your problems, you are not using enough of it."
> A neuron with two inputs has three parameters. Two of them, called weights, determine how much each input affects the output. The third parameter, called the bias, determines the neuron’s overall preference for putting out 0 or 1.
So a neuron does very basic polynomial interpolation and by hooking them together you get polynomial regression. I don't know if it amusing or amazing that people use polynomial regression to write programs now.
[-]
- bc569a80a344f9c 3 days ago
  > So a neuron does very basic polynomial interpolation and by hooking them together you get polynomial regression
  The article glosses over activation functions, which - if non-polynomial - give the entire neural networks non-linearity. A major inflection point was proving that neural networks architectures even with very few layers (as small as one) can approximate any continuous function.
  https://en.m.wikipedia.org/wiki/Universal_approximation_theo...
  [-]
  - Grimblewald 2 days ago
    Furthermore, many apparent discontinuities can be removed or smoothed by choosing a more appropriate domain, codomain, or topology. This means a neural network can not only approximate any smooth function, but can learn to approximate many discontinious ones as well, provided these arent fundamentally discontinious.
    https://en.m.wikipedia.org/wiki/Classification_of_discontinu...
bc569a80a344f9c 3 days ago
An interesting follow-up is using various xAI (explainable AI) techniques to then investigate what features in an image the classifier uses to make its decisions. Saliency maps work great for images. When I was playing around with it, the binary classifier I trained from scratch to distinguish cats from dogs ended up basically only looking at eyes. Enough images in the dataset featured cats with visible, open eyes, and the vertical slit is an excellent predictor. It was an interesting lesson that also emphasized how much the training data matters.
[-]
- cco 3 days ago
  ExAI feels like a better shortening, both for clarity and given that xAI is a company already.
  [-]
  - bc569a80a344f9c 3 days ago
    The term certainly predates the company.
    [-]
    - andrewflnr 3 days ago
      I've heard of explainability for years, but I don't think I've specifically seen the term "xAI" in relation to it.
    - pests 3 days ago
      First I’ve heard of it.
      [-]
      - bc569a80a344f9c 3 days ago
        The term comes from a paper, “An explainable artificial intelligence system for small-unit tactical behavior” by Lent al from 2004
        https://cdn.aaai.org/IAAI/2004/IAAI04-019.pdf
        It has 490 citations.
        DARPA has a whole program named after it: https://www.darpa.mil/research/programs/explainable-artifici...
        [-]
        ledauphin 3 days ago
        notably, the paper uses the capitalization XAI.
    - 3 days ago
      [deleted]
- vismit2000 3 days ago
  Relevant: https://distill.pub/2017/feature-visualization/
- krackers 3 days ago
  This article seemed really basic, no insight other than "it learns the high dimensional manifold on which cat images lie, thus separating cats from non-cats" (not that simple explanations are bad, but Quanta articles seem to be getting more watered down over time).
  The real question is whether we can get some insight as to how exactly it's able to do this. For convolution neural networks it turns out that you can isolate and study the behavior of individual circuits and try to understand what "traditional image processing" function they perform, and that gives some decent intuition: https://distill.pub/2020/circuits/ - CNNs become less mysterious when you break them down as being decomposed into "edge detectors, curve detectors, shape classifiers, etc."
  For LLMs it's a bit harder, but anthropic did some research in this vein.
cmpalmer52 3 days ago
Just an anecdote, but back in college, I had an algorithms professor who gave us a classifier problem like the square and triangle boundary problem. His English was poor and nobody understood the problem as he stated it. I got an okay score on it, but never understood it very well.
Anyway, it’s 40 years later and I just read this article and said, “Oh! Now I get it.” A little too late, for Dr. Hippe’s class.
Findecanor 3 days ago
Identification has two components: recognition and authentication.
I'm not an expert on neural networks, but from what all I've heard, current systems can only be trained to be really good at doing the former.
I once used to have a tabby cat. When it ran away, I put up posters with a picture and description. I got several calls about cats in the neighbourhood that had the same tabby colour scheme (recognition). And from a distance they indeed looked the same. But close up, they each had a different eye colour, colour of the nose, or length of its white "socks" on its paws. (authentication)
To do the second step, the system would need to be trained not just on raw pixel data but also on which features to look for to distinguish one cat from another. I think that current system could be brute-forced to do this, somewhat, by training also on negative examples ... but I feel like that is suboptimal.
BobbyTables2 3 days ago
Wasn’t it “Hitchhikers Guide to the Galaxy” that humorously described an AI controlled train system failing because it was looking at the clock instead of the trains?
Seems extremely prescient…
[-]
- 3 days ago
  [deleted]
spacecadet 3 days ago
Many years ago one of our cats got out, she was gone for 3 weeks, we tracked her down using 6 game cameras. Long story short, I have 200,000 images of "wild life"... Last year I used a VLM to catalog all of the images by generating detailed descriptions. I was able to find images of our cat in 3 searches, the same images we used to identify her originally, which took hours each day combing through thousands of images.
isopede 3 days ago
Neat. Anyone know what is used to make the animations? I like the graphic design!
[-]
- chacham15 3 days ago
  Lottie: https://lottiefiles.com/
- cwmoore 3 days ago
  Small but effective visual cues, smooth and carefully chromatic.
  I am struck by the conceptual framework of classification tasks so snappily rendering clear categories from such fuzziness.
busymom0 3 days ago
Probably one of the first articles on this topic which I have read to the finish line and understood everything fully. Thanks.
[-]
- globalnode 2 days ago
  Same here, I've never done any study of these things other than learning a bit about gradient descent out of interest. But the idea that these networks work as classifiers by figuring out boundary regions was more interesting than I previously believed.
reilly3000 3 days ago
Long have I wanted a cat door that would only open for my cats, not the mean neighborhood one that eats their food. I can’t be the only one. I’ve been meaning to try to build one with a camera, rPi and Google Coral, but never got around to it. There’s the matter of the locking mechanism and more.
[-]
- DannyBee 2 days ago
  I have built two of these for dogs. It's really not hard,w hether you go completely from scratch or use something premade.
  If you want something mostly premade,go get an autoslide. If you want to do it completely from scratch:
  1. RFID/bluetooth proximity is much easier to work with than camera + rpi + AI. For the usecase you are talking about, AI is not just overkill, but will make it actively harder to achieve your goal
  2. Locking is pretty easy depending on motor mechanism - either a cheap relay'd magnetic lock, or simply a motor that can't be backdriven easily.
  Motor wise, you can either use the rack and pinion style that autoslide does, or a simple linear motor if you don't want to deal with gear tracks.
  Overall, i went the autoslide route and had it all set up and working in an hour or two.
- darkwater 3 days ago
  That's t'he definition of (entertaining) overengineering: since every house cat should have an RFID chip already, there are doors that use that already. 4 AA batteries, "low-tech" enough, it just works
  [-]
  - sillysaurusx 3 days ago
    How do you interface with your cat’s chip? Mine is chipped but it never occurred to me to build a detector.
    [-]
    - darkwater 2 days ago
      In my case I bought a pre-built solution by PetSafe that works pretty well. You scan the catś chip once and it recognizes him/her each time they try to enter. It supports up to 10 cats IIRC
    - donavanm 2 days ago
      You can get off the shelf doors and feeders. I use “sureflap.” The rfid pellets vary slightly based on locale and age, but probably 125 or 132khz. You can get them off of ebay if you want an easier test case. Handheld cheap scanners are about $20.
    - DannyBee 2 days ago
      125khz reader. The real problem is distance most of the time. Cats are curious enough about the doors that they will go right next to them. Most dogs won't.
      [-]
      - Doxin 2 days ago
        Luckily dogs are usually pretty easy to train. My dog will tap the glass on the back door if he wants out. I'm sure I could get him to "present his chip" to a doggie door within a couple of days.
- Findecanor 3 days ago
  Long ago I read about an automatic cat door that operated simply on the colour of the cat. It worked because the cat was the only red cat in the neighbourhood.
- dehrmann 3 days ago
  Take a look at SureFlap and OnlyCat. They use RFID chips in the cats.
  [-]
  - throwaway290 3 days ago
    Until the neighborhood bully gets a hold of a flipper and shakes down well cared cats for copies of their rfid chips.... sorry
    [-]
    - YPPH 3 days ago
      A neighbourhood bully could do far worse things to a cat that involves little to no technology. One of many reasons why my cat stays inside.
      [-]
      - sillysaurusx 3 days ago
        Also for your own sanity, mostly. I always wonder if mine will come back each night. Thus far he hasn’t disappointed, but I’m braced for that day.
        That said, some people vehemently argue that it’s abus{e,ive} to let cats wander the neighborhood, so thank you for not trying to tell others what to do. It’s become so common that I’m braced for it every time this topic comes up.
        [-]
        hdgvhicv 3 days ago
        Seems more abusive to keep a cat indoors its entire life.
        [-]
        throwaway290 2 days ago
        keeping a street cat locked in the house is maybe about as cruel as to let a house cat roam on the street.
        throwaway290 3 days ago
        Guys come on I was joking, just had that funny image of neighborhood stray flipping chips from house cats
StrandedKitty 3 days ago
For some reason I thought this article would explain how to ID a specific cat, that is basically facial recognition for cats.
Is this even something that's possible with current tech? Like, surely cats have some facial features that can be used to uniquely identify them? It would be cool to have a global database of all cats that users would be able to match their photos against. Imagine taking a picture of a cat you see on the street, and it immediately tells you the owner's details and whether it's missing.
[-]
- tanelpoder 3 days ago
  I wrote the CatBench vector search playground toy app exactly for this reason! [1] ("cat-similarity search for recommendation engines and cat-fraud detection"). I built it both for learning & fun, but also it's useful for demoing vector search functionality, plugged in to regular RDBMS application schemas in business context. I used cats & dogs as it's something everyone understands, instead of diving deep into some narrow industry vertical specific use case.
  [1]: https://tanelpoder.com/posts/catbench-vector-search-query-th...
- dhosek 3 days ago
  I imagine when they run out of other sensors to add to our phones, they’ll add chip readers so you can just scan for the implanted microchip on a cat you encounter. (said semi-sarcastically since the tech requires close proximity between animal and reader which most cats you encounter on the street will not countenance)
  [-]
  - ch4s3 3 days ago
    > which most cats you encounter on the street will not countenance
    Maybe not with you ;)
- joshvm 3 days ago
  Yes, I've worked in this space for dogs (for re-identifying animals that have been vaccinated for rabies). It's a very difficult problem, but mostly because getting/scraping good training data is difficult. You really want lots of paired images of the same animal and that's hard compared to searching for "cat". Plus the usual challenges: animals don't like to stay still so getting good pictures is hard and users must have good guidance for lighting/pose to get the best results. Human facial recognition benefits from strong commercial interest and the most robust methods rely on extras like 3D scanning.
  Tricks include facial alignment + cropping and very strong constraints on orientation to make sure you have a good frontal image (apps will give users photo alignment markers). Otherwise it's a standard visual seatch. Run a face extraction model to get the crop, warp to standard key points, compute the crop embedding, store in a database and do a nearest neighbour lookup.
  There are a few startups doing this. Also look at PetFace which was a benchmark released a year or so ago. Not a huge amount of work in this area compared to humans, but it's of interest to people like cattle farmers as well.
  https://github.com/mapooon/PetFace
trjordan 3 days ago
One of the funny things about LLMs and modern AI is that "the ability to recognize a cat" isn't a trained behavior anymore, as described here. It's an emergent property of training it to predict a lot of things, and cats happens to be present enough in the data such that they're one of the things you can ask a larger model and have it work.
My favorite work on digging into the models to explain this is Golden Gate Claude [0]. Basically, the folks at Anthropic went digging into the many-level, many-parameter model and found the neurons associated with the Golden Gate Bridge. Dialing it up to 11 made Claude bring up the bridge in response to literally everything.
I'm super curious to see how much of this "intuitive" model of neural networks can be backed out effectively, and what that does to how we use it.
[0] https://www.anthropic.com/news/golden-gate-claude
npteljes 3 days ago
Fun fact: we keep rabbits, and the different random AIs that I have tried over the years classify them so often as cats, that a proper "rabbit" classification is rare to come by! The full versions of ChatGPT do it well now, even with trickier photos (when the rabbit keeps their ears flat for example).
Veliladon 3 days ago
I have a Finnish Lapphund dog and from the right angle AI thinks it's a cat.
wkat4242 3 days ago
Well that's pretty easy. AI is trained on internet content and it's not like there's a lack of cat pictures there lol
3 days ago
[deleted]
syenvdh 3 days ago
[dead]
aaron695 3 days ago
[dead]