AI, Science

Why do people look by touching?

Every so often, I see someone get irritated that “can I see that?” tends to mean “may I hold that while I look at it?” Given how common this is, and how natural it seems to me to want to hold something while I examine it, I wonder if there is an underlying reason behind it.

Seeing some of the pictures in a recent blog post by Google’s research team, I wonder if that reason may be related to how “quickly” we learn to recognise new objects — quickly in quotes, because we make “one observation” while a typical machine-learning system may need thousands of examples to learn from — what if we also need a lot of examples, but we don’t realise that we need them because we’re seeing them in a continuous sequence?

Human vision isn’t as straightforward as a video played back on a computer, but it’s not totally unreasonable to say we see things “more than once” when we hold them in our hands — and, crucially, if we hold them while we do so we get to see those things with additional information: the object’s distance and therefore size comes from proprioception (which tells us where our hand is), not just from binocular vision; we can rotate it and see it from multiple angles, or rotate ourselves and see how different angles of light changes its appearance; we can bring it closer to our eyes to see fine detail that we might have missed from greater distance; we can rub the surface to see if markings on the surface are permanent or temporary.

So, the hypothesis (conjecture?) is this: humans need to hold things to look at them properly, just to gather enough information to learn what it looks like in general rather than just from one point of view. Likewise, machine learning systems seem worse than they are for lack of capacity to create realistic alternative perspectives of the things they’ve been tasked with classifying.

Not sure how I’d test both parts of this idea. A combination of robot arm, camera, and machine learning system that manipulates an object it’s been asked to learn to recognise is the easy part; but when testing the reverse in humans, one would need to show them a collection of novel objects, half of which they can hold and the other half of which they can only observe in a way that actively prevents them from seeing multiple perspectives, and then test their relative abilities to recognise the objects in each category.

Standard
Science

Vision

Human vision is both amazingly good, and surprisingly weird.

Good, because try taking a photo of the Moon and comparing it with what you can see with your eyes.

Weird, because of all the different ways to fool it. The faces you see in clouds. The black-and-blue/gold-and-white dress (I see gold and white, which means I’m wrong). The way your eyes keep darting all over the place without you even noticing.

What happens if you force your eyes to stay put?

I have limited ability to pause my eyes’ saccade; I have no idea how it compares to other people, so I assume anyone can do what I have tried.

On a recent sunny day, in some nearby woodland, I focused on the smallest thing I could see, a small dot in the grass near where I sat. I shut one eye, then tried to keep my open eye as still as possible.

It was difficult, and I had to make several attempts, but soon all the things which were moving stood out against all the things which were stationary. That much, I expected. What I did not expect was for my perception of what was near the point I was focused on to change.

This slideshow requires JavaScript.

I didn’t take photos at the time, but this mock-up shows a similar environment, and an approximation of the effect. One small region near the point I was looking at tiled itself around much of my central vision. I don’t think it was any particular shape: what I have in this mock-up is a square, what I saw was {shape of nearby thing} tiled, even though that doesn’t really work in euclidian space. If I let my vision focus on a different point infinitesimally near the first, my central vision became tiled with a different thing with its own shape. The transition from normal vision to weird tiling felt like it took about a second.

How much of this is consistent with other people’s eyes (and minds), I don’t know. It might be that all human vision (and brains) do this, or it might be that the way I learned to see is different to the way you learned to see. Or perhaps we learned the same way, and me thinking about computer vision has literally changed the way I see.

Vision is strange. And I’m not at all sure where the boundary is between seeing and thinking; the whole concept is far less clear than it seemed when I was a kid.

Standard