Attention and scene perception

The retinal array contains far more info than we can process
Attention: a family of mechanisms that restrict processing in various ways o External versus internal attention
- External: attention to stimuli in the world
- Internal: our ability to attend to one line of thought as opposed to another or to select one response over another
Overt versus covert attention
- Overt: refers to directing a sense organ at a stimulus (e.g. fixating the eyes on a single word)
- Covert attention: point your eyes somewhere while directing your attention elsewhere (e.g. point your eyes at the page, but your attention to a person)
Divided attention: doing two tasks at once (e.g. reading and listening to music) o Sustained attention: e.g. watching the pot to note the moment the water begins to boil
Selective attention: the ability to pick one out of many stimuli

Selection in space

Posner cueing experiment: RT decreases when given a valid cue, because you are already paying attention to the right location, RT increases compared to control when given an invalid cue
The effect of a cue can be measured by using stimulus onset asynchrony (SOA): varying the interval between the time the cue appears and the time the probe appears o As the SOA increases to about 150ms, the magnitude of the cueing effect from a valid peripheral cue increases (e.g. red colouring of the box where the probe will be)
- Symbolic cues take longer to work, presumably because we need to do some work to interpret this type of cue (e.g. red dot, probe right; green dot, probe left). However some symbolic cues, such as arrows, behave like fast peripheral cues

The spotlight of attention

Attention moves in a manner which is analogous to the movement of the eyes, when we shift our gaze our fixation points shifts → spotlight model
Attention might expand from fixation, growing to fill the whole region, from the fixation spot to the cued location and then shrink to include just the cued location → zoom lens model

Visual search

Visual search experiments: search for a target in a display containing distracting elements o As set size increases it gets harder to find the target among the distractors o Efficiency: the ease with which we can work our way through a display

Feature searches are efficient

Feature search: search for a target defined by a single attribute, such as a salient colour or orientation
If the target is salient, that is if it stands out visually from its neighbours, it really doesn’t matter how many distractors there are the target just seems to pop out of the display o Apparently we can process the colour or orientation of all the items in the display at once → parallel search
- RT does not change with set size

Many searches are inefficient

Searches become more inefficient as the number of features the target is composed of increases (e.g. it is easy to see the difference between a L and a T, but you have to look at every item individually to be able to determine whether it is a L or a T), the time it takes to determine if the target is absent increases even more
Serial self-terminating search: items are examined individually, item for item, until the target is found

In real-world searches, basic features guide visual search

Guided search: search in which attention can be restricted to a subset of possible items on the basis of info about the target item’s basic features
Conjunction search: search for a target defined by the presence of two or more attributes

In real-world searches, the real world guides visual search

Scene-based guidance: info in our understanding of scenes that helps us find specific objects in scenes o Constitutes a Bayesian prior that tells you how likely it is that any given object in the scene is the target

The binding problem in visual search

Binding problem: the challenge of tying different attributes of visual stimuli, which are handled by different brain circuits, to the appropriate object so that we perceive a unified object
Pre-attentive stage of processing: occurs before selective attention is deployed
Feature-integration theory: holds that a limited set of basic features can be processed in parallel pre-attentively, but other properties, including the correct binding of feature objects, require attention

Illusory conjunction: a false combination of the features of two or more different objects o This effect appears when we cannot complete the binding process, as a result we do the best we can with the info available
Proto-objects: a loose collection of unbound features that will be a recognizable object, once attended

Attending in time: RSVP and the attentional blink

Rapid serial visual presentation (RSVP):stimuli appear in a stream at one location at a rapid rate, with fairly large, clearly visible stimuli we are able to pick out a specific one at an appearing rate of 8-10 items per second
Attentional blink: the tendency not to perceive or respond to the second of two different target stimuli amid a rapid stream of distracting stimuli if the observer has responded to the first target stimulus within 200-500ms before the second stimulus is presented o Once attention has been focused on the first to-be-found object, there is a temporary inhibition or a temporary loss of control that makes it impossible to coordinate attention to the second to-be-found item for several hundred ms → fishing with a net in a less-than-pristine stream metaphor
Repetition blindness: a failure to detect the second occurrence of an identical letter, word or picture in a RSVP stream of stimuli when the second occurrence falls within 200-500ms of the first

The physiological basis of attention

Attention could enhance neural activity

Even in the early stage of attention, the first stages of cortical processing are activated. As we progress further into the visual areas of the cortex, even larger attentional fields are seen. In fact the effects seen in the early stages in the cortex are quite possibly the result of feedback from these later stages of processing, this may be a very important part of visual processing

Attention could enhance the processing of a specific type of stimulus

Attention is based on stimulus property if we want to find a specific kind of stimulus among others (e.g. wanting to find pennies in a bowl of change, enhances the salience of pennies)
If the image of a face is superimposed on the image of a house, the FFA becomes more activated when attending to the face and the PPA becomes more active when attending to the house

Attention and single cells

How attention changes the response of a neuron:
- Response enhancement: attention makes the cell more responsive across the board o Sharper tuning: attention could make it easier for the neuron to find a weak vertical signal amid the noise of other orientations
- Altered tuning: attention changes the preferences of a neuron, the receptive field of the neuron shifts with the point of attention

If cells are restricting their processing to the object of attention, then sensitivity to neighbouring items might be reduced, as resources are withdrawn from them

Normalization theory: the current response of a neuron is the product of that neuron’s builtin receptive field and the effects of attention, this product must then be normalized by neural suppression

Disorders of visual attention

Visual-field defect: a portion of the visual field with no vision or with abnormal vision
Lesion in the parietal lobe: patients have problems directing attention to objects and places on the contralateral side of the damaged area. These problems manifest themselves in a curious set of clinical symptoms including neglect and extinction Neglect
Neglect:
- The inability to attend to or respond to stimuli in the contra-lesional visual field o Ignoring half of the body or half of an object -Barbell rotation experiment: o Initially the person will prefer objects that fall in the red barbell which is on the right side of his visual field. However when the barbell turns, while the person watches, his focus will remain on the red barbell which is now on the left side of the visual field → object-based account of attention
- Once the object is focused, it stays in attention regardless of its movement to the left half of the visual field

Extinction

Extinction: when the patient is asked to look straight ahead, and an object is presented in either the left or the right half of the visual field, the patient will be able to identify the object. However if two objects are presented simultaneously in both halves of the visual field, only the object in the good half of the field will be reported. The object presented in the contra-lesional half will be perceptually extinguished

Balint syndrome

Balint syndrome: bilateral lesions of the parietal lobes; three major symptoms
1. Spatial localization abilities are greatly reduced, reaching toward an object is very difficult
2. Patients don’t move their eyes very much, they tend to gaze fixedly ahead
3. Simultagnosia: the inability to perceive more than one thing at a time, attending to one object eliminates everything else
Patients have much more severe binding errors, since they can only perceive one thing at a time and the features of this thing may have come from several different locations in the world (due to the combination of simultagnosia and the spatial localization deficit)

Perceiving and understanding scenes

Two pathways to scene perception

Selective pathway: early vision provides the proto-objects that can be selected and recognized in the selective pathway
Non-selective pathway

The non-selective pathway computes ensemble statistics

Ensemble statistics: represent knowledge about the properties of a group of objects or , perhaps we should say, an ensemble of proto-objects; features are not bound (e.g. you know that there are smaller fish and there are bluish fish, but you won’t know whether there are small blue fish) o Ensemble statistics because we know them without knowing the properties of the individual objects

The non-selective pathway computes scene gist and layout- very quickly

All images can be broken down into sine waves, it is these sine wave that are processed so quickly. These sine waves can be put in dimensions, such as openness or roughness. Olivia and Torralba observed that scenes with the same meaning tend to be neighbours. Figure 7.30 page 209. We achieve basic understanding of the meaning of a scene very quickly by analysing the spatial-frequency components of the image (global info). The selective pathway might contribute the identities and relative locations of a few objects (local info). The combination of global and local info might create a representation rich enough to specify a particular scene and to give you the impression of seeing a whole scene.

Memory for objects and scenes is amazingly good

People are spectacularly good at remembering pictures. If one has seen as much as 10.000 different images for about 2 seconds, they will have 98% of the items correctly classified as old or new. Even when there are images included of the same object in a different pose, the correct classification rate will still be at 88%!!

Memory for objects and scenes is amazingly bad: change blindness

Change blindness: the failure to notice a change between two scenes. If the gist or the meaning of the scene is not altered, quite large changes can pass unnoticed
If changes in a scene are made when the eyes are in motion, changes can be made without being detected

What do we actually see?

Inattentional blindness: a failure to notice a stimulus that would be easily reportable if it were attended