The 9th iteration of the University of Toronto Interdisciplinary Symposium on the Mind (UTism) is here, titled ‘Vision and Visual Perception: How Does Vision Affect Cognition?’

Date: February 1-2, 2020     |      Eventbrite Register

Read now: Official 2020 Programme

— Conference Abstract —

Vision is considered one of our most important biological senses, if not the most important sense. It involves a vast network of complex cognitive visual pathways and offers important implications for the internal workings of our body. In essence, the relationship between vision and cognition is one of major interest in contemporary research. Studying vision may reveal fascinating details about the mind and unearth further connections between the physical and mental.

UTism will address the main question, “how does vision affect cognition?” from an interdisciplinary perspective based on contemporary research in philosophy, psychology, computer vision, neuroscience, and sign language. UTism seeks to generate relevant and novel insights into the cognitive science of vision and visual perception.

About UTism  

Cognitive science is the interdisciplinary study of the mind, integrating knowledge in psychology, philosophy, computer science, neuroscience, and linguistics. UTism aims to explore contemporary issues in cognitive science via interdisciplinary dialogue from leading minds in cognitive science research. The goal is to converge different academic disciplines related to cognitive science to provide a holistic, multifaceted understanding of issues through an accessible platform.

The main learning outcome from this conference is the novel exposure to interdisciplinary research that will hopefully stimulate your intellectual and personal interests in the study of the mind.



Image result for nico orlandi"

Nico Orlandi (Associate Professor of Philosophy, UC-Santa Cruz)

Talk: How Can Vision Affect Cognition?

Visual perception, like perception more generally, is often taken to be iconic. Vision represents the environment in the way that maps, realist pictures, photographs and diagrams do, rather than in the way in which linguistic propositions do. Cognitive states, such as beliefs, by contrast, are commonly thought to be propositional and to have concepts as constituents. This general picture would seem to suggest that there is a substantial difference between seeing something and thinking about it. It also seems undeniable, however, that, at least in some instances, we acquire cognitive states in virtue of seeing. We think that something is present, for example, because we see it. Or we learn what something is by seeing it repeatedly. In this talk, I offer some reasons for thinking that perception is iconic rather than propositional, and then propose a view of how visual states may come to change format to become part of our conceptual and justificatory repertoire.

Image result for chia-chien wu"

Chia-Chien Wu (Post-Doctoral Fellow, Harvard University)

Talk: We Have Underestimated Your Capacity: Multiple Object Awareness

Humans live in a dynamic world that requires us to constantly update our representations of this world in order to function effectively. However, we only have limited capacity to monitor things around us. The standard visual cognition account for this capacity has been mainly based on studies using Multiple Object Tracking (MOT) paradigm. In classic Multiple Object Tracking (MOT) experiments, observers (Os) try to track N out of M identical objects as they move around in a display. When Os are asked to find all N tracked items or to indicate if a particular item was or was not in the tracked set, the calculated capacity is typically about 4 objects. In the real-world, items are not usually identical to each other.

We used a “Multiple Identity Tracking” (MIT) task to mimic this. In the MIT task, Os are asked to track N distinct moving objects. We found that capacity was even lower: ~2-3 items. This limitation seems counter-intuitive as we usually feel much more aware of our surroundings. We propose that the inconsistency arises because we often know the approximate, but not the exact location of the object and MOT and MIT tasks require the exact location. To capture this partial information, we have developed a new Multiple Object Awareness (MOA) paradigm. It demonstrates that our capacity has been dramatically underestimated. Your MOA capacity may be more representative of your ability to track information in this dynamic world.


Image result for john vervaeke"

John Vervaeke (Assistant Professor, Cognitive Science & Psychology, University of Toronto)

Talk: Seeing into Being on Mars

This talk will explore how scientists construct visual images from the rovers on Mars. Given the time lag the causal link is delayed and the images are often artificially augmented, and yet the scientist report feeling present on Mars and enactively identify with and through the rovers for very successful performance. This talk will explore the enactive and socially embedded processing that support the successful visual perception of the Martian landscape to the point where the scientist feel they are on Mars in and through the rovers.

Image result for jim john uoft"

Jim John (Assistant Professor, Cognitive Science & Philosophy, University of Toronto)

Talk: How to Be a Materialist about Visual Consciousness

Materialists about visual consciousness believe that the qualitative properties of visual states are “nothing but” physical properties of the brain. But they recognize that this view is puzzling and difficult to believe: just how could the reddish quality of seeing a tomato really be nothing but, say, recurrent processing in visual cortex? Anti-materialists seek to derive from this sense of puzzlement positive arguments against materialism. This talk will address the most fundamental of these arguments, the “argument from revelation.” I will argue that there is a plausible way for materialists to respond to this argument and explain why, despite the plausible response, I’m not a materialist and you shouldn’t be one either.

Image result for susanne ferber uoft"

Susanne Ferber (Associate Professor, Psychology & Collaborative Program in Neuroscience, University of Toronto)

Talk: Remembrance of Things Present

How do we keep track of oncoming traffic while checking our rear-view mirrors? Selecting and sustaining a percept after it has been removed from view involves attentional and visual working memory (VWM) processes that enable us to hold in our mind’s eye the contents of our visual awareness and to decide which input will continue forward for further processing. VWM is a temporary storage system for visual input in which information is kept safe for a few seconds across eye movements and other intervening events. From crossing the street to calculating the tip at a restaurant, our success on countless basic and higher-order tasks depends on our ability to mentally represent external events, so that they can be integrated with the attentional system and long-term knowledge to determine our decisions and actions.

Most theories on VWM posit a capacity limit of less than four items. This is in stark contrast to our subjective experience of the present moment and events recalled from memory. Normally the VWM system works well, but it may be disrupted in healthy individuals when it is overloaded, may show different characteristics in populations with neural developmental disorders, and may fail altogether after brain damage. Despite our growing understanding of the neural and cognitive processing limitations of VWM, we know little about how VWM ultimately affects visually-guided behaviour. Indeed, the interactions between this memory system and other cognitive faculties remain relatively unchartered research territory. I will present converging evidence from multidisciplinary studies to elucidate the cognitive and neural mechanisms that govern the interactions of VWM with other cognitive faculties.

Image result for john tsotsos uoft"

John Tsotsos (Professor, Electrical Engineering & Computer Science, York University)

Talk: Towards Understanding the Roles of Saliency in Perception

The current dominant visual processing paradigm in both human and machine research is the feedforward, layered hierarchy of neural-like processing elements. Within this paradigm, visual saliency is seen by many to have a specific role, namely that of early selection. Early selection is thought to enable very fast visual performance by limiting processing to only the most salient candidate portions of an image. This strategy has led to a plethora of saliency algorithms that have indeed improved processing time efficiency in machine algorithms, which in turn have strengthened the suggestion that human vision might also employ a similar early selection strategy. In fact, this is exactly what Broadbent proposed in his classic 1958 work known widely as Early Selection Theory. However, at least one set of critical tests of this idea has never been performed with respect to the role of early selection in human vision.

How would the best of the current saliency models perform on the stimuli used by experimentalists who first provided evidence for this visual processing paradigm? Would the algorithms really provide correct candidate sub-images to enable fast categorization on those same images? Do humans really need this early selection for their impressive performance? Here, we report on a new series of tests of these questions whose results suggest that it is quite unlikely that such an early selection process has any role in human rapid visual categorization. The question that naturally arises then is what could saliency be good for? Eye movements seem to significantly involve visual saliency. We describe a novel fixation control strategy that performs at human levels. Interestingly, commonly used image-driven saliency alone will not lead to that performance and it seems that the interaction of several different visual representations is the key.


Image result for james elder uoft"

James Elder (Professor, Human and Computer Vision, York University)

Talk: 2D and 3D Shape from Contour

A considerable portion of our primate visual cortex is involved in the coding of object shape. While objects in our visual world are generally 3D, the boundary of a 3D object projects to the retina as a closed 2D contour, and the shape of this contour provides important information about object shape and identity.  In this talk, I will review a series of psychophysical and computational studies that provide insights into how brains and computers can detect and organize image contours to extract 2D and 3D shape representations of the objects that populate both natural and built environments.






Corrine Occhino (Research Faculty, Center on Cognition and Language, Rochester Institute of Technology)

Talk: Embodied Cognition, Visual Language and the Emergence of Form

Vision and visual schematization have an important role to play in conceptualization and in the creation of grammatical structure. Following the embodied turn in cognitive science, we see spoken language and written language as inextricably anchored to the visual modality. In this talk I will argue that signers and signed languages offer a unique look into cognition when the primary sensory mode of experience unfolds in the visual modality. Looking at language processing in American Sign Language (ASL), we find that tasks which do not require overt semantic processing nevertheless are influenced by meaningful, ‘visually iconic’ aspects of signs. My work on iconicity in the visual modality has shown that while iconicity can impact processing, it interacts with several experiential factors including language proficiency and socio-cultural experience.

How can these findings from studies on the role of iconicity in signed language processing be reconciled within a larger, psychologically plausible framework of language and cognition? I will conclude my talk by discussing the mechanisms underlying findings from embodied and situated language processing and the findings presented on iconicity effects in signed language processing. I argue that the key to bringing research on signed languages and spoken languages together is to recognize that language users visualize scenes to make sense of the language that they process. Understanding the interaction between language users’ experiences and the multimodality of the linguistic signal will be important factors to consider as we investigate the role of vision in the emergence of linguistic structure.