Introduction

  • Formal definition: Using targeted behavior in an organism using artificial sensory simulation with little or no awareness of the interference on the part of the organism
    • Targeted behavior: A man-made experience
    • Organism: Any organism, not just human
    • Targeted behavior: One or more senses are “taken over” (at least partially) by the virtual world
    • No awareness: “Fooled” to feel like the real world; sense of presence
    • Music, movies, and paintings can be thought of as “virtual reality” through this definition
  • Defined by Immanuel Kant as the reality in someone’s mind
  • Jaron Lanier also defined a real world (the physical world) and a virtual world (the perceived world)
  • Different terms for VR: Augmented reality (AR), mixed reality (MR), XR, telepresence, teleoperation
  • Open Loop vs. Closed Loop: Open loop systems don’t allow for the user to interact, while closed loop systems do
  • Components
    • Tracking: Input from user, looks at hand, head, body, etc. movements
    • Software: Renders and controls the virtual world
      • Maintains consistency between real world and virtual world
      • Matched zone; users should be able to walk in the real world to walk in the virtual one
    • Display: Outputs the virtual world to the user
    • The computer links all these things together
  • A VR headset uses two different images for your two eyes in order to create the illusion of depth
    • Instead of obscuring all other vision, AR uses pass-through monitors as lenses in order to project virtual objects onto the real world
    • SAR wants to get rid of any wearables (i.e. headsets) and allow for seamless merging between the virtual and real worlds
  • Some challenges with VR headsets
    • Vergence: Headsets will cannot emulate aspects of depth; eyes will try to focus on something far away, but the screen will stay the same, causing discomfort to the user
    • Law of Weber and Stephen: Users will be able to physically feel a difference depending on their stimulus
      • $P=KS^n$, where $K = \frac{\text{Difference Threshold}}{\text{Standard Weight}}$ is the Weber fraction, $P$ is the perception, and $S$ is the stimulus strength
      • If $n>1$, then we have expansion; if $n<1$, we have compression
        • Electric shocks follow expansion (double the shock is more than double the pain), whereas brightnesss follows compression (double the light is less than double the brightness)
    • McGurk Effect: If the lip sync and audio are different, you hear something different
  • Early VR headsets included stereoscopes, HMD, Nintendo’s Virtual Boy

Rendering

  • Has multiple inputs
    • 3D world: objects, lights, materials, textures
    • Camera location, orientation, FoV
  • Output: 2D image of the world from the camera
  • Graphics Pipeline
    • Modeling: Coordinate system and objects
    • Viewing: Camera/eye, gets rid of objects not being seen
    • Illumination and shading
    • Rasterization: Creating a 2D image from the 3D world
    • Texture mapping
  • Triangle Soup Model
    • Vertices have a number of attributes, such as coordinates, colors, normals
      • Normals define the direction a face is oriented; can be calculated by averaging the normals of the nearby vertices
    • Triangles are defined as objects that connect vertices
  • Techniques
    • Rasterization: Project vertices from 3D onto 2D space and draw triangles between them to represent the polygons; done by the GPU
    • Interpolation: Automatically generating transitions between colors, frames, polygons, etc.
      • Creates interpolation coefficients by averaging out the colors/normals of nearby vertices
  • Transformations
    • Scaling: Apply a scaling matrix, defined as $S(s_x, s_y, s_z)$ onto a point to transform it
      • Matrix has parameters on the diagonal; can be reversed using the inverse matrix, which is equivalent to $S(1/s_x, 1/s_y, 1/s_z)$
    • Rotation: Apply a rotation matrix which rotates a point about one of the three axes using sine and cosine
    • Translation: Must use a 4D matrix and convert the 3-vector into a 4-vector
      • Homogenous coordinates: A 3-vector and 4-vector representing the same point in 3D; append an extra 1
      • All of the previous transformations can be converted into 4D matrices in order to work with the homogenous representation
    • Shearing: Translating an object about two out of the three axes by a value proportional to the third axis; affects shape of the object
  • Can concatenate different transformations onto each other to perform complex operations
    • Most notable is rotating/scaling about a fixed point by translating to the origin, performing the transformation, and inverting the first translation
  • An affine transformation is any transformation using a 4x4 matrix where the last row is 0 0 0 1
    • Degree of curve can’t be changed, and parallel lines cannot become intersecting lines
  • In projective transformations, parallel lines can intersect and vice versa; used when rendering using a pinhole camera

Graphics Pipeline

  • Input: Soup of triangles
  • Output: Image from a particular viewpoint, produced in real time
    • The output is put into the frame buffer, which is updated according to the FPS and sent to the monitor
  • Steps (done in the GPU)
    • Vertex Processing: Process the vertices and normals
      • Performs transformations on points and per vertex lighting
      • Transformations include model, view, and projection transforms
    • Rasterization: Convert vertices into a set of fragments (triangles)
    • Fragment Processing: Process individual fragments
      • Performs texturing and per fragment lighting
    • Output Merging: Combine 3D fragments into 2D space for the display

Vertex Procesing

  • Begins by arranging the objects in the world using a model transform
    • Involves scaling, rotation, translation, shear transformations to propagate the world space
  • Positions and orients the camera using a view transform
    • Translate the camera to the origi ($T$) and then rotate it appropriately ($R$); final transformation is $M = RT$
  • Defines properties of the camera (FOV, lens) and projects the 3D space onto the camera using a projection transform
    • Uses gaze direction (shear), FOV, aspect ratio, near plane (image plane), and far plane (cuts off rest of scene) to create the 2D image
    • Displays all objects inside of the view frustum which is a 3D object connecting the near and far planes
      • Must be normalized so conversion to window coordinates is easy
    • Final transformation matrix: ${v_{clip} = M_{proj} \cdot M_{view} \cdot M_{model} \cdot v}$
    • Must clip objects to fit on screen by transforming coordinates again
  • TLDR: Takes 3D vertices and puts them on a 2D screen, ensuring that only vertices that are “on-screen” are rendered