Virtual Reality
COMPSCI 118
Introduction
-
Formal definition: Using targeted behavior in an organism using artificial sensory simulation with little or no awareness of the interference on the part of the organism
- Targeted behavior: A man-made experience
- Organism: Any organism, not just human
- Targeted behavior: One or more senses are “taken over” (at least partially) by the virtual world
- No awareness: “Fooled” to feel like the real world; sense of presence
- Music, movies, and paintings can be thought of as “virtual reality” through this definition
- Defined by Immanuel Kant as the reality in someone’s mind
- Jaron Lanier also defined a real world (the physical world) and a virtual world (the perceived world)
- Different terms for VR: Augmented reality (AR), mixed reality (MR), XR, telepresence, teleoperation
- Open Loop vs. Closed Loop: Open loop systems don’t allow for the user to interact, while closed loop systems do
- Components
- Tracking: Input from user, looks at hand, head, body, etc. movements
- Software: Renders and controls the virtual world
- Maintains consistency between real world and virtual world
- Matched zone; users should be able to walk in the real world to walk in the virtual one
- Display: Outputs the virtual world to the user
- The computer links all these things together
- A VR headset uses two different images for your two eyes in order to create the illusion of depth
- Instead of obscuring all other vision, AR uses pass-through monitors as lenses in order to project virtual objects onto the real world
- SAR wants to get rid of any wearables (i.e. headsets) and allow for seamless merging between the virtual and real worlds
- Some challenges with VR headsets
- Vergence: Headsets will cannot emulate aspects of depth; eyes will try to focus on something far away, but the screen will stay the same, causing discomfort to the user
-
Law of Weber and Stephen: Users will be able to physically feel a difference depending on their stimulus
- $P=KS^n$, where $K = \frac{\text{Difference Threshold}}{\text{Standard Weight}}$ is the Weber fraction, $P$ is the perception, and $S$ is the stimulus strength
- If $n>1$, then we have expansion; if $n<1$, we have compression
- Electric shocks follow expansion (double the shock is more than double the pain), whereas brightnesss follows compression (double the light is less than double the brightness)
- McGurk Effect: If the lip sync and audio are different, you hear something different
- Early VR headsets included stereoscopes, HMD, Nintendo’s Virtual Boy
Rendering
- Has multiple inputs
- 3D world: objects, lights, materials, textures
- Camera location, orientation, FoV
- Output: 2D image of the world from the camera
- Graphics Pipeline
- Modeling: Coordinate system and objects
- Viewing: Camera/eye, gets rid of objects not being seen
- Illumination and shading
- Rasterization: Creating a 2D image from the 3D world
- Texture mapping
-
Triangle Soup Model
- Vertices have a number of attributes, such as coordinates, colors, normals
- Normals define the direction a face is oriented; can be calculated by averaging the normals of the nearby vertices
- Triangles are defined as objects that connect vertices
- Vertices have a number of attributes, such as coordinates, colors, normals
- Techniques
- Rasterization: Project vertices from 3D onto 2D space and draw triangles between them to represent the polygons; done by the GPU
-
Interpolation: Automatically generating transitions between colors, frames, polygons, etc.
- Creates interpolation coefficients by averaging out the colors/normals of nearby vertices
- Transformations
-
Scaling: Apply a scaling matrix, defined as $S(s_x, s_y, s_z)$ onto a point to transform it
- Matrix has parameters on the diagonal; can be reversed using the inverse matrix, which is equivalent to $S(1/s_x, 1/s_y, 1/s_z)$
- Rotation: Apply a rotation matrix which rotates a point about one of the three axes using sine and cosine
-
Translation: Must use a 4D matrix and convert the 3-vector into a 4-vector
- Homogenous coordinates: A 3-vector and 4-vector representing the same point in 3D; append an extra 1
- All of the previous transformations can be converted into 4D matrices in order to work with the homogenous representation
- Shearing: Translating an object about two out of the three axes by a value proportional to the third axis; affects shape of the object
-
Scaling: Apply a scaling matrix, defined as $S(s_x, s_y, s_z)$ onto a point to transform it
- Can concatenate different transformations onto each other to perform complex operations
- Most notable is rotating/scaling about a fixed point by translating to the origin, performing the transformation, and inverting the first translation
- An affine transformation is any transformation using a 4x4 matrix where the last row is 0 0 0 1
- Degree of curve can’t be changed, and parallel lines cannot become intersecting lines
- In projective transformations, parallel lines can intersect and vice versa; used when rendering using a pinhole camera
Graphics Pipeline
- Input: Soup of triangles
- Output: Image from a particular viewpoint, produced in real time
- The output is put into the frame buffer, which is updated according to the FPS and sent to the monitor
- Steps (done in the GPU)
-
Vertex Processing: Process the vertices and normals
- Performs transformations on points and per vertex lighting
- Transformations include model, view, and projection transforms
- Rasterization: Convert vertices into a set of fragments (triangles)
-
Fragment Processing: Process individual fragments
- Performs texturing and per fragment lighting
- Output Merging: Combine 3D fragments into 2D space for the display
-
Vertex Processing: Process the vertices and normals
Vertex Procesing
- Begins by arranging the objects in the world using a model transform
- Involves scaling, rotation, translation, shear transformations to propagate the world space
- Positions and orients the camera using a view transform
- Translate the camera to the origi ($T$) and then rotate it appropriately ($R$); final transformation is $M = RT$
- Defines properties of the camera (FOV, lens) and projects the 3D space onto the camera using a projection transform
- Uses gaze direction (shear), FOV, aspect ratio, near plane (image plane), and far plane (cuts off rest of scene) to create the 2D image
- Displays all objects inside of the view frustum which is a 3D object connecting the near and far planes
- Must be normalized so conversion to window coordinates is easy
- Final transformation matrix: ${v_{clip} = M_{proj} \cdot M_{view} \cdot M_{model} \cdot v}$
- Must clip objects to fit on screen by transforming coordinates again
- TLDR: Takes 3D vertices and puts them on a 2D screen, ensuring that only vertices that are “on-screen” are rendered