Masters Thesis Research

Abstract

3D models representing cities contain massive amounts of geometry and textures. City models can be acquired using aerial imagery and ground laser scans or generated procedurally [Mueller06]. The resulting models can have polygon counts in the billions; e.g. a procedurally generated, high detail model of Pompeii has about 1.4 billion polygons.

Brute force rendering of city models cannot achieve interactive frame rates even on today’s fastest GPUs. Furthermore, entire models are unlikely to fit into system memory.

For my masters thesis, I plan to explore real time rendering of massive city models, including the following techniques.

  • Polygonal simplification [Luebke01] and hierarchical levels of detail (HLOD) generation.
  • Rendering using HLODs [Erikson01]
    • View frustum culling
    • LOD selection
    • Out-of-core memory management
  • Rendering with occlusion culling [Cohen03]

There may be room for innovation by introducing novel combinations of HLOD and occlusion culling, which [Erikson01] sites as future work. [Mueller06] states that LOD techniques for massive city model need to be developed to support real-time rendering.

Furthermore, I hope to apply distributed systems to HLOD generation, which is the aim of my CIS 505 project. See my proposal: PreprocessingCityModels.doc for details. Dr. Boon Thau Loo is advising me on the distributed systems aspects.

My thesis adviser is Dr. Norman Badler.

Work Log

If this is your first visit, read the work log bottom up.

Wednesday 7/30/2008

I read Efficient Occlusion Culling in GPU Gems. This is a good article that provides an introduction to hardware occlusion queries and a strategy to avoid CPU stalls. Here is the basic algorithm:

Check occlusion query from previous frame.
begin query
for each visible object
render with color and depth buffer writes enabled // Potential Occluder
for each previously invisible object
render bounding volume with color and depth buffer writes disabled // Potential Occludee
end query

The author makes a lot interesting points:

  • On newer hardware (well, probably all of today's hardware), rendering with just the depth test is a faster then writing color/depth/etc.
  • If a bounding volume intersects the near plane, it's back faces need to be rendered or the occlusion query should be skipped.
  • Large bounding volumes can lead to a fill rate problem, consider using more then one volume to surround an object.

Tuesday 7/29/2008

I've read Real-Time KD-Tree Construction on Graphics Hardware. In summary, they present a GPU algorithm that takes advantage of the 100-1000s of threads available on a GPU compared to the previous parallel CPU algorithms that only have a few cores available so memory layout becomes important. Their GPU algorithm is as fast as the fastest CPU algorithm. Tree nodes are built in a breath-first order, every node at the same tree level spawns a new thread. CPU algorithms have to go depth-first for nodes near bottom or too many threads will be created. In the GPU algorithm, large, upper-level nodes are built by parallelizing over primitives instead of nodes. They also present a scheme for fast evaluation of node split costs. I have a few ideas from this paper.

  • Since the CPU doesn't do much, except for some coordination during the GPU algorithm, can we come up with a CPU-GPU hybrid algorithm that keeps both busy? Perhaps the CPU is better at the top levels then the GPU. Maybe modify Experiences with Streaming Construction of SAH KD-Trees to use the GPU for lower-level nodes instead of switching to depth-first.
  • My current HLOD code uses uniform partition, as suggested in [Erikson01]. Perhaps we can do the partitioning kd-tree style. Maybe do the simplification on the CPU while the GPU does the partitioning?

I also read the first half of a great paper on using occlusion queries with HLOD: Optimized HLOD Refinement Driven by Hardware Occlusion Queries. In summary, HLOD refinement (moving from parent to children) is based on both the screen-space error (SSE) and the predicted result of hardware occlusion queries. The HOQ is predicted to avoid CPU stalls and GPU starvation. The HOQ is modulated with the SSE so a higher SEE can be tolerated if the node takes up less number of pixels. A linear scale can lead to artifacts so a sigmoid bias function is suggested (can't this just be a cubic Bezier curve?). They compute relative visibility = unoccluded pixels / pixels object projects to (Idea: benefits from tight bounding volume!). When they visit a node, they either stop (render), refine (issue query, visit children), or delay (issue query).

I'd like to extend this work if possible. They use octtree partitioning and are in-core, perhaps I can use kd-tree and go out-of-core.

Saturday 7/26/2008

I've read through some of [Alliez05] (Recent Advances in Compression of 3D Meshes) since this is an area I know almost nothing about. Compression is done either loss-less or lossy. The vertices themselves can be compressed as well as the connectivity (indices). The algorithms are orthogonal to each other. From a quick google search and browsing the CUDA website, it doesn't look like much work has been done on mesh compression using CUDA, although I've seen vertex decompression in vertex shaders before. I thought about a CUDA implementation of the EdgeBreaker algorithm but from what I can tell, it will not map to the GPU well. Here is an interesting statement from that paper that I didn't know:

"For all meshes that are homeomorphic to a sphere, and in fact for most meshes in practice, the number of triangles is roughly twice the number of vertices."

Improving Pre-processing

"Even though the concepts of classic spatial indexes are simple and well understood, constructing them for massive models requires specialized methods that have a low computational complexity and coherent access patterns to avoid I/O thrashing, while at the same time providing optimized space partitions for visibility queries." [Gobbetti08]

Three things are required

  • Boxes of objects, not individual triangles, are used
  • test K heuristically selected planes
  • SAH is computed in single streaming pass

Here are some good slides on kd-trees. There is a lot to read.

Backup Plan

I'm doing a lot of reading looking for an area I can make a contribution. I am going to wait until after SIGGRAPH until I make any decisions but so far I am leaning towards either accelerating pre-processing using CUDA or combing HLOD and occlusion culling in a novel way. If I can't come up with anything I can fall back on my HLOD work from CIS 505: We can try different splitting plane strategies, different strategies for triangles crossing nodes, and another LOD algorithm.

Wednesday 7/23/2008

I've finished reading Cohen's visibility survey [Cohen03]. They site reducing preprocessing time and space, and integrating occlusion culling with other visibility techniques as part of the work remaining so it looks like I'm on the right track.

This survey lead me to [Wonka00] who pre-computes per-region visibility with occluder fusion by taking samples from the region and shrinking each occluder to keep the algorithm conservative. Wonka builds on his occluder shadow paper to implement this algorithm for 2.5D scenes. Perhaps we can extend it to 3D scenes by fitting an non-uniform grid around the scene that is more dense near the "bottom" where more occlusion is expected. As I've said before, we should also try to precompute with CUDA.

I also want to read Level of Detail Based Occlusion Culling for Dynamic Scenes.

Monday 7/21/2008

I've read most of [Martens03]. Can we modify HLOD trees to support hardware occlusion queries (2.5)? We will need big triangles high in the tree and to sort nodes front to back during rendering

Monday 7/14/2008

I finished [Gobbetti08] and don't have much more to say about it. I'm probably not interested in the cache-coherent layouts or mesh compression but I want to skim [Alliez05] before I completely rule out mesh compression.

I've started reading a survey on occlusion culling [Martens03]. I'm considering using CUDA to precompute per-region visibility but need to read a lot more before I know if this is a good idea and if its already been done.

Saturday 7/12/2008

I have a lot to read. I need to focus on reading slower and paying more attention, if if that means reading less.

I've read the first half of [Gobbetti08] and have a few ideas from it:

  • Can our area be out-of-core creation of HLOD (2.1.1)?
  • Can we use cone of normals for (clustered) back-face culling on the CPU (2.1.2)? Since ever building is likely to have half its triangles culled anyway. Use two planes perpendicular to the ground to create 4 clusters. This may cut memory bandwidth in half.
  • When pre-computing per-region visibility, only consider occluders within a threshold distance to make pre-computing fast. People must be doing this already.

Also, I written most of my abstract and will post it in a few days.

Wednesday 7/09/2008

The schedule for my thesis will something like:

  • Today - SIGGRAPH (mid August): Continue reading and researching. Come up with an abstract and outline.
  • SIGGRAPH - Late November: Implementation and writing.

The current goal is to determine to best way to combine HLOD and occlusion culling in urban environments. This is view dependent, in a walk-thru, occlusion culling is highly effective, in a fly-thru, LOD is more important. We will, of course, also survey the field of massive model visualization as it pertains to urban environments.

I've already read a lot but still have the following on my reading list (in order):

  • [Dietrich07] Massive-Model Rendering Techniques: A Tutorial (almost done this one)
  • [Gobbetti08] Technical Strategies for Massive Model Visualization
  • [Cohen03] A Survey of Visibility for Walkthrough Applications
  • [Alliez05] Recent Advances in Compression of 3D Meshes

Tuesday 7/08/2008 (Start of Thesis)

I watched the SIGGRAPH 2007 course on Massive Model Visualization (SIGGRAPH members can access this using the ACM Digital Library. It was a good course but mostly contained thing I'm already familiar with. Here are the more noteworthy points:

  • One technique didn't use LOD and instead used hardware occlusion queries to minimize overdraw. Perhaps we can do some occlusion culling with CUDA? We should look at "Visibility-guided rendering to accelerate 3D graphics hardware performance."
  • Handling general models is hard: different densities, different degrees of depth complexity, different number of materials, etc.
  • People think we will have to ray trace for quality because of number of materials and effects that require multi-pass increase linearly w/ amount of geometry and also hurt frame to frame coherence.
  • There is interest in higher order primitives, such as surfaces, for massive models. Perhaps we can make use of geometry shaders?
  • Other issues: textures, pre-processing time, dynamic changes to model, …

I need to read Massive-Model Rendering Techniques: A Tutorial in IEE Computer Graphics and Applications. It looks really good.

Monday 4/28/2008

I should read this all the way thru at some point: Efficient view-dependent out-of-core visualization

Friday 4/25/2008

Here is the paper: PreprocessingCityModels.pdf
Here is a near final version of the slides: PreprocessingCityModels.ppt

The abstract:

We present a parallel algorithm for creating hierarchical levels of detail (HLODs). Our algorithm exploits two orthogonal degrees of parallelism. First, polygonal simplification executes on separate nodes in parallel. In addition, sub-trees are processed and spatially subdivided in parallel. We demonstrate our algorithm with a multi-threaded C++ implementation that takes advantage of multi-core processors.

Tuesday 4/22/2008

A draft of my slides is now available: PreprocessingCityModels.ppt. I've also made a draft of my paper I haven't cleaned it up enough to the point where I want to post it.

The following timings were done on a hyper-threaded dual core Xeon 2.66 GHz with 2 GB RAM. The city model used is 5,646,041 triangles with 3,394,672 unique vertices and takes up 205,217 KB on disk in the wavefront format. All timings are in seconds.

The following are constants across all timings and have been factored out to isolate the speedup of the parallel algorithm.

Read model into memory ~8.55
Compute per vertex weight ~7.62

2603 nodes are created when the tree height is 4. The single threaded timing is:

Read model into memory 8.45521
Compute per vertex weight 8.01062
Total spatial subdivision 13.452
Total polygonal simplification 4.48443
Total node write disk IO 31.5126
Total 65.9885

When factoring out the constants, the following percentage of time is spent on each task:

Total spatial subdivision 27%
Total polygonal simplification 9%
Total node write disk IO 64%

Without the constants, the total time is:

Total without constants 49.52267

Observe

  • Once per-vertex weights are created, vertex clustering is incredibly fast. It is just 2 passes over the input vertices, 1 pass over the input indices and 1 pass over the output indices.
  • Having each node write to disk generates too much disk IO. A production quality implementation should batch up writes to improve performance.
  • Given the speed of vertex clustering and the IO overhead, how well will this algorithm scale to a distributed system?

The following timings should show simplification and/or sub-tree workers:

Simplification Workers:

None 1 simplification workers 2 simplification workers 3 simplification workers
49.52 42.63 41.6 41.89

Sub-tree Workers

None 2 sub-tree workers 3 sub-tree workers 4 sub-tree workers
49.52 44.66 44.34 44.75

Sub-tree Workers - Only root starts workers

None 2 sub-tree workers 3 sub-tree workers 4 sub-tree workers
49.52 40.94 42.04 43.47

Hybrid of Simplification and Sub-tree Workers

None 1 simplification worker, 2 sub-tree workers (off root)
49.52 45.35

Observe

  • Thread creation is overhead. When using sub-tree workers, it is more efficient to only let the root start threads.
  • Since the single threaded node write disk IO time is 31.51, we can't expect to get below this. 41.6 for 2 simplification workers is pretty good considering the amount of spatial subdivision work the root node needs to do before any more then the first simplification worker can run.
  • The hybrid doesn't appear to perform well but it requires 3 cores and this is a 2 core machine.

The next test removes the node write disk IO to test the raw parallel algorithm. The tree height is 6, which creates 95,479 nodes.

Total spatial subdivision 22.5812
Total polygonal simplification 9.61928
Total time 32.882

The following timings should show simplification and/or sub-tree workers:

Simplification Workers:

None 1 simplification workers 2 simplification workers
32.882 23.65 24.15

Sub-tree Workers - Only root starts workers

None 2 sub-tree workers 3 sub-tree workers
32.882 20.46 21.01

Hybrid of Simplification and Sub-tree Workers

None 1 simplification worker, 2 sub-tree workers (off root) 2 simplification workers, 2 sub-tree workers (off root)
32.882 19.32 19.51

Monday 4/21/2008

I have good news and bad news. Well, it may just be good news. The tree creation in the table from Thursday that took 38 seconds for a single thread now takes 8 seconds for a single thread. I was using the entire vertex data for every simplification and didn't realize that creating the clusters was O(n) with n being the number of vertices not the number of triangles. So I changed the child node creation to pass along just the vertices in the child node and attained a 4x speed increase.

Now the bad news. The parallel algorithm is now less effective since the bottleneck has shifted from the simplification to the disk I/O. That and the fact that the work at the root node is significant (Although I wouldn't go so far as to say that it dominates) and has to be done, to some extent, before the parallelism.

Regardless, the parallel algorithm is still useful for deep trees (e.g. height > 5), which will be the norm for rendering. Furthermore, a nice combination of simplification parallelism and spatial subdivision parallelism appears to provide good results that I can almost rationalize. I will post timings tomorrow.

Saturday 4/19/2008

Here is an draft abstract for the final report:

We present a parallel algorithm for creating hierarchical levels of detail (HLODs). Our algorithm exploits three orthogonal degrees of parallelism. First, polygonal simplification of nodes is executed in parallel. Next, sub-trees are spatial subdivided in parallel. Finally, polygonal simplification algorithms such as vertex clustering can be parallelized. We demonstrate our algorithm with a multi-threaded C++ implementation running on a single machine that scales xxx with the number of cores.

I've also put together about 10 slides so far. Most importantly, I made sure I could load the largest model I have (again from: Pascal Mueller and Simon Haegler). It is 5,646,041 triangles with 3,394,672 unique vertices. Here is a wireframe with no simplification:

10blocks.jpg

Thursday 4/17/2008

I finished making the simplification step run in parallel using threads. This included switching to boost's thread pool and smart pointers and a lot of code cleanup and testing.

When the algorithm starts, the model is simplified on a separate thread to create the root node and the main thread spatial subdivides the model. Here is an image of subdivision into two half-spaces.

division2.jpg

You'll notice that the triangles that are not completely within an AABB are included in both AABBs. I'll address this after the semester. Although it is easy to see what is going on in the above image, subdivision really creates 8 children like the following image.

division8.jpg

The model isn't orientated nicely but as subdivision continues, the empty children will be thrown away so it won't be a big deal.

The model in the above image was made available from Pascal Mueller and Simon Haegler. It is 543,428 triangles with 341,315 unique vertices. It is not a "massive model" still useful for quick performance tests to see how threading the simplification helped performance:

# Worker Threads 0 1 3 10
tree height = 1 3.94 3.34 2.97
tree height = 3 38.36 33.92 24.45 36

All timings are in seconds, using a release build and include constants such as file I/O to read and parse the model (19,576 KB) and the initial pass over the vertices to compute weights.

The tree of height 1 includes the root (depth 0) and its 8 children (depth 1). In this case, simplification is only done for the root but the leaf nodes are still written to disk in worker threads. With the tree height of 3, it could have has many as 1 + 8 + 64 + 512 nodes, so threading should help more. You'll notice that too many threads trash the L2 cache and produced a wide array of timings, all worst than 3 threads.

Currently, individual nodes are just written to disk. In a complete, implementation a final pass would combine the nodes into a single file. Next, I will add concurrency for subdivision and do more detailed testing.

Wednesday 4/16/2008

I made pretty good progress on a parallel implementation. I am using boost for threading, which I have never used before but the learning curve is minimal. I'm using threads instead of separate processes so I can finish coding in time for the end of the semester. Also, when this algorithm runs on a single machine, I am sure threads will perform much better then separate processes. With that being said, it should be straightforward to convert the code to a truly distributed algorithm after the semester ends.

I added concurrency to the polygonal simplification step, so more then one node may be simplified at the same time. Using a semaphore, I restrict the total number of threads in flight. Tomorrow, I will look at doing the spatial subdivision in parallel as well. Finally, I will get timings for large inputs to see the performance improvements.

Saturday 4/12/2008

I cleaned up a lot of the HLOD tree creation code. In particular, I changed the spatial hash table used for vertex clustering to use the hash function suggested in [Teschner03]. I haven't tested it extensively but it appears to perform well. I also made sure vertex weights are computed just once and then used for simplification of geometry in every node of the tree. I want to clean up and few more areas and test some large models, then see how far I can get with the parallel implementation.

Dr. Loo and I came up with several ways to apply concurrency to this algorithm:

  • Parallel Cell Collapse - The cell collapse operation is performed on each cluster during vertex clustering. This operation replaces vertices in a cluster with the highest weighted vertex in the cluster. It is trivial to "collapse" each cluster in parallel. Furthermore, as Dr. Loo noted, this is orthogonal to the additional modifications suggested below.
  • Parallel Simplification - In this model, each call to simplify() is done in parallel (See the pseudo code from Wednesday). The master process is responsible for the spatial subdivision and sends nodes to other processes for simplification. These processes return the simplified node, which the master process writes to disk. In a final pass, the master process aggregates all the nodes written to disk to create a single file to the HLOD tree.
    • This has some additional overhead as opposed to using shared memory. If shared memory is used, the vertices are not duplicated or copied. All nodes just contain indices into the original mesh's vertex buffer. Because of this, it is likely that a threaded, shared memory implementation will outperform a multi-process implementation on a single machine.
  • Parallel Spatial Subdivision - Each call to build_tree() is done in parallel. Therefore each sub-tree is subdivided and simplified in parallel. Similarly to the 2nd method, each process will write its simplified node to disk, which is then aggregated to form the tree.

In addition to concurrency, we will also need to consider out-of-core algorithms such as the one described in [Lindstrom00]. Although this is likely to be outside the scope of this semester.

Friday 4/11/2008

I have a basic implementation of HLOD tree creation working. Each node is just written to a separate binary file on disk. It will take at least one or two more full days before I can start implementing concurrency. After talking with Dr. Loo, we have several ideas about adding concurrency the algorithm, I will post some notes tomorrow.

Thursday 4/10/2008

The Triangle/AABB test turned out to be much more complicated that I though. Fortunately, Tomas Akenine-Möller has a public domain implementation that I tried out. After writing several test cases, I am now happy with it.

I also modified the vertex clustering implementation to return a new set of vertices for the simplified model.

With these changes in place, I am ready to write the code to build the HLOD tree.

Wednesday 4/09/2008

I came up with a straight forward algorithm to build the HLOD tree. First, I will need to code some primitives:

  • Triangle/AABB (axis-aligned bounding box) intersection test. For starters, I'm ok with a boolean result but in the future, I might need the intersection points. Although I don't know the algorithm off the top of my head, this should be easy to code.
  • Modify my vertex clustering implementation to return a new set of vertices for the simplified model. The current implementation returns just a new set of indices, which is desirable in many cases due to the memory savings, but the HLOD will require vertices be stored in every node. Potentially there's room for innovation here by coming up with a memory efficient tree structure.

Once these changes are made, the HLOD tree creation will look like:

build_tree(model, aabb)
   {
      // return if some stopping condition is meet
 
      //
      // RESOLUTION is a constant.  Nodes deeper in the tree are smaller in
      // world space then those above but have the same resolution, so
      // therefore are more detailed.
      //
      simplify(model, RESOLUTION);   // Execute in a separate thread or machine
 
      //
      // Uniformly divide this node into 8 children defined by non-overlapping
      // AABBs just like a standard octtree.  We will improve this in the future
      // by pick a better splitting plane or subdivision strategy, potentially
      // based on model density.
      //
      node children[8];
 
      for each triangle in model
         for each child
            if triangle is in or intersects child.aabb
               child.mesh.add(triangle)             // break inner loop if triangle is complete inside aabb
 
      for each child
         build_tree(child.model, child.aabb)        // recurse, execute each in a separate thread or machine
   }

The above leaves out lots of details like splicing node pointers together, storing world space errors, what happens to result from simplify(), writing the tree to disk and maximum amount of allowed concurrency. Trivially, the algorithm allows an incredible amount of concurrency; I expect excellent results in this area.

I also read parts of "Polygonal Model Simplification using Hierarchical Vertex Clustering." [Strauss00]. This lead me to two other papers suggesting improvements for vertex clustering in addition to floating-cell clustering [Low97]. I'm not modifying my implementation right now but I want to make notes of them: "3d object database simplification using a vertex clustering algorithm" suggests using multiple representative vertices per cluster if needed and "Generating Multiple Levels of Detail from Polygonal Geometry Models" suggests a hierarchical approach. Later, I'll review these in detail.

Monday 4/07/2008

I am happy with my vertex clustering implementation, although I will probably improve it to use floating-cell clustering [Low97] after CIS 505. I am now turning my attention to creating the HLOD tree. I found some recent work [Borgeat05]. They segment the model differently depending on if it is geometry-dense or texture-dense. They aim to keep natural boundaries such as shaders and textures. For geometry-dense models, they recursively subdivide along the axis of maximum variation. In section 3.3, they describe how they parallelize pre-processing.

On the rendering side, they implement geomorphing on the GPU in a vertex shader to avoid cracks. This is different then [Erikson01], who restricts triangles that intersect a node's boundary from simplification. [Lakhia04] takes yet another approach, triangulating triangles that intersect node boundaries to minimize cracks. [Blow04] shows what looks to be the most pragmatic way to eliminate cracks but I will need to read it again when my head hurts less. The same author also argues that simplification that also outputs a normal map can help reduce popping due to changes in per-vertex lighting [Blow02].

These are really a bunch of rendering decisions that I don't need to make right now. I will likely create the HLOD tree using simple uniform or density based subdivision. I will spend one more day reading and start coding on Wednesday.

Saturday 4/05/2008

I tests the vertex clustering algorithm on some more complicated models. Here are three circles; the original with 30 triangles, a simplified one with 19 triangles, and a drastically simplified one with 4 triangles.

circle30.jpg circle19.jpg circle4.jpg

I also tested a procedurally generated city model made available by Pascal Mueller and Simon Haegler. Visit their page to see textured and shaded versions.

The original model, shown below, is 4,656 triangles.

block4656.jpg

The following image shows the model reduced to 1,807 triangles.

block1807.jpg

The following image show the model drastically reduced to just 488 triangles:

block488.jpg

I wrote a quick and dirty Wavefront OBJ model reader to read these files. It would have been painless except std::istringstream is so slow that I had to do all the parsing by hand.

I also read the paper on the out-of-core version of vertex clustering [Lindstrom00]. It sounds like I can increase the fidelity of the simplification by using error quadrics for placing a cluster's representative vertex. This will be good for large models but it will not allow reuse of vertex data for smaller models. I'm not going to make any changes right now though.

Friday 4/04/2008

I've made a ton of progress today; I have an implementation of vertex clustering working on a very simple model. It is already almost 1,000 lines of code, not including unit tests (to be fair, that's including comments though).

In order to assign vertices to clusters, I implemented a spatial hash table as suggested in section 5.1.3 of Level of Detail for 3D Graphics [Luebke03]. The size of the table is based on prime numbers from a handy table I found at PlanetMath.org. The actual hashing function takes (x, y, z) as input and returns ((x * some_prime_number) + (y * another_prime) + (z * last_prime)) % table_size. I wasn't quite sure how to pick the primes here so I just picked them out of thin air. At some point, I might look at my CIS 502 notes on universal hash functions.

Once the spatial hash table was working, it was easy to assign vertices to clusters, then collapse all the vertices in each cluster to the one with the highest weight. Then another pass removes the degenerate triangles created. Its about 6 passes in total. The result is a simplified mesh. Here is a simple test case:

house.jpg simplehouse.jpg

The mesh on the left is simplified from 3 to 1 triangles. Although this mesh is planar, the algorithm works on meshes in 3-space. Before I move onto HLOD tree generation, I want to experiment with more complicated models.

Thursday 4/03/2008

I coded the vertex weighting part of vertex clustering using the silhouette edge probability enhancement suggested in [Low97]. I was able to remove all the trigonometry by introducing a square root using the half angle identity. So the performance of this part should be pretty good. It takes two passes over the vertices. In the first pass, while the adjacency graph is build, weights for the silhouette edge probability and weights for the edge length are computed. In the second pass, the final weight for each vertex is computed using a linearly combination of the two weights. This second pass is required so the edge lengths can be scaled to [0, 1] so they are not unevenly weighted. Of course, if you haven't read the paper, this probably doesn't make any sense.

I want to code this algorithm making as few passes as possible over the data since every pass is going to generate L2 misses for most models due to trashing. Tomorrow, I'll look at clustering the vertices now that they are weighted. That should be easy.

Wednesday 4/02/2008

I started coding vertex clustering. The first step of the algorithm is to assign weights to each vertex based on all of its incident edges. Vertices with higher weights are less likely to be removed during simplification. So I wrote code to covert the input triangle mesh (an array of indices that index into an array of vertices, every three indices represents a triangle) to an adjacency list representation of an undirected graph. From here computing the weights will be easy.

I also read the floating-cell clustering variation [Low97] to vertex clustering and I really like it. It doesn't look much harder to code (most of the code will be the same) but the simplified models are much nicer. The run-time increases from linear time to nlog(n) but the nlog(n) sort only happens once per input model, then the simplification is linear. So when we create the HLOD tree, we only sort once and the simplification for each node is O(n). When you amortize the initial sort over each simplification, I'm sure it will be worth it.

Tomorrow will be a big day of coding.

Monday 3/31/2008

I read about two more chapters of Level of Detail for 3D Graphics [Luebke03], including their coverage of vertex clustering, which I am now confident I can implement pretty quickly. Although they have me tempted to implement the floating-cell clustering variation [Low97], I will stick with uniform grids since I've done the most reading on that version and I want to get the polygonal simplification algorithm out of the way so I can focus on ideas more relevant to CIS 505. I should also mention that the book authors like vertex clustering for massive models because of its O(n) running time.

I've decided to implement vertex clustering instead of trying to integrate with QSlim, mainly because after doing all this reading I've becoming more interested in the details of polygonal simplification. Also, QSlim is GPL and it is likely that I will want to roll this code into my employer's engine, which I'm not sure I can do with QSlim.

The plan is to implement vertex clustering this week, HLOD tree generation next week, then explore utilizing distributed systems the following week. Although tight, this isn't completely insane and leaves a few days for a buffer.

Also, I looked thru the SIGGRAPH 2006 and 2007 materials. I didn't see any interesting papers but I saw two 2007 courses I want to look at after CIS 505 is over.

  • Course 4: State of the art in massive model visualization
    • 2.5 Visibility-guided Rendering to Accelerate 3D Graphics Hardware Performance
    • 2.6 GPU-friendly Accelerated Mesh-based and Mesh-less Techniques for the Output-sensitive Rendering of Huge Complex 3D Models
  • Course 14: Urban design and procedural modeling
  • State of the art in modeling and rendering

Saturday 3/29/2008

I've read Lakhia's masters thesis on city rendering [Lakhia04], which led me to Erikson's paper on HLODs [Erikson01]. Both of which made me realize that the first thing I need to do is figure out what polygonal simplification algorithm I am going to use. This is even more important then reading city models, since they come in such a wide array of formats.

[Lakhia04] uses QSlim. I was able to get version 2.0 (2.1 is the latest) to compile on Windows. This package is Garland's implementation of quadric-based polygonal surface simplification [Garland99]. This algorithm is desirable for many reasons including it can combine vertices not attached by an edge and does not require manifold topology.

I've read Luebke's excellent survey of polygon simplification algorithms [Luebke01] and he recommends this algorithm numerous times. As an alternative, I may implement vertex clustering [Rossignac93]. I've read the paper twice now and think I can implement it in 3 long productive days. Once I have a basic version running, we can consider enhancements such as quadric error metrics and out-of-core simplification [Lindstrom00], which we are going to need long term since the input models will be very large. Furthermore, implementing the simplification algorithm myself may give us insights into how to best take advantage of distributed systems.

I also read about 2 chapters in Level of Detail for 3D Graphics [Luebke03], which so far is an outstanding book. After reading their coverage of vertex clustering and looking at how hard it will be to integrate with QSlim, I will decide if I am going to code vertex clustering or use QSlim.

About Me

References

[Alliez05] P. Alliez and C. Gotsman, Recent Advances in Compression of 3D Meshes, Advances in Multiresolution for Geometric Modelling, N.A. Dodgson et al., eds., Springer, 2005, pages 3–26.

[Borgeat05] Borgeat, L., Godin, G., Blais, F., Massicotte, P., and Lahanier, C. "GoLD: Interactive Display of Huge Colored and Textured Models" Proceedings of ACM SIGGRAPH 2005.

[Blow02] Blow, J. Rendering Level-of-Detail Forecast. The Inner Product. August 2002.

[Blow04] Blow, J. Unified Rendering Level-of-Detail, Part 1. The Inner Product. March 2003.

[Cohen03] Cohen-Or, D., Chrysanthou, Y.L., Silva, C.T., and Durand, F. "A Survey of Visibility for Walkthrough Applications." IEEE Transactions on Visualization and Computer Graphics. Volume 9, Issue 3, July-Sept. 2003 Pages: 412-431.

[Dietrich07] Dietrich A., Gobbetti E., and Eui Yoon S. Massive-Model Rendering Techniques: A Tutorial IEEE Computer Graphics and Applications November/December 2007 (Vol. 27, No. 6), pages 20-34.

[Erikson01] Erikson C., Manocha D., and Baxter, W. V. III. "HLODs for faster display of large static and dynamic environments." Proceedings of the 2001 symposium on Interactive 3D graphics, pages 111-120. ACM Press, 2001.

[Garland99] Garland, M., "Quadric-Based Polygonal Surface Simplification", Ph.D. Thesis. Carnegie Mellon University. 1999.

[Gobbetti08] Gobbetti, E., Kasik, D., and Yoon, S. Technical Strategies for Massive Model Visualization Proceedings of the 2008 ACM symposium on Solid and physical modeling, pages 405-415.

[Strauss00] Strauss, J. S., "Polygonal Model Simplification using Hierarchial Vertex Clustering", Honours Thesis. The University of Western Australia. 200.

[Lakhia04] Lakhia, Ali Abbas Dawood. "Interactive Rendering for Large City Models." Masters Thesis. The University of California at Berkeley, 2004.

[Lindstrom00] Lindstrom, P., "Out-of-Core Simplification of Large Polygonal Models." ACM SIGGRAPH 2000, pages 259 - 262

[low97] Low, K.-L. and Tan, T.-S. "Model Simplification using Vertex-Clustering". Proceedings of 1997 Symposium on Interactive 3D Graphics, April 1997, pages 75 – 82.

[Luebke01] Luebke, D. "A Developer's Survey of Polygonal Simplification Algorithms" IEEE Computer Graphics and Applications. 2001.

[Luebke03] Luebke, D., Reddy, M., Cohen, J., Varshney, A., Watson, B., and Huebner, R. "Level of Detail for 3D Graphics." 2003. Morgan Kaufmann as part of their series in Computer Graphics and Geometric Modeling. ISBN 1-55860-838-9.

[Mueller06] Mueller P., Wonka P., Haegler S., Ulmer A., and Van Gool, Luc. "Procedural Modeling of Buildings." Proceedings of ACM SIGGRAPH 2006.

[Martens03] Martens, R. Occlusion Culling for the Real-Time Display of Complex 3D Models, Masters Thesis, Limburgs Universitair Centrum. 2003.

[Rossignac93] Rossignac, J. and Borrel, P. "Multi-Resolution 3D Approximations for Rendering Complex Scenes." Modeling in Computer Graphics, B. Falciendo and T. L. Kunii, Eds., IFIP series on computer graphics. Springer-Verlag, pages 455-465.

[Teschner03] Teschner, M., Heidelberger, B., Mueller, M., Pomeranets, D., and Gross, M. "Optimized spatial hashing for collision detection of deformable objects." Proceedings Vision, Modeling, Visualization. 2003. pages 47 - 54.

[Wonka00] Wonka, P., Wimmer, M., and Schmalstieg D. Visibility Preprocessing with Occluder Fusion for Urban Walkthroughs Rendering Techniques 2000 (Proceedings of the Eurographics Workshop on Rendering 2000). June 2000. pages 71–82.

Comments

Add your comment

Add a new comment
page_revision: 92, last_edited: 1217421185|%e %b %Y, %H:%M %Z (%O ago)
Unless stated otherwise Content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License