Sebastian Aaltonen Profile Banner
Sebastian Aaltonen Profile
Sebastian Aaltonen

@SebAaltonen

Followers
20,854
Following
229
Media
1,223
Statuses
22,671

Building a new renderer for HypeHype. Former Unity principal engineer and Ubisoft senior rendering lead. Opinions are my own.

Helsinki, Finland
Joined August 2015
Don't wanna be here? Send us removal request.
Explore trending content on Twitter
Pinned Tweet
@SebAaltonen
Sebastian Aaltonen
4 years ago
My Digital Dragons 2018 Claybook ray-tracing & GPU physics presentation video was released during my summer vacation: . This presentation has some more info about the coarse RT too (versus my #GDC18 presentation).
5
66
314
@SebAaltonen
Sebastian Aaltonen
a day ago
Or it might autovectorize my hard coded loop. But it's impossible to tell, because currently optimized release mode build tests don't have debug symbols and break points don't work. Have to modify CMake files to fix this. Need to check the codegen in release build.
1
0
1
@SebAaltonen
Sebastian Aaltonen
a day ago
The difference is that the offset counter is a class member instead of a local variable. Wondering if the compiler is able to keep this in a register in the manually inlined case. Also wondering if array[offset++] creates a chain dependency for CPU OOO execution...
1
0
1
@SebAaltonen
Sebastian Aaltonen
a day ago
I am seeing odd performance for a trivial stream helper class. 0.29ms instead of the optimal 0.21ms. Code should be identical and functions should be inlined. I am actually seeing the same behavior if I just copy paste the trivial wrapper...
1
0
0
@SebAaltonen
Sebastian Aaltonen
a day ago
This is just a user land helper class. The API doesn't enforce this stupidity. I could of course write a advanced user helper class that enforced the order instead of deferring the data emit.
2
0
0
@SebAaltonen
Sebastian Aaltonen
a day ago
The most stupid idea I have: Do a debug assert enforcing that the stream setters are called in correct order. Write the data immediately to the stream. Problem solved. But user experience is worse.
1
0
0
@SebAaltonen
Sebastian Aaltonen
a day ago
This would be easy to replace with a tzcnt loop. The data and bits are in the same order in the m_draw struct. That would be less stupid work, but I want to get rid of it all instead of just reducing it.
3
1
3
@SebAaltonen
Sebastian Aaltonen
a day ago
Who would have guessed. This code is SLOW! I manually hard coded the shadow draw loop with optimal code and... 0.51ms -> 0.21ms 0.30ms wasted for no good reason. Have to fix this.
Tweet media one
4
2
48
@SebAaltonen
Sebastian Aaltonen
a day ago
This means that writing 100,000 draw commands to my command stream take a whopping ~0.5ms of CPU time. I would assume that reading them takes similar time since we were bandwidth bound earlier (with 32 byte draw commands structs).
0
0
3
@SebAaltonen
Sebastian Aaltonen
a day ago
Scene object data persistent in GPU memory. One big uniform buffer, with offset binding. The "write data" sections write both the draw stream + uniforms. Persistent case doesn't write uniforms in draw loop at all. Nice saving there :)
Tweet media one
2
0
20
@SebAaltonen
Sebastian Aaltonen
a day ago
Conclusion: Persistent GPU scene data can't be in shader specific format. You don't know which shader uses it later. Split data accordingly to make it usable by many shaders. SoA layout works fine here. Read only those streams your shader cares about.
0
0
5
@SebAaltonen
Sebastian Aaltonen
a day ago
Similarly we have already split the vertex to position stream and properties stream. Properties stream could be shader specific. Main pass UBO data stream (above) could be shader specific. Still both can be rendered using the same shadow map vertex shader.
1
0
1
@SebAaltonen
Sebastian Aaltonen
a day ago
This is persistent data. We don't pay any extra write cost for unused regions. The unused region is 6144 bytes. No extra data is loaded to caches. Shadow draw performance is identical. Main pass reads matrices from two locations. But data is shared with all vertices of the draw.
1
0
2
@SebAaltonen
Sebastian Aaltonen
a day ago
The simple solution is to split data to multiple arrays (SoA layout). This works with both shadows and main passes. Both matrices are 48 bytes (vec4 padding). We have 6144 bytes of common data followed by 6144 bytes of main-only data. Shadows simply bind the first 6144 bytes.
Tweet media one
1
0
2
@SebAaltonen
Sebastian Aaltonen
a day ago
This is what shadow rendering pass expects:
Tweet media one
1
0
4