Blog Post Index (2020)

ECS with Python and Tkinter, Part 2

[Dynamic Link — Static Link]
Posted: 2020-12-11
Last Edited: 2020-12-11

Basics of ECS

ECS stands for Entity Component System, and is a way of designing programs that leads to very clean code, and easy composition. If you have component A, with its relevant system (controlling said components behaviour), every entity with component A will exhibit its associated behavior (as it will be processed by system A). Say that system A causes an entity to oscillate up and down, and that you have another entity which corresponds to a rock in your scene. By simply adding component A to this other entity, it will (hopefully) start being processed by system A and will start moving up and down. This composition makes it really enjoyable to work with ECS.

Entities

Entities are the things in a game, simply unique identifier, with no data by themselves. They represent discrete objects in the game (i.e. the player, rocks in the scene, a hidden manager object). Some ECS architectures store tags and other lightweight metadata on the Entity, which is fine, but i chose not to for purity and simplicity reasons.

typedef unsigned long entity_t; // as simple as a uint64

typedef struct Entity_t {
    unsigned long id;
    unsigned long flags;
    unsigned long tags;
    // ...
} Entity; // or something more complex

Systems

Systems implement the behaviour seen in a game. From rendering and physics, to input handling and enemy AI, everything the player experiences is the result of a system. The only difference between a system and a standalone script, is that where a script acts on a single object, the system acts on an iterator of objects (be that an array or a vector). This means that the instruction cache of the CPU is kept hot while processing objects, as the CPU doesn't have to go from processing a Collider to rendering a Mesh to then deserialising some data from the network.

Whilst systems at their simplest process a single component type, in real situations it is often necessary to reference other components on the same entity. For example, when running physics, it is important to know an entities current position (e.g. stored in a Transform component) and its velocity (e.g. stored in a Velocity component).

One solution is to move velocity data into the Transform. This means that it is no longer possible to have an entity with no velocity, only one with zero velocity. This makes the Transform component larger, harming performance (as the cacheline can hold less Transform components; explained in more detail in the 'Components' section). There is a better way.

Another solution is to have sytems work not on individual components, but on unions of components instead. This is what the archetypal ECS architecture is about. By doing this, data can be kept in smaller, more cache-friendly chunks.

class RenderSystem:
    def actions():
        return {
            self.process_active: (Transform2D, ScreenElement),
            self.process_stale: (StaleTag,),
        }

This is the approach i took, where each System has an actions() method which returns delegates which process entities, and the archetypes those delegates operate on. This allows a single system to contain many processing methods, and thus keeps related state in one place instead of strew around many different systems. The above example shows how 1 system can handle multiple archetypes.

Components

Finally, components. Components hold all the data for the game. Every entity that needs to have its mesh rendered, for example, will have a MeshComponent. Each component stores only the data that it needs, and no more. This separation means that no memory is wasted for entities that dont need a certain component.

Another good reason for storing data in components is that when you have a huge number, cache locality and the efficient filling of cache becomes important. Storing all possible state in 1 struct, and iterating over an array of said struct just to do 1 operation on it (say, update physics) means that the cache is constantly cold for the next piece of data to operate on.

A little detour to hardware land!

When a CPU wants to fetch some data from RAM, it will load a certain slice (say 64 bytes) into cache first (this is the idea of a cacheline). By loading more data than necessary from RAM, the CPU makes the assumption that it will need to fetch data nearby soon, and so it preloads the data into cache (which is much much faster than RAM; see pg 22 of above pdf). The relative timings of cache vs RAM taken from an intel pdf are as follows:

Data Source                                             Latency
L3 Cache hit, line unshared between cores               ~40 cycles
L3 Cache hit, line shared readonly between cores        ~65 cycles
L3 Cache hit, line shared readwrite between cores       ~75 cycles
L3 Cache hit, from other core                           ~100-300 cycles
Local DRAM                                              ~60ns
Remote DRAM                                             ~100ns

For a CPU clocked at 2GHz (slower than modern CPUs), each cycle takes half a nanosecond. This means that an unshared L3 hit takes 20ns, whereas going to RAM takes triple that. Lower caches (L1 and L2) can be on the order of 5-2 ns for a hit, meaning that going to RAM takes between 12 and 30 times longer. Thus, the CPU can save a lot of cycles by loading chunks instead of fetching data byte by byte if nearby data is also used (and there will be a cache hit when said new data is accessed).

Getting back on track

Getting back to why components are a thing, with an example:

// clean separation into multiple discrete components
typedef struct Transform_t {
    double px, py, pz;
} Transform;

typedef struct Velocity_t {
    double vx, vy, vz;
} Velocity;

typedef struct Mesh_t {
    void *mesh_data;
} Mesh;

// chucking it all into one mixer and pressing "blend"
typedef struct State_t {
    Transform transform;
    Velocity velocity;
    Mesh mesh;
} State;

// ...

void main(void) {
    // this approach will have worse cacheline efficiency
    State[] game_objects = { ... };
    
    for (int i = 0; i < sizeof(game_objects); i++) {
        calculate_physics(game_objects[i]);
        render_object(game_objects[i]);
    }

    // this approach will have better cacheline efficiency
    Entity[] entities = { ... };
    Transform[] transforms = { ... };
    Velocity[] velocities = { ... };
    Mesh[] meshes = { ... };

    for (int i = 0; i < sizeof(entities); i++) {
        process_physics(transforms[i], velocities[i]);
    }

    for (int i = 0; i < sizeof(entities); i++) {
        render(transforms[i], meshes[i]);
    }
}

Why is the second approach better? Don't we iterate twice compared to once? The reason that the second approach is preferrable is that it packs the cacheline much more efficiently.

Below is a visualisation of this difference, with x, y, and z standing in for Transform, Velocity, and Mesh. The dashes show a simplified cacheline.

Mixed State:
| x0 y0 z0 x1 | y1 z1 x2 y2 | z2 x3 y3 z3 | x4 y4 z4    |
| ----------- | ----------- | ----------- | ----------- |

Components:
| x0 y0 x1 y1 | x2 y2 x3 y3 | x4 y4       |
| ----------- | ----------- | ----------- |

| x0 z0 x1 z1 | x2 z2 x3 z3 | x4 z4       |
| ----------- | ----------- | ----------- |

Not only does processing the mixed state do worse on the instruction cache, it also fills the regular data cache worse. 1 struct can fit at a time, and to get the next struct to process another fetch must be made. This contrasts with the component approach, where each cacheline can fit 2 entities (Transform + Other) at once. When the number of processed entities gets large, the worse cache performance of the mixed state approach becomes more and more evident.

The fact that you have to iterate twice doesn't really matter. You end up doing the same amount of work (you still do render() and physics() for each entity). The difference is that each fetch in the 1st approach has the opcache be cold. The 2nd approach means that the opcache is cold for the first entity, and hot thereafter. This means that the 2nd approach can outperform the 1st.

ECS with Python and Tkinter, Part 1

[Dynamic Link — Static Link]
Posted: 2020-12-09
Last Edited: 2020-12-09

The Specification

Recently at Uni we was given a piece of coursework to write a game in python, except that we weren't allowed to use any non-standard features (no pip). This meant that we were not allowed to use pygame, pillow, or any other useful library that sane people (especially busy professionals with many important things to do) would like to use. Now, im no professional and i certainly don't have many important things to do, but even i see the sense in reusing existing code. Its faster, less error prone, and helps you keep sane. However, this was not allowed by the cousework specification, leaving only tkinter as the GUI library of (forced) choice.

Tkinter as a Game Engine

For fun and profit i decided to write an ECS game engine in python. Partly because i haven't ever done an ECS engine and wanted to learn about the upsides and downsides of the ECS paradigm. Partly because i hate myself. Mainly because python is inherently slow and i wanted to make the game engine as fast as possible to amortize the cost of the language. Thus i started reading up about ECS and Tkinter. A great resource for Tk is effbot.org, which i used extensively during the course of researching for the engine. It seems to be down as of 08-12-2020, which is a shame, but it is still accessible on the wayback machine.

Whilst i may be come off as negative towards Tkinter, after working with it for a bit i think its quite good. Its performance surprised me, as it was easily able to draw thousands of canvas entities and move them around without issue at 60 frames per second. It was also really easy to work with, with a simple Tk() call to initialise the entire library and then a single Canvas() and canvas.pack() call to make the game screen show up.

Overall i enjoyed the project, and ill be writing up some further explanations of the ECS engine itself. For now, the code for the game engine is available on my GitHub here.

Website maintenance

[Dynamic Link — Static Link]
Posted: 2020-10-17
Last Edited: 2020-10-17

Purging Node

Since i've remade my website, the fact that i used node has been irking me. To be honest, so has the fact that i use python; i don't see any other way of using md for my posts without serious pain and suffering in shell script. To combat the fact that node pollutes my source tree, i reworked my shell script and my python build script. They regrettably increased in lines-of-code, but are at least slightly more readable. I also now use multiprocessing to process the posts in parallel, and i have implemented a very basic css/js compressor which strips newlines, carriage returns, and tabs from the source file (it marginally shrinks the filesize). For html source files, i only remove tabs since Hoedown uses newlines to format its codeblock output. After removing node, i did some spring cleaning, removing unused icons and the old compilation shell script.

Building an OS for dummies

[Dynamic Link — Static Link]
Posted: 2020-10-10
Last Edited: 2020-10-10

I did a thing!

Today i finished a semi-long-term project. I managed to complete the CLFS guide and produce a working distro of linux for my raspberry pi. It is rather barebones at the moment and is not configured for even networking out of the box, but the fact that nothing segfaults immediately after booting makes this a great success! (My version also contains updated packages, as the ones on the guide were all outdated)

Compiling a (cross-)compiler

For the longest time i was trying to properly configure the toolchain. Turns out that building a toolchain is rather simple, as long as you stick to the correct schedule:

1)   Build barebones binutils for your chosen arch [no libc]
2)   Build barebones gcc for your chosen arch      [no libc]
3)   Compile your libc for your chosen arch
3.1) Compile your libstdcxx for your chosen arch
4)   Recompile binutils for your chosen arch
5)   Recompile gcc for your chosen arch

Step 3.1 is optional if you don't want to enable c++ support for your xgcc.

Compiling the base

After compiling the toolchain, i was able to start compiling the basic packages that are required for a working OS (that is the system libc, libstdc++, kernel, and bootloader). Since the libc and libstdcxx we compiled in the previous step are for the xbinutils and xgcc, we need to recompile (i placed the sysroot for the xgcc in the root of the clfs system, instead of the cross-toolchain root). This allows us to compile whichever packages we want in the sysroot, and be able to link to them afterwards, instead of having to compile in the toolchain sysroot and copy libraries to the clfs sysroot later.

Compiling more packages

In addition to the base packages in the system, we compile dropbox to provide a basic unix environment, and zlib to provide compression. In addition to these core packages, we also compile dropbear, netplug, and wireless tools to provide a ssh server and a few convenience net utilities. Some configuration is also done via files placed in /etc/.

Conclusion

This was a very fun/frustrating project. At multiple points my builds were failing mysteriously, and i ended up banging my head against a wall. After hopping onto irc and asking for help, these issues were eventually resolved and i managed to continue with the project. The source code for the build script can be found here.

Transition to a Static Website

[Dynamic Link — Static Link]
Posted: 2020-10-04
Last Edited: 2020-10-04

Recognising the flaws

This website used to be hosted with ASP.NET on my raspi 3B. That classes as extreme overkill for what was (and still is) a 100% static website. Aside from the fact that running Kestrel behind Nginx will be slow by itself, the raspberry pi is not what one would call a powerful platform. Granted, at 1 GHz on 4 cores it's relatively fast, and more than sufficient to use as a daily driver for basic tasks, but when running C# it just wasn't fast enough for what i would consider a decent browsing experience.

Part of the problem was that ASP.NET would precompile each page from its .cshtml.g into an internal representation the first time the page was accessed after a restart of Kestrel. This meant that i would have to sit through a 10-second compile every time i restarted my raspi and visited my website. That alone was atrocious.

In additon, the fact that i was running ASP.NET itself was part of the problem. ASP.NET is a very powerful framework, but i wasn't using it to its full potential. I was using it to render a static site. So all of the extra features were sitting idle and unused whilst Kestrel served the same static pages over and over. It was simply a waste of computing power, since i was using Nginx to proxy requests to Kestrel, which can already serve static files.

This all led me to seek a replacement for my hosting stack.

Searching for a replacement

The two things that i prioritised when searching for a replacement for ASP.NET were final size (directly correlated with page load speed) and ease of deployment (to make my life as easy as possible). Initially i had my sights set on using a static webpage generator to build my website. My reasoning was that these generators would allow me to add some dynamic elements should i want to, that they would produce compact final products, and would save me time. I picked next.js as my generator of choice.

Next.js produced a relatively small output at ~61KB all-in-all. The majority of that was the js framework that comes bundled with next.js, and a small fraction came from the actual content on my website (~15KB). Deployment was relatively easy, since next.js places your static site in the out/ folder by default. Its contents were simply copied via sftp to the webroot on my raspi, and done! I was initially happy with these results, but after re-reading motherfuckingwebsite i decided to return to the drawing board.

Finding the solution

I ended up just rolling my own 'static site generator'. It consists of a simple shell script, which calls a few npm tools (my html, css, and js minifiers), and a python script to actually convert my markdown posts into html to render. It uses misaka (a binding for Hoedown) for the markdown -> html, and also places the final product into out/ to make deployment as simple as possible. The source is available on my GitHub here. Whilst this is not as compact as it could be, such as if i did everything from within the shell script, for now i'm happy with it. It does what it is meant to do, and my website is again ~15KB.

Getting sidetracked

I also decided to compile Nginx from scratch, so i could fine-tune what modules were included in the final binary. This meant i had as little redundant code (read: bloat) as possible in the final product. It also allowed me to include the Brotli compression module, and the Headers-Mode module. These were important to help increase security and performance on my raspi. I built Nginx against libressl, just to ensure that i had a solid crypto library backing up my website. All in all it was probably a bit overkill, but i don't mind. To top it off, i also built Tor from source, to host a hidden version of my website. Tor was also built against libressl, as in this case i need the security. The sources for my Nginx and Tor builds can be found here (nginx) and here (tor). Feel free to send pull requests and improve the build scripts!

Hello World!

[Dynamic Link — Static Link]
Posted: 2020-09-10
Last Edited: 2020-09-10

Hello World!

asdasdasdasdasd

Test para 1

Test para 2

H3

asd

H4

asd

H5

asd

H6

asd

1) Foo
2) Bar
3) Baz

fn main() -> usize {
    println!("Hello, World!");

    42
}