2024-01-10 · 6 min read

Mastering CSS Grid: From Basics to Advanced Layouts

Ramblings on Staging Environments

Admit it, we all hate testing in staging. Say you’re adding a new feature to an API. In staging, the servers that your service could be shut off because some upstream dependency is broken or because it is configured differently than in production. Now you spend some time figuring out how to get the upstream dependencies on. So now your server is on and you begin testing your change. Unfortuantely, another developer wants to test an upstream service that your service depends on, and he turns it off. Your testing is interrupted and you are now back at square one. Bottom line is staging is unreliable and frustratingly wastes a massive amount of developer time. Things shouldn’t have to be this way.

Thisis the global staging appraoch i.e. all services run in one enviornment, there are other approaches where eahc service runs in its own environemtn and uses the production verison of other services. You’d have to spin up the entire enviornmetn to test a service though. probably way too slow.

Is this is a environment or people problem? Perhaps partiioning staging enviornments by team is better i.e. corss team dependnecies are just frozen, each team can change their own dependnecies.

Possible solutions people have tried

Some research on how to design a good staging enviornment

IDeas from top of my head:

Pristine vs Test separation - Separate out core services which are a clone of the services in prod s.t. they are properly maintained to mirror production. Test servers subscribe upstream to these pristine servers rather than depend on other servers within Test. This reduces cross-team dependencies during testing, but comes at the cost of cloning all key servers with downstreams from production.
Dynamic spin ups - Instead of spinning up staging servers and having fixed dependencies on upstreams, we could instead spin up the transitive closure of all staging servers needed when we watn to test a service. This could be achieved through Kubernetes container orchestration. To avoid excessive duplication of upstreams, the upstream services could be shared by the relevant services we want to test. A keen reader may wonder how one could achieve this as some changes may require changing the upstream as well. Well, we could borrow the idea in docker where each command creates a new layer. If we assume that each service is composed of a binary and a config, then the layer’s hash is determined both by the binary version and also the config’s hash. That way, if an instance with the corresponding hash already exists, that server could be shared by the server we want to test. This approach uses more compute than the previous, but is more flexible as it allows developers to change upstream servers for testing their service without affecting other developers’ testing. It however introduces more complexity as the developer will need to set out a deployment plan when they want ot change upstream dependencies. Assuming this is not the case most of the time, this approach will carry the same compute costs as approach 1.

INSERT image discussing your idea, also use more personal tone instead of writing as if this is an essay.

https://www.12factor.net/dev-prod-parity - keep developmenyt and prod as similar as possible

Limitations in staging - resources,

Why they don’t work

What we can do differently?

Other things to ramble about in future blog articles

CI/CD progress - Progbot rolling things out gradually configuration pita, cannot roll things back. Bamboo builds are slow if your build is queued, logs are too long and hard to search over in bamboo (need to download the entire log),
Managing batch jobs/processes - GS does this through procmon and you can see all the dependencies. Having to babysit this causes huge support workloads, but also on the one hand this is because the org setup e.g. strats making frequent code changes without being able to test this in an environment where they can see the downstream effects. Support workload is centralised at GS i.e. risk team supports all global strats, whereas at Optiver, systems are local/regionalised. Teams are also within the same timezone so if a support issue comes up you can easily address the team responsible e.g. abacus -> go to abacus owner’s desk instead of having to wait on outlook and email someone that just logged off in NY. The separation between strats/devs is too far at GS, they don’t seem to care about the dev enviornment, whereas at Optiver, abacus qds (strats) sit very close to devs, they also manage pricing/risks within thier own service rather than as useres that write scripts for this execution enviornment controlled by devs.

What are nice to haves

Optic - Overview of entire production or staging setup at any point in time. This system is much better than GSDiscovery where you had to know the exact 5-10 namespaces to click through to see the dyanmic config generated/used by your app, which in most cases is this massive json blob that you have to spend ages digging through. The config includes alot of application data which should be kept separate e.g. config = timeout, regions allowed to serve, not the api responses which are used to generate the config like the cycle metadata for every market.
Lightning 2 Dashboard - seeing the exact runtime state that each risk cycle was on is much more useful than having to dig through the logs yourself, you can also see exactly which parts of the execution state led to errors e.g. partial failures which led to incorrect figures, or full failures which led to the cycle failing (risk having huge latencies), Sometimes you can even see exactly what are the next steps of the cycle! and all the upstream tasks that might have caused this error e.g. partial error upstream causes downstream full error. Although this was specifically designed for the DAG execution paradigmn in GS running in Java. New scripting observability modules should encourage users to model program/errors states so that we can clearly see when something has gone wrong, though we kind of already get this with script alerting to HUD EUDF.

CSS Grid has revolutionized how I think about web layouts. After years of struggling with floats and flexbox for complex designs, Grid finally gave me the tools I needed.

The Journey

Starting with simple grid containers, I gradually learned about:

Grid tracks and grid lines
Grid areas and named lines
Responsive grid patterns
Advanced techniques like subgrid

Favorite Techniques

One of my favorite discoveries was using grid-template-areas for semantic layouts. Being able to visualize the layout directly in CSS made everything click.

The key insight was understanding that Grid is two-dimensional - you can control both rows and columns simultaneously, which opens up possibilities that were impossible with previous layout methods.

Example Layout

.grid-container {
    display: grid;
    grid-template-areas: 
        "header header header"
        "sidebar main main"
        "footer footer footer";
    grid-template-columns: 200px 1fr 1fr;
    grid-template-rows: auto 1fr auto;
    gap: 1rem;
}

This approach makes complex layouts much more maintainable and easier to understand.