{"id":11473,"date":"2024-03-13T21:05:00","date_gmt":"2024-03-13T20:05:00","guid":{"rendered":"https:\/\/monodes.com\/predaelli\/?p=11473"},"modified":"2024-03-13T16:08:18","modified_gmt":"2024-03-13T15:08:18","slug":"multi-threading-is-always-the-wrong-design","status":"publish","type":"post","link":"https:\/\/monodes.com\/predaelli\/2024\/03\/13\/multi-threading-is-always-the-wrong-design\/","title":{"rendered":"Multi-threading is always the wrong design"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cWe\u2019ll just do that on a background thread\u201d<\/p>\n<cite>Source: <em><a href=\"https:\/\/unetworkingab.medium.com\/multi-threading-is-always-the-wrong-design-a227be57f107\">Multi-threading is always the wrong design<\/a><\/em><\/cite><\/blockquote>\n\n\n\n<p>Well, really?<\/p>\n\n\n\n<!--more-->\n\n\n\n<!--nextpage-->\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<h1 class=\"wp-block-heading\">Multi-threading is always the wrong design<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">\u201cWe\u2019ll just do that on a background thread\u201d<\/h2>\n\n\n\n<p><a href=\"https:\/\/unetworkingab.medium.com\/?source=post_page-----a227be57f107--------------------------------\"><\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/unetworkingab.medium.com\/?source=post_page-----a227be57f107--------------------------------\">uNetworking AB<\/a><\/p>\n\n\n\n<p>Say what you want about Node.js. It sucks, a lot. But it was made with one very accurate observation: multithreading sucks even more.<\/p>\n\n\n\n<p>A CPU with 4 cores doesn\u2019t work like you are taught from entry level computer science. There is no \u201cshared memory\u201d with \u201crandom time access\u201d. That\u2019s a lie, it\u2019s not how a CPU works. It\u2019s not even how RAM works.<\/p>\n\n\n\n<p>A CPU with 4 cores is going to have the capacity of executing 4 seconds of CPU-time per second. It does not matter how much \u201cbackground idle threading\u201d you do or don\u2019t. The CPU doesn\u2019t care. You always have 4 seconds of CPU-time per second. That\u2019s an important concept to understand.<\/p>\n\n\n\n<p>If you write a program in the design of Node.js \u2014 isolating a portion of the problem, pinning it to 1 thread on one CPU core, letting it access an isolated portion of RAM with no data sharing, then you have a design that is making as optimal use of CPU-time as possible. It is how you optimize for NUMA systems and CPU cache locality. Even a SMP system is going to perform better if treated as NUMA.<\/p>\n\n\n\n<p>A CPU does not see RAM as some \u201cshared random access memory\u201d. Most of the time you aren\u2019t even touching RAM at all. The CPU operates in an address space that is cached in SRAM in different layers of locality and size. As soon as you have multiple threads access the same memory, either you have cache coherence, threading bugs (which all companies have plenty of, even FAANG companies), or you need synchronization primitives that involve memory barriers that will cause shared cache lines to be sent back and forth as copies between the CPU cores, or caches to be committed to slow DRAM (the exact details depend on CPU).<\/p>\n\n\n\n<p>In other words, isolating the problem at a high level, tackling it with single-threaded simple code is always going to be a lot faster than having a pool of threads bounce between cores, taking turn handling a shared pool of tasks. What I am saying is that designs like those in Golang, Scala and similar Actor designs are the least optimal for a modern CPU \u2014 even if the ones writing such code think of themselves as superior beings. Hint: they aren\u2019t.<\/p>\n\n\n\n<p>Not only is multithreading detrimental for CPU-time usage efficiency, it also brings tons of complexity very few developers (really) understand. In fact, multithreading is such a leaky abstraction that you really must study your exact model of CPU to really understand how it works. So exposing threads to some high level [in terms of abstraction] developer is opening up pandoras box for seriously complex and hard to trigger bugs. These bugs do not belong in abstract business logic. You aren\u2019t supposed to write business logic that depend on the details of your exact CPU.<\/p>\n\n\n\n<p>Coming back to the idea of 4 seconds of CPU-time per second. The irony is that, since you are splitting the problem in a way that requires synchronization between cores, you are actually introducing more work to be executed in the same CPU-time budget. So you are spending more time on overhead due to synchronization, which does the opposite of what you probably hoped for \u2014 it makes your code even slower, not faster. Even if you think you don\u2019t need synchronization because you are \u201cclearly\u201d mutating a different part of DRAM \u2014 you can still have complex bugs due to false sharing where a cache line spans across the addressed memory of two (\u201cclearly isolated\u201d) threads.<\/p>\n\n\n\n<p>And since you have threads with their own stack, things like zero-copy are practically impossible between threads since, well they stand at different depths in the stack with different registers. Zero-copy, zero-allocation flows are possible and very easy in single threaded isolated code, duplicated as many times there are CPU-cores. So if you have 4 CPU cores, you duplicate your entire single threaded code 4 times. This will utilize all CPU-time efficiently, given that the bigger problem can be reasonably cut into isolated parts (which is incredibly easy if you have a significant flow of users). And if you don\u2019t have such a flow of users, well then you don\u2019t care about the performance aspect either way.<\/p>\n\n\n\n<p>I\u2019ve seen this mistake done at every possible company you can imagine \u2014 from unknown domestic ones to global FAANG ones. It\u2019s always a matter of pride and thinking that, we, we can manage. We are better. No. It always ends with a wall of text of threading issues once you enable ThreadSanitizer and it always leads to poor CPU-time usage, complex getter functions with return by dynamic copy, and it blows the complexity out of proportions.<\/p>\n\n\n\n<p><mark>The best design is the one where complexity is kept minimal, and where locality is kept maximum.<\/mark> That is where you get to write code that is easy to understand without having these bottomless holes of mindbogglingly complex CPU-dependent memory barrier behaviors. These designs are the easiest to deploy and write. You just make your load balancer cut the problem in isolated sections and spawn as many threads or processes of your entire single threaded program as needed.<\/p>\n\n\n\n<p>Again, say what you want about Node.js, but it does have this thing right. Especially in comparison with legacy languages like C, Java, C++ where threading is \u201ceverything goes\u201d and all kinds of projects do all kinds of crazy threading (and most of them are incredibly error prone). Rust is better here, but still causes the same overhead as discussed above. So while Rust is easier to get bug-free, it still becomes a bad solution.<\/p>\n\n\n\n<p>I hear so often \u2014 \u201cjust throw it on a thread and forget about it\u201d. That is simply the worst use of threading imaginable. You are adding complexity and overhead by making multiple CPU cores cause invalidation of their caches. This thinking often leads to having 30-something threads just do their own thing, sharing inputs and outputs via some shared object. It\u2019s terrible in terms of usage of CPU-time and like playing with a loaded revolver.<\/p>\n\n\n\n<p>Rant: over.<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p class=\"excerpt\">\u201cWe\u2019ll just do that on a background thread\u201d Source: Multi-threading is always the wrong design Well, really?<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"https:\/\/monodes.com\/predaelli\/2024\/03\/13\/multi-threading-is-always-the-wrong-design\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"federated","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[278],"tags":[],"class_list":["post-11473","post","type-post","status-publish","format-standard","hentry","category-tricks"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p6daft-2Z3","jetpack-related-posts":[{"id":13109,"url":"https:\/\/monodes.com\/predaelli\/2025\/04\/16\/1024cores-distributed-reader-writer-mutex\/","url_meta":{"origin":11473,"position":0},"title":"1024cores &#8211; Distributed Reader-Writer Mutex","author":"Paolo Redaelli","date":"2025-04-16","format":false,"excerpt":"1024cores - Distributed Reader-Writer Mutex This is definitively something that I would like to Eiffelize! Now, when we know that traditional reader-writer mutexes do no scale and write sharing is our foe, and that the way to go is state distribution, let's try to create a scalable distributed reader-writer mutex.\u2026","rel":"","context":"In &quot;Agenda&quot;","block_context":{"text":"Agenda","link":"https:\/\/monodes.com\/predaelli\/category\/agenda\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/monodes.com\/predaelli\/wp-content\/uploads\/sites\/4\/2025\/04\/distributed-reader-writer-mutex-1.webp?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":3482,"url":"https:\/\/monodes.com\/predaelli\/2017\/11\/04\/wipy-2-0-bluetooth-seeed-studio\/","url_meta":{"origin":11473,"position":1},"title":"WiPy 2.0 &#8211; Bluetooth &#8211; Seeed Studio","author":"Paolo Redaelli","date":"2017-11-04","format":false,"excerpt":"WiPy 2.0 Just one word: WOW WiPy 2.0, The tiny Micro Python enabled WiFi & Bluetooth IoT development platform. With a 1KM WiFi range, state of the art Espressif ESP32 chipset and dual processor, the WiPy is all about taking the Internet of Things to the next level. Features Basic\u2026","rel":"","context":"In &quot;Senza categoria&quot;","block_context":{"text":"Senza categoria","link":"https:\/\/monodes.com\/predaelli\/category\/senza-categoria\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/monodes.com\/predaelli\/wp-content\/uploads\/sites\/4\/2017\/11\/1483517401876721-1.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/monodes.com\/predaelli\/wp-content\/uploads\/sites\/4\/2017\/11\/1483517401876721-1.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/monodes.com\/predaelli\/wp-content\/uploads\/sites\/4\/2017\/11\/1483517401876721-1.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":8347,"url":"https:\/\/monodes.com\/predaelli\/2021\/04\/11\/ffmpeg-to-youtube-live\/","url_meta":{"origin":11473,"position":2},"title":"FFMPEG to Youtube Live","author":"Paolo Redaelli","date":"2021-04-11","format":false,"excerpt":"video - FFMPEG to Youtube Live - Stack Overflow After a lot of trial and error the solution below works pretty much perfectly. To make sure it runs 24\/7 wrap it inside a service of some description. This is with an up to date version of FFMPEG to include -stream_loop\u2026","rel":"","context":"In &quot;Documentations&quot;","block_context":{"text":"Documentations","link":"https:\/\/monodes.com\/predaelli\/category\/documentations\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10892,"url":"https:\/\/monodes.com\/predaelli\/2023\/10\/19\/does-amds-threadripper-7000-make-a-good-compiling-workstation\/","url_meta":{"origin":11473,"position":3},"title":"Does AMD&#8217;s Threadripper 7000 make a good compiling workstation?","author":"Paolo Redaelli","date":"2023-10-19","format":false,"excerpt":"AMD's Monstrous Threadripper 7000 CPUs Aim For Desktop PC Dominance (pcworld.com) 22 AMD's powerhouse Threadripper chips are back for desktop PCs. ... AMD announced three new Ryzen Threadripper 7000-series chips on Thursday, with up to 64 cores and 128 threads -- and the option of installing a \"Pro\"-class Threadripper 700\u2026","rel":"","context":"In &quot;Senza categoria&quot;","block_context":{"text":"Senza categoria","link":"https:\/\/monodes.com\/predaelli\/category\/senza-categoria\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3492,"url":"https:\/\/monodes.com\/predaelli\/2017\/11\/11\/raphael-js-or-kinetic\/","url_meta":{"origin":11473,"position":4},"title":"Raphael.js or Kinetic?","author":"Paolo Redaelli","date":"2017-11-11","format":false,"excerpt":"The biggest difference between RaphaelJS and KineticJS is that RaphaelJS uses SVG and KineticJS uses HTML5 Canvas for visualization. So it really depends on what kind of project you are doing. Here are some useful links which you should check out regarding SVG vs Canvas: Thoughts on when to use\u2026","rel":"","context":"In &quot;Javascript&quot;","block_context":{"text":"Javascript","link":"https:\/\/monodes.com\/predaelli\/category\/javascript\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":7813,"url":"https:\/\/monodes.com\/predaelli\/2020\/12\/08\/is-apple-m1-the-new-amiga\/","url_meta":{"origin":11473,"position":5},"title":"Is Apple M1 the new Amiga?","author":"Paolo Redaelli","date":"2020-12-08","format":false,"excerpt":"I loved Apple. Not the Apple of DRMs and its golden prison where you can't really control\u00a0your hardware; I loved the Apple that loved Software Libero. Then it mutated into a company that crushes people freedoms while smiling. I like to have control of my hardware. I don't want to\u2026","rel":"","context":"In &quot;Amiga&quot;","block_context":{"text":"Amiga","link":"https:\/\/monodes.com\/predaelli\/category\/amiga\/"},"img":{"alt_text":"Erik Engheim","src":"https:\/\/i0.wp.com\/monodes.com\/predaelli\/wp-content\/uploads\/sites\/4\/2020\/12\/01Y9ylHZ8csOxgZr7.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/11473","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/comments?post=11473"}],"version-history":[{"count":0,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/11473\/revisions"}],"wp:attachment":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/media?parent=11473"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/categories?post=11473"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/tags?post=11473"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}