{"id":9066,"date":"2022-01-14T14:01:00","date_gmt":"2022-01-14T13:01:00","guid":{"rendered":"https:\/\/monodes.com\/predaelli\/?p=9066"},"modified":"2022-01-14T14:01:02","modified_gmt":"2022-01-14T13:01:02","slug":"getting-started-with-cython-how-to-perform-1-7-billion-calculations-per-second-in-python","status":"publish","type":"post","link":"https:\/\/monodes.com\/predaelli\/2022\/01\/14\/getting-started-with-cython-how-to-perform-1-7-billion-calculations-per-second-in-python\/","title":{"rendered":"Getting started with Cython: How to perform >1.7 billion calculations per second in Python"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\" id=\"e161\"><a href=\"https:\/\/towardsdatascience.com\/getting-started-with-cython-how-to-perform-1-7-billion-calculations-per-second-in-python-b83374cfcf77\">Getting started with Cython: How to perform >1.7 billion calculations per second in Python<\/a><\/h1>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"f1bc\">Combine the ease of Python with the speed of C<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/mikehuls.medium.com\/?source=post_page-----b83374cfcf77-----------------------------------\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/miro.medium.com\/fit\/c\/56\/56\/1%2AwmCBB5Ad3YGey3fEAu5dFQ.jpeg?w=910&#038;ssl=1\" alt=\"Mike Huls\"\/><\/a><\/figure>\n\n\n\n<p><a class=\"\" href=\"https:\/\/mikehuls.medium.com\/?source=post_page-----b83374cfcf77-----------------------------------\">Mike Huls<\/a><a class=\"\" href=\"https:\/\/towardsdatascience.com\/getting-started-with-cython-how-to-perform-1-7-billion-calculations-per-second-in-python-b83374cfcf77?source=post_page-----b83374cfcf77-----------------------------------\">Dec 8, 2021\u00b711 min read<\/a><a href=\"https:\/\/medium.com\/m\/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2Fb83374cfcf77&amp;operation=register&amp;redirect=https%3A%2F%2Ftowardsdatascience.com%2Fgetting-started-with-cython-how-to-perform-1-7-billion-calculations-per-second-in-python-b83374cfcf77&amp;source=post_actions_header--------------------------bookmark_preview--------------\"><\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1400\/0*yf2O6Sn00z8UGlgX\" alt=\"\"\/><figcaption>Cython will add an afterburner to your program (image by <a href=\"https:\/\/unsplash.com\/@o5ky\" rel=\"noreferrer noopener\" target=\"_blank\">Oscar Sutton<\/a> on <a href=\"https:\/\/unsplash.com\/photos\/pBrHNFqcX-M\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a>)<\/figcaption><\/figure>\n\n\n\n<p id=\"ba14\">The main advantage of Python is that it is very developer-friendly and easy to pick up. These design choices have a major downside, though; they cause Python to execute significantly slower than some other languages. This article will show you how to eat your cake and have it too. We\u2019re going to take the bottleneck out of our Python code and use Cython to speed it up extremely. I can strongly recommend reading <a href=\"https:\/\/mikehuls.medium.com\/why-is-python-so-slow-and-how-to-speed-it-up-485b5a84154e\"><strong>this article<\/strong><\/a> before continuing to get a clear idea of the problem we\u2019re trying to solve. Cython will help us to:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>use Python-like syntax to write code that Cython then will generate C-code with. We won\u2019t have to write in C ourselves<\/li><li>compile the C-code and package it into a Python module that we can import (just like <code class=\"\" data-line=\"\">import time<\/code>)<\/li><li>improve execution speeds of our program &gt;70x<\/li><li>brag to colleagues about our superfast code<\/li><\/ul>\n\n\n\n<p id=\"743e\">I\u2019ve split this article into 4 parts: First, we install dependencies and setup in <strong>part A<\/strong>. Then, in <strong>part B<\/strong> we just focus on getting Cython code to run Python. Once this is done we optimize our Cython code in <strong>part C<\/strong> using a handy built-it tool that\u2019ll tell you exactly where the bottlenecks in your code are. Then, in <strong>part D <\/strong>we\u2019ll squeeze out the last bit of speed by multiprocessing our module, resulting in over 1.7 billion calculations per second!<br \/>Let\u2019s code!<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>By using Cython to create a Python module and multiprocessing the resulting function we\u2019ve increased execution speeds from 25 thousand e\/ms to 1.75 million e\/ms. This is a speed increase of 70x!<\/p><\/blockquote>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"0b7a\">Before we begin<\/h1>\n\n\n\n<p id=\"ee39\">Creating a Cython package has some enormous benefits but it also takes a bit more effort than your regular Python programming. Think about the following before Cythonizing every last line of your code.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Make sure your code is slow for the right reason.<br \/>We can\u2019t write code that waits faster. I recommend first glancing through <a href=\"https:\/\/mikehuls.medium.com\/why-is-python-so-slow-and-how-to-speed-it-up-485b5a84154e\"><strong>this article<\/strong><\/a> because it explains <em>why <\/em>Python is slow, how it works under the hood and how you can work your way around certain bottlenecks. This way you\u2019ll understand more how Cython is a very good solution to our slowness.<\/li><li>Is concurrency the problem?<br \/>Can your problem be solved by <a href=\"https:\/\/mikehuls.medium.com\/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e\"><strong>using threads<\/strong><\/a> (like waiting for an API)? Maybe <a href=\"https:\/\/mikehuls.medium.com\/advanced-multi-tasking-in-python-applying-and-benchmarking-threadpools-and-processpools-90452e0f7d40\"><strong>running code in parallel<\/strong><\/a> over multiple CPUs helps your speed by multiprocessing?<\/li><li>Are you a good C programmer or just interested in how Python and C work together? Check out <a href=\"https:\/\/mikehuls.medium.com\/write-your-own-c-extension-to-speed-up-python-x100-626bb9d166e7\"><strong>this article<\/strong><\/a> about how Python modules are written in C (how Python uses C-code).<\/li><li>Make sure to use a <a href=\"https:\/\/mikehuls.medium.com\/virtual-environments-for-absolute-beginners-what-is-it-and-how-to-create-one-examples-a48da8982d4b\"><strong>virtual environment<\/strong><\/a>. This is not required by Cython but it is best practice.<\/li><\/ul>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"3e82\">Part A \u2014 Installation and setup<\/h1>\n\n\n\n<p id=\"477f\">Installation is very easy.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">pip istall Cython<\/pre>\n\n\n\n<p id=\"3aeb\">To showcase how Cython can speed up CPU-heavy tasks we\u2019ll use a simple example: we\u2019re going to Cythonize a function that counts the number of prime numbers within a given range. The Python code for this looks like this:https:\/\/towardsdatascience.com\/media\/702886fc5cfe19af49a3a39352a143db<\/p>\n\n\n\n<p id=\"be08\">Note that this is hardly the most efficient way to calculate primes but that\u2019s not important for now. We just want a function that calculates a lot.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/0*Vvi4d4dzvhWjLNtf\" alt=\"\"\/><figcaption>(image by <a href=\"https:\/\/unsplash.com\/@caraventurera\" rel=\"noreferrer noopener\" target=\"_blank\">Cara Fuller<\/a> on <a href=\"https:\/\/unsplash.com\/photos\/34OTzkN-nuc\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a>)<\/figcaption><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"7d99\">Part B \u2014 Creating, packaging, and importing<\/h1>\n\n\n\n<p id=\"6169\">First, we\u2019re going to create a very simple Cython function that closely resembles the one we\u2019ve written in Python. The goal of this part is to:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>create the function<\/li><li>compile and package the C-code in a Python module<\/li><li>import and use our function.<\/li><\/ol>\n\n\n\n<p id=\"26d1\">Further down we\u2019ll optimize the function to achieve mindblowing speed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"fe5b\">1. Creating the Cython function<\/h2>\n\n\n\n<p id=\"6478\">Let\u2019s create a new file called <code class=\"\" data-line=\"\">primecounter.pyx<\/code> and:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>copy the <code class=\"\" data-line=\"\">prime_count_vanilla_range<\/code> function from the previous part into the file<\/li><li>Rename the function we\u2019ve just pasted to <code class=\"\" data-line=\"\">prime_counter_cy<\/code>.<\/li><\/ul>\n\n\n\n<p id=\"55e8\">For now, we\u2019ll just run the Python code in Cython. This is possible because Cython is a superset of Python; anything you can do in Python, you can do in Cython.<\/p>\n\n\n\n<p id=\"3189\">Just copying the function should already give is a nice speedup because the code is compiled now. Before we can check that, however, we have to get the code into Python using a module.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"51b7\">2. Compiling and packaging into a Python package<\/h2>\n\n\n\n<p id=\"cea5\">The next step is to instruct Cython to take the pyx-file, compile it into C, and stuff that code into a Python module that we can import and use in our Python code. For this, we\u2019ll need a simple <code class=\"\" data-line=\"\">setup.py<\/code> script that defines what and how we want to package. This is what it looks like:https:\/\/towardsdatascience.com\/media\/28c96eeaac8fac847503caab1f1833a7<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>You might be familiar with the <code class=\"\" data-line=\"\">&lt;em&gt;setup.py&lt;\/em&gt;<\/code> script: it is used when creating your own Python package. More on creating your own package <a href=\"https:\/\/mikehuls.medium.com\/create-and-publish-your-own-python-package-ea45bee41cdc\"><strong>here (public package)<\/strong><\/a><strong> <\/strong>and <a href=\"https:\/\/mikehuls.medium.com\/create-your-custom-python-package-that-you-can-pip-install-from-your-git-repository-f90465867893\"><strong>here (private package)<\/strong><\/a><strong>.<\/strong><\/p><\/blockquote>\n\n\n\n<p id=\"8ae0\">We simply define a list of extensions and pass it to the <code class=\"\" data-line=\"\">setup<\/code> function. In the <code class=\"\" data-line=\"\">Extension<\/code> we give our module a name. This way we can <code class=\"\" data-line=\"\">import primes<\/code> and then <code class=\"\" data-line=\"\">primes.prime_counter_cy(0, 1000)<\/code> later. First, we\u2019ll create and install the module. The code below acts like <code class=\"\" data-line=\"\">pip install primes<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">python setup.py build_ext --inplace<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"359d\">Autobuilding Cython<\/h2>\n\n\n\n<p id=\"a49e\">You can also use CythonBuilder for compiling, building and packaging your Cython code; check it out at <strong>PyPi<\/strong>.<\/p>\n\n\n\n<p id=\"73c8\"><strong>Troubleshooting<br \/><\/strong>Cython will compile the pyx into a C-file which we\u2019ll include in the module. For this compilation process, it needs a compiler. If you receive a message like <code class=\"\" data-line=\"\">Microsoft Visual C++ 14.0 or greater is required<\/code> it means you don\u2019t have a compiler. You can solve this by installing C++ build tools that you can download <a href=\"https:\/\/visualstudio.microsoft.com\/visual-cpp-build-tools\" rel=\"noreferrer noopener\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1e25\">3. Importing and using our function<\/h2>\n\n\n\n<p id=\"2a57\">Now that our module is compiled, packaged, and installed we can easily import and use it:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import primes<br \/>print(primes.prime_counter_cy(0, 1000))   <br \/># &gt;&gt; will correctly print out 168<\/pre>\n\n\n\n<p id=\"97dd\">When we time the Python function and Cython function we already see a nice speedup:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Finding primes between 0 and 1000<br \/>Total number of evaluations required = 78 thousand<br \/>[py]     2.92ms (25k per ms)<br \/>[cy]     1.58ms (42k per ms)<\/pre>\n\n\n\n<p id=\"991e\">Even though we\u2019ve merely copied the function and have not spent a single second optimizing it, it&#8217;s still faster just because the code is compiled.<\/p>\n\n\n\n<p id=\"9ff9\">Notice that the number of evaluations indicates how many numbers need to be compared for the function to find all the primes in the range. They consist of multiple calculations which make this speedup even more impressive. Check out how it\u2019s calculated <a href=\"https:\/\/gist.github.com\/mike-huls\/5347b9e2cd339934857061db39d6abc9\" rel=\"noreferrer noopener\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<p id=\"6840\">Now the fun part begins: let\u2019s start optimizing and see how much speed we can squeeze out of our machine!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/0*OfoEXOJbDzNiakFs\" alt=\"\"\/><figcaption>Our code is packaged and ready for tweaking (image by <a href=\"https:\/\/unsplash.com\/@jiaweizhao\" rel=\"noreferrer noopener\" target=\"_blank\">Jackie Zhao<\/a> on <a href=\"https:\/\/unsplash.com\/photos\/W-ypTC6R7_k\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a>)<\/figcaption><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"9a55\">Part C \u2014 Optimizing our Cython function<\/h1>\n\n\n\n<p id=\"96f3\">We\u2019re going to tweak the function in the pyx file to speed it up. Let\u2019s first look at the result so that we can go through it.https:\/\/towardsdatascience.com\/media\/6cd99919540fa88c7024fb7047c40fb4<\/p>\n\n\n\n<p id=\"7e55\">We\u2019ve added types for variables and for the function itself. A lot of these additions enable C to compile our program so that we don\u2019t get bogged down by the Python Interpreter. Check out <a href=\"https:\/\/mikehuls.medium.com\/why-is-python-so-slow-and-how-to-speed-it-up-485b5a84154e\"><strong>this article<\/strong><\/a> for more information on why the interpreter slows execution down so much how to speed it up (spoiler alert: writing a Cython module is one of them!)<\/p>\n\n\n\n<p id=\"f2d5\">When a variable in Cython is not typed we fall back to how Python handles variables; checking each of them with the interpreter and storing them in a PyObject (again, check out <a href=\"https:\/\/mikehuls.medium.com\/why-is-python-so-slow-and-how-to-speed-it-up-485b5a84154e\"><strong>the article<\/strong><\/a>). This is very slow so by typing our variables we let C handle them, which is blazingly fast.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"f1a4\">Adding types<\/h2>\n\n\n\n<p id=\"48dc\">In line 1 we define <code class=\"\" data-line=\"\">prime_counter_cy<\/code> as a function of type cpdef. This means that both Python and C can access the function. In line 1 we write <code class=\"\" data-line=\"\">int range_from<\/code>. This way the compiler knows the data type of range_from is an integer. Because we know which data to expect we avoid a lot of checks. The same goes on in line 3 where we cdef an integer named prime_count. In the two lines below we define <code class=\"\" data-line=\"\">num <\/code>and <code class=\"\" data-line=\"\">divnum<\/code>. The special thing about these two integers is that they don\u2019t have a value yet, their value only gets set in lines 7 and 8.<\/p>\n\n\n\n<p id=\"f22f\">Just adding types increased performance a lot. Check it out:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Finding primes between 0 and 50k<br \/>Total number of evaluations required = 121 million<br \/>[py]        4539ms ( 27k \/ms)<br \/>[cy]        2965ms ( 41k \/ms)<br \/>[cy+types]   265ms (458k \/ms)<\/pre>\n\n\n\n<p id=\"6e2e\">We go from a little over 4.5 seconds to a quarter of a second. This is a speed increase of 17x just adding some types.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a20f\">Optimizing further using annotations<\/h2>\n\n\n\n<p id=\"06c9\">All of our variables are defined; how can we optimize further? Remember the <code class=\"\" data-line=\"\">setup.py<\/code>? In line 8 (see above) we\u2019ve called the <code class=\"\" data-line=\"\">cythonize<\/code> function with <code class=\"\" data-line=\"\">annotate=True<\/code>. This creates an HTML file in the same directory as our pyx file.<\/p>\n\n\n\n<p id=\"16e3\">When we open that file in the browser we see our code annotated with yellow lines that indicate how close to Python the line of code is. Bright yellow means that it\u2019s a lot like Python (read: slow) and white means that it\u2019s closer to C (fast). This is what it looks like when we open our <code class=\"\" data-line=\"\">primecounter.html<\/code> in a browser:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/miro.medium.com\/max\/886\/1%2Arm01SoZvl9RcrDWjBuNytw.png?w=910&#038;ssl=1\" alt=\"\"\/><figcaption>Two of our Cython functions in the annotations.html (image by author)<\/figcaption><\/figure>\n\n\n\n<p id=\"665b\">In the image above you can see what adding types does for the code. You can also click on each line to see the resulting C-code. Let\u2019s click on line 28 to see why it\u2019s not completely white.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/miro.medium.com\/max\/581\/1%2AuFSL2zOuAyyKqoIW-r8OwQ.png?w=910&#038;ssl=1\" alt=\"\"\/><figcaption>How one line of code is translated to C (image by author)<\/figcaption><\/figure>\n\n\n\n<p id=\"fe7c\">As you can see in the image above, Python checks for a ZeroDivisionError. I don\u2019t think this is necessary because the range that calls divnum starts at 2.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"3786\">Avoiding unnecessary checks<\/h2>\n\n\n\n<p id=\"40ce\">So let\u2019s optimize further! We\u2019re adding a decorator to our function that tells the compiler to avoid the ZeroDivisionError-check. Only do this when you\u2019re very sure of your code because avoiding checks means extra risks of your program failing:https:\/\/towardsdatascience.com\/media\/e73920e4b16989928b8d7c9e230d0b3e<\/p>\n\n\n\n<p id=\"fd10\">There are many of these so-called compiler directives that you can apply. Many of them are interesting when it comes to loops. Read more on these compiler directives <a href=\"https:\/\/cython.readthedocs.io\/en\/latest\/src\/userguide\/source_files_and_compilation.html#compiler-directives\" rel=\"noreferrer noopener\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<p id=\"9556\">Checking out our annotations:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/miro.medium.com\/max\/440\/1%2AwgWMYJf33MuwBKHP3mVTgg.png?w=910&#038;ssl=1\" alt=\"\"\/><figcaption>Notice that line 44 is completely white now (image by author)<\/figcaption><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\">Finding primes between 0 and 50k<br \/>Total number of evaluations required = 121 million<br \/>[py]        4539ms ( 27k \/ms)<br \/>[cy]        2965ms ( 41k \/ms)<br \/>[cy+types]   265ms (458k \/ms)<br \/>[cy-check]   235ms (517k \/ms)<\/pre>\n\n\n\n<p id=\"70ca\">This small directive improved execution times even more!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/0*3caRi0r0uWLS7eD0\" alt=\"\"\/><figcaption>For more speed, we\u2019ll need more workers to tackle the task (image by <a href=\"https:\/\/unsplash.com\/@phillshaw\" rel=\"noreferrer noopener\" target=\"_blank\">Phil Shaw<\/a> on <a href=\"https:\/\/unsplash.com\/photos\/zAZYuch7deE\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a>)<\/figcaption><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"a29d\">Part D \u2014 Even more speed<\/h1>\n\n\n\n<p id=\"2af7\">So our function is pretty optimized now, it runs almost entirely in machine code. How can we squeeze out even more speed? If you\u2019ve read <a href=\"https:\/\/mikehuls.medium.com\/advanced-multi-tasking-in-python-applying-and-benchmarking-threadpools-and-processpools-90452e0f7d40\"><strong>this article<\/strong><\/a> you might have an idea. Our code still runs on one CPU while my laptop has 12 so why not use more?https:\/\/towardsdatascience.com\/media\/23e8453f58edd19137d925a846bff82e<\/p>\n\n\n\n<p id=\"c0a5\">The code above creates a ProcessPool (again; read <a href=\"https:\/\/mikehuls.medium.com\/advanced-multi-tasking-in-python-applying-and-benchmarking-threadpools-and-processpools-90452e0f7d40\"><strong>this article<\/strong><\/a>) that will divide all jobs over all of my available CPUs. In line 3 we use a function to divide the start and endnumber over the number of workers. If we want 0 till 100 with 10 workers, it generates 0 till 9, 10 till 19, 20 till 29, etc.<\/p>\n\n\n\n<p id=\"3f8e\">Next, we create the job by submitting it to the threadpoolexecutor. In the last record, we retrieve the result of each job and sum up the results to get the number of prime numbers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"85e2\">Is it worth the investment of setting up?<\/h2>\n\n\n\n<p id=\"46c0\">As <a href=\"https:\/\/mikehuls.medium.com\/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e\"><strong>this article<\/strong><\/a> explains applying multiple processes takes a little investment; it takes a while before the processes are created. If benchmark our new multiprocessed function with the normal one it is even slower when we examine the first 10k numbers:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Finding primes between 0 and 10k<br \/>Total number of evaluations required = 5.7 million<br \/>[py]       246ms ( 23k \/ms)<br \/>[cy]       155ms ( 37k \/ms)<br \/>[cy+types]  14ms (423k \/ms)<br \/>[cy-check]  12ms (466k \/ms)<br \/>[cy-mp]    201ms ( 29k \/ms)<\/pre>\n\n\n\n<p id=\"e59e\">Yikes, that\u2019s not fast at all. Let\u2019s see what happens when we check out the first 50.000 numbers:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">finding primes between 0 and 50k<br \/>Total number of evaluations required = 121 million<br \/>[py]       4761ms ( 25k \/ms)<br \/>[cy]       3068ms ( 40k \/ms)<br \/>[cy+types]  304ms (399k \/ms)<br \/>[cy-check]  239ms (508k \/ms)<br \/>[cy-mp]     249ms (487k \/ms)<\/pre>\n\n\n\n<p id=\"f8b8\">Notice that we are not making up for the investment of setting up processes by the increased calculation speed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"b251\">The final test<\/h2>\n\n\n\n<p id=\"0d52\">For the final test, we\u2019re going to find all primes between 0 and 200k. Notice that we already waiting multiple seconds for the first two methods. Also notice that increasing our range increases the total number of required evaluations exponentially. For this reason, we\u2019re only benchmarking the Cython methods:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">finding primes between 0 and 200k<br \/>Total number of evaluations required = 1.7 billion<br \/>[cy+types]  3949ms ( 433k \/ms)<br \/>[cy-check]  3412ms ( 502k \/ms)<br \/>[cy-mp]      978ms (1750k \/ms)<\/pre>\n\n\n\n<p id=\"a526\">And here we see our result; we are executing <strong>1.75 million evaluations per millisecond<\/strong>. Notice that the number of actual operations is <a href=\"https:\/\/gist.github.com\/mike-huls\/5347b9e2cd339934857061db39d6abc9\" rel=\"noreferrer noopener\" target=\"_blank\">even higher<\/a>!<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>By using Cython to create a Python module and multiprocessing the resulting function we\u2019ve increase execution speeds from 25k e\/ms to 1.75 million e\/ms. This is a speed increase of 70x!<\/p><\/blockquote>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/0*9hMwxj3qs8MviCMG\" alt=\"\"\/><figcaption>Our code has transformed from a slow car to a hypersonic airplane (image by <a href=\"https:\/\/unsplash.com\/@nasa\" rel=\"noreferrer noopener\" target=\"_blank\">NASA <\/a>on <a href=\"https:\/\/unsplash.com\/photos\/Tquhp9Kqkzk\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a>)<\/figcaption><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"9e0a\">Conclusion<\/h1>\n\n\n\n<p id=\"5eb5\">With this article I hoped to have shown that you can extend your Python code with a little Cython to achieve incredible speed increases, combining the ease of coding in Python with the incredible speeds of compiled C. Want to<\/p>\n\n\n\n<p id=\"c867\">I hoped everything was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my <a href=\"https:\/\/mikehuls.medium.com\/\">other articles <\/a>on all kinds of programming-related topics like these:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/mikehuls.medium.com\/why-is-python-so-slow-and-how-to-speed-it-up-485b5a84154e\">Why Python is so slow and how to speed it up<\/a><\/li><li><a href=\"https:\/\/mikehuls.medium.com\/write-your-own-c-extension-to-speed-up-python-x100-626bb9d166e7\">Write your own C extension to speed up Python x100<\/a><\/li><li><a href=\"https:\/\/mikehuls.medium.com\/advanced-multi-tasking-in-python-applying-and-benchmarking-threadpools-and-processpools-90452e0f7d40\">Advanced multi-tasking in Python: applying and benchmarking threadpools and processpools<\/a><\/li><li><a href=\"https:\/\/mikehuls.medium.com\/virtual-environments-for-absolute-beginners-what-is-it-and-how-to-create-one-examples-a48da8982d4b\">Virtual environments for absolute beginners \u2014 what is it and how to create one (+ examples<\/a>)<\/li><li><a href=\"https:\/\/mikehuls.medium.com\/create-and-publish-your-own-python-package-ea45bee41cdc\">Create and publish your own Python package<\/a><\/li><li><a href=\"https:\/\/mikehuls.medium.com\/create-your-custom-python-package-that-you-can-pip-install-from-your-git-repository-f90465867893\">Create Your Custom, private Python Package That You Can PIP Install From Your Git Repository<\/a><\/li><li><a href=\"https:\/\/mikehuls.medium.com\/create-a-fast-auto-documented-maintainable-and-easy-to-use-python-api-in-5-lines-of-code-with-4e574c00f70e\">Create a fast auto-documented, maintainable, and easy-to-use Python API in 5 lines of code with FastAPI<\/a><\/li><li><a href=\"https:\/\/mikehuls.medium.com\/dramatically-improve-your-database-inserts-with-a-simple-upgrade-6dfa672f1424\">Dramatically improve your database insert speed with a simple upgrade<\/a><\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p class=\"excerpt\">Getting started with Cython: How to perform >1.7 billion calculations per second in Python Combine the ease of Python with the speed of C Mike HulsDec 8, 2021\u00b711 min read The main advantage of Python is that it is very developer-friendly and easy to pick up. These design choices have a major downside, though; they&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"https:\/\/monodes.com\/predaelli\/2022\/01\/14\/getting-started-with-cython-how-to-perform-1-7-billion-calculations-per-second-in-python\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[113],"tags":[],"class_list":["post-9066","post","type-post","status-publish","format-standard","hentry","category-python"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p6daft-2me","jetpack-related-posts":[{"id":12976,"url":"https:\/\/monodes.com\/predaelli\/2025\/03\/15\/codon-high-performance-python-compiler\/","url_meta":{"origin":9066,"position":0},"title":"Codon: high-performance Python compiler","author":"Paolo Redaelli","date":"2025-03-15","format":false,"excerpt":"Codon: high-performance Python compiler Codon is a high-performance Python implementation that compiles to native machine code without any runtime overhead. Typical speedups over vanilla Python are on the order of 10-100x or more, on a single thread. Codon's performance is typically on par with (and sometimes better than) that of\u2026","rel":"","context":"In &quot;Python&quot;","block_context":{"text":"Python","link":"https:\/\/monodes.com\/predaelli\/category\/python\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":9532,"url":"https:\/\/monodes.com\/predaelli\/2022\/08\/09\/7-python-one-liners-that-will-blow-your-mind\/","url_meta":{"origin":9066,"position":1},"title":"7 Python One-Liners that will Blow Your Mind","author":"Paolo Redaelli","date":"2022-08-09","format":false,"excerpt":"Xiaoxu Gao wrote 7 Python One-Liners that will Blow Your Mind Less is more? Photo by Photos by Lanty from Unsplash The term one-liner comes from comedy where a joke is delivered in a single line. A good one-liner is said to be meaningful and concise. This concept also exists\u2026","rel":"","context":"In &quot;Python&quot;","block_context":{"text":"Python","link":"https:\/\/monodes.com\/predaelli\/category\/python\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11883,"url":"https:\/\/monodes.com\/predaelli\/2024\/08\/28\/python-shared-memory-in-multiprocessing\/","url_meta":{"origin":9066,"position":2},"title":"Python Shared Memory in Multiprocessing","author":"Paolo Redaelli","date":"2024-08-28","format":false,"excerpt":"Python Shared Memory in Multiprocessing np_array's size=220.0MB With SharedMemory: ... Current memory usage 0.11283MB; Peak: 0.156706MB Time elapsed: 0.99s No SharedMemory: ... Current memory usage 0.026587MB; Peak: 467.558995MB Time elapsed: 5.48s I think I shall go for shared memory! My Amiga formation years requires it! Python Shared Memory in Multiprocessing\u2026","rel":"","context":"In &quot;Python&quot;","block_context":{"text":"Python","link":"https:\/\/monodes.com\/predaelli\/category\/python\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":14807,"url":"https:\/\/monodes.com\/predaelli\/2026\/01\/23\/the-xonsh-shell-python-powered-shell\/","url_meta":{"origin":9066,"position":3},"title":"The Xonsh Shell \u2014 Python-powered shell","author":"Paolo Redaelli","date":"2026-01-23","format":false,"excerpt":"The Xonsh Shell \u2014 Python-powered shell. Python shell. Python in the shell. Shell in Python. Shell and Python. Python and shell.Xonsh (sounds like \"consh\") is a modern, full-featured and cross-platform python shell. The language is a superset of Python 3.6+ with additional shell primitives that you are used to from\u2026","rel":"","context":"In &quot;Python&quot;","block_context":{"text":"Python","link":"https:\/\/monodes.com\/predaelli\/category\/python\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/monodes.com\/predaelli\/wp-content\/uploads\/sites\/4\/2026\/01\/xonsh.webp?fit=257%2C399&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":5473,"url":"https:\/\/monodes.com\/predaelli\/2019\/04\/09\/most-popular-programming-languages-c-knocks-python-out-of-top-three-in-new-study-slashdot\/","url_meta":{"origin":9066,"position":4},"title":"Most Popular Programming Languages: C++ Knocks Python Out of Top Three in New Study &#8211; Slashdot","author":"Paolo Redaelli","date":"2019-04-09","format":"link","excerpt":"Is it time to give C++ a second canche or to revive mi interest for Eiffel? Source: Most Popular Programming Languages: C++ Knocks Python Out of Top Three in New Study - Slashdot C++ has knocked machine-learning favorite Python out of the top 3 in the TIOBE Index of popular\u2026","rel":"","context":"In &quot;Mood&quot;","block_context":{"text":"Mood","link":"https:\/\/monodes.com\/predaelli\/category\/mood\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3299,"url":"https:\/\/monodes.com\/predaelli\/2017\/08\/25\/micropython-python-for-microcontrollers\/","url_meta":{"origin":9066,"position":5},"title":"MicroPython &#8211; Python for microcontrollers","author":"Paolo Redaelli","date":"2017-08-25","format":false,"excerpt":"MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a small subset of the Python standard library and is optimised to run on microcontrollers and in constrained environments. Sorgente: MicroPython - Python for microcontrollers Well, pretty neat, as they deliver it in 256k of\u2026","rel":"","context":"In &quot;Python&quot;","block_context":{"text":"Python","link":"https:\/\/monodes.com\/predaelli\/category\/python\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/9066","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/comments?post=9066"}],"version-history":[{"count":0,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/9066\/revisions"}],"wp:attachment":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/media?parent=9066"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/categories?post=9066"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/tags?post=9066"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}