I promised asyncio with async-await syntax, so here you have it :)
I fixed all bugs I could find (quite many to be more exact, which
is normally a good sign), and as a result I could run some programs
using asyncio with async-await syntax without any error, with the
same results cpython 3.5 would give.
I implemented all tests of cpython for the async-await syntax and
added some tests to check if everything works together well if
combined in a more complex program. It's all working now.
The GIL problem I described earlier was just an incompatibility of
asyncio with the PyPy interpreter (pyinteractive). The error does not
occur otherwise.
I have been working on the py3.5-async branch in the PyPy
repository lately and that is also where I did all my checks if
everything is working. I also merged all my work into the main py3.5
branch. This branch was really broken though, because there are some
major changes from py3k to py3.5. I fixed some things in there, and
the PyPy team managed to fix a lot of it as well. My mentor sometimes
took some time to get important things to work, so my workflow
wouldn't get interrupted. While I was working on py3.5-async, a lot
of fixes have been done on py3.5, changing the behaviour a bit and
(possibly) breaking some other things. I have not yet checked
everything of my proposal on this branch yet, but with some help I
think I managed to get everything working there as well. At least it
all looks very promising for now.
Next to that I also managed to do some fixes I could find in the
unpack feature of my proposal. There have been some special cases
which lead to errors, for example if a function looks like this: def
f(*args, **kwds), and a call like that: f(5,b=2,*[6,7]), I got a
TypeError saying “expected string, got list object”. The problem
here was that certain elements had a wrong order on the stack, so
pulling them normally would not work. I made a fix checking for that
case, there are probably better solutions but it seems to work for
now.
I will probably keep working on py3.5 for a little bit longer,
because I would like to do some more things with the work of my
proposal. It would be interesting to see a benchmark to compare the
speed of my work with cpython. Also there is a lot to be done on
py3.5 to work correctly, so I might want to check that as well.
Here is a link to all of my work:
https://bitbucket.org/pypy/pypy/commits/all?search=Raffael+Tfirst
My experience with GSOC 2016
It's really been great to work with professional people on such a
huge project. My mind was blown as I could witness how much I was
able to learn in this short time. The PyPy team was able to motivate
me to achieve the goal of my proposal and to get a lot more
interested in compiler construction than I have already been before.
I am glad I took the chance to participate in this years GSOC. Of
course my luck with having such a helpful and productive team on my
side is one of the main reasons why I enjoyed it so much.
Thursday, August 18, 2016
Thursday, August 4, 2016
Only bug fixes left!
All changes of cpython have been implemented in PyPy, so all that's left to do now is fixing some bugs. Some minor changes had to be done, because not everything of cpython has to be implemented in PyPy. For example, in cpython slots are used to check the existence of a function and then call it. The type of an object has the information of valid functions, stored as elements inside structs. Here's an example of how the __await__ function get's called in cpython:
ot = Py_TYPE(o);
if (ot->tp_as_async != NULL) {
getter = ot→tp_as_async→am_await;
}
if (getter != NULL) {
PyObject *res = (*getter)(o);
if (res != NULL) { … }
return res;
}
PyErr_Format(PyExc_TypeError,
"object %.100s can't be used in 'await' expression",
ot→tp_name);
return NULL;
This 'getter' directs to the am_await slot in typeobject.c. There, a lookup is done having '__await__' as parameter. If it exists, it gets called, an error is raised otherwise.
In PyPy all of this is way more simple. Practically I just replace the getter with the lookup for __await__. All I want to do is call the method __await__ if it exists, so that's all to it. My code now looks like this:
w_await = space.lookup(self, “__await__”)
if w_await is None: …
res = space.get_and_call_function(w_await, self)
if is not None: …
return res
I also fixed the _set_sentinel problem I wrote about in the last post. All dependency problems of other (not yet covered) Python features have been fixed. I can already execute simple programs, but as soon as it gets a little more complex and uses certain asyncio features, I get an error about the GIL (global interpreter lock):
“Fatal RPython error: a thread is trying to wait for the GIL, but the GIL was not initialized”
First I have to read some descriptions of the GIL, because I am not sure where this problem could come up in my case.
There are also many minor bugs at the moment. I already fixed one of the bigger ones which didn't allow async or await to be the name of variables. I also just learned that something I implemented does not work with RPython which I wasn't aware of. My mentor is helping me out with that.
I also have to write more tests, because they are the safest and fastest way to check for errors. There are a few things I didn't test enough, so I need to catch up on writing tests a bit.
Things are not going forward as fast as I would love it to, because I often get to completely new things which I need to study first (like the GIL in this case, or the memoryview objects from the last blog entry). But there really shouldn't be much left to do now until everything works, so I am pretty optimistic with the time I have left. If I strive to complete this task soon, I am positive my proposal will be successful.
Thursday, July 21, 2016
Progress async and await
It's been some time, but I made quite some progress in the new async feature of Python 3.5! There is still a bit to be done though and the end of this years Google Summer of Code is pretty close already. If I can do it in time will mostly be a luck factor, since I don't know how much I will still have to do in order for asyncio to work. The module is dependent of many new features from Python 3.3 up to 3.5 that have not been implemented in PyPy yet.
Does async and await work already?
Not quite. PyPy now accepts async and await though, and checks pretty much all places where it is allowed and where it is not. In other words, the parser is complete and has been tested.
The code generator is complete as well, so the right opcodes get executed in all cases.
The new bytecode instructions I need to handle are: GET_YIELD_FROM_ITER, GET_AWAITABLE, GET_AITER, GET_ANEXT, BEFORE_ASYNC and SETUP_ASYNC.
These opcodes do not work with regular generators, but with coroutine objects. Those are based on generators, however they do not imlement __iter__ and __next__ and can therefore not be iterated over. Also generators and generator based coroutines (@asyncio.coroutines in asyncio) cannot yield from coroutines. [1]
I started implementing the opcodes, but I can only finish them after asyncio is working as I need to test them constantly and can only do that with asyncio, because I am unsure what the values normally lying on the stack are. That is also valid for some functions in coroutine objects. Coroutine objects are working, however they are missing a few functions needed for the async await-syntax feature.
These two things are the rest I have to do though, everything else is tested and should therefore work.
What else has been done?
Only implementing async and await would have been too easy I guess. With it comes a problem I already mentioned, and that is the missing dependencies of Python 3.3 up to 3.5.
The module sre (offers support for regular expressions) was missing a macro named MAXGROUPS (from Python 3.3), the magic number standing for the number of constants had to be updated as well. The memoryview objects also got an update from Python 3.3 that is needed for an import. It has a function called “cast” now, which converts memoryview objects to any other predefined format.
I just finished implementing this as well, now I am at the point where it says inside threading.py:
_set_sentinel = _thread._set_sentinel
AttributeError: 'module' object has no attribute '_set_sentinel'
What to do next?
My next goal is that asyncio works and the new opcodes are implemented. Hopefully I can write about success in my next blog post, because I am sure I will need some time to test everything afterwards.
A developer tip for execution of asyncio in pyinteractive (--withmod)
(I only write that as a hint because it gets easily skipped in the PyPy doc, or at least it happened to me. The PyPy team already thought about a solution for that though :) )
Asyncio needs some modules in order to work which are by default not loaded in pyinteractive. If someone stumbles across the problem where PyPy cannot find these modules, –withmod does the trick [2]. For now, –withmod-thread and –withmod-select are required.
[1] https://www.python.org/dev/peps/pep-0492/
[2] http://doc.pypy.org/en/latest/getting-started-dev.html#pyinteractive-py-options
Update (23.07.): asyncio can be imported and works! Well that went better than expected :)
For now only the @asyncio.coroutine way of creating coroutines is working, so for example the following code would work:
import
asyncio
@asyncio
.coroutine
def
my_coroutine(seconds_to_sleep
=
3
):
print
(
'my_coroutine sleeping for: {0} seconds'
.
format
(seconds_to_sleep))
yield
from
asyncio.sleep(seconds_to_sleep)
loop
=
asyncio.get_event_loop()
loop.run_until_complete(
asyncio.gather(my_coroutine())
)
loop.close(
(from http://www.giantflyingsaucer.com/blog/?p=5557)
And to illustrate my goal of this project, here is an example of what I want to work properly:
import asyncio async def coro(name, lock): print('coro {}: waiting for lock'.format(name)) async with lock: print('coro {}: holding the lock'.format(name)) await asyncio.sleep(1) print('coro {}: releasing the lock'.format(name)) loop = asyncio.get_event_loop() lock = asyncio.Lock() coros = asyncio.gather(coro(1, lock), coro(2, lock)) try: loop.run_until_complete(coros) finally: loop.close()
(from https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-492)
The async keyword replaces the @asyncio.coroutine, and await is written instead of yield from. "await with" and "await for" are additional features, allowing to suspend execution in "enter" and "exit" methods (= asynchronous context manager) and to iterate through asynchronous iterators respectively.
Sunday, July 3, 2016
Unpacking done! Starting with Coroutines
It took a bit longer
than anticipated, but the additional unpacking generalizations are
finally completed.
Another good thing:
I am now finally ready to work full time and make up for the time I
lost, as I don't have to invest time into studying anymore.
The unpackings are
done slightly different than in cpython, because in PyPy I get
objects with a different structure for the maps. So I had to filter
them for the keys and values and do a manual type check if it really
is a dict. For the map unpack with call I had to implement the
intersection check. For that I just check if a key is already stored
in the dict that gets returned.
Now it's time to
implement coroutines with async and await syntax. There is still a
problem with the translation of PyPy, which is connected to the
missing coroutines syntax feature. I will need to get this working as
well, starting with implementing async in the parser.
Thursday, June 23, 2016
Progress summary of additional unpacking generalizations
Currently there's only so much to tell about my progress. I fixed a lot of
errors and progressed quite a bit at the unpacking task. The problems
regarding AST generator and handler are solved. There's still an
error when trying to run PyPy though. Debugging and looking for
errors is quite a tricky undertaking because there are many ways to
check whats wrong. What I already did though is checking and
reviewing the whole code for this task, and it is as good as ready as
soon as that (hopefully) last error is fixed. This is probably done
by comparing the bytecode instructions of cpython and pypy, but I
still need a bit more info.
As a short description of what I implemented: until now the order of parameters allowed in function calls was predefined. The order was: positional args, keyword args, * unpack, ** unpack. The reason for this was simplicity, because people unfamiliar with this concept might get confused otherwise. Now everything is allowed, breaking the last thought about confusions (as described in PEP 448). So what I had to do was checking parameters for unpackings manually, first going through positional and then keyword arguments. Of course some sort of priority has to stay intact, so it is defined that "positional arguments precede keyword arguments and * unpacking; * unpacking precedes ** unpacking" (PEP 448). Pretty much all changes needed for this task are implemented, there's only one more fix and a (not that important compared to the others) bytecode instruction (map unpack with call) to be done.
As soon as it works, I will write the next entry in this blog. Also, next in the line is already asyncio coroutines with async and await syntax.
Short Update (25.06.): Because of the changes I had to do in PyPy, pretty much all tests failed for some time as function calls haven't been handled properly. I managed to reduce the number of failing tests by about 1/3 by fixing a lot of errors, so there's missing just a bit for the whole thing to work again.
Update 2 (26.06.): And the errors are all fixed! As soon as all opcodes are implemented (that's gonna be soon), I will write the promised next blog entry.
As a short description of what I implemented: until now the order of parameters allowed in function calls was predefined. The order was: positional args, keyword args, * unpack, ** unpack. The reason for this was simplicity, because people unfamiliar with this concept might get confused otherwise. Now everything is allowed, breaking the last thought about confusions (as described in PEP 448). So what I had to do was checking parameters for unpackings manually, first going through positional and then keyword arguments. Of course some sort of priority has to stay intact, so it is defined that "positional arguments precede keyword arguments and * unpacking; * unpacking precedes ** unpacking" (PEP 448). Pretty much all changes needed for this task are implemented, there's only one more fix and a (not that important compared to the others) bytecode instruction (map unpack with call) to be done.
As soon as it works, I will write the next entry in this blog. Also, next in the line is already asyncio coroutines with async and await syntax.
Short Update (25.06.): Because of the changes I had to do in PyPy, pretty much all tests failed for some time as function calls haven't been handled properly. I managed to reduce the number of failing tests by about 1/3 by fixing a lot of errors, so there's missing just a bit for the whole thing to work again.
Update 2 (26.06.): And the errors are all fixed! As soon as all opcodes are implemented (that's gonna be soon), I will write the promised next blog entry.
Thursday, June 9, 2016
The first two weeks working on PyPy
I could easily summarize my first two official coding weeks by saying: some things worked out while others didn’t! But let’s get more into detail.
The first visible progress!
I am happy to say that matrix multiplication works! That was pretty much the warm-up task, as it really only is an operator which no built-in Python type implements yet. It is connected to a magic method __matmul__() though, that allows to implement it. For those who don't know, magic methods get invoked in the background while executing specific statements. In my case, calling the @ operator will invoke __matmul__() (or __rmatmul__() for the reverse invocation), @= will invoke __imatmul__(). If “0 @ x” gets called for example, PyPy will now try to invoke “0.__matmul__(x)”. If that doesn't exist for int it automatically tries to invoke “x.__rmatmul__(0)”.
As an example the following code already works:
>>> class A:
... def __init__(self,val):
... self.value = val
... def __matmul__(self, other):
... return self.value + other
>>> x = A(5)
>>> x @ 2
7
The next thing that would be really cool is a working numpy implementation for PyPy to make good use of this operator. But that is not part of my proposal, so it's probably better to focus on that later.
The extended unpacking feature is where I currently work on. Several things have to be changed in order for this to work.
Parameters of a function are divided into formal (or positional) arguments, as well as non-keyworded (*args) and keyworded (**kwargs) variable-length arguments. I have to make sure that the last two can be used any number of times in function calls. Until now, they were only allowed once per call.
PyPy processes arguments as args, keywords, starargs (*) and kwargs (**) individually. The solution is to compute all arguments only as args and keywords, checking all types inside of them manually instead of using a fix order. I'm currently in the middle of implementing that part while fixing a lot of dependencies I overlooked. I broke a lot of testcases with my changes at an important method, but that should work again soon. The task requires the bytecode instruction BUILD_SET_UNPACK to be implemented. That should already work, it still needs a few tests though.
I also need to get the AST generator working with the new syntax feature. I updated the grammar, but I still get errors when trying to create the AST, I already got clues on that though.
Some thoughts to my progress so far
The cool thing is, whenever I get stuck at errors or the giant structure of PyPy, the community helps out. That is very considerate as well as convenient, meaning the only thing I have to fight at the moment is time. Since I have to prepare for the final commitments and tests for university, I lose some time I really wanted to spend in developing for my GSoC proposal. That means that I probably fall a week behind to what I had planned to accomplish until now. But I am still really calm regarding my progress, because I also planned that something like that might happen. That is why I will spend way more time into my proposal right after I that stressful phase to compensate for the time I lost.
With all that being said, I hope that the extended unpacking feature will be ready and can be used in the coming one or two weeks.
Wednesday, May 18, 2016
GSoC 2016, let the project begin!
First of all I am
really excited to be a part of this years Google Summer of Code! From
the first moment I heard of this event, I gave it my best to get
accepted. I am happy it all worked out :)
About me
I am a 21 years old
student of the Technical University of Vienna (TU Wien) and currently
work on my BSc degree in software and information engineering. I
learned about GSoC through a presentation of PyPy, explaining the
project of a former participant. Since I currently attend compiler
construction lectures, I thought this project would greatly increase
my knowledge in developing compilers and interpreters. I was also
looking for an interesting project for my bachelor thesis, and those
are the things that pretty much lead me here.
My Proposal
The project I am
(and will be) working on is PyPy (which is an alternative implementation of the Python language [2]). You can check out a short
description of my work here.
Here comes the long
but interesting part!
So basically I work
on implementing Python 3.5 features in PyPy. I already started and
nearly completed matrix multiplication
with the @ operator. It would have been cool to implement the matmul
method just like numpy does (a Python package for N-dimensional
arrays and matrices adding matrix multiplication support), but sadly
the core of numpy is not yet functional in PyPy.
The main advantage
of @ is that you can now write:
S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
instead of:
S = dot((dot(H, beta) - r).T, dot(inv(dot(dot(H, V), H.T)), dot(H, beta) - r))
making code much more readable. [1]
I will continue with
the additional unpacking generalizations.
The cool thing of this extension is that it allows multiple
unpackings in a single function call.
Calls like
function(*args,3,4,*args,5)
are possible now.
That feature can also be used in variable assignments.
The third part of my
proposal is also the main part, and that is asyncio and coroutineswith async and await syntax.
To keep this short and understandable: coroutines are functions that
can finish (or 'return' to be precise) and still remember the state
they are in at that moment. When the coroutine is called again at a
later moment, it continues where it stopped, including the values of
the local variables and the next instruction it has to execute.
Asyncio is a Python
module that implements those coroutines. Because it is not yet
working in PyPy, I will implement this module to make coroutines
compatible with PyPy.
Python 3.5 also
allows those coroutines to be controlled with “async” and “await”
syntax. That is also a part of my proposal.
I will explain
further details as soon as it becomes necessary in order to
understand my progress.
Bonding Period
The Bonding Period
has been a great experience until now. I have to admit, it was a bit
quiet at first, because I had to finish lots of homework for my
studies. But I already got to learn a bit about the community and my
mentors before the acceptance date of the projects. So I got green
light to focus on the development of my tasks already, which is
great! That is really important for me, because it is not easy to
understand the complete structure of PyPy. Luckily there is
documentation available (here http://pypy.readthedocs.io/en/latest/)
and my mentors help me quite a bit.
My timeline has got
a little change, but with it comes a huge advantage because I will
finish my main part earlier than anticipated, allowing me to focus
more on the implementation of further features.
Until the official
start of the coding period I will polish the code of my first task,
the matrix multiplication, and read all about the parts of PyPy that
I am still a bit uneasy with.
My next Blog will
already tell you about the work of my first official coding weeks,
expect good progress!
[2] http://pypy.org/
Subscribe to:
Posts (Atom)