By Atul Varma
I couldn't find anything on the web that attempted to teach Python to readers who already knew JavaScript, so I thought I'd give it a shot, since a number of my friends at Mozilla don't know much about Python but know JavaScript incredibly well. The languages actually aren't that dissimilar--in fact, some of JavaScript's latest features have been borrowed directly from Python.
Many thanks to those who have commented in my blog post about this tutorial; this rendition of it contains a number of changes as a result of that feedback.
This is a good time to explain a bit about Python's design philosophy; hopefully it will give you a better idea of whether this is a language you'd like to use.
While not syntactically enforced by many languages, whitespace is semantically meaningful during the reading and writing of code. Take the following example of C-like code:
if (someVar == 1) doSomething();
The line doSomething(); is indented after the if statement to indicate that it should only be done if the statement above it is true. Given this, consider what the following code does:
if (someVar == 1) doSomething(); doSomethingElse();
It's clear from the use of whitespace that doSomethingElse(); should also only be executed if the statement it's indented under is true, but this is not the case for C-like languages. Indeed, the programmer must add additional code to tell the compiler what he or she means:
if (someVar == 1) { doSomething(); doSomethingElse(); }
Why does the programmer have to write more code to tell the computer something it should already be able to infer from the use of whitespace?
This is actually a violation of the Don't Repeat Yourself (DRY) principle popularized by Andy Hunt and Dave Thomas. Because extra work is required when moving from a single-line clause to a multiple-line clause, it's a constant source of errors in C-like languages, and stylistic rules and arguments have been spawned as a result of this mistake in language design.
Python is one of the few languages that takes the simpler and more humane approach: whitespace has a consistent semantic meaning to the humans who write code, so the computer should take this into account when it processes the code. This reduces the burden on the programmer from having to repeat their intent in multiple different ways.
So, you won't see any brackets in Python. Instead, if a statement ends with a colon, the next statement needs to be indented and begins a new block. The block ends as soon as an unindented line is encountered, like so:
if someVar == 1: doSomething() doSomethingElse() else: doOtherThing()
Python is technically like JavaScript in that semicolons are optional, but its community prescribes the opposite convention: that is, semicolons should never be used to delimit statements unless absolutely necessary. This is yet another decision that reduces the cognitive burden on the programmer; indeed, many of the language features covered below were designed with a very careful eye towards readability, reducing cognitive load, and making the process of programming as enjoyable as possible.
Python, when executed with no parameters, just presents an interactive interpreter. It's similar to the SpiderMonkey/Rhino shell and xpcshell if you're familiar with those. All following code examples in this tutorial will be displayed as though they're being executed in it, like so:
>>> 1 + 2 3 >>> # Here's a comment that does nothing. >>> print "hi!" hi! >>> print "This is a long " \ ... "statement that spans multiple lines." This is a long statement that spans multiple lines.
One built-in function in particular that helps explore things in the built-in shell is dir(), which returns a list of all the attributes attached to an object:
>>> dir("a string") ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
If there's a function you're interested in learning more about, you can look at the built-in documentation metadata associated with the object--known as the docstring--by calling the built-in help() function on the object. For instance, here's how to get help on the string object's join() method:
>>> help("a string".join) Help on built-in function join: <BLANKLINE> join(...) S.join(sequence) -> string <BLANKLINE> Return a string which is the concatenation of the strings in the sequence. The separator between elements is S. <BLANKLINE>
This makes it easy and fun to explore the language and its environs.
Python comes with a standard library that provides a great deal of functionality, from enhanced introspection to serialization, logging, XML processing, database access, testing, networking, data archiving, and more. Extensive documentation for it all is contained in the Python Library Reference.
To use the functionality of a module, you'll use Python's import statement, like so:
>>> import sha
This particular line imports the sha module, which provides access to the SHA-1 message digest algorithm. At this point, sha is an object in your namespace and can be used, for instance, to create a sha object from which to generate a hex digest:
>>> sha.sha("hello").hexdigest() 'aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d'
It's not hard to create your own modules; you can learn how to do it in the Modules section of the official Python Tutorial.
Strings in Python work a lot like they do in JavaScript, but with some added benefits.
Strings--or any sequence-like object in Python, for that matter--can be indexed by character like they can in JavaScript, with the addition that negative indexes may be used to denote items from the end of the sequence:
>>> "Hello"[-1] 'o'
Any indexable item can generally also be sliced; this is similar to String.slice in JavaScript, only built-in to the language:
>>> "hello"[2:4] # Just like "hello".slice(2,4) in JS 'll' >>> "hello"[2:] # Just like "hello".slice(2) in JS 'llo' >>> "hello"[:4] # Just like "hello".slice(0,4) in JS 'hell'
It's also easy to format strings in Python. If you're familiar with C's sprintf() function, Python's string interpolation operator, %, behaves a bit like it:
>>> "Hello %s, I need %d dollars." % ("bob", 5) 'Hello bob, I need 5 dollars.'
You can find out more in the String Formatting Operations section of the Python Library Reference.
Python's expression syntax is much like that of JavaScript, or any C-like language for that matter:
>>> 9 & 1 # Bitwise operations 1 >>> 2 << 2 # Shifting 8 >>> 5 >= 3 # Comparisons True >>> 8 + 2 * (3 + 5) # Arithmetic 24 >>> 1 == 1 # Equivalence True
Some C-like expression constructs have been substituted for more readable alternatives:
>>> not True # 'not' instead of '!' False >>> True and True # 'and' instead of '&&' True >>> True or False # 'or' instead of '||' True
But there's some elements of C-like expressions that aren't supported, because they tend to be more trouble than they're worth. For instance, some constructs that can be used in expressions in C-like languages can only be used in statements in Python:
>>> a = 5 # Assignment works in statements. >>> a += 1 # Add-assignment does too. >>> if a = 1: # But you can't assign in an expression. ... pass Traceback (most recent call last): ... SyntaxError: invalid syntax
The ++ and -- unary assignment operators aren't part of the Python language.
Unlike JavaScript, Python doesn't have a concept of undefined. Instead, things that would normally cause undefined to be returned by JavaScript simply end up raising an exception in Python:
>>> "a string".foo Traceback (most recent call last): ... AttributeError: 'str' object has no attribute 'foo'
In most cases, this is for the best, as it makes debugging easier.
Python also has an analog to JavaScript's null: it's called None.
There are some differences between Python and JavaScript when it comes to equality; when using the == operator, for instance, Python compares the value of objects rather than their locations in memory:
>>> a = [1, 2, 3] >>> b = [1, 2, 3] >>> a == b True
The above expression is valid JavaScript code, but it would evaluate to false. Python's is operator compares object identity:
>>> a is b False
Functions are defined like so:
>>> def foo(x): ... print "foo called with parameter: %s" % x
They are called as you'd expect:
>>> foo(5) foo called with parameter: 5
Unlike JavaScript, though, it's not possible to call them with fewer or more arguments than they'd expect:
>>> foo() Traceback (most recent call last): ... TypeError: foo() takes exactly 1 argument (0 given)
Though it is possible to provide defaults for arguments:
>>> def bar(x, y=1, z=5): ... return x + y + z
And it's also possible to specify arguments using keywords:
>>> bar(1, z=6) 8
You can also write documentation for functions by providing a string immediately following the function signature:
>>> def foo(): ... "Does something useless" ... pass
As mentioned earlier, this string is called the docstring; it's actually attached to the function object as its __doc__ attribute. Creating docstrings for your functions not only helps document your code, but also makes it easier for Python users to interactively explore your code, too.
It's also possible for Python functions to have arbitrary argument lists, which is similar to JavaScript's arguments array. And as in JavaScript, functions are first-class citizens and can be passed around as parameters to other functions, returned by functions, and so forth.
Python, like JavaScript, is lexically scoped when it comes to reading variables.
However, Python's scoping rules for assignment to undefined variables works opposite to JavaScript's; instead of being global by default, variables are local, and there is no analog to var or let. Rather, the global keyword is used to specify that a variable be bound to global instead of local scope:
>>> a = 1 # Define our global variable. >>> def foo(x): ... a = x + 1 # 'a' is a new local variable. >>> def bar(x): ... global a # Bind 'a' to the global scope. ... a = x + 1 >>> foo(5) >>> a 1 >>> bar(5) >>> a 6
This is for the best: as it's well-known that global variables should be used as sparingly as possible, it's better for a language interpreter to assume that all new assignments are local unless explicitly told otherwise.
Lists are a lot like JavaScript arrays:
>>> mylist = ["hello", "there"]
Iterating through them is easy:
>>> for i in mylist: ... print i hello there
Strings are just sequences of single-character strings, so they can be used similarly:
>>> for c in "boof": ... print c b o o f
Tuples are just like lists, only they're immutable and differentiated from lists by using parentheses instead of brackets:
>>> mytuple = ("hello", "there") >>> mytuple[0] = "bye" Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment
Tuples with a single item look a little weird, though:
>>> mytuple = ("hello",) # Without the comma, it'd just be a string.
It's also not possible for there to be "holes" in Python lists like there are in Javascript arrays:
>>> a = [1, 2, 3] >>> del a[1] # Deletes '2' >>> a [1, 3]
It's also possible to index and slice lists and tuples, just like you can with strings:
>>> ["hello", "there", "dude"][-1] 'dude'>>> [1, 2, 3][1:2] [2]
In fact, if the datatype is mutable like lists are, you can even assign to slices:
>>> a = [1, 2, 3, 4] >>> a[1:3] = [5] >>> a [1, 5, 4]
You've already seen examples of for, if, and if...else. Python also supports if...elif:
>>> if 1 == 2: ... pass ... elif 1 == 1: ... print "Hooray!" ... else: ... print "Boo." Hooray!
It also supports while:
>>> while False: ... print "This should never display."
However, Python does not have a do...while loop.
To loop through a range of numbers, you can use the range() built-in function, which returns a list of numbers in the range you specify:
>>> for i in range(3): ... print i 0 1 2
Python also has break and continue statements, which work as expected.
Dictionaries are a bit like Object literals in JavaScript:
>>> d = {"foo" : 1, "bar" : 2} >>> d["foo"] 1
Their properties can't be referenced using dot notation, though:
>>> d.foo Traceback (most recent call last): ... AttributeError: 'dict' object has no attribute 'foo'
Since Python doesn't have a notion of undefined, the easiest way to check whether a dictionary has a key is through the in keyword:
>>> "a" in {"a" : 1, "b" : 2} True
Dictionaries can also be used as operands for string formatting operations:
>>> d = {"name" : "bob", "money" : 5} >>> "Hello %(name)s, I need %(money)d dollars." % d 'Hello bob, I need 5 dollars.'
Keys for dictionaries can actually be any immutable type; this means that, for instance, tuples can be used as keys:
>>> a = {(1,2) : 1}
But lists can't:
>>> b = {[1,2] : 1} Traceback (most recent call last): ... TypeError: list objects are unhashable
Python dictionaries generally aren't used to create arbitrary objects like they are in Javascript; they don't have prototypes, nor do they have meta-methods. Instead, classes are used to do that sort of thing. In some ways, this is unfortunate, since the simplicity of conflating objects with dictionaries, as JavaScript and Lua do, makes understanding and using them easier. But in exchange, dictionaries do come pre-packaged with a bevy of useful methods.
Classes are pretty straightforward:
>>> class Foo(object): ... def __init__(self, a): ... self.a = a ... print "Foo created." ... def doThing(self): ... return self.a + 1
Here Foo is a subclass of object, which is the root object class that any class should ultimately descend from. The constructor is always called __init__() and is invoked like so:
>>> f = Foo(1) Foo created.
So you don't need to use a new operator or anything as is the case with JS. Calling methods and accessing attributes is straightforward too:
>>> f.a 1 >>> f.doThing() 2
An object's methods are also bound to the object itself once it's created; that is, the self parameter that's passed to them is always the same, unlike the this parameter in JavaScript which changes based on the object the function is attached to:
>>> f = Foo(5) Foo created. >>> doThing = f.doThing >>> doThing() 6
Do make sure that you always remember to include self as an explicit parameter in class methods, though; failure to do so can lead to some strange results:
>>> class Foo(object): ... def bar(x): ... return x + 1 >>> f = Foo() >>> f.bar() Traceback (most recent call last): ... TypeError: unsupported operand type(s) for +: 'Foo' and 'int' >>> f.bar(1) Traceback (most recent call last): ... TypeError: bar() takes exactly 1 argument (2 given)
As you can see, classes in Python aren't particularly elegant; it's hard to understand exactly why things work the way they do unless you understand how classes are implemented "under the hood", which is unfortunate.
Because classes in Python aren't really prototype-based, it's not easy to dynamically add or remove methods to existing objects on-the-fly--though some will probably tell you that doing such a thing isn't a good idea in the first place. In practice, all of Python's built-in types come with a well-designed retinue of methods, so there's little need for one to want to add methods to them on-the-fly, which certainly isn't the case in JavaScript.
Another advantage of Python's class mechanism is that you get inheritance for free:
>>> class A(object): ... def foo(self): ... print "In A.foo()." >>> class B(A): ... def bar(self): ... print "In B.bar()." >>> b = B() >>> b.foo() In A.foo(). >>> b.bar() In B.bar().
Overriding superclass methods is a bit odd syntactically, though:
>>> class C(B): ... def foo(self): ... super(C, self).foo() ... print "In C.foo()." >>> c = C() >>> c.foo() In A.foo(). In C.foo().
You can achieve the equivalent of JavaScript's getters and setters by creating a property in a class definition:
>>> class Foo(object): ... def _get_bar(self): ... print "getting bar!" ... return 5 ... bar = property(fget = _get_bar)
Not quite as elegant as JavaScript's get keyword in an object initializer, but it gets the job done:
>>> f = Foo() >>> f.bar getting bar! 5
Note that since we didn't define a setter, we've effectively created a read-only attribute:
>>> f.bar = 5 Traceback (most recent call last): ... AttributeError: can't set attribute
Classes can define methods with special names to do all sorts of dynamic things, from operator overloading to custom attribute access and more. You can read about them more in the Python Reference Manual's section on special method names.
They work as expected, and there's a number of built-in ones.
Python prefers the term raise to JavaScript's throw, and except to JavaScript's catch. Given this, the following code is fairly self-explanatory:
>>> try: ... raise Exception("Oof") ... except Exception, e: ... print "Caught an exception: %s" % e Caught an exception: Oof
Function closures are available in Python:
>>> def myfunc(): ... a = 1 ... def wrapped(): ... return a ... return wrapped >>> myfunc()() 1
Unlike Javascript, however, the variable bindings in the closure are "read-only":
>>> def myfunc(): ... a = 1 ... def wrapped(): ... a += 1 # Doesn't work! ... return a ... return wrapped >>> myfunc()() Traceback (most recent call last): ... UnboundLocalError: local variable 'a' referenced before assignment
This means that closures can't be used to access private variables like they can in JavaScript; instead, everything is visible, and implementation-specific variables are conventionally preceded with one or two underscores.
As mentioned at the beginning of this document, some of JavaScript's latest features have been borrowed directly from Python.
In particular, generators, iterators, generator expressions, and list comprehensions work almost identically to their JavaScript 1.7 counterparts.
As with any language, there's a few wrinkles in Python's design and history that any newcomer should be aware of. I'll try to outline the most important ones below.
Sometimes, strings are the bane of Python programming. Unlike JavaScript, in which every string is unicode, strings in Python are really more like immutable arrays of bytes. Unicode strings are an entirely different type, and unicode literals must be prepended with a u, like so:
>>> u"I am a unicode string." u'I am a unicode string.' >>> "I am a non-unicode string." 'I am a non-unicode string.'
The non-intuitiveness of this is due to historical reasons: Python is an older language than JavaScript and dates back to 1991, so the language didn't originally support unicode. When support was added, it was added in a way that didn't break backwards compatibility. This situation will be resolved in Python 3000, the first version of Python to break backwards compatibility with previous versions.
A string with a character encoding may be converted to a unicode object through the decode() method, like so:
>>> "Here is an ellipsis: \xe2\x80\xa6".decode("utf-8") u'Here is an ellipsis: \u2026'
Conversely, you can convert a unicode object into a string via the encode() method:
>>> u"Here is an ellipsis: \u2026".encode("utf-8") 'Here is an ellipsis: \xe2\x80\xa6'
An exception will be raised if there are characters that aren't supported by the encoding you specify, though:
>>> u"hello\u2026".encode("ascii") Traceback (most recent call last): ... UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 5: ordinal not in range(128)
As such, it's a good idea to optionally specify an algorithm to deal with characters that aren't supported by the encoding:
>>> u"hello\u2026".encode("ascii", "ignore") 'hello' >>> u"hello\u2026".encode("ascii", "xmlcharrefreplace") 'hello…'
There are also some bumps in the history of Python's class mechanism: until version 2.2, Python's built-in types weren't part of the class heirarchy, and there was no root object class; these kinds of classes were known as old-style classes, and are being mentioned here solely because you may run across them when reading old code. They don't support a lot of the things that new-style classes do, and should be avoided if at all possible. You can tell that an object is an instance of an old-style or new-style class by using the type built-in function:
>>> class OldStyle: # No superclass means it's old-style. ... pass >>> class NewStyle(object): ... pass >>> type(OldStyle()) <type 'instance'> >>> type(NewStyle()) <class 'NewStyle'>
A number of the class mechanisms outlined in this tutorial, such as the property() and super() built-in functions, don't work with old-style classes. Fortunately, as with the string/unicode schism, this confusion will be resolved in Python 3000, which abandons old-style classes to their well-deserved fate.
Python has a coding convention that's generally been embraced throughout the community; almost all libraries use it. It's contained in PEP 8.
One of the most useful features of Python is one of its standard library modules. The doctest module allows you to test interactive interpreter excerpts embedded either in the docstrings of your Python code or separate files to verify their correctness. It's an excellent way to turn your documentation into your unit tests, and it's also how the document you're reading right now is tested for accuracy.
If you like what you've seen of the language, I highly recommend reading David Beazley's Python Essential Reference, which features a much more thorough and concise overview of the language.
It's also a good idea to become involved with the Python community; it's very friendly and helpful. In particular, you may want to join the tutor mailing list, and a local user group if your area has one.