Bazaar Developer Guide

This document describes the Bazaar internals and the development process. It's meant for people interested in developing Bazaar, and some parts will also be useful to people developing Bazaar plugins.

If you have any questions or something seems to be incorrect, unclear or missing, please talk to us in irc://irc.freenode.net/#bzr, or write to the Bazaar mailing list. To propose a correction or addition to this document, send a merge request or new text to the mailing list.

The current version of this document is available in the file doc/developers/HACKING.txt in the source tree, or at http://doc.bazaar-vcs.org/bzr.dev/en/developer-guide/HACKING.html

See also: Bazaar Developer Documentation Catalog.

Contents

Getting Started

Exploring the Bazaar Platform

Before making changes, it's a good idea to explore the work already done by others. Perhaps the new feature or improvement you're looking for is available in another plug-in already? If you find a bug, perhaps someone else has already fixed it?

To answer these questions and more, take a moment to explore the overall Bazaar Platform. Here are some links to browse:

If nothing else, perhaps you'll find inspiration in how other developers have solved their challenges.

Planning and Discussing Changes

There is a very active community around Bazaar. Mostly we meet on IRC (#bzr on irc.freenode.net) and on the mailing list. To join the Bazaar community, see http://bazaar-vcs.org/BzrSupport.

If you are planning to make a change, it's a very good idea to mention it on the IRC channel and/or on the mailing list. There are many advantages to involving the community before you spend much time on a change. These include:

  • you get to build on the wisdom on others, saving time
  • if others can direct you to similar code, it minimises the work to be done
  • it assists everyone in coordinating direction, priorities and effort.

In summary, maximising the input from others typically minimises the total effort required to get your changes merged. The community is friendly, helpful and always keen to welcome newcomers.

Bazaar Development in a Nutshell

Looking for a 10 minute introduction to submitting a change? See http://bazaar-vcs.org/BzrGivingBack.

TODO: Merge that Wiki page into this document.

Understanding the Development Process

The development team follows many practices including:

  • a public roadmap and planning process in which anyone can participate
  • time based milestones everyone can work towards and plan around
  • extensive code review and feedback to contributors
  • complete and rigorous test coverage on any code contributed
  • automated validation that all tests still pass before code is merged into the main code branch.

The key tools we use to enable these practices are:

For further information, see http://bazaar-vcs.org/BzrDevelopment.

Preparing a Sandbox for Making Changes to Bazaar

Bazaar supports many ways of organising your work. See http://bazaar-vcs.org/SharedRepositoryLayouts for a summary of the popular alternatives.

Of course, the best choice for you will depend on numerous factors: the number of changes you may be making, the complexity of the changes, etc. As a starting suggestion though:

  • create a local copy of the main development branch (bzr.dev) by using this command:

    bzr branch http://bazaar-vcs.org/bzr/bzr.dev/ bzr.dev
    
  • keep your copy of bzr.dev prestine (by not developing in it) and keep it up to date (by using bzr pull)

  • create a new branch off your local bzr.dev copy for each issue (bug or feature) you are working on.

This approach makes it easy to go back and make any required changes after a code review. Resubmitting the change is then simple with no risk of accidentially including edits related to other issues you may be working on. After the changes for an issue are accepted and merged, the associated branch can be deleted or archived as you wish.

Navigating the Code Base

Some of the key files in this directory are:

bzr
The command you run to start Bazaar itself. This script is pretty short and just does some checks then jumps into bzrlib.
README
This file covers a brief introduction to Bazaar and lists some of its key features.
NEWS
Summary of changes in each Bazaar release that can affect users or plugin developers.
setup.py
Installs Bazaar system-wide or to your home directory. To perform development work on Bazaar it is not required to run this file - you can simply run the bzr command from the top level directory of your development copy. Note: That if you run setup.py this will create a 'build' directory in your development branch. There's nothing wrong with this but don't be confused by it. The build process puts a copy of the main code base into this build directory, along with some other files. You don't need to go in here for anything discussed in this guide.
bzrlib
Possibly the most exciting folder of all, bzrlib holds the main code base. This is where you will go to edit python files and contribute to Bazaar.
doc
Holds documentation on a whole range of things on Bazaar from the origination of ideas within the project to information on Bazaar features and use cases. Within this directory there is a subdirectory for each translation into a human language. All the documentation is in the ReStructuredText markup language.
doc/developers
Documentation specifically targetted at Bazaar and plugin developers. (Including this document.)

Automatically-generated API reference information is available at <http://starship.python.net/crew/mwh/bzrlibapi/>.

See also the Bazaar Architectural Overview.

The Code Review Process

All code changes coming in to Bazaar are reviewed by someone else. Normally changes by core contributors are reviewed by one other core developer, and changes from other people are reviewed by two core developers. Use intelligent discretion if the patch is trivial.

Good reviews do take time. They also regularly require a solid understanding of the overall code base. In practice, this means a small number of people often have a large review burden - with knowledge comes responsibility. No one like their merge requests sitting in a queue going nowhere, so reviewing sooner rather than later is strongly encouraged.

Sending patches for review

If you'd like to propose a change, please post to the bazaar@lists.canonical.com list with a bundle, patch, or link to a branch. Put [PATCH] or [MERGE] in the subject so Bundle Buggy can pick it out, and explain the change in the email message text. Remember to update the NEWS file as part of your change if it makes any changes visible to users or plugin developers. Please include a diff against mainline if you're giving a link to a branch.

You can generate a merge request like this:

bzr send -o bug-1234.patch

A .patch extension is recommended instead of .bundle as many mail clients will send the latter as a binary file.

bzr send can also send mail directly if you prefer; see the help.

Please do NOT put [PATCH] or [MERGE] in the subject line if you don't want it to be merged. If you want comments from developers rather than to be merged, you can put [RFC] in the subject line.

If this change addresses a bug, please put the bug number in the subject line too, in the form [#1] so that Bundle Buggy can recognize it.

If the change is intended for a particular release mark that in the subject too, e.g. [1.6].

Review cover letters

Please put a "cover letter" on your merge request explaining:

  • the reason why you're making this change
  • how this change achieves this purpose
  • anything else you may have fixed in passing
  • anything significant that you thought of doing, such as a more extensive fix or a different approach, but didn't or couldn't do now

A good cover letter makes reviewers' lives easier because they can decide from the letter whether they agree with the purpose and approach, and then assess whether the patch actually does what the cover letter says. Explaining any "drive-by fixes" or roads not taken may also avoid queries from the reviewer. All in all this should give faster and better reviews. Sometimes writing the cover letter helps the submitter realize something else they need to do. The size of the cover letter should be proportional to the size and complexity of the patch.

Reviewing proposed changes

Anyone is welcome to review code, and reply to the thread with their opinion or comments.

The simplest way to review a proposed change is to just read the patch on the list or in Bundle Buggy. For more complex changes it may be useful to make a new working tree or branch from trunk, and merge the proposed change into it, so you can experiment with the code or look at a wider context.

There are three main requirements for code to get in:

  • Doesn't reduce test coverage: if it adds new methods or commands, there should be tests for them. There is a good test framework and plenty of examples to crib from, but if you are having trouble working out how to test something feel free to post a draft patch and ask for help.
  • Doesn't reduce design clarity, such as by entangling objects we're trying to separate. This is mostly something the more experienced reviewers need to help check.
  • Improves bugs, features, speed, or code simplicity.

Code that goes in should not degrade any of these aspects. Patches are welcome that only cleanup the code without changing the external behaviour. The core developers take care to keep the code quality high and understandable while recognising that perfect is sometimes the enemy of good.

It is easy for reviews to make people notice other things which should be fixed but those things should not hold up the original fix being accepted. New things can easily be recorded in the Bug Tracker instead.

It's normally much easier to review several smaller patches than one large one. You might want to use bzr-loom to maintain threads of related work, or submit a preparatory patch that will make your "real" change easier.

Checklist for reviewers

  • Do you understand what the code's doing and why?
  • Will it perform reasonably for large inputs, both in memory size and run time? Are there some scenarios where performance should be measured?
  • Is it tested, and are the tests at the right level? Are there both blackbox (command-line level) and API-oriented tests?
  • If this change will be visible to end users or API users, is it appropriately documented in NEWS?
  • Does it meet the coding standards below?
  • If it changes the user-visible behaviour, does it update the help strings and user documentation?
  • If it adds a new major concept or standard practice, does it update the developer documentation?
  • (your ideas here...)

Bundle Buggy and review outcomes

Anyone can "vote" on the mailing list by expressing an opinion. Core developers can also vote using Bundle Buggy. Here are the voting codes and their explanations.

approve:Reviewer wants this submission merged.
tweak:Reviewer wants this submission merged with small changes. (No re-review required.)
abstain:Reviewer does not intend to vote on this patch.
resubmit:Please make changes and resubmit for review.
reject:Reviewer doesn't want this kind of change merged.
comment:Not really a vote. Reviewer just wants to comment, for now.

If a change gets two approvals from core reviewers, and no rejections, then it's OK to come in. Any of the core developers can bring it into the bzr.dev trunk and backport it to maintenance branches if required. The Release Manager will merge the change into the branch for a pending release, if any. As a guideline, core developers usually merge their own changes and volunteer to merge other contributions if they were the second reviewer to agree to a change.

To track the progress of proposed changes, use Bundle Buggy. See http://bundlebuggy.aaronbentley.com/help for a link to all the outstanding merge requests together with an explanation of the columns. Bundle Buggy will also mail you a link to track just your change.

Coding Style Guidelines

hasattr and getattr

hasattr should not be used because it swallows exceptions including KeyboardInterrupt. Instead, say something like

if getattr(thing, 'name', None) is None

Code layout

Please write PEP-8 compliant code.

One often-missed requirement is that the first line of docstrings should be a self-contained one-sentence summary.

We use 4 space indents for blocks, and never use tab characters. (In vim, set expandtab.)

No trailing white space is allowed.

Unix style newlines (LF) are used.

Each file must have a newline at the end of it.

Lines should be no more than 79 characters if at all possible. Lines that continue a long statement may be indented in either of two ways:

within the parenthesis or other character that opens the block, e.g.:

my_long_method(arg1,
               arg2,
               arg3)

or indented by four spaces:

my_long_method(arg1,
    arg2,
    arg3)

The first is considered clearer by some people; however it can be a bit harder to maintain (e.g. when the method name changes), and it does not work well if the relevant parenthesis is already far to the right. Avoid this:

self.legbone.kneebone.shinbone.toebone.shake_it(one,
                                                two,
                                                three)

but rather

self.legbone.kneebone.shinbone.toebone.shake_it(one,
    two,
    three)

or

self.legbone.kneebone.shinbone.toebone.shake_it(
    one, two, three)

For long lists, we like to add a trailing comma and put the closing character on the following line. This makes it easier to add new items in future:

from bzrlib.goo import (
    jam,
    jelly,
    marmalade,
    )

There should be spaces between function paramaters, but not between the keyword name and the value:

call(1, 3, cheese=quark)

In emacs:

;(defface my-invalid-face
;  '((t (:background "Red" :underline t)))
;  "Face used to highlight invalid constructs or other uglyties"
;  )

(defun my-python-mode-hook ()
 ;; setup preferred indentation style.
 (setq fill-column 79)
 (setq indent-tabs-mode nil) ; no tabs, never, I will not repeat
;  (font-lock-add-keywords 'python-mode
;                         '(("^\\s *\t" . 'my-invalid-face) ; Leading tabs
;                            ("[ \t]+$" . 'my-invalid-face)  ; Trailing spaces
;                            ("^[ \t]+$" . 'my-invalid-face)); Spaces only
;                          )
 )

(add-hook 'python-mode-hook 'my-python-mode-hook)

The lines beginning with ';' are comments. They can be activated if one want to have a strong notice of some tab/space usage violations.

Module Imports

  • Imports should be done at the top-level of the file, unless there is a strong reason to have them lazily loaded when a particular function runs. Import statements have a cost, so try to make sure they don't run inside hot functions.
  • Module names should always be given fully-qualified, i.e. bzrlib.hashcache not just hashcache.

Naming

Functions, methods or members that are "private" to bzrlib are given a leading underscore prefix. Names without a leading underscore are public not just across modules but to programmers using bzrlib as an API. As a consequence, a leading underscore is appropriate for names exposed across modules but that are not to be exposed to bzrlib API programmers.

We prefer class names to be concatenated capital words (TestCase) and variables, methods and functions to be lowercase words joined by underscores (revision_id, get_revision).

For the purposes of naming some names are treated as single compound words: "filename", "revno".

Consider naming classes as nouns and functions/methods as verbs.

Try to avoid using abbreviations in names, because there can be inconsistency if other people use the full name.

Standard Names

revision_id not rev_id or revid

Functions that transform one thing to another should be named x_to_y (not x2y as occurs in some old code.)

Destructors

Python destructors (__del__) work differently to those of other languages. In particular, bear in mind that destructors may be called immediately when the object apparently becomes unreferenced, or at some later time, or possibly never at all. Therefore we have restrictions on what can be done inside them.

  1. If you think you need to use a __del__ method ask another developer for alternatives. If you do need to use one, explain why in a comment.
  2. Never rely on a __del__ method running. If there is code that must run, do it from a finally block instead.
  3. Never import from inside a __del__ method, or you may crash the interpreter!!
  4. In some places we raise a warning from the destructor if the object has not been cleaned up or closed. This is considered OK: the warning may not catch every case but it's still useful sometimes.

Factories

In some places we have variables which point to callables that construct new instances. That is to say, they can be used a lot like class objects, but they shouldn't be named like classes:

> I think that things named FooBar should create instances of FooBar when > called. Its plain confusing for them to do otherwise. When we have > something that is going to be used as a class - that is, checked for via > isinstance or other such idioms, them I would call it foo_class, so that > it is clear that a callable is not sufficient. If it is only used as a > factory, then yes, foo_factory is what I would use.

Registries

Several places in Bazaar use (or will use) a registry, which is a mapping from names to objects or classes. The registry allows for loading in registered code only when it's needed, and keeping associated information such as a help string or description.

InterObject and multiple dispatch

The InterObject provides for two-way multiple dispatch: matching up for example a source and destination repository to find the right way to transfer data between them.

There is a subclass InterObject classes for each type of object that is dispatched this way, e.g. InterRepository. Calling .get() on this class will return an InterObject instance providing the best match for those parameters, and this instance then has methods for operations between the objects.

inter = InterRepository.get(source_repo, target_repo) inter.fetch(revision_id)

InterRepository also acts as a registry-like object for its subclasses, and they can be added through .register_optimizer. The right one to run is selected by asking each class, in reverse order of registration, whether it .is_compatible with the relevant objects.

Lazy Imports

To make startup time faster, we use the bzrlib.lazy_import module to delay importing modules until they are actually used. lazy_import uses the same syntax as regular python imports. So to import a few modules in a lazy fashion do:

from bzrlib.lazy_import import lazy_import
lazy_import(globals(), """
import os
import subprocess
import sys
import time

from bzrlib import (
   errors,
   transport,
   revision as _mod_revision,
   )
import bzrlib.transport
import bzrlib.xml5
""")

At this point, all of these exist as a ImportReplacer object, ready to be imported once a member is accessed. Also, when importing a module into the local namespace, which is likely to clash with variable names, it is recommended to prefix it as _mod_<module>. This makes it clearer that the variable is a module, and these object should be hidden anyway, since they shouldn't be imported into other namespaces.

While it is possible for lazy_import() to import members of a module when using the from module import member syntax, it is recommended to only use that syntax to load sub modules from module import submodule. This is because variables and classes can frequently be used without needing a sub-member for example:

lazy_import(globals(), """
from module import MyClass
""")

def test(x):
    return isinstance(x, MyClass)

This will incorrectly fail, because MyClass is a ImportReplacer object, rather than the real class.

It also is incorrect to assign ImportReplacer objects to other variables. Because the replacer only knows about the original name, it is unable to replace other variables. The ImportReplacer class will raise an IllegalUseOfScopeReplacer exception if it can figure out that this happened. But it requires accessing a member more than once from the new variable, so some bugs are not detected right away.

The Null revision

The null revision is the ancestor of all revisions. Its revno is 0, its revision-id is null:, and its tree is the empty tree. When referring to the null revision, please use bzrlib.revision.NULL_REVISION. Old code sometimes uses None for the null revision, but this practice is being phased out.

Object string representations

Python prints objects using their __repr__ method when they are written to logs, exception tracebacks, or the debugger. We want objects to have useful representations to help in determining what went wrong.

If you add a new class you should generally add a __repr__ method unless there is an adequate method in a parent class. There should be a test for the repr.

Representations should typically look like Python constructor syntax, but they don't need to include every value in the object and they don't need to be able to actually execute. They're to be read by humans, not machines. Don't hardcode the classname in the format, so that we get the correct value if the method is inherited by a subclass. If you're printing attributes of the object, including strings, you should normally use %r syntax (to call their repr in turn).

Try to avoid the representation becoming more than one or two lines long. (But balance this against including useful information, and simplicity of implementation.)

Because repr methods are often called when something has already gone wrong, they should be written somewhat more defensively than most code. The object may be half-initialized or in some other way in an illegal state. The repr method shouldn't raise an exception, or it may hide the (probably more useful) underlying exception.

Example:

def __repr__(self):
    return '%s(%r)' % (self.__class__.__name__,
                       self._transport)

Exception handling

A bare except statement will catch all exceptions, including ones that really should terminate the program such as MemoryError and KeyboardInterrupt. They should rarely be used unless the exception is later re-raised. Even then, think about whether catching just Exception (which excludes system errors in Python2.5 and later) would be better.

Test coverage

All code should be exercised by the test suite. See Guide to Testing Bazaar for detailed information about writing tests.

Core Topics

Evolving Interfaces

We have a commitment to 6 months API stability - any supported symbol in a release of bzr MUST NOT be altered in any way that would result in breaking existing code that uses it. That means that method names, parameter ordering, parameter names, variable and attribute names etc must not be changed without leaving a 'deprecated forwarder' behind. This even applies to modules and classes.

If you wish to change the behaviour of a supported API in an incompatible way, you need to change its name as well. For instance, if I add an optional keyword parameter to branch.commit - that's fine. On the other hand, if I add a keyword parameter to branch.commit which is a required transaction object, I should rename the API - i.e. to 'branch.commit_transaction'.

When renaming such supported API's, be sure to leave a deprecated_method (or _function or ...) behind which forwards to the new API. See the bzrlib.symbol_versioning module for decorators that take care of the details for you - such as updating the docstring, and issuing a warning when the old api is used.

For unsupported API's, it does not hurt to follow this discipline, but it's not required. Minimally though, please try to rename things so that callers will at least get an AttributeError rather than weird results.

Deprecation decorators

bzrlib.symbol_versioning provides decorators that can be attached to methods, functions, and other interfaces to indicate that they should no longer be used. For example:

@deprecated_method(deprecated_in((0, 1, 4)))
def foo(self):
     return self._new_foo()

To deprecate a static method you must call deprecated_function (not method), after the staticmethod call:

@staticmethod
@deprecated_function(deprecated_in((0, 1, 4)))
def create_repository(base, shared=False, format=None):

When you deprecate an API, you should not just delete its tests, because then we might introduce bugs in them. If the API is still present at all, it should still work. The basic approach is to use TestCase.applyDeprecated which in one step checks that the API gives the expected deprecation message, and also returns the real result from the method, so that tests can keep running.

Deprecation warnings will be suppressed for final releases, but not for development versions or release candidates, or when running bzr selftest. This gives developers information about whether their code is using deprecated functions, but avoids confusing users about things they can't fix.

Getting Input

Processing Command Lines

bzrlib has a standard framework for parsing command lines and calling processing routines associated with various commands. See builtins.py for numerous examples.

Standard Parameter Types

There are some common requirements in the library: some parameters need to be unicode safe, some need byte strings, and so on. At the moment we have only codified one specific pattern: Parameters that need to be unicode should be checked via bzrlib.osutils.safe_unicode. This will coerce the input into unicode in a consistent fashion, allowing trivial strings to be used for programmer convenience, but not performing unpredictably in the presence of different locales.

Writing Output

(The strategy described here is what we want to get to, but it's not consistently followed in the code at the moment.)

bzrlib is intended to be a generically reusable library. It shouldn't write messages to stdout or stderr, because some programs that use it might want to display that information through a GUI or some other mechanism.

We can distinguish two types of output from the library:

  1. Structured data representing the progress or result of an operation. For example, for a commit command this will be a list of the modified files and the finally committed revision number and id.

    These should be exposed either through the return code or by calls to a callback parameter.

    A special case of this is progress indicators for long-lived operations, where the caller should pass a ProgressBar object.

  2. Unstructured log/debug messages, mostly for the benefit of the developers or users trying to debug problems. This should always be sent through bzrlib.trace and Python logging, so that it can be redirected by the client.

The distinction between the two is a bit subjective, but in general if there is any chance that a library would want to see something as structured data, we should make it so.

The policy about how output is presented in the text-mode client should be only in the command-line tool.

Progress and Activity Indications

bzrlib has a way for code to display to the user that stuff is happening during a long operation. There are two particular types: activity which means that IO is happening on a Transport, and progress which means that higher-level application work is occurring. Both are drawn together by the ui_factory.

Transport objects are responsible for calling report_transport_activity when they do IO.

Progress uses a model/view pattern: application code acts on a ProgressTask object, which notifies the UI when it needs to be displayed. Progress tasks form a stack. To create a new progress task on top of the stack, call bzrlib.ui.ui_factory.nested_progress_bar(), then call update() on the returned ProgressTask. It can be updated with just a text description, with a numeric count, or with a numeric count and expected total count. If an expected total count is provided the view can show the progress moving along towards the expected total.

The user should call finish on the ProgressTask when the logical operation has finished, so it can be removed from the stack.

Progress tasks have a complex relatioship with generators: it's a very good place to use them, but because python2.4 does not allow finally blocks in generators it's hard to clean them up properly. In this case it's probably better to have the code calling the generator allocate a progress task for its use and then call finalize when it's done, which will close it if it was not already closed. The generator should also finish the progress task when it exits, because it may otherwise be a long time until the finally block runs.

Displaying help

Bazaar has online help for various topics through bzr help COMMAND or equivalently bzr command -h. We also have help on command options, and on other help topics. (See help_topics.py.)

As for python docstrings, the first paragraph should be a single-sentence synopsis of the command.

The help for options should be one or more proper sentences, starting with a capital letter and finishing with a full stop (period).

All help messages and documentation should have two spaces between sentences.

Handling Errors and Exceptions

Commands should return non-zero when they encounter circumstances that the user should really pay attention to - which includes trivial shell pipelines.

Recommended values are:

  1. OK.
  2. Conflicts in merge-like operations, or changes are present in diff-like operations.
  3. Unrepresentable diff changes (i.e. binary files that we cannot show a diff of).
  4. An error or exception has occurred.
  5. An internal error occurred (one that shows a traceback.)

Errors are handled through Python exceptions. Exceptions should be defined inside bzrlib.errors, so that we can see the whole tree at a glance.

We broadly classify errors as either being either internal or not, depending on whether internal_error is set or not. If we think it's our fault, we show a backtrace, an invitation to report the bug, and possibly other details. This is the default for errors that aren't specifically recognized as being caused by a user error. Otherwise we show a briefer message, unless -Derror was given.

Many errors originate as "environmental errors" which are raised by Python or builtin libraries -- for example IOError. These are treated as being our fault, unless they're caught in a particular tight scope where we know that they indicate a user errors. For example if the repository format is not found, the user probably gave the wrong path or URL. But if one of the files inside the repository is not found, then it's our fault -- either there's a bug in bzr, or something complicated has gone wrong in the environment that means one internal file was deleted.

Many errors are defined in bzrlib/errors.py but it's OK for new errors to be added near the place where they are used.

Exceptions are formatted for the user by conversion to a string (eventually calling their __str__ method.) As a convenience the ._fmt member can be used as a template which will be mapped to the error's instance dict.

New exception classes should be defined when callers might want to catch that exception specifically, or when it needs a substantially different format string.

  1. If it is something that a caller can recover from, a custom exception is reasonable.
  2. If it is a data consistency issue, using a builtin like ValueError/TypeError is reasonable.
  3. If it is a programmer error (using an api incorrectly) AssertionError is reasonable.
  4. Otherwise, use BzrError or InternalBzrError.

Exception strings should start with a capital letter and should not have a final fullstop. If long, they may contain newlines to break the text.

Assertions

Do not use the Python assert statement, either in tests or elsewhere. A source test checks that it is not used. It is ok to explicitly raise AssertionError.

Rationale:

  • It makes the behaviour vary depending on whether bzr is run with -O or not, therefore giving a chance for bugs that occur in one case or the other, several of which have already occurred: assertions with side effects, code which can't continue unless the assertion passes, cases where we should give the user a proper message rather than an assertion failure.
  • It's not that much shorter than an explicit if/raise.
  • It tends to lead to fuzzy thinking about whether the check is actually needed or not, and whether it's an internal error or not
  • It tends to cause look-before-you-leap patterns.
  • It's unsafe if the check is needed to protect the integrity of the user's data.
  • It tends to give poor messages since the developer can get by with no explanatory text at all.
  • We can't rely on people always running with -O in normal use, so we can't use it for tests that are actually expensive.
  • Expensive checks that help developers are better turned on from the test suite or a -D flag.
  • If used instead of self.assert*() in tests it makes them falsely pass with -O.

Documenting Changes

When you change bzrlib, please update the relevant documentation for the change you made: Changes to commands should update their help, and possibly end user tutorials; changes to the core library should be reflected in API documentation.

NEWS File

If you make a user-visible change, please add a note to the NEWS file. The description should be written to make sense to someone who's just a user of bzr, not a developer: new functions or classes shouldn't be mentioned, but new commands, changes in behaviour or fixed nontrivial bugs should be listed. See the existing entries for an idea of what should be done.

Within each release, entries in the news file should have the most user-visible changes first. So the order should be approximately:

  • changes to existing behaviour - the highest priority because the user's existing knowledge is incorrect
  • new features - should be brought to their attention
  • bug fixes - may be of interest if the bug was affecting them, and should include the bug number if any
  • major documentation changes
  • changes to internal interfaces

People who made significant contributions to each change are listed in parenthesis. This can include reporting bugs (particularly with good details or reproduction recipes), submitting patches, etc.

Commands

The docstring of a command is used by bzr help to generate help output for the command. The list 'takes_options' attribute on a command is used by bzr help to document the options for the command - the command docstring does not need to document them. Finally, the '_see_also' attribute on a command can be used to reference other related help topics.

API Documentation

Functions, methods, classes and modules should have docstrings describing how they are used.

The first line of the docstring should be a self-contained sentence.

For the special case of Command classes, this acts as the user-visible documentation shown by the help command.

The docstrings should be formatted as reStructuredText (like this document), suitable for processing using the epydoc tool into HTML documentation.

General Guidelines

Copyright

The copyright policy for bzr was recently made clear in this email (edited for grammatical correctness):

The attached patch cleans up the copyright and license statements in
the bzr source. It also adds tests to help us remember to add them
with the correct text.

We had the problem that lots of our files were "Copyright Canonical
Development Ltd" which is not a real company, and some other variations
on this theme. Also, some files were missing the GPL statements.

I want to be clear about the intent of this patch, since copyright can
be a little controversial.

1) The big motivation for this is not to shut out the community, but
just to clean up all of the invalid copyright statements.

2) It has been the general policy for bzr that we want a single
copyright holder for all of the core code. This is following the model
set by the FSF, which makes it easier to update the code to a new
license in case problems are encountered. (For example, if we want to
upgrade the project universally to GPL v3 it is much simpler if there is
a single copyright holder). It also makes it clearer if copyright is
ever debated, there is a single holder, which makes it easier to defend
in court, etc. (I think the FSF position is that if you assign them
copyright, they can defend it in court rather than you needing to, and
I'm sure Canonical would do the same).
As such, Canonical has requested copyright assignments from all of the
major contributers.

3) If someone wants to add code and not attribute it to Canonical, there
is a specific list of files that are excluded from this check. And the
test failure indicates where that is, and how to update it.

4) If anyone feels that I changed a copyright statement incorrectly, just
let me know, and I'll be happy to correct it. Whenever you have large
mechanical changes like this, it is possible to make some mistakes.

Just to reiterate, this is a community project, and it is meant to stay
that way. Core bzr code is copyright Canonical for legal reasons, and
the tests are just there to help us maintain that.

Miscellaneous Topics

Debugging

Bazaar has a few facilities to help debug problems by going into pdb, the Python debugger.

If the BZR_PDB environment variable is set then bzr will go into pdb post-mortem mode when an unhandled exception occurs.

If you send a SIGQUIT signal to bzr, which can be done by pressing Ctrl-\ on Unix, bzr will go into the debugger immediately. You can continue execution by typing c. This can be disabled if necessary by setting the environment variable BZR_SIGQUIT_PDB=0.

Debug Flags

Bazaar accepts some global options starting with -D such as -Dhpss. These set a value in bzrlib.debug.debug_flags, and typically cause more information to be written to the trace file. Most mutter calls should be guarded by a check of those flags so that we don't write out too much information if it's not needed.

Debug flags may have effects other than just emitting trace messages.

Run bzr help global-options to see them all.

These flags may also be set as a comma-separated list in the debug_flags option in e.g. ~/.bazaar/bazaar.conf. (Note that it must be in this global file, not in the branch or location configuration, because it's currently only loaded at startup time.) For instance you may want to always record hpss traces and to see full error tracebacks:

debug_flags = hpss, error

Jargon

revno
Integer identifier for a revision on the main line of a branch. Revision 0 is always the null revision; others are 1-based indexes into the branch's revision history.

Unicode and Encoding Support

This section discusses various techniques that Bazaar uses to handle characters that are outside the ASCII set.

Command.outf

When a Command object is created, it is given a member variable accessible by self.outf. This is a file-like object, which is bound to sys.stdout, and should be used to write information to the screen, rather than directly writing to sys.stdout or calling print. This file has the ability to translate Unicode objects into the correct representation, based on the console encoding. Also, the class attribute encoding_type will effect how unprintable characters will be handled. This parameter can take one of 3 values:

replace
Unprintable characters will be represented with a suitable replacement marker (typically '?'), and no exception will be raised. This is for any command which generates text for the user to review, rather than for automated processing. For example: bzr log should not fail if one of the entries has text that cannot be displayed.
strict
Attempting to print an unprintable character will cause a UnicodeError. This is for commands that are intended more as scripting support, rather than plain user review. For exampl: bzr ls is designed to be used with shell scripting. One use would be bzr ls --null --unknows | xargs -0 rm. If bzr printed a filename with a '?', the wrong file could be deleted. (At the very least, the correct file would not be deleted). An error is used to indicate that the requested action could not be performed.
exact
Do not attempt to automatically convert Unicode strings. This is used for commands that must handle conversion themselves. For example: bzr diff needs to translate Unicode paths, but should not change the exact text of the contents of the files.

bzrlib.urlutils.unescape_for_display

Because Transports work in URLs (as defined earlier), printing the raw URL to the user is usually less than optimal. Characters outside the standard set are printed as escapes, rather than the real character, and local paths would be printed as file:// urls. The function unescape_for_display attempts to unescape a URL, such that anything that cannot be printed in the current encoding stays an escaped URL, but valid characters are generated where possible.

Portability Tips

The bzrlib.osutils module has many useful helper functions, including some more portable variants of functions in the standard library.

In particular, don't use shutil.rmtree unless it's acceptable for it to fail on Windows if some files are readonly or still open elsewhere. Use bzrlib.osutils.rmtree instead.

C Extension Modules

We write some extensions in C using pyrex. We design these to work in three scenarios:

  • User with no C compiler
  • User with C compiler
  • Developers

The recommended way to install bzr is to have a C compiler so that the extensions can be built, but if no C compiler is present, the pure python versions we supply will work, though more slowly.

For developers we recommend that pyrex be installed, so that the C extensions can be changed if needed.

For the C extensions, the extension module should always match the original python one in all respects (modulo speed). This should be maintained over time.

To create an extension, add rules to setup.py for building it with pyrex, and with distutils. Now start with an empty .pyx file. At the top add "include 'yourmodule.py'". This will import the contents of foo.py into this file at build time - remember that only one module will be loaded at runtime. Now you can subclass classes, or replace functions, and only your changes need to be present in the .pyx file.

Note that pyrex does not support all 2.4 programming idioms, so some syntax changes may be required. I.e.

  • 'from foo import (bar, gam)' needs to change to not use the brackets.
  • 'import foo.bar as bar' needs to be 'import foo.bar; bar = foo.bar'

If the changes are too dramatic, consider maintaining the python code twice - once in the .pyx, and once in the .py, and no longer including the .py file.

Making Installers for OS Windows

To build a win32 installer, see the instructions on the wiki page: http://bazaar-vcs.org/BzrWin32Installer

Core Developer Tasks

Overview

What is a Core Developer?

While everyone in the Bazaar community is welcome and encouraged to propose and submit changes, a smaller team is reponsible for pulling those changes together into a cohesive whole. In addition to the general developer stuff covered above, "core" developers have responsibility for:

  • reviewing changes
  • reviewing blueprints
  • planning releases
  • managing releases (see the Releasing Bazaar)

Note

Removing barriers to community participation is a key reason for adopting distributed VCS technology. While DVCS removes many technical barriers, a small number of social barriers are often necessary instead. By documenting how the above things are done, we hope to encourage more people to participate in these activities, keeping the differences between core and non-core contributors to a minimum.

Communicating and Coordinating

While it has many advantages, one of the challenges of distributed development is keeping everyone else aware of what you're working on. There are numerous ways to do this:

  1. Assign bugs to yourself in Launchpad
  2. Mention it on the mailing list
  3. Mention it on IRC

As well as the email notifcations that occur when merge requests are sent and reviewed, you can keep others informed of where you're spending your energy by emailing the bazaar-commits list implicitly. To do this, install and configure the Email plugin. One way to do this is add these configuration settings to your central configuration file (e.g. ~/.bazaar/bazaar.conf on Linux):

[DEFAULT]
email = Joe Smith <joe.smith@internode.on.net>
smtp_server = mail.internode.on.net:25

Then add these lines for the relevant branches in locations.conf:

post_commit_to = bazaar-commits@lists.canonical.com
post_commit_mailer = smtplib

While attending a sprint, RobertCollins' Dbus plugin is useful for the same reason. See the documentation within the plugin for information on how to set it up and configure it.

Submitting Changes

An Overview of PQM

Of the many workflows supported by Bazaar, the one adopted for Bazaar development itself is known as "Decentralized with automatic gatekeeper". To repeat the explanation of this given on http://bazaar-vcs.org/Workflows:

In this workflow, each developer has their own branch or branches, plus read-only access to the mainline. A software gatekeeper (e.g. PQM) has commit rights to the main branch. When a developer wants their work merged, they request the gatekeeper to merge it. The gatekeeper does a merge, a compile, and runs the test suite. If the code passes, it is merged into the mainline.

In a nutshell, here's the overall submission process:

  1. get your work ready (including review except for trivial changes)
  2. push to a public location
  3. ask PQM to merge from that location

Note

At present, PQM always takes the changes to merge from a branch at a URL that can be read by it. For Bazaar, that means a public, typically http, URL.

As a result, the following things are needed to use PQM for submissions:

  1. A publicly available web server
  2. Your OpenPGP key registered with PQM (contact RobertCollins for this)
  3. The PQM plugin installed and configured (not strictly required but highly recommended).

Selecting a Public Branch Location

If you don't have your own web server running, branches can always be pushed to Launchpad. Here's the process for doing that:

Depending on your location throughout the world and the size of your repository though, it is often quicker to use an alternative public location to Launchpad, particularly if you can set up your own repo and push into that. By using an existing repo, push only needs to send the changes, instead of the complete repository every time. Note that it is easy to register branches in other locations with Launchpad so no benefits are lost by going this way.

Note

For Canonical staff, http://people.ubuntu.com/~<user>/ is one suggestion for public http branches. Contact your manager for information on accessing this system if required.

It should also be noted that best practice in this area is subject to change as things evolve. For example, once the Bazaar smart server on Launchpad supports server-side branching, the performance situation will be very different to what it is now (Jun 2007).

Configuring the PQM Plug-In

While not strictly required, the PQM plugin automates a few things and reduces the chance of error. Before looking at the plugin, it helps to understand a little more how PQM operates. Basically, PQM requires an email indicating what you want it to do. The email typically looks like this:

star-merge source-branch target-branch

For example:

star-merge http://bzr.arbash-meinel.com/branches/bzr/jam-integration http://bazaar-vcs.org/bzr/bzr.dev

Note that the command needs to be on one line. The subject of the email will be used for the commit message. The email also needs to be gpg signed with a key that PQM accepts.

The advantages of using the PQM plugin are:

  1. You can use the config policies to make it easy to set up public branches, so you don't have to ever type the full paths you want to merge from or into.
  2. It checks to make sure the public branch last revision matches the local last revision so you are submitting what you think you are.
  3. It uses the same public_branch and smtp sending settings as bzr-email, so if you have one set up, you have the other mostly set up.
  4. Thunderbird refuses to not wrap lines, and request lines are usually pretty long (you have 2 long URLs in there).

Here are sample configuration settings for the PQM plugin. Here are the lines in bazaar.conf:

[DEFAULT]
email = Joe Smith <joe.smith@internode.on.net>
smtp_server=mail.internode.on.net:25

And here are the lines in locations.conf (or branch.conf for dirstate-tags branches):

[/home/joe/bzr/my-integration]
push_location = sftp://joe-smith@bazaar.launchpad.net/%7Ejoe-smith/bzr/my-integration/
push_location:policy = norecurse
public_branch = http://bazaar.launchpad.net/~joe-smith/bzr/my-integration/
public_branch:policy = appendpath
pqm_email = Bazaar PQM <pqm@bazaar-vcs.org>
pqm_branch = http://bazaar-vcs.org/bzr/bzr.dev

Note that the push settings will be added by the first push on a branch. Indeed the preferred way to generate the lines above is to use push with an argument, then copy-and-paste the other lines into the relevant file.

Submitting a Change

Here is one possible recipe once the above environment is set up:

  1. pull bzr.dev => my-integration
  2. merge patch => my-integration
  3. fix up any final merge conflicts (NEWS being the big killer here).
  4. commit
  5. push
  6. pqm-submit

Note

The push step is not required if my-integration is a checkout of a public branch.

Because of defaults, you can type a single message into commit and pqm-commit will reuse that.

Tracking Change Acceptance

The web interface to PQM is https://pqm.bazaar-vcs.org/. After submitting a change, you can visit this URL to confirm it was received and placed in PQM's queue.

When PQM completes processing a change, an email is sent to you with the results.

Reviewing Blueprints

Blueprint Tracking Using Launchpad

New features typically require a fair amount of discussion, design and debate. For Bazaar, that information is often captured in a so-called "blueprint" on our Wiki. Overall tracking of blueprints and their status is done using Launchpad's relevant tracker, https://blueprints.launchpad.net/bzr/. Once a blueprint for ready for review, please announce it on the mailing list.

Alternatively, send an email begining with [RFC] with the proposal to the list. In some cases, you may wish to attach proposed code or a proposed developer document if that best communicates the idea. Debate can then proceed using the normal merge review processes.

Recording Blueprint Review Feedback

Unlike its Bug Tracker, Launchpad's Blueprint Tracker doesn't currently (Jun 2007) support a chronological list of comment responses. Review feedback can either be recorded on the Wiki hosting the blueprints or by using Launchpad's whiteboard feature.

Planning Releases

Roadmaps

As the two senior developers, Martin Pool and Robert Collins coordinate the overall Bazaar product development roadmap. Core developers provide input and review into this, particularly during sprints. It's totally expected that community members ought to be working on things that interest them the most. The roadmap is valuable though because it provides context for understanding where the product is going as a whole and why.

Using Releases and Milestones in Launchpad

TODO ... (Exact policies still under discussion)

Bug Triage

Keeping on top of bugs reported is an important part of ongoing release planning. Everyone in the community is welcome and encouraged to raise bugs, confirm bugs raised by others, and nominate a priority. Practically though, a good percentage of bug triage is often done by the core developers, partially because of their depth of product knowledge.

With respect to bug triage, core developers are encouraged to play an active role with particular attention to the following tasks:

  • keeping the number of unconfirmed bugs low
  • ensuring the priorities are generally right (everything as critical - or medium - is meaningless)
  • looking out for regressions and turning those around sooner rather than later.

Note

As well as prioritizing bugs and nominating them against a target milestone, Launchpad lets core developers offer to mentor others in fixing them.