Bazaar Developer Guide

This document describes the Bazaar internals and the development process. It’s meant for people interested in developing Bazaar, and some parts will also be useful to people developing Bazaar plugins.

If you have any questions or something seems to be incorrect, unclear or missing, please talk to us in irc://irc.freenode.net/#bzr, or write to the Bazaar mailing list. To propose a correction or addition to this document, send a merge request or new text to the mailing list.

The latest developer documentation can be found online at http://doc.bazaar.canonical.com/developers/.

Getting Started

Exploring the Bazaar Platform

Before making changes, it’s a good idea to explore the work already done by others. Perhaps the new feature or improvement you’re looking for is available in another plug-in already? If you find a bug, perhaps someone else has already fixed it?

To answer these questions and more, take a moment to explore the overall Bazaar Platform. Here are some links to browse:

If nothing else, perhaps you’ll find inspiration in how other developers have solved their challenges.

Finding Something To Do

Ad-hoc performance work can also be done. One useful tool is the ‘evil’ debug flag. For instance running bzr -Devil commit -m "test" will log a backtrace to the bzr log file for every method call which triggers a slow or non-scalable part of the bzr library. So checking that a given command with -Devil has no backtraces logged to the log file is a good way to find problem function calls that might be nested deep in the code base.

Planning and Discussing Changes

There is a very active community around Bazaar. Mostly we meet on IRC (#bzr on irc.freenode.net) and on the mailing list. To join the Bazaar community, see http://wiki.bazaar.canonical.com/BzrSupport.

If you are planning to make a change, it’s a very good idea to mention it on the IRC channel and/or on the mailing list. There are many advantages to involving the community before you spend much time on a change. These include:

  • you get to build on the wisdom of others, saving time
  • if others can direct you to similar code, it minimises the work to be done
  • it assists everyone in coordinating direction, priorities and effort.

In summary, maximising the input from others typically minimises the total effort required to get your changes merged. The community is friendly, helpful and always keen to welcome newcomers.

Bazaar Development in a Nutshell

One of the fun things about working on a version control system like Bazaar is that the users have a high level of proficiency in contributing back into the tool. Consider the following very brief introduction to contributing back to Bazaar. More detailed instructions are in the following sections.

Making the change

First, get a local copy of the development mainline (See Why make a local copy of bzr.dev?.)

$ bzr init-repo ~/bzr
$ cd ~/bzr
$ bzr branch lp:bzr bzr.dev

Now make your own branch:

$ bzr branch bzr.dev 123456-my-bugfix

This will give you a branch called “123456-my-bugfix” that you can work on and commit in. Here, you can study the code, make a fix or a new feature. Feel free to commit early and often (after all, it’s your branch!).

Documentation improvements are an easy place to get started giving back to the Bazaar project. The documentation is in the doc/ subdirectory of the Bazaar source tree.

When you are done, make sure that you commit your last set of changes as well! Once you are happy with your changes, ask for them to be merged, as described below.

Making a Merge Proposal

The Bazaar developers use Launchpad to further enable a truly distributed style of development. Anyone can propose a branch for merging into the Bazaar trunk. To start this process, you need to push your branch to Launchpad. To do this, you will need a Launchpad account and user name, e.g. your_lp_username. You can push your branch to Launchpad directly from Bazaar:

$ bzr push lp:~your_lp_username/bzr/meaningful_name_here

After you have pushed your branch, you will need to propose it for merging to the Bazaar trunk. Go to <https://launchpad.net/your_lp_username/bzr/meaningful_name_here> and choose “Propose for merging into another branch”. Select “~bzr/bzr/trunk” to hand your changes off to the Bazaar developers for review and merging.

Alternatively, after pushing you can use the lp-propose command to create the merge proposal.

Using a meaningful name for your branch will help you and the reviewer(s) better track the submission. Use a very succint description of your submission and prefix it with bug number if needed (lp:~mbp/bzr/484558-merge-directory for example). Alternatively, you can suffix with the bug number (lp:~jameinel/bzr/export-file-511987).

Review cover letters

Please put a “cover letter” on your merge request explaining:

  • the reason why you’re making this change
  • how this change achieves this purpose
  • anything else you may have fixed in passing
  • anything significant that you thought of doing, such as a more extensive fix or a different approach, but didn’t or couldn’t do now

A good cover letter makes reviewers’ lives easier because they can decide from the letter whether they agree with the purpose and approach, and then assess whether the patch actually does what the cover letter says. Explaining any “drive-by fixes” or roads not taken may also avoid queries from the reviewer. All in all this should give faster and better reviews. Sometimes writing the cover letter helps the submitter realize something else they need to do. The size of the cover letter should be proportional to the size and complexity of the patch.

Why make a local copy of bzr.dev?

Making a local mirror of bzr.dev is not strictly necessary, but it means

  • You can use that copy of bzr.dev as your main bzr executable, and keep it up-to-date using bzr pull.

  • Certain operations are faster, and can be done when offline. For example:

    • bzr bundle
    • bzr diff -r ancestor:...
    • bzr merge
  • When it’s time to create your next branch, it’s more convenient. When you have further contributions to make, you should do them in their own branch:

    $ cd ~/bzr
    $ bzr branch bzr.dev additional_fixes
    $ cd additional_fixes # hack, hack, hack

Understanding the Development Process

The development team follows many practices including:

  • a public roadmap and planning process in which anyone can participate
  • time based milestones everyone can work towards and plan around
  • extensive code review and feedback to contributors
  • complete and rigorous test coverage on any code contributed
  • automated validation that all tests still pass before code is merged into the main code branch.

The key tools we use to enable these practices are:

For further information, see <http://wiki.bazaar.canonical.com/BzrDevelopment>.

Preparing a Sandbox for Making Changes to Bazaar

Bazaar supports many ways of organising your work. See http://wiki.bazaar.canonical.com/SharedRepositoryLayouts for a summary of the popular alternatives.

Of course, the best choice for you will depend on numerous factors: the number of changes you may be making, the complexity of the changes, etc. As a starting suggestion though:

  • create a local copy of the main development branch (bzr.dev) by using this command:

    bzr branch lp:bzr bzr.dev
  • keep your copy of bzr.dev pristine (by not developing in it) and keep it up to date (by using bzr pull)

  • create a new branch off your local bzr.dev copy for each issue (bug or feature) you are working on.

This approach makes it easy to go back and make any required changes after a code review. Resubmitting the change is then simple with no risk of accidentally including edits related to other issues you may be working on. After the changes for an issue are accepted and merged, the associated branch can be deleted or archived as you wish.

Core Topics

Evolving Interfaces

We don’t change APIs in stable branches: any supported symbol in a stable release of bzr must not be altered in any way that would result in breaking existing code that uses it. That means that method names, parameter ordering, parameter names, variable and attribute names etc must not be changed without leaving a ‘deprecated forwarder’ behind. This even applies to modules and classes.

If you wish to change the behaviour of a supported API in an incompatible way, you need to change its name as well. For instance, if I add an optional keyword parameter to branch.commit - that’s fine. On the other hand, if I add a keyword parameter to branch.commit which is a required transaction object, I should rename the API - i.e. to ‘branch.commit_transaction’.

(Actually, that may break code that provides a new implementation of commit and doesn’t expect to receive the parameter.)

When renaming such supported API’s, be sure to leave a deprecated_method (or _function or ...) behind which forwards to the new API. See the bzrlib.symbol_versioning module for decorators that take care of the details for you - such as updating the docstring, and issuing a warning when the old API is used.

For unsupported API’s, it does not hurt to follow this discipline, but it’s not required. Minimally though, please try to rename things so that callers will at least get an AttributeError rather than weird results.

Deprecation decorators

bzrlib.symbol_versioning provides decorators that can be attached to methods, functions, and other interfaces to indicate that they should no longer be used. For example:

@deprecated_method(deprecated_in((0, 1, 4)))
def foo(self):
     return self._new_foo()

To deprecate a static method you must call deprecated_function (not method), after the staticmethod call:

@staticmethod
@deprecated_function(deprecated_in((0, 1, 4)))
def create_repository(base, shared=False, format=None):

When you deprecate an API, you should not just delete its tests, because then we might introduce bugs in them. If the API is still present at all, it should still work. The basic approach is to use TestCase.applyDeprecated which in one step checks that the API gives the expected deprecation message, and also returns the real result from the method, so that tests can keep running.

Deprecation warnings will be suppressed for final releases, but not for development versions or release candidates, or when running bzr selftest. This gives developers information about whether their code is using deprecated functions, but avoids confusing users about things they can’t fix.

Getting Input

Processing Command Lines

bzrlib has a standard framework for parsing command lines and calling processing routines associated with various commands. See builtins.py for numerous examples.

Standard Parameter Types

There are some common requirements in the library: some parameters need to be unicode safe, some need byte strings, and so on. At the moment we have only codified one specific pattern: Parameters that need to be unicode should be checked via bzrlib.osutils.safe_unicode. This will coerce the input into unicode in a consistent fashion, allowing trivial strings to be used for programmer convenience, but not performing unpredictably in the presence of different locales.

Writing Output

(The strategy described here is what we want to get to, but it’s not consistently followed in the code at the moment.)

bzrlib is intended to be a generically reusable library. It shouldn’t write messages to stdout or stderr, because some programs that use it might want to display that information through a GUI or some other mechanism.

We can distinguish two types of output from the library:

  1. Structured data representing the progress or result of an operation. For example, for a commit command this will be a list of the modified files and the finally committed revision number and id.

    These should be exposed either through the return code or by calls to a callback parameter.

    A special case of this is progress indicators for long-lived operations, where the caller should pass a ProgressBar object.

  2. Unstructured log/debug messages, mostly for the benefit of the developers or users trying to debug problems. This should always be sent through bzrlib.trace and Python logging, so that it can be redirected by the client.

The distinction between the two is a bit subjective, but in general if there is any chance that a library would want to see something as structured data, we should make it so.

The policy about how output is presented in the text-mode client should be only in the command-line tool.

Progress and Activity Indications

bzrlib has a way for code to display to the user that stuff is happening during a long operation. There are two particular types: activity which means that IO is happening on a Transport, and progress which means that higher-level application work is occurring. Both are drawn together by the ui_factory.

Transport objects are responsible for calling report_transport_activity when they do IO.

Progress uses a model/view pattern: application code acts on a ProgressTask object, which notifies the UI when it needs to be displayed. Progress tasks form a stack. To create a new progress task on top of the stack, call bzrlib.ui.ui_factory.nested_progress_bar(), then call update() on the returned ProgressTask. It can be updated with just a text description, with a numeric count, or with a numeric count and expected total count. If an expected total count is provided the view can show the progress moving along towards the expected total.

The user should call finish on the ProgressTask when the logical operation has finished, so it can be removed from the stack.

Progress tasks have a complex relationship with generators: it’s a very good place to use them, but because python2.4 does not allow finally blocks in generators it’s hard to clean them up properly. In this case it’s probably better to have the code calling the generator allocate a progress task for its use and then call finalize when it’s done, which will close it if it was not already closed. The generator should also finish the progress task when it exits, because it may otherwise be a long time until the finally block runs.

Message guidelines

When filenames or similar variables are presented inline within a message, they should be enclosed in double quotes (ascii 0x22, not chiral unicode quotes):

bzr: ERROR: No such file "asdf"

When we print just a list of filenames there should not be any quoting: see bug 544297.

https://wiki.ubuntu.com/UnitsPolicy provides a good explanation about which unit should be used when. Roughly speaking, IEC standard applies for base-2 units and SI standard applies for base-10 units:

  • for network bandwidth and disk sizes, use base-10 (Mbits/s, kB/s, GB)
  • for RAM sizes, use base-2 (GiB, TiB)

Displaying help

Bazaar has online help for various topics through bzr help COMMAND or equivalently bzr command -h. We also have help on command options, and on other help topics. (See help_topics.py.)

As for python docstrings, the first paragraph should be a single-sentence synopsis of the command. These are user-visible and should be prefixed with __doc__ = so help works under python -OO with docstrings stripped.

The help for options should be one or more proper sentences, starting with a capital letter and finishing with a full stop (period).

All help messages and documentation should have two spaces between sentences.

Handling Errors and Exceptions

Commands should return non-zero when they encounter circumstances that the user should really pay attention to - which includes trivial shell pipelines.

Recommended values are:

  1. OK.
  2. Conflicts in merge-like operations, or changes are present in diff-like operations.
  3. Unrepresentable diff changes (i.e. binary files that we cannot show a diff of).
  4. An error or exception has occurred.
  5. An internal error occurred (one that shows a traceback.)

Errors are handled through Python exceptions. Exceptions should be defined inside bzrlib.errors, so that we can see the whole tree at a glance.

We broadly classify errors as either being either internal or not, depending on whether internal_error is set or not. If we think it’s our fault, we show a backtrace, an invitation to report the bug, and possibly other details. This is the default for errors that aren’t specifically recognized as being caused by a user error. Otherwise we show a briefer message, unless -Derror was given.

Many errors originate as “environmental errors” which are raised by Python or builtin libraries – for example IOError. These are treated as being our fault, unless they’re caught in a particular tight scope where we know that they indicate a user errors. For example if the repository format is not found, the user probably gave the wrong path or URL. But if one of the files inside the repository is not found, then it’s our fault – either there’s a bug in bzr, or something complicated has gone wrong in the environment that means one internal file was deleted.

Many errors are defined in bzrlib/errors.py but it’s OK for new errors to be added near the place where they are used.

Exceptions are formatted for the user by conversion to a string (eventually calling their __str__ method.) As a convenience the ._fmt member can be used as a template which will be mapped to the error’s instance dict.

New exception classes should be defined when callers might want to catch that exception specifically, or when it needs a substantially different format string.

  1. If it is something that a caller can recover from, a custom exception is reasonable.
  2. If it is a data consistency issue, using a builtin like ValueError/TypeError is reasonable.
  3. If it is a programmer error (using an api incorrectly) AssertionError is reasonable.
  4. Otherwise, use BzrError or InternalBzrError.

Exception strings should start with a capital letter and should not have a final fullstop. If long, they may contain newlines to break the text.

Documenting Changes

When you change bzrlib, please update the relevant documentation for the change you made: Changes to commands should update their help, and possibly end user tutorials; changes to the core library should be reflected in API documentation.

NEWS File

If you make a user-visible change, please add a note to the NEWS file. The description should be written to make sense to someone who’s just a user of bzr, not a developer: new functions or classes shouldn’t be mentioned, but new commands, changes in behaviour or fixed nontrivial bugs should be listed. See the existing entries for an idea of what should be done.

Within each release, entries in the news file should have the most user-visible changes first. So the order should be approximately:

  • changes to existing behaviour - the highest priority because the user’s existing knowledge is incorrect
  • new features - should be brought to their attention
  • bug fixes - may be of interest if the bug was affecting them, and should include the bug number if any
  • major documentation changes, including fixed documentation bugs
  • changes to internal interfaces

People who made significant contributions to each change are listed in parenthesis. This can include reporting bugs (particularly with good details or reproduction recipes), submitting patches, etc.

To help with merging, NEWS entries should be sorted lexicographically within each section.

Commands

The docstring of a command is used by bzr help to generate help output for the command. The list ‘takes_options’ attribute on a command is used by bzr help to document the options for the command - the command docstring does not need to document them. Finally, the ‘_see_also’ attribute on a command can be used to reference other related help topics.

API Documentation

Functions, methods, classes and modules should have docstrings describing how they are used.

The first line of the docstring should be a self-contained sentence.

For the special case of Command classes, this acts as the user-visible documentation shown by the help command.

The docstrings should be formatted as reStructuredText (like this document), suitable for processing using the epydoc tool into HTML documentation.

General Guidelines

Miscellaneous Topics

Debugging

Bazaar has a few facilities to help debug problems by going into pdb, the Python debugger.

If the BZR_PDB environment variable is set then bzr will go into pdb post-mortem mode when an unhandled exception occurs.

If you send a SIGQUIT or SIGBREAK signal to bzr then it will drop into the debugger immediately. SIGQUIT can be generated by pressing Ctrl-\ on Unix. SIGBREAK is generated with Ctrl-Pause on Windows (some laptops have this as Fn-Pause). You can continue execution by typing c. This can be disabled if necessary by setting the environment variable BZR_SIGQUIT_PDB=0.

Debug Flags

Bazaar accepts some global options starting with -D such as -Dhpss. These set a value in bzrlib.debug.debug_flags, and typically cause more information to be written to the trace file. Most mutter calls should be guarded by a check of those flags so that we don’t write out too much information if it’s not needed.

Debug flags may have effects other than just emitting trace messages.

Run bzr help global-options to see them all.

These flags may also be set as a comma-separated list in the debug_flags option in e.g. ~/.bazaar/bazaar.conf. (Note that it must be in this global file, not in the branch or location configuration, because it’s currently only loaded at startup time.) For instance you may want to always record hpss traces and to see full error tracebacks:

debug_flags = hpss, error

Jargon

revno
Integer identifier for a revision on the main line of a branch. Revision 0 is always the null revision; others are 1-based indexes into the branch’s revision history.

Unicode and Encoding Support

This section discusses various techniques that Bazaar uses to handle characters that are outside the ASCII set.

Command.outf

When a Command object is created, it is given a member variable accessible by self.outf. This is a file-like object, which is bound to sys.stdout, and should be used to write information to the screen, rather than directly writing to sys.stdout or calling print. This file has the ability to translate Unicode objects into the correct representation, based on the console encoding. Also, the class attribute encoding_type will effect how unprintable characters will be handled. This parameter can take one of 3 values:

replace
Unprintable characters will be represented with a suitable replacement marker (typically ‘?’), and no exception will be raised. This is for any command which generates text for the user to review, rather than for automated processing. For example: bzr log should not fail if one of the entries has text that cannot be displayed.
strict
Attempting to print an unprintable character will cause a UnicodeError. This is for commands that are intended more as scripting support, rather than plain user review. For example: bzr ls is designed to be used with shell scripting. One use would be bzr ls --null --unknowns | xargs -0 rm. If bzr printed a filename with a ‘?’, the wrong file could be deleted. (At the very least, the correct file would not be deleted). An error is used to indicate that the requested action could not be performed.
exact
Do not attempt to automatically convert Unicode strings. This is used for commands that must handle conversion themselves. For example: bzr diff needs to translate Unicode paths, but should not change the exact text of the contents of the files.

bzrlib.urlutils.unescape_for_display

Because Transports work in URLs (as defined earlier), printing the raw URL to the user is usually less than optimal. Characters outside the standard set are printed as escapes, rather than the real character, and local paths would be printed as file:// urls. The function unescape_for_display attempts to unescape a URL, such that anything that cannot be printed in the current encoding stays an escaped URL, but valid characters are generated where possible.

C Extension Modules

We write some extensions in C using pyrex. We design these to work in three scenarios:

  • User with no C compiler
  • User with C compiler
  • Developers

The recommended way to install bzr is to have a C compiler so that the extensions can be built, but if no C compiler is present, the pure python versions we supply will work, though more slowly.

For developers we recommend that pyrex be installed, so that the C extensions can be changed if needed.

For the C extensions, the extension module should always match the original python one in all respects (modulo speed). This should be maintained over time.

To create an extension, add rules to setup.py for building it with pyrex, and with distutils. Now start with an empty .pyx file. At the top add “include ‘yourmodule.py’”. This will import the contents of foo.py into this file at build time - remember that only one module will be loaded at runtime. Now you can subclass classes, or replace functions, and only your changes need to be present in the .pyx file.

Note that pyrex does not support all 2.4 programming idioms, so some syntax changes may be required. I.e.

  • ‘from foo import (bar, gam)’ needs to change to not use the brackets.
  • ‘import foo.bar as bar’ needs to be ‘import foo.bar; bar = foo.bar’

If the changes are too dramatic, consider maintaining the python code twice - once in the .pyx, and once in the .py, and no longer including the .py file.

Making Installers for OS Windows

To build a win32 installer, see the instructions on the wiki page: http://wiki.bazaar.canonical.com/BzrWin32Installer

Core Developer Tasks

Overview

What is a Core Developer?

While everyone in the Bazaar community is welcome and encouraged to propose and submit changes, a smaller team is reponsible for pulling those changes together into a cohesive whole. In addition to the general developer stuff covered above, “core” developers have responsibility for:

  • reviewing changes
  • reviewing blueprints
  • planning releases
  • managing releases (see Releasing Bazaar)

Note

Removing barriers to community participation is a key reason for adopting distributed VCS technology. While DVCS removes many technical barriers, a small number of social barriers are often necessary instead. By documenting how the above things are done, we hope to encourage more people to participate in these activities, keeping the differences between core and non-core contributors to a minimum.

Communicating and Coordinating

While it has many advantages, one of the challenges of distributed development is keeping everyone else aware of what you’re working on. There are numerous ways to do this:

  1. Assign bugs to yourself in Launchpad
  2. Mention it on the mailing list
  3. Mention it on IRC

As well as the email notifcations that occur when merge requests are sent and reviewed, you can keep others informed of where you’re spending your energy by emailing the bazaar-commits list implicitly. To do this, install and configure the Email plugin. One way to do this is add these configuration settings to your central configuration file (e.g. ~/.bazaar/bazaar.conf):

[DEFAULT]
email = Joe Smith <joe.smith@internode.on.net>
smtp_server = mail.internode.on.net:25

Then add these lines for the relevant branches in locations.conf:

post_commit_to = bazaar-commits@lists.canonical.com
post_commit_mailer = smtplib

While attending a sprint, RobertCollins’ Dbus plugin is useful for the same reason. See the documentation within the plugin for information on how to set it up and configure it.

Planning Releases

Bug Triage

Keeping on top of bugs reported is an important part of ongoing release planning. Everyone in the community is welcome and encouraged to raise bugs, confirm bugs raised by others, and nominate a priority. Practically though, a good percentage of bug triage is often done by the core developers, partially because of their depth of product knowledge.

With respect to bug triage, core developers are encouraged to play an active role with particular attention to the following tasks:

  • keeping the number of unconfirmed bugs low
  • ensuring the priorities are generally right (everything as critical - or medium - is meaningless)
  • looking out for regressions and turning those around sooner rather than later.

Note

As well as prioritizing bugs and nominating them against a target milestone, Launchpad lets core developers offer to mentor others in fixing them.