Bazaar 3.0 User Story Analysis

Date: June 2010, Brisbane
Authors: Andrew Bennetts
Ian Clatworthy
Martin Pool

User stories

(nb: command names and output formats subject to change; the important thing is the shape of the concepts.)

Starting with an existing project

bzr clone lp:do

Gives you a workspace containing a working tree, all the branches and tags in the main Gnome-Do workspace, and all the history referenced by those branches and tags.

The working tree addresses the default branch of Do (typically trunk) and is up-to-date on the last revision of that branch.

What if you try to commit? Because this is a mirrored branch, you can't change it. If you try to commit, you're invited to make a new branch, or to change this from being mirrored to being bound.

How do you see what you've got?

% bzr show-tags-and-branches  # (or "bzr names"?)
do/trunk [branch,active,default]
do/1.0.x [branch]
do/1.0 [tag]
do/1.0beta1 [tag]
do/1.0rc1 [tag]

This means that trunk is a branch (you can commit to it and update it), the working tree is addressing the trunk branch, and this is also the default branch for a working tree when you create a new workspace.

% bzr log

Shows the history for the active branch of the working tree.

How does this relate to other workspaces?

% bzr links
do        bound to     lp:do
mbp-do    pushes to    lp:~mbp/do
spiv-do   mirror of    lp:~spiv/do

This means that "do" is bound to the namespace on Launchpad and I can change only in lockstep with the namespace on Launchpad.

"mbp-do" has the definitive copy on my own machine in this workspace. I can make any changes and they're pushed out later.

"spiv-do" means the definitive copy is on Launchpad and I can't make any changes myself, I can only pull from there. (This saves me committing a change on a local copy of a logical branch that I can't actually directly update.)

I could pop my mirror of spiv's branch into readwrite mode and then change it but then what would happen when I next try to sync?

Will it try to push to his branches? Or will it clobber my changes with his? Or will it (most likely?) just ignore his changes because it's expecting to push from these branches, not to pull into them.

Make your own line of work

% bzr new-line do-mbp/bug1234

This makes a new branch/floating tag/whatever, starting from the current basis revision of your working tree,

By default this switches your working tree to address this new branch, though you could have an option to do otherwise.

Other names switch --create or even tag --branch.

See the diff that would be committed to trunk

After I've done a feature branch, I want to see the overall diff that will be committed when I merge this to trunk:

bzr diff -rancestor:do/trunk

but if do/trunk is configured as the default submit branch (where?) then I should be able to just do

bzr diff -rsubmit:

Fetching the latest new stuff from others

bzr sync

by default pulls in updates to all links, and pushes out my own namespaces to all the mirrors of it (perhaps there can be more than one.) Perhaps for the sake of getting your work backed up it should push before it pulls.

You could give options to synchronize only particular links or namespaces.

This should be normally non-interactive and not semantic so that it can be run in the background or as a daemon.

Does this update the working tree? Probably yes, so that "get up to date" is easy.

But what if your working tree is specifically set to an old revision? Perhaps not then?

Perhaps you want to sync something larger than a single workspace, eg to update all of your Bazaar plugins. We could search the directory tree or we could have a list of all of the user's repostories that they care to sync. (That would need to cope with them sometimes being deleted.)

Moving your working tree

bzr update

Can move the working tree to address a different label, and possibly a different revision in the history of that label.

Does update ever implicitly fetch? If you're in a bound branch you could expect it to, and it does in bzr 2.2. So perhaps if you're in a mirrored or bound branch it should fetch, unless you tell it not to, or maybe unless you give it a -r option?

Updating a central branch

One way to update a centralized branch is to have a tag set bound to the centralized tag set: any updates to those tags are first applied to the central branch. (Either both updates succeed or neither does.)

So after cloning a workspace, that's still looking at the central trunk branch, you say

vim README
bzr commit

then the commit will go into the central trunk.

Deleting a tag

I can delete a tag or a branch.

Because binding happen at the level of a namespace, not a single label, if I delete a label inside a bound namespace then the deletion propagates:

bzr delete-label do/1.0rc1

So how is this looked up? The working tree has a repository, which has a dictionary of links/namespaces, each of which knows about binding and has a dictionary of labels. It looks conceptually like:

wt.repository.lookup_namespace('do').delete_tag('1.0rc1')

Then if it's a bound namespace, it will automatically synchronously update that remotely.

In this case I've given a name that includes the namespace so it doesn't matter which branch I'm already looking at. Presumably if I said

bzr delete-label testing

it would look that up first within the namespace of the label of my current working tree (or whatever dwim logic you want.)

Bringing other branches into this workspace

Although the documentation is a different codebase, a different Launchpad project, and a different ancestry, you can still bring it into the same workspace and repository.

% bzr link lp:do-docs
% bzr switch do-docs/trunk

Using multiple working trees

"I want to keep several copies of important trees around, such as the supported branches of my project, so that I can quickly refer to them without switching my main working tree."

Back up all my work

Back up everything important, including: my mirrors of other peoples' work and my work in progress.

Doing bzr sync or bzr push in a mode where it syncs label sets that aren't synced by default.

People don't want to do this at the moment because it works branch-at-a-time, doesn't notice branch deletion, doesn't optimize repository transfers across branches, and needs to crawl directories to notice branches.

(Perhaps out of scope and better done by a separate backup tool? But they aren't transaction-aware and may get an incoherent view of the repository.)

Purely remote links

We could have a type of link where no history is stored locally and any reference to a name within it goes to the other repository, so the name would just be a shortcut for the URL.

This might be expensive or annoying if there are operations that implicitly search all namespaces such as tags or dwim lookup. But perhaps we could cope with that.

Publishing a web site in bzr

Want a checkout showing trunk on the remote machine.

Can keep using bzr-upload, bzr-push-and-update, etc.

Could just add support for accessing/transforming working trees over a transport so they can be used just as if they were local.

This is mostly orthogonal but it would be nice if you could say

% bzr checkout bzr+ssh://example.com/www/site
% bzr update -r website/stable bzr+ssh://example.com/www/site

to create the working tree and then to change it remotely to look at a different label. Or you could just push a change there then update it.

Does creating a new remote workspace create a tree by default? This is inconsistent between local and remote, and between branch, push and pull.

Does pushing to this site update the working tree? Presumably not? Web developers might like this: commit, have sync running automatically, and have it automatically update the tree on the remote site. Perhaps an on-push hook or configuration option to say that the tree should be updated.

Access control

OS permissions should normally be the same across the whole workspace.

On the local machine or over dumb transports it's pretty simple: the OS will let you either read the workspace, or write to it, or not. It doesn't make sense to attempt finer-grained control if the user can directly manipulate the files.

If the user's accessing the workspace over a smart protocol we have the chance to vet their operations at a semantic level. The kind of checks we want are:

  • "read the tree referenced by this label." (They should only see data referenced by that branch?).

    It may not make sense to try to have read access control on a finer level than the whole repository, because the server would need to work out if they have a plausible reason to see something inside a repository.

    So instead: if you don't want people to see something, you need to put it into a separate repository that they can't see.

  • When a label is updated, check policy about the tree it points to. (eg that it has no trailing whitespace.)

  • "Commit to this branch". If you can't commit, perhaps you shouldn't be able to insert records either.

  • "Make new labels", or "change existing labels". (Or "can make specific labels", with general policy hooks.)

  • These people can change links.

  • Update the working tree (assuming that's supported remotely.) You could have policy about allowing it to be updated but only to the tip of a particular branch.

  • "Change configuration."

  • Break locks.

  • Repack.

Most of these seem like they should be hooks with default implementations that look in a configuration file. And the hooks will need some kind of context describing which remote user requested this operation.

Garbage collection

After accidentally bringing too much data, we'd like to remove the unwanted data from the repository.

Because we have a more self-contained list of the interesting labels in this repository it's safer to support gc as a built-in operation. Alternatively we can just tell people to branch the whole thing and that should filter out unneeded data.

Questions

Deletions from bzr 2.2

UDD impact

Collaboration on stacks of branches

The patches are represented as stacks of branches, and you want to merge the Debian patches with the Ubuntu patches.

Impact on looms

Would probably represent loom threads as labels.

Gives an easier on-ramp: a bunch of loom-specific commands go away, and people can work with them without needing to know about looms. Doesn't need to override bzr status anymore. Looms might still have a command to show you only the labels that are relevant to the current loom.

Is a loom a label namespace? Or can they span namespaces? The latter could be asking for a lot of trouble. How do you move looms across namespaces?

Does branching a loom into a different namespace also branch all of the referenced threads into that namespace? What if they clash with some that already exist?

Could have record-loom take a snapshot of the current label set and record it into a special branch.

Perhaps we'd like fetching a loom to bring across stuff referenced by all revisions referenced by all labels referenced by all previous versions of the loom. That seems to require letting the loom hook fairly deeply into the fetch logic and that may be hard if there are ordering constraints in fetch. (But this is mostly orthogonal to these model changes.)

Impact on plugins

Pipelines

Not much impact? Does want a space for per-branch configuration to point to the previous and next? May work better with more builtin colocated branch support?

Model objects

Working tree

Knows its repository, and knows which branch within that repository it addresses, and which revision (by revision id) it addresses.

Workspace

A filesystem directory that contains:

  • A .bzr directory
  • A repository
  • Optionally working files (outside of .bzr) and then it will have a .bzr/checkout
  • Information about some label sets

Hooks